Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Apache Arrow Flight is the future protocol for querying Databases! It use columnar data and leverages Apache Arrow to avoid unnecessary copy of data, which makes it able to query large data much (about 100x) faster than ODBC and JDBC.
-
Overall speaking, Python has better ODBC support than JDBC support. Most database related packages in Python support or rely on ODBC. Currently, ODBC packages also have richer features than JDBC packages. However, it is not an easy job to install and configure ODBC drivers for non-open source databases (e.g., Teradata), in which situations JDBC is more convenient.
JDBC Packages in Python
-
JDBC + pyarrow.jvm is currently the best way to query SQL databases.
-
jpype.dbapi2 provides an implementation of the Python DBAPI2 . However, it is about 20x slower than the approach of JDBC + pyarrow.jvm.
-
JayDeBeApi is outdated and unmaintained.
-
PyAthenaJDBC is a JDBC package specifically for Amazon Athena.
ODBC Packages in Python
-
TurbODBC is likely the fastest ODBC Python package. pyodbc is another (less efficient) alternative.
-
SQLAlchemy are general purpose packages relying on ODBC.
-
There are lots of database specific packages relying on ODBC. For example, teradata is a Python package for Teradata SQL which relies on ODBC or RESTful.
Database Modules
-
json: JSON parsing.
-
sqlite3
-
PyMySQL, MySQLdb
-
PyMongo
-
teradata
-
pyodbc, pypyodbc: Python ODBC bridget.
-
SQLAlchemy
-
sqlalchemy-teradata
Misc
http://docs.python-guide.org/en/latest/scenarios/db/
the teradata package is weird, the file option can be used to run multiple statements in file ... but if I manually pass a string, it doesn't work ... check the implementation ... https://support.microsoft.com/en-us/help/3103282/teradata-odbc-configuration-on-linux https://developer.teradata.com/tools/articles/teradata-sqlalchemy-introduction https://github.com/Teradata/sqlalchemy-teradata http://developer.teradata.com/tools/reference/teradata-python-module#Installing http://stackoverflow.com/questions/34948453/read-teradata-query-into-pandas
ORM
- SQLAlchemy
is the most popular ORM package for Python.
peewee
and
orator
are lightweight ORM solutions compared to
SQLAlchemy
.
http://stackoverflow.com/questions/10797794/multiple-queries-executed-in-java-in-single-statement
http://stackoverflow.com/questions/4493614/sqlalchemy-equivalent-of-pyodbc-connect-string-using-freetds
https://developer.teradata.com/tools/articles/teradata-sqlalchemy-introduction
http://stackoverflow.com/questions/12047193/how-to-convert-sql-query-result-to-pandas-data-structure
http://stackoverflow.com/questions/29525808/sqlalchemy-orm-conversion-to-pandas-dataframe/29528804#29528804
JDBC
https://analyticsanvil.wordpress.com/2016/06/08/python-jdbc-dyanmic-hive-scripting/
https://github.com/minatverma/pythonWorks/blob/master/DQM.py
https://www.fullstackpython.com/databases.html
http://docs.python-guide.org/en/latest/scenarios/db/
References
https://www.fullstackpython.com/object-relational-mappers-orms.html
https://docs.python-guide.org/scenarios/db/
https://www.python.org/dev/peps/pep-0249/