Ben Chuanlong Du's Blog

It is never too late to learn.

Get Size of Tables on HDFS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The HDFS Way

You can use the hdfs dfs -du /path/to/table command or hdfs dfs -count -q -v -h /path/to/table to get the size of an HDFS path (or table). However, this only works if the cluster supports HDFS. If a Spark cluster exposes only JDBC/ODBC APIs, this method does not work.

Access Control in Spark SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Grant Permission to Users

GRANT
    priv_type [, priv_type ] ...
    ON database_table_or_view_name
    TO principal_specification [, principal_specification] ...
    [WITH GRANT OPTION];

Examples:

GRANT SELECT ON table1 TO USER user1;
GRANT SELECT ON DATABASE db1 TO USER user1 …

SQL Translation Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The ORM library SQLAlchemy

can be leverage

SQL Translation Tools

JOOQ Parser

Hands on SQLite3

Tips and Traps

  1. Do NOT use SQLite3 on network filesystems (NFS, SAMBA, etc).

  2. .schema show create statement of a table

  3. You can force query to keep the original order of rows by applying order by rowid.

  4. SQLite3 supports full-text search by the FTS5 extension (since 3.9.0). It is suggested that you use the porter

Use Column Alias in SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Logically any SELECT is processed in following order:

    1. from
    2. where
    3. group by
    4. having
    5. olap functions
    6. qualify
    7. select
    8. sample
    9. order by

    Besides the proprietary QUALIFY/SAMPLE every DBMS will do it exactly …

Volatile CTE and Subqueries in SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. A volatile table persistents in the duration of the connection that creates it while a CTE is only accessible by the query following it. That is the scope of CTE is …