Ben Chuanlong Du's Blog

It is never too late to learn.

Select All Columns Except a Few from a Table

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Comments

There is no (direct) way of select all columns except a few from a table using SQL. However, this is easily doable with DataFrame APIs (pandas, Spark/PySpark, etc.).

Get Size of Tables on HDFS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The HDFS Way

You can use the hdfs dfs -du /path/to/table command or hdfs dfs -count -q -v -h /path/to/table to get the size of an HDFS path (or table). However, this only works if the cluster supports HDFS. If a Spark cluster exposes only JDBC/ODBC APIs, this method does not work.

Tips on NetworkX

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Alternatives to Docker Containers

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. LXD and Multipass are alternatives to Docker container. Docker is more lightweight than LXD which is more lightweight than Multipass (Docker < LXD < Multipass).

  2. Neither Docker nor LXD requires a CPU which …

Tips on Omegaconf

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. omegaconf can parse command-line options too. However, unlike argparse it does not enforce any constraint on command-line options.

Tips on Redash

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Creating a new query runner (data source)

https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347