Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The HDFS Way¶
You can use the hdfs dfs -du /path/to/table
command
or hdfs dfs -count -q -v -h /path/to/table
to get the size of an HDFS path (or table).
However,
this only works if the cluster supports HDFS.
If a Spark cluster exposes only JDBC/ODBC APIs,
this method does not work.
HDFS in Python
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
mtth/hdfs
- Supports proxy. See this issue for details.
Apache Arrow - HDFS
References
http://wesmckinney.com/blog/python-hdfs-interfaces/