Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The HDFS Way¶
You can use the hdfs dfs -du /path/to/table
command
or hdfs dfs -count -q -v -h /path/to/table
to get the size of an HDFS path (or table).
However,
this only works if the cluster supports HDFS.
If a Spark cluster exposes only JDBC/ODBC APIs,
this method does not work.
The SQL Query Way¶
Size of One Table¶
tblproperties will give the size of the table and can be used to grab just that value if needed.
:::sql
describe formatted table_name;
show tblproperties table_name
-- or
show tblproperties table_ame("rawDataSize")
Yes the output is bytes. Also, this only works for non-partitioned tables which have had stats run on them.
Size of Multiple Tables¶
:::sql
show table extended in some_db
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];
ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;