Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Hive is case-insensitive, both keywords and functions
-
You can use both double and single quotes for strings
-
use
=
rather than==
for equality comparison but it seems that==
also works -
use
%
rather than*
for matching multiple characters -
use quit or exit
-
when filtering use the where clause while when joining use the on clause
-
it seems that it's not necessary to use group before running aggregation functions in hive, e.g., when you count the number of records in a table.
-
must separate fields/column names with comma in the select clause or group by clause.
-
it seems to me that Hive runs command from right to left?
-
Random sample from a Hive table. http://www.joefkelley.com/736/
SELECT * FROM my_table WHERE rand() <= 0.0001 DISTRIBUTE BY rand() SORT BY rand() LIMIT 10000 ;
Common Mistakes
- Forget to separate fields with comma in the select or group by clause.
Hive
http://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv
https://hadoopsters.net/2015/09/18/hadoop-tutorial-how-to-export-hive-table-to-csv-file/
https://analyticsanvil.wordpress.com/2016//python-jdbc-dyanmic-hive-scripting/
Hive table, if I need a small part of a big Hive table, does hive load in all data or try to be smart? it seems that it's hard ...
References
http://www.joefkelley.com/736/