Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Logically any
SELECTis processed in following order:- from
- where
- group by
- having
- olap functions
- qualify
- select
- sample
- order by
Besides the proprietary
QUALIFY/SAMPLEevery DBMS will do it exactly the same. When you use a column alias inwhere,group by,having, window function orqualifythe column list is not yet created, thus using an alias should fail. However, there are a few exceptions.
Both Teradata SQL and MySQL allows using column aliases inwhere,group by,having,olap functionsandqualify. For those SQL variants which do not support using a column alias inwhere,group by,having, window functions orqualify, you can use subqueries instead. -
Spark/Hive SQL does not support using column aliases in
where,group by,havingor window functions. However, column aliases can be used in Spark DataFrame APIs as long as it is used in a subsequent method invoke. For example, the following code does not work in PySpark because the column aliasnew_column_aliasis used in the same method invoke.df.select( col("col1").alias("new_column_alias"), (col("new_column_allias") + 1).alias("another_column_alias") )However, the following code works as the column alias
new_colun_aliasis used in a subsequent method invoke.df.select( col("col1").alias("new_column_alias") ).withColumn("another_column_alias", col("new_column_alias") + 1)Since Spark SQL and DataFrame APIs can be mixed together in Spark applications, things become very flexible and thus convenient.
-
Even if column aliases are allowed in Teradata and MySQL, you should never alias to an existing column name to avoid confusing the optimizer and/or end user). If you do alias to an existing column name in Teradata, the original column instead of the alias is used in
where,group byy,having, window function orqualify.
Allow Column Alias
- Teradata SQL
- MySQL
Disallow Column Alias
- Spark SQL
- Oracle SQL