Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Logically any
SELECT
is processed in following order:- from
- where
- group by
- having
- olap functions
- qualify
- select
- sample
- order by
Besides the proprietary
QUALIFY/SAMPLE
every DBMS will do it exactly the same. When you use a column alias inwhere
,group by
,having
, window function orqualify
the column list is not yet created, thus using an alias should fail. However, there are a few exceptions.
Both Teradata SQL and MySQL allows using column aliases inwhere
,group by
,having
,olap functions
andqualify
. For those SQL variants which do not support using a column alias inwhere
,group by
,having
, window functions orqualify
, you can use subqueries instead. -
Spark/Hive SQL does not support using column aliases in
where
,group by
,having
or window functions. However, column aliases can be used in Spark DataFrame APIs as long as it is used in a subsequent method invoke. For example, the following code does not work in PySpark because the column aliasnew_column_alias
is used in the same method invoke.df.select( col("col1").alias("new_column_alias"), (col("new_column_allias") + 1).alias("another_column_alias") )
However, the following code works as the column alias
new_colun_alias
is used in a subsequent method invoke.df.select( col("col1").alias("new_column_alias") ).withColumn("another_column_alias", col("new_column_alias") + 1)
Since Spark SQL and DataFrame APIs can be mixed together in Spark applications, things become very flexible and thus convenient.
-
Even if column aliases are allowed in Teradata and MySQL, you should never alias to an existing column name to avoid confusing the optimizer and/or end user). If you do alias to an existing column name in Teradata, the original column instead of the alias is used in
where
,group byy
,having
, window function orqualify
.
Allow Column Alias
- Teradata SQL
- MySQL
Disallow Column Alias
- Spark SQL
- Oracle SQL