Ben Chuanlong Du's Blog

It is never too late to learn.

Sort DataFrame in Spark

Comments

  1. After sorting, rows in a DataFrame are sorted according to partition ID. And within each partition, rows are sorted. This property can be leverated to implement global ranking of rows. For more details, please refer to Computing global rank of a row in a DataFrame with Spark SQL. However, notice that multi-layer ranking is often more efficiency than a global ranking in big data applications.
In [2]:
import findspark

findspark.init("/opt/spark")

from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *
from pyspark.sql.types import StructType

spark = (
    SparkSession.builder.appName("PySpark_Sorting").enableHiveSupport().getOrCreate()
)
In [3]:
import pandas as pd
In [12]:
df_p = pd.DataFrame(
    [
        ("Ben", "Du", 1),
        ("Ben", "Du", 2),
        ("Ken", "Xu", 1),
        ("Ken", "Xu", 9),
        ("Ben", "Tu", 3),
        ("Ben", "Tu", 4),
    ],
    columns=["first_name", "last_name", "id"],
)
df_p
Out[12]:
first_name last_name id
0 Ben Du 1
1 Ben Du 2
2 Ken Xu 1
3 Ken Xu 9
4 Ben Tu 3
5 Ben Tu 4
In [13]:
df = spark.createDataFrame(df_p)
df.show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ben|       Du|  1|
|       Ben|       Du|  2|
|       Ken|       Xu|  1|
|       Ken|       Xu|  9|
|       Ben|       Tu|  3|
|       Ben|       Tu|  4|
+----------+---------+---+

In [14]:
df.orderBy(["first_name", "last_name"]).show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ben|       Du|  1|
|       Ben|       Du|  2|
|       Ben|       Tu|  4|
|       Ben|       Tu|  3|
|       Ken|       Xu|  9|
|       Ken|       Xu|  1|
+----------+---------+---+

Note: The asecending keyword below cannot be omitted!

In [16]:
df.orderBy(["first_name", "last_name"], ascending=[False, False]).show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ken|       Xu|  9|
|       Ken|       Xu|  1|
|       Ben|       Tu|  3|
|       Ben|       Tu|  4|
|       Ben|       Du|  1|
|       Ben|       Du|  2|
+----------+---------+---+

In [ ]:
 

Comments