Ben Chuanlong Du's Blog

It is never too late to learn.

Reset the Index of a pandas DataFrame

reset_index

By default reset_index returns a copy rather than modify the original data frame. You can specify inplace = True to overwrite the behavior.

Series

  1. If you drop the original index, you still have a Series. However, if you reset index of a sereis without dropping the original index, you get a data frame.

New Features in Spark 3

AQE (Adaptive Query Execution)

To enable AQE, you have to set spark.sql.adaptive.enabled to true (using --conf spark.sql.adaptive.enabled=true in spark-submit or using `spark.config("spark.sql.adaptive,enabled", "true") in Spark/PySpark code.)

Pandas UDFs

Pandas UDFs are user defined functions that are executed by Spark using Arrow to transfer data to Pandas to work with the data, which allows vectorized operations. A Pandas UDF is defined using pandas_udf