Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Construct MultiIndexes in pandas
Conversion Between PySpark DataFrames and pandas DataFrames
Comments¶
A PySpark DataFrame can be converted to a pandas DataFrame by calling the method
DataFrame.toPandas
, and a pandas DataFrame can be converted to a PySpark DataFrame by callingSparkSession.createDataFrame
. Notice that when you callDataFrame.toPandas
to convert a Spark DataFrame to a pandas DataFrame, the whole Spark DataFrame is collected to the driver machine! This means that you should only call the methodDataFrame.toPandas