Environment Variables¶
export
unset
Tips and Traps¶
explainshell.com is a great place for learning shell.
Bash-it/bash-it is a great community driven Bash framework.
It is suggested that you avoid writing complicated Bash scripts. IPython is a much better alternative.
Do NOT use
;
to delimit paths passed to a shell command because;
Use a Class in the Definition of the Class in Python
Comments¶
- As long as the class name is not need at definition time of the class, it is OK to use it.
You cannot use a class in default values of the __init__
function of the class.
Shell in Docker
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Configure the Shell for the RUN
Command
https://docs.docker.com/engine/reference/builder/#shell
Configure the Default Shell for Terminals in Docker Containers
Just set the SHELL environment variable in …
Check Whether a Python Object Is Callable
Broadcast Join in Spark
Tips and Traps¶
BroadcastHashJoin, i.e., map-side join is fast. Use BroadcastHashJoin if possible. Notice that Spark will automatically use BroacastHashJoin if a table in inner join has a size less then the configured BroadcastHashJoin limit.
Notice that BroadcastJoin only works for inner joins. If you have a outer join, BroadcastJoin won't happend even if you explicitly Broadcast a DataFrame.
Conversion Between PySpark DataFrames and pandas DataFrames
Comments¶
A PySpark DataFrame can be converted to a pandas DataFrame by calling the method
DataFrame.toPandas
, and a pandas DataFrame can be converted to a PySpark DataFrame by callingSparkSession.createDataFrame
. Notice that when you callDataFrame.toPandas
to convert a Spark DataFrame to a pandas DataFrame, the whole Spark DataFrame is collected to the driver machine! This means that you should only call the methodDataFrame.toPandas