Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
spark-submit and spark-shell
Overwrite the PATH environment variable before invoking spark-submit
and/or spark-shell
often resolves the issue.
Spark in Jupyter/Lab Notebooks
Remove or reset the environment variable HADOOP_CONF_DIR
resolves the issue.
import os
os.environ["HADOOP_CONF_DIR"] = ""
import findspark
findspark.init("/opt/spark-3.1.1-bin-hadoop3.2/")
from pyspark.sql import SparkSession, DataFrame
spark = SparkSession.builder.appName("PySpark_Notebook") \
.enableHiveSupport().getOrCreate()
...
More Spark Related Environment Variables
- HADOOP_CONF_DIR
- SPARK_HOME
- HADOOP_HOME
- HIVE_HOME
- PIG_HOME
- HBASE_HOME