Spark Issue: Duplicated Partitions

Aug 21, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There seems to be an issue in Spark that it might fail to overwrite files even if mode of spark.write is set to be "overwrite".

Spark Issue: Too Many Containers Asked

May 21, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Error Message

org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Too many containers asked, 16731530.

Possible Causes

"Too many containers asked" is a protection mechanism of the Resource Manager. It might be triggered …

Spark Issue: Total Size of Serialized Results Is Bigger than spark.driver.maxResultSize

Feb 21, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Issue

Total size of serialized results is bigger than spark.driver.maxResultSize

Solutions

Eliminate unnecessary broadcast or collect.
If one of the tables for joining contains too large number of partitions …

Spark Issue: Unable to Find Encoder Type

May 21, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Issue Unable to find encoder for type stored in a Dataset

Solution …

Access Control in Spark SQL

Jul 22, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Grant Permission to Users

GRANT
    priv_type [, priv_type ] ...
    ON database_table_or_view_name
    TO principal_specification [, principal_specification] ...
    [WITH GRANT OPTION];

Examples:

GRANT SELECT ON table1 TO USER user1;
GRANT SELECT ON DATABASE db1 TO USER user1 …

Koalas is pandas API on PySpark

Dec 11, 2020

References¶

https://github.com/databricks/koalas

https://databricks.com/blog/2020/08/11/interoperability-between-koalas-and-apache-spark.html

https://notebooks.gesis.org/binder/jupyter/user/databricks-koalas-mxw72n1l/notebooks/docs/source/getting_started/10min.ipynb

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions

← Older Newer →

Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Duplicated Partitions

Spark Issue: Too Many Containers Asked

Error Message

Possible Causes

Spark Issue: Total Size of Serialized Results Is Bigger than spark.driver.maxResultSize

Issue

Solutions

Spark Issue: Unable to Find Encoder Type

Issue Unable to find encoder for type stored in a Dataset

Solution …

Access Control in Spark SQL

Grant Permission to Users

Koalas is pandas API on PySpark

References¶