Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Unable to Find Encoder Type

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Issue Unable to find encoder for type stored in a Dataset

Solution …

Access Control in Spark SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Grant Permission to Users

GRANT
    priv_type [, priv_type ] ...
    ON database_table_or_view_name
    TO principal_specification [, principal_specification] ...
    [WITH GRANT OPTION];

Examples:

GRANT SELECT ON table1 TO USER user1;
GRANT SELECT ON DATABASE db1 TO USER user1 …

Logging in PySpark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Excessive logging is better than no logging! This is generally true in distributed big data applications.

  2. Use loguru if it is available. If you have to use the logging module, be …

Use XGBoost With Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The split-by-leaf mode (grow_policy="lossguide") is not supported in distributed training, which makes XGBoost4J on Spark much slower than LightGBM on Spark.

XGBoost with Spark

https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d

https://xgboost …