Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: GetQuotaUsage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom I

py4j.protocol.Py4JJavaError: An error occurred while calling o156.getQuotaUsage.

Symptom II

org.apache.hadoop.ipc.RemoteException(java.io.IOException): The quota system is disabled in Router.

Possible Causes …

Spark Issue: RuntimeException: Could not find any configured addresses for URI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: java.lang.RuntimeException: Could not find any configured addresses for URI hdfs://clustername-router/

Possible Causes

This is due to missing clustername-router settings in the property dfs.nameservices in …

Spark Issue: UriSyntaxException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs::/cluster-name/user/dclong/feature_example/features/train/2022-03-11

Possible Causes

As the error message points out, there's a syntax …

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

Handling Categorical Variables in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Categorical variables are very common in a machine learning project. On a high level, there are two ways to handle a categorical variable.

  1. Drop a categorical variable if a categorical variable …

Tips on LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. It is strongly suggested that you load data into a pandas DataFrame and handle categorical variables by specifying a dtype of "category" for those categorical variables.

    df.cat_var = df.cat_var.astype …