Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: AnalysisException: Found Duplicated Columns

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

pyspark.sql.utils.AnalysisException: Found duplicate column(s) when inserting into ...

Possible Causes

As the error message says, there are duplicated columns in your Spark SQL code.

Possible Solutions

Fix …

Spark Issue: GetQuotaUsage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom I

py4j.protocol.Py4JJavaError: An error occurred while calling o156.getQuotaUsage.

Symptom II

org.apache.hadoop.ipc.RemoteException(java.io.IOException): The quota system is disabled in Router.

Possible Causes …

Spark Issue: RuntimeException: Could not find any configured addresses for URI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: java.lang.RuntimeException: Could not find any configured addresses for URI hdfs://clustername-router/

Possible Causes

This is due to missing clustername-router settings in the property dfs.nameservices in …

Spark Issue: UriSyntaxException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs::/cluster-name/user/dclong/feature_example/features/train/2022-03-11

Possible Causes

As the error message points out, there's a syntax …

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

Hands on the Deque Collection in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A deque is implemented via the circular queue data structure and it has O(1) time complexity appending from both ends.

  2. Unlike list and tuple collections, a deque CANNOT be sliced!