Ben Chuanlong Du's Blog

It is never too late to learn.

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

Hands on the Deque Collection in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A deque is implemented via the circular queue data structure and it has O(1) time complexity appending from both ends.

  2. Unlike list and tuple collections, a deque CANNOT be sliced!

Spark Issue: SIGBUS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

CalledProcessError: Command './pine' died with .

Possible Causes

SIGBUS (bus error) is a signal that happens when you try to access memory that has not been physically mapped . There are several …

Ubuntu Crashes When Opening Settings

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

I encountered an issue with Ubuntu 20.04 on my machine. Ubuntu crashes when I try to open settings or video players. After searching online, it seems that the problem happens …

Spark Issue: Could Not Execute Broadcast in 300S

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: org.apache.spark.SparkException: Could not execute broadcast in 600 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting …