Ben Chuanlong Du's Blog

It is never too late to learn.

Preparing Data for AI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. When you label individual images, it is better to use numerical labels (even though text labels are easier to understand) so that you can avoid mapping between numbers (use …

Loss Functions for Machine Learning Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A Loss function is always non-negative. If you get a negative loss when training a model, there must be something wrong with the code. For example, maybe you …

Entropy

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Entropy
  2. Shannon Entropy
  3. Cross Entropy
  4. K-L divergence

Tips

  1. The entropy concept was first introduced for discrete distributions (called Shannon entropy), which is defined as

    $$H(X) = E …

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

Handling Categorical Variables in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Categorical variables are very common in a machine learning project. On a high level, there are two ways to handle a categorical variable.

  1. Drop a categorical variable if a categorical variable …