Ben Chuanlong Du's Blog

It is never too late to learn.

Loss Functions for Machine Learning Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A Loss function is always non-negative. If you get a negative loss when training a model, there must be something wrong with the code. For example, maybe you …

Entropy

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Entropy
  2. Shannon Entropy
  3. Cross Entropy
  4. K-L divergence

Tips

  1. The entropy concept was first introduced for discrete distributions (called Shannon entropy), which is defined as

    $$H(X) = E …

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

Handling Categorical Variables in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Categorical variables are very common in a machine learning project. On a high level, there are two ways to handle a categorical variable.

  1. Drop a categorical variable if a categorical variable …

Tips on LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. It is strongly suggested that you load data into a pandas DataFrame and handle categorical variables by specifying a dtype of "category" for those categorical variables.

    df.cat_var = df.cat_var.astype …