Ben Chuanlong Du's Blog

It is never too late to learn.

Preparing Data for AI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. When you label individual images, it is better to use numerical labels (even though text labels are easier to understand) so that you can avoid mapping between numbers (use …

Loss Functions for Machine Learning Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A Loss function is always non-negative. If you get a negative loss when training a model, there must be something wrong with the code. For example, maybe you …

Entropy

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Entropy
  2. Shannon Entropy
  3. Cross Entropy
  4. K-L divergence

Tips

  1. The entropy concept was first introduced for discrete distributions (called Shannon entropy), which is defined as

    $$H(X) = E …

Rule-base Image Process

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

If you face a relative simple image recognition problem which hasn't been studied by other people before so that no public data is available for it, it is probably less effort …

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.