Ben Chuanlong Du's Blog

It is never too late to learn.

Double Dipping in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Double dipping is a term for overfitting a model through both building and evaluating the model on the same data-set, yielding inappropriately high statistical significance and circular logic.

References

Preparing Data for AI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. When you label individual images, it is better to use numerical labels (even though text labels are easier to understand) so that you can avoid mapping between numbers (use …

Entropy

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Entropy
  2. Shannon Entropy
  3. Cross Entropy
  4. K-L divergence

Tips

  1. The entropy concept was first introduced for discrete distributions (called Shannon entropy), which is defined as

    $$H(X) = E …

Rule-base Image Process

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

If you face a relative simple image recognition problem which hasn't been studied by other people before so that no public data is available for it, it is probably less effort …

Handling Categorical Variables in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Categorical variables are very common in a machine learning project. On a high level, there are two ways to handle a categorical variable.

  1. Drop a categorical variable if a categorical variable …