Ben Chuanlong Du's Blog

It is never too late to learn.

Cross Validation in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Training and Testing Data Set

  • good when you have large amount of data

  • usually use 1/5 to 1/3 of the data as testing data set.

K-fold CV

  • suitable when you have medium number of data

  • K=10 is popular

  • computationally extensive

Leave-k-out CV

  • Use this way only when you have very limited number of data.

  • Leave-1-out is a specially case of the K-fold CV.

  • computationally very extensive

Some Rules:

  • 10 times number of parameters, probably in good shape

  • 20 times number of prameters, usually perfect good

Comments