Split a Dataset into Train and Test Datasets in Python
Scikit-learn Compatible Packages¶
sklearn.model_selection.train_test_split
is the best way to split a dataset into train and test subset
for scikit-learn compatible packages (scikit-learn, XGBoost, LightGBM, etc.).
It supports splitting both iterable objects (numpy array, list, pandas Series) and pandas DataFrames.
When splitting an iterable object,
it returns (train, test) where train and test are lists.
When splitting a pandas DataFrame,
it returns (train, test)
Resizing and Padding for Image Recognition
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
- The best way to deal with different sized images is to downscale them to match dimensions from the smallest image available.
References
Convolutional Neural Networks
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
CS231n: Convolutional Neural Networks for Visual Recognition is a great introduction of CNN.
References
http://vision.stanford.edu/teaching/cs231n/
Tips on Darknet and Yolo
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
https://pjreddie.com/darknet/tiny-darknet/
https://pjreddie.com/darknet/
Build Spark from Source
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
You can download prebuilt binary Spark at https://spark.apache.org/downloads.html. This is where you should get started and it will likely satisfy your need most of the time …