Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
General Tips
- By default, internet access from a Kaggle notebook/kernel is turned off. You have to manually turn it on from the right-side panel in order to visit access internet.
Tips for Competition on Kaggle
Choose the Right Competitions
If you are a beginner, tackle the "Getting Started" competitions. You can try "Research" and "Feature" competitions when you become experienced.
Set incremental goals
Review most voted kernels
Ask questions on the forums
Keep a logbook or use a model versioning tool
It is a good to keep a logbook or use a model versioning tool since you will typically have many iterations of models. If you like a simple tool, use a spreadsheet or markdown doc. Otherwise, find a model versioning tool.
-
Establish a single baseline model to compare all future changes to.
-
Come up with a bunch of tweaks you want to try and run modified versions of the baseline for each tweak independently rather than in a cumulative fashion.
-
Maintain the same (and smallest) CNN Architecture for as long as possible as it will make iteration quicker and with some look many of the hyper-parameters should transfer decently to larger more complex models.
Get More Data
If you are doing a image, voice or text related competition, do some research before you start coding and see if a similar competition has been run before or if there are any databases of similar labelled training sets you can use. More data is never really harmful to your model (assuming the quality of labelling is decent), so get as much of it as you can, but just don't forget to keep your validation and test sets from the original dataset provided to you or you may end up with a train-test mismatch.
Leveraging Existing Kernels
It is always good practice to learn from others and leverage existing work.
Note: If a kernel suggests a bunch of techniques to use for your model you should check if they state the resultant performance gains, otherwise be skeptical and conduct tests yourself before blindly incorporating them into your own models :)
Preprocessing Data
Images
Cropping & Other Augmentations
Transfer Learning
Start from Easy/Simple/Small Models
Download Datasets from Kaggle Using Kaggle API
-
Install the Python package kaggle.
-
Generate a token file
kaggle.json
and place it into your directory$HOME/.kaggle
. www.kaggle.com -> Your Account -> Create New API token. -
Make sure that
$HOME/.kaggle/kaggle.json
is readable only by you.chmod 600 $HOME/.kaggle/kaggle.json
-
Search for datasets on Kaggle using the following command.
kaggle datasets list -s [keywords]
-
Download a dataset using the command below.
kaggle datasets download user/dataset
For more details, please refer to Kaggle Public API and Easy way to use Kaggle datasets in Google Colab .
Useful Datasets for Learning
There are lots of machine learning ready datasets available to use for fun or practice on Kaggle's Public Datasets platform. Here is a short list of some of our favorites that we've already had the chance to review. They're all (mostly) cleaned and ready for analysis!
Binary Classification
- Indian Liver Patient Records
- Synthetic Financial Data for Fraud Detection
- Business and Industry Reports
- Can You Predict Product Backorders?
- Exoplanet Hunting in Deep Space
- Adult Census Income
Multiclass Classification
Regression
NLP
- The Enron Email Dataset
- Ubuntu Dialogue Corpus
- Old Newspapers: A cleaned subset of HC Corpora newspapers
- Speech Accent Archive
- Blog Authorship Corpus
Time Series Analysis
Image Processing
Mapping and Prediction
- Seattle Police Department 911 Incident Response
- Baltimore 911 Calls
- Crimes in Chicago
- Philadelphia Crime Data
- London Crime
Large Datasets
Misc
The Beginner’s Guide to Kaggle
https://www.kaggle.com/getting-started/44919
https://towardsdatascience.com/how-to-improve-your-kaggle-competition-leaderboard-ranking-bcd16643eddf
https://www.kdnuggets.com/2016/11/rank-ten-precent-first-kaggle-competition.html
https://towardsdatascience.com/how-to-improve-your-kaggle-competition-leaderboard-ranking-bcd16643eddf
https://towardsdatascience.com/how-i-got-in-the-top-1-on-kaggle-79ddd7c07f1c