Ben Chuanlong Du's Blog

It is never too late to learn.

Data Engineering Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://github.com/linkedin/datahub

https://engineering.linkedin.com/blog/2019/data-hub DataHub: A generalized metadata search & discovery tool

GPU Related Issues and Solutions

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips

  1. Training a model requires significantly more CPU/GPU memories than running inference using the model.

  2. torch.cuda.empty_cache() doesn't help if memory is not enough

  3. It is suggested that you …

Tips on Scikit-Learn

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Cross validation in scikit-learn supports pipeline in addition to vanilla models. Please refer to Cross Validation Pipeline for more details.

  2. Label encoding is an easy way to convert a categorical response …

Common Issues in PyTorch

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

This means that the input data and the model are on different …