Ben Chuanlong Du's Blog

It is never too late to learn.

Extracting Data from PDF Files

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Sometimes, a PDF file is corrupted or encrypted making it hard to extract data from it directly. In this case, you can convert a PDF page to an image first and …

Tips on Google Gemini CLI

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips & Traps

  1. Gemini CLI is extremely useful as it doesn't requires your favorite IDE to have integration with LLM tools. You can use whichever IDE you like. What you need is …

Tips on Large Language Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Data Sources

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Data Platforms

https://www.oxen.ai/

https://github.com/quiltdata/quilt

https://registry.opendata.aws/

https://www.google.com/publicdata/directory

https://proxycrawl.com

Data for Computer Vison

Data for NLP …

Tips on NotebookLM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

No, not directly in the way you might hope (i.e., pasting a URL and having it automatically fetch and parse the article content as a source). NotebookLM is designed to …