Extracting Data from PDF Files

Nov 15, 2024

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Sometimes, a PDF file is corrupted or encrypted making it hard to extract data from it directly. In this case, you can convert a PDF page to an image first and …

Tips on Google Gemini CLI

Jul 06, 2025

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips & Traps

Gemini CLI is extremely useful as it doesn't requires your favorite IDE to have integration with LLM tools. You can use whichever IDE you like. What you need is …

Tips on Large Language Models

May 06, 2023

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips on Gemini

May 06, 2025

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Gemini API Cookbook

Data Sources

Feb 27, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Data Platforms

https://www.oxen.ai/

https://github.com/quiltdata/quilt

https://registry.opendata.aws/

https://www.google.com/publicdata/directory

https://proxycrawl.com

Data for Computer Vison

Data for NLP …

Tips on NotebookLM

May 31, 2025

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

No, not directly in the way you might hope (i.e., pasting a URL and having it automatically fetch and parse the article content as a source). NotebookLM is designed to …

← Older

Ben Chuanlong Du's Blog

It is never too late to learn.

Extracting Data from PDF Files

Tips on Google Gemini CLI

Tips & Traps

Tips on Large Language Models

Tips on Gemini

Data Sources

Data Platforms

Data for Computer Vison

Data for NLP …

Tips on NotebookLM