Ben Chuanlong Du's Blog

It is never too late to learn.

Extracting Data from PDF Files

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Sometimes, a PDF file is corrupted or encrypted making it hard to extract data from it directly. In this case, you can convert a PDF page to an image first and then use AI tools (e.g., Table Image to CSV Converter) to extract data from it.

AI-powered Tools

Python Libraries

  • pdf2text

  • pdfplumber

References

Comments