I’ve been working on a project that involves extracting data from a PDF document and transforming and loading it into a more useful format. It involves the use of the PyPDF2 library and extensive use of regular expressions.
The project is essentially complete, but with plenty of room for improvement. I would welcome any comments, suggestions, or questions.
I’m posting the Jupyter Notebook here. I’ve also posted all the files for the project on a GitHub repository.
First of all, sincere apologies for this delayed response. I had come to your project the same week you had uploaded it. Somehow I couldn’t respond back to this and thereafter I lost it. I searched for it using “PDF” as a keyword but got other posts
Thank you for sharing the project I hope I will be able to break the Regex patterns you have devised, for a better understanding for myself.