How I finally solved ABN's data challenge
In the previous article, I discussed the challenges of obtaining processable bank statements from ABN-AMRO for data older than 18 months. One of the main obstacles was that the scanned PDFs of the bank statements did not have a clear table structure, making it difficult for automated methods to accurately extract data. However, I’m excited to share that I’ve found a solution to overcome this challenge.
To start, I converted the scanned PDFs of bank statements into images for further processing. However, the images were not recognized as tables by automated methods due to the layout inconsistencies and lack of structure in the statements. To address this, I developed a script to enhance the images and add lines to them to be a more readable, table-like format. This involved applying image processing techniques to improve the clarity and layout of the statements, making them more amenable to automated data extraction.
With the enhanced images, I was able to use Amazon Web Services to extract data from the images and organize it in a structured table format. I then used the pandas library for data manipulation and cleaning of the extracted data. This involved handling inconsistencies in the data and ensuring that the extracted information was accurate and reliable.
Additionally I needed to upload the pandas dataframs to google for amending and download them again for further processing.
Despite the additional effort required to enhance the images and make them more table-like, I am pleased with the solution I developed. If you’re facing a similar issue with obtaining bank statements in a processable format, don’t hesitate to reach out to me for assistance in overcoming this challenge and preparing your data for your accounting software.