Data analysis pipeline help

Hi! I’m a Java developer working on a sales data project that needs to process a few hundreds of invoices a day.

I develop a Java application to clean the item description. After that, I do some amounts checks to identify outliers and split the invoice by item in a bucket as a JSON file. This file will be used to aggregate by item description the mean and median in further amount checks.

I also save invoice metadata as a JSON to compare inbound with outbound quantities.

The challenges that I’m facing it’s the best practices to store and then process this data that I hope to keep growing over the years.

Any thots and suggestions are much appreciated, and if this kind of question shouldn’t be here, please delete the question.