I have 170 different models of ~5 Mb each. The individual memory consumption of each model is roughly around 50 Mb (± 10 Mb). It takes 1-2 minutes to load 1 model. The problem at hand is a multi label prediction task. So, i ensured all the models are loaded before inference, as this would significantly reduce my inference time. The model loads on my 8gb ram machine, which occupies nearly all of my memory.
When the same is dockerized and deployed on a 16gb vm. The model consumes nearly 16gb of memory.