Incremental Learning in Production Environment

How does one create an incremental learning model that is both serving predictions and getting incremental updates concurrently? It feels like a distributed database kind of question with eventually consistency.

My goal is to have a webapp that acts like a contextual bandit. In this case it will need to respond to requests for which bandit to show as well as update from a queue when the reward is realized. In most situations I’ve seen the reward immediately known where you can handle this in a single transaction, but in my use case I want to serve a recommendation and then have the reward be a purchase which may or may not be known for minutes to hours later.

In addition, I am not sure when a single worker in the web application is updating / serving how all the other workers are using the same “copy” of the model without re-deploying the web application

Lots of questions here - let me know what you think!

1 Like


Seems like a Reinforcement Learning problem? I havent really dived into RL myself but I’m interested in it so maybe someone else would be able to better assist you. Or did you already find the answer to your question somewhere else? Curious to know!