I have an event booking database as above (120,000 rows) with actual utilization (minimum 0, maximum 1), average event price, household income, date, and hour block. I am trying to understand at what price I can maximize the utilization in each hour blocks on different days. What would be the appropriate machine learning algorithm for this? I did random forest regression but my accuracy came only 43%. Is there any different algorithm I can use? Any suggestion for the analysis? Thank you for your help!
With the trained model, will you be predicting price by setting actual_utilization to 1?
Assuming there is indeed a peak for a graph of utilization (y-axis) vs price (x-axis), i may want to find the upward sloping part of the graph, and the downward sloping part, to estimate the location of the peak.
Because there are so many categoricals (hour-blocks, day) and to-be categoricals (binned household_income), there could be an explosion of combinations and little data in each partition if you treat every group separately and use partition-based models like random forest, so maybe you could find some correlations between categoricals (such as one column explaining another) and do some feature engineering/selection to shrink the number of groups (such as creating categoricals from information in multiple columns eg. binned household income+binned actual utilization) so there are more data members in each group to help with the estimation of the peak.
I can’t say any model where you can throw this data in and get a good fit. Have you tried neural networks?
Thank you for the suggestion. I am not really good in neural networks, but I will give a try. Thanks!