What is the way to select which machine learning model to solve the problem


I want to ask if you can give me more advice on how to select the best machine learning model for your problem.

There are so many different types so far i have learn like KNN, deep learning, linear regression, logistic regression. but when i come to have a problem need to do prediction, which one should i pick or usually we need to pick 2-3 model and try to run and test which one has the higher accuracy?

Please advice.


1 Like

@ashleychoy: my main experience lies in deep learning so I will address that and will leave the rest of the community members to address machine learning algorithms, which I have not really ventured into thus far.

Usually, you should follow the universal machine learning workflow:

I quote:

  • Define the problem : What data is available, and what are you trying to predict? Will you need to collect more data or hire people to manually label a dataset?
  • Identify a way to reliably measure success on your goal.
  • Prepare the validation process that you will use to evaluate your models. In particular, you should define a training set, a validation set, and a test set.
  • Vectorize the data by turning it into vectors and preprocessing it in a way that makes it more easily approachable by a neural network.
  • Develop a first model that beats a trivial common-sense baseline .
  • Gradually refine your model architecture by tuning Hyperparameter and adding regularization.
  • Be aware of validation-set overfitting when tuning hyperparameters (i.e. overspecialized to the validation set).

So based on what you want to predict: if its a binary/categorical classification of images of animals or food, for instance you would use a Convolutional Neural Network. If there is a small dataset, you may choose to do data augmentation. However, when dealing with sequence data like text or audio for NLP, you may instead choose to use a Recurrent Neural Network to predict, say the next word in a phrase, given the first few words.

Next is to find a way to measure success and often at times, this is the accuracy of the model, or sometimes the mean squared or mean absolute error.

Your inputs and weights into the neural network could be somewhat random at first and based on the loss score the model will adjust the weights and biases. You could perhaps plot a graph of validation/training accuracy/loss to determine at which epoch the model has overfitted and then you may need to then tune hyperparameters such as the activation function at each layer, number of epochs, adding droupout layers etc. Repeat these steps until you are satisfied that the model has reached the best possible accuracy.

Hope this helps!

1 Like

@masterryan.prof, thanks for your detail guidance.

So, i have two more questions now:

  • if i am using a time series data like a market price of the energy on every 30 minutes and i want to use the past 6 hours records to predict the coming 30 minute record, should i just pick linear regression?

and also i come to have a question that on the mean absolute percentage error i cannot be able to handle it when my actual data have 0 values, can you please advice whether i should turn to use sMAPE or MAAPE?

Please advice.


1 Like

HI @ashleychoy: I’m not very familar with using specific machine learning algorithms thus I am not in the best position to answer you at this point. Perhaps @Rucha could offer some additional advice.


1 Like

hey @masterryan.prof and @ashleychoy

I am a student too at DQ and come second :trophy: in race with the snail studying along with me!

I have yet to start the ML part in my DS track. So all I did was google “how to choose best ml algo”, and got these 3 top posts.

  1. (https://towardsdatascience.com/do-you-know-how-to-choose-the-right-machine-learning-algorithm-among-7-different-types-295d0b0c7f60)

  2. (https://docs.microsoft.com/en-us/azure/machine-learning/how-to-select-algorithms)

  3. (https://blog.statsbot.co/machine-learning-algorithms-183cc73197c)

There are many others, and they may not directly help you. They might just give you further to research on.

Apologies I am of no help here :frowning:

1 Like

Thanks a lot for all of your help.

Let me try to play around and understand how time series data can do the predictive modelling first :slight_smile:

1 Like