Parameter min_samples_split

Screen Link: Learn data science with Python and R projects

This screen is about using parameter min_samples_split when creating the DecisionTreeClassifier. I read the definition of min_samples_split but still couldn’t understand its definition. Can you give me more details about this parameter?

Thank you.

A decision tree consists of a series of nodes and there is a split at each node based on certain criteria.
min_samples_split is the minimum number required for a node to split into another node. If the number of samples in a node is less than the min_samples_split value, then there will be no further split done on the node and it becomes a leaf (last) node.

The essence of this is to curb overfitting, the lower the possibility of the model to learn all noise from the data and overfit.


https://julienbeaulieu.gitbook.io/wiki/sciences/machine-learning/decision-trees
This blog post can be helpful for further explanation

1 Like

Thank you so much for your help!