Parameter min_samples_split

Screen Link: Learn data science with Python and R projects

This screen is about using parameter min_samples_split when creating the DecisionTreeClassifier. I read the definition of min_samples_split but still couldn’t understand its definition. Can you give me more details about this parameter?

Thank you.

A decision tree consists of a series of nodes and there is a split at each node based on certain criteria.
min_samples_split is the minimum number required for a node to split into another node. If the number of samples in a node is less than the min_samples_split value, then there will be no further split done on the node and it becomes a leaf (last) node.

The essence of this is to curb overfitting, the lower the possibility of the model to learn all noise from the data and overfit.
This blog post can be helpful for further explanation

1 Like

Thank you so much for your help!