Parameter "Max_Features" is a Valid Hyper-parameter for RandomForestRegressor but Causes Error when Passed to RandomSearchCV

Screen Link:

During trying this guided project, I was forced to avoid adding max_features as a parameter that would be optimized by the randomized search estimator. The parameter max_features is listed as a parameter of RandomForestRegressor in the SKLearn Documentation. It is also listed when calling the get_params method, a method that is supposed to list all “optimizable” parameters of a model. For some reason, whether or not the RandomForestRegressor model I tried to optimize had bootstrap set to True, adding a dictionary item with max_features as a key caused my code to return the error:

ValueError: Invalid parameter max_samples for estimator RandomForestRegressor. Check the list of available parameters with `estimator.get_params().keys()`.

Here is the entire code set I used attempting to run Random Search Optimization on a RandomForestRegressor instance. It is a python notebook translated into a simple Python file.

# ### Conclusions About Results of Decision Tree Regressor
# The Hyper-parameters of the randomized search stuck on either minimum or maximum will have to be explored. The range of values will be correspondingly increased or decreased. This is to avoid any case where the hyper-parameter score has stopped increasing because it reached the limit of values but must still increase or decrease to get a higher accuracy score.

# ## Modeling Using a Random Forest Regressor

# In[102]:

from sklearn.ensemble import RandomForestRegressor

# In[103]:

#For reference, the previous hyper-parameter value ranges are copied.
parameter_values = {
    "max_depth": range(1, 25),
    "max_features": np.linspace(start = 0.10, stop = 1.0, num = 10),
    "min_samples_leaf": np.linspace(start = 0.05, stop = 0.50, num = 10 ),
    "min_samples_split": np.linspace(start = 0.05, stop = 0.50, num = 10 ), 
    #"max_samples": [n for n in range(500, len(df_rentals), 100)]
    "max_samples": np.linspace(start = 0.20, stop = 1.0, num = 10)

mdl_random_forest_optimized = RandomForestRegressor(bootstrap = True)

#The best scoring hyper-parameter settings from the simple decision tree:

# In[104]:

#Add item for hyper-parameter choices:
for param, val_array in parameter_values.items():
    print(param, val_array)

cv_randomized_search_random_forest = RandomizedSearchCV(
    estimator = mdl_random_forest_optimized,
    param_distributions = parameter_values,
    n_iter = 64,
    n_jobs = -1,
    cv = 5

# In[105]: = df_rentals.drop(columns = ["cnt"]), y = df_rentals["cnt"])

# In[ ]:


# In[ ]:

mdl_random_forest_reg_optimized = cv_randomized_search_random_forest.best_estimator_

#  ### Cross-Validation on Random Forest Regressor With Best Hyper-Parameters

# In[ ]:

ndar_crossval_scores = cross_val_score(
    estimator = mdl_random_forest_reg_optimized,
    X = df_rentals.drop(columns = ["cnt"]),
    y = df_rentals["cnt"],
    n_jobs = -1,
    scoring = "neg_mean_squared_error",
    cv = 5

# In[ ]:


# In[ ]:


# In[ ]:

rmse = np.mean( np.sqrt(np.abs(ndar_crossval_scores)) )

The issue is that you are checking the documentation for the latest stable release of sklearn.

The documentation you linked is for the 0.23 version. But DataQuest uses 0.18 version.

So, you will have to check out the documentation for the latter. The left side of the page states Other versions, and from there you can access the correct documentation.

Here is the direct link -

max_samples is not present in version 0.18. max_samples controls the sub-sample size, and for 0.18 it’s stated that -

The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).