This is an Independent Project to match the guided project from the course “Machine Learning Fundamentals” in Python for Data Science track’s KNN (K-Nearest Neighbors) practice.
My project seeks to predict the age of abalone. Abalone are large marine snails which can live up to 40 years. They have beautiful, shiny white shells which Native Americans of North America widely used as money along the Pacific Coast, especially in California. They grow as large as a small dinner plate. Here is a picture so you can see how cool they are. I always look for them when I go to the beach, although they are rare to find:
I used a dataset on predicting the age of abalone from UC Irvine’s Machine Learning repository. Attached below is the link to the dataset. Scientists can tell the age of the abalone in years by counting the number of rings on the shell and adding 1.5. In my project, I use the number of rings as the target variable and the features of the abalone such as weight, height, and diameter as the features.
I have a specific question on my project and need your help:
I could easily determine the right combination of features to reduce the RMSE. However, once I switched to finding the best number for k, my RMSE jumped high for any number of k. What does this mean and how should I interpret this in my results?
Thank you in advance for your time and help. It will greatly enhance my education!
abalone.csv (187.5 KB)