Hi,
Just completed the first project. I decided to aggregate about half of the brands into an “other” brand so that they could be included in the analysis. This involved selecting brands that appeared in less than 2% of the overall dataset, then building a loop that looked for the brand in the dataframe and replaced the actual brand with “Other”.
Is there a more efficient way to do this? And on second thought, I probably shouldn’t have over-written the column, but instead created a second column “brand2” so I didn’t destroy the data.
GuidedProjectOne.ipynb (250.5 KB)
Click here to view the jupyter notebook file in a new tab