screens: Data Cleaning Walkthrough: Analyzing and Visualizing the Data- Screen 5 to Screen 7.
sat_score column in combined has the values 0 (zero) for me has the in some of the rows, therefore when plotting the scatterplot, there are some values at the bottom that are messing with plot and consequently with the the rest of the exercises.
How can I remove to
0 values so that I can have a clean scatter plot?
Everything stemmed from this functions here where we converted the values to integers using
pd.to_numeric() with the argument
sat_columns= ["SAT Math Avg. Score", "SAT Critical Reading Avg. Score", "SAT Writing Avg. Score"] # converting the values to integers using pd.to_numeric() for i in sat_columns: data["sat_results"][i] = pd.to_numeric(data["sat_results"][i], errors= 'coerce') data["sat_results"]["sat_score"] = data["sat_results"][sat_columns].sum(axis=1) data["sat_results"]["sat_score"]
Is the data set wrong? Anyone can give me a way to filter it?
Also, when choosing the school names from that had a
sat_score lower than 1000, I got a lot of schools with the
0. Is this correct?
I proceeded to filter them out using
low_enrollment = low_enrollment[low_enrollment['sat_score'] > 0] so I can get the schools that actually have an SAT score. Is method of filtering out the schools with the 0 correct if it were to be a real-life situation?
I know there are a lot of questions and I will highly appreciate your help and for taking your time to read and answer the questions.
Thank you in advance!