Joined data subsets not returning the correct number of rows when function is applied

Screenshots have been uploaded.

I have filtered my dataset by grade level and inner-joined the Mathematics and ELA exam results per grade level on the ‘DBN’ column. As can be seen from the ‘initial_joins’ screenshot, this worked as intended; the number of printed rows is displayed below the join code. However, when I apply a function to these newly created joined subsets (‘applied_function’ screenshot) and print each subset with the function applied, it no longer shows the correct number of rows for each subset. Rather, it shows the Grade 3 result for each grade level (749 rows - ‘function_applied’ screenshot).

Any insight would be appreciated. Thanks.



You did well in attempting to not repeat any labor. You kind of did both though :slight_smile:

I see where you were going with it. You don’t absolutely need the function but I see why you would want to do it. The functions are more for re-usability than anything. If you wanted to go with what you had here you would do as below.
Notice I wrote in the “inplace” parameter to the methods we called for the df object. This just glues it back to “overwrite” the original df. If you check now, by running a grade_3_combined.shape[0], it should have the updated values you were looking for.

Try this:

combined_same_grade = [grade_3_combined, grade_4_combined ... grade_n_combined]

for df in combined_same_grade:
      df.drop(df.columns[[2,4,5,6]], axis=1, inplace=True)
      df.rename(columns={"School_Name_x":"School_Name"}, inplace=True)
      print(f'{df}', df.shape[0])

Curious if this helps you at all.

This helped. Thank you!

1 Like