2011 NYC School Survey Data Dictionary Mismatch

Screen Link: https://app.dataquest.io/m/136/data-cleaning-walkthrough/9/cleaning-up-the-surveys

I noticed that in Step 2 Course 6 Mission 1: Data Cleaning Walkthrough, 9. Cleaning Up the Surveys, the lesson refers to the Data Dictionary for the 2011 NYC School Survey. The lesson states that the relevant columns from the dictionary are:

["dbn", "rr_s", "rr_t", "rr_p", "N_s", "N_t", "N_p", "saf_p_11", "com_p_11", "eng_p_11", "aca_p_11", "saf_t_11", "com_t_11", "eng_t_11", "aca_t_11", "saf_s_11", "com_s_11", "eng_s_11", "aca_s_11", "saf_tot_11", "com_tot_11", "eng_tot_11", "aca_tot_11"]

However, if you look closely at the description of these columns in the Data Dictionary, you’ll notice that it has a number 10 at the end of the entries where the number 11 appears in the list above.

The headers in the dataframe used in the lesson are consistent with 11 not 10, so it seems that whatever Dataquest did to curate the original data set caused this discrepancy with the original data set.

The lesson still works, but I thought I’d bring it to your attention in case you tried comparing the original and Dataquest data sets.

2 Likes