Column that leaks data about final sale isn't dropped in the solution, at 240-4

Hello

In this mission: https://app.dataquest.io/m/240/guided-project%3A-predicting-house-sale-prices/4/train-and-test

I’ve noticed that though it is mentioned at some point in the solution that we must drop columns that
" leak data about the final sale, read more about columns here"

and in fact at that point they are dropped

In [14]:

Drop columns that aren’t useful for ML

df = df.drop([“PID”, “Order”], axis=1)

Drop columns that leak info about the final sale

df = df.drop([“Mo Sold”, “Sale Condition”, “Sale Type”, “Yr Sold”], axis=1)

In the final cell where all the functions are updated
In [23]:

def transform_features(df):
num_missing = df.isnull().sum()
drop_missing_cols = num_missing[(num_missing > len(df)/20)].sort_values()
df = df.drop(drop_missing_cols.index, axis=1)

text_mv_counts = df.select_dtypes(include=['object']).isnull().sum().sort_values(ascending=False)
drop_missing_cols_2 = text_mv_counts[text_mv_counts > 0]
df = df.drop(drop_missing_cols_2.index, axis=1)

num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()
fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()
replacement_values_dict = df[fixable_numeric_cols.index].mode().to_dict(orient='records')[0]
df = df.fillna(replacement_values_dict)

years_sold = df['Yr Sold'] - df['Year Built']
years_since_remod = df['Yr Sold'] - df['Year Remod/Add']
df['Years Before Sale'] = years_sold
df['Years Since Remod'] = years_since_remod
df = df.drop([1702, 2180, 2181], axis=0)

df = df.drop(["PID", "Order", "Mo Sold", "Sale Condition", "Sale Type", "Year Built", "Year Remod/Add"], axis=1)

The column 'Yr Sold ’ isn’t dropped.

Is there a reason for that or it’s just a bug?

1 Like