Hi guys,
In the Predicting House Sale Prices Guided Project, we drop columns such as “Mo Sold”, “Sale Condition”, “Sale Type”, “Yr Sold” as they consist of data regarding the actual sale and hence ‘leak data’.
However, in the suggested solution, why is it that the two created features “Years before sale”" and “Years Since Remod” are not considered leaky data, since they made use of “Yr Sold” and wouldn’t be available on new data.
years_sold = df[‘Yr Sold’] - df[‘Year Built’]
years_since_remod = df[‘Yr Sold’] - df[‘Year Remod/Add’]
df[‘Years Before Sale’] = years_sold
df[‘Years Since Remod’] = years_since_remod