Hi Guys,
There is one step in the Guided Project: ‘Predicting House Sale Prices’, asking us to drop columns that “leak data about the final sale”. The target feature is the house price.
The answer was to drop “Mo Sold”, “Sale Condition”, “Sale Type”, “Yr Sold”, those four columns.
Below is the info for those four features.
The part that I don’t understand is how do those four features leak data about the final sale, why can’t we just change them to categorical data and dummy them? Thanks, really appreciate
Mo Sold: Month Sold (MM)
Sale Condition: Condition of sale
Normal Normal Sale
Abnorml Abnormal Sale - trade, foreclosure, short sale
AdjLand Adjoining Land Purchase
Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit
Family Sale between family members
Partial Home was not completed when last assessed (associated with New Homes)
Sale Type: Type of sale
WD Warranty Deed - Conventional
CWD Warranty Deed - Cash
VWD Warranty Deed - VA Loan
New Home just constructed and sold
COD Court Officer Deed/Estate
Con Contract 15% Down payment regular terms
ConLw Contract Low Down payment and low interest
ConLI Contract Low Interest
ConLD Contract Low Down
Oth Other
Yr Sold: Year Sold (YYYY)