Looking to get a definition of 'Data Leakage'

Hi there, I have been working through the guided project for preditcting house prices and the instructions advise me to ‘remove any columns that leak information about the sale’ and provides the year of sale as an example of leaky information.

I was wondering if anyone could provide some advise as to what constitutes a variable that leaks information as I have not encountered this term in any missions leading up to this.

Mission link: https://app.dataquest.io/m/240/guided-project%3A-predicting-house-sale-prices/2/feature-engineering

Thankyou very much,

1 Like

hi @nick.creed98

I haven’t completed this project yet, so these links are part of my search on this topic. Perhaps they are helpful to you too: