Screen Link:
https://app.dataquest.io/m/290/boolean-indexing-with-numpy/6/assigning-values-in-ndarrays
In the question 3 of this mission, we are requested to use the mean value of a column to replace the incorrect data.
The provided solution uses the mean that is calculated using the incorrect data.
Wouldn’t it be better to calculate a mean without using the incorrect data ?
I realise the difference between the two means is very small here but I was wondering what is the best approach in the real world. Is it worth the trouble of adding some extra code to compute a mean without the incorrect data ? Or the mean provided in the answer is good enough ?
Question 3:
The values at column index 7
(trip_distance) of rows index 1800
and 1801
are incorrect. Use assignment to change these values in the taxi_modified
ndarray to the mean value for that column.
Provided answer:
taxi_modified[1800:1802,7] = taxi_modified[:,7].mean()
In addition I wrote the below code to compute the mean without the two incorrect data. However, I would like to know what would be a more elegant way of doing it.
mean = 0
for item in taxi_modified[:1800,7]:
mean += item
for item in taxi_modified[1802:,7]:
mean += item
mean = mean / (taxi_modified.shape[0] - 2)
print(mean)
Thank you for reading !