I just finished this project, and I have to say great job! You’ve cleaned it up really well and explained everything nicely. I thought the extra analysis on the difference between categories of the 2 datasets was a great addition.
In the code below, there is an issue that would occur if there were multiple rows that had to be deleted.
for i in range(len(google)):
if len(google[i])!=len(google_header): # Check if length of each entry does not coincide with length of the header
print('Row ',i,' contains errors.')
errors_g.append(i) # In case of error saves the row number in list (errors_g)
for e in errors_g: # Loop over list (errors_g)
del google[e] # and delete rows containing failures from Google Play Market dataset
print('Row ',e,' deleted')
When iterating over the error list, the indexes will change after each deleted row. This would cause rows with correct data to be deleted and rows with incorrect data to stay. This is demonstrated using the code below.
index_del_test = [0, 1, 2, 3, 4 ,5] # create list with index matching number
to_del = [1, 4] # create list for iterating to delete indexes
for num in to_del:
print('Deleting index ' + str(num)) # shows the index to delete, should match number below
print('Index ' + str(num) + '\'s value is ' + str(index_del_test[num])) # shows the value of the index, should match number above
The output of the code is then
Deleting index 1
Index 1’s value is 1
Deleting index 4
Index 4’s value is 5
This did not affect the project since there was only one row to delete, but something to be aware of.
Overall, this looked fantastic, and is motivating me to improve my project now! Keep on learning and you’ll do amazing!