So i am doing the remove duplicates part of the google play store app project, and wondered if there is anything inherently wrong with my code. I created my solution before following the DQ recommendation
In order to find duplicates dataquest suggests to us the below code, however i used a different method to capture the index of the duplicate rows , which i can then delete later.
can someone have a look and tell me if there is anything wrong with what i did ?
My Code
duplicate_rows = []
unique_rows = []
unique_dict = {}
rows_for_deletion = []
for row in google_data:
app_name = row[0]
no_reviews = row[2]
row_index = google_data.index(row)
if app_name not in unique_dict:
unique_dict[app_name] = [no_reviews, row_index]
else:
if unique_dict[app_name][0] < no_reviews:
rows_for_deletion.append(unique_dict[app_name][1])
unique_dict[app_name] = [no_reviews, row_index]
else:
rows_for_deletion.append(row_index)
clean_data = google_data
for row in rows_for_deletion:
del clean_data[row]
DQ Code
reviews_max = {}
for row in google_data:
name = row[0]
n_reviews = float(row[3])
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name] = n_reviews
elif name not in reviews_max:
reviews_max[name] = n_reviews
print(len(reviews_max))
google_clean = []
already_added = []
for row in google_data:
name = row[0]
n_reviews = float(row[3])
if (reviews_max[name] == n_reviews) and (name not in already_added):
google_clean.append(row)
already_added.append(name)
print(len(google_clean))
What I expected to happen:
What actually happened:
Replace this line with the output/error