List and Loops Question

Hi everyone.

Can somebody explain to me this further? i’m working on 10. Average app rating and I’m confused about the loop.

The hint says this:

You can easily get the number of ratings by using len(apps_data) . But the first row of apps_data contains the column names, so we’ll need to exclude it — this means that we need to use either len(apps_data[1:]) , or len(apps_data) - 1 .

How come I have to exclude the first row? I’m having difficulty picturing the data set here. While I do understand most of the looping code, I don’t get why you have to take the length of the rows here minus the column names. Isn’t it already exluded?

Hi @scchoi31, welcome to the community!

The reason we need to exclude the header row for the number of rows is because in len(apps_data) we’re including all the rows – including the header. When we created the loop starting with for row in apps_data[1:]:, we gave the command to loop through the entire dataset except for the first row (the header row). We haven’t deleted the first row, though – it’s still there in apps_data! We just excluded it from the loop so that we could get the ratings from each row and add them together. When we want to find the number of rows with len() so we can calculate the average, we have to separately exclude the first row using apps_data[1:] again, or alternatively, take the length of the whole dataset and subtract 1 to account for that header row.

If you want, just for fun and to make sure you understand how it works, you can actually use your loop to count all the rows. We create another variable num_rows outside the loop, and add 1 to it in each iteration of the loop.

rating_sum = 0
num_rows = 0
for row in apps_data[1:]:
    rating = float(row[7])
    rating_sum = rating_sum + rating
    num_rows += 1

When the loop is done, compare num_rows with what you get with len(apps_data). In the loop, we excluded the header row from our counting of the rows, but with len(apps_data) we included the header row, so we’re off by 1.

I hope that helps!

1 Like

Hi @scchoi31

The answer is yes
Before the loop apps_data is defined by

apps_data = apps_data[1:]

this mean that you have excluded the first row and to get the number of ratings you can just use len(apps_data).
I hope that helps!

@bahmed21 That’s one way to get around the issue that was talked about in the lesson. Thanks for bringing that up, because that was a situation I didn’t think about when answering the question! I don’t think the mission exercise intended us to redefine apps_data without the header row though, which is why the hint talks about using len(apps_data[1:]).

In practice it’s probably better to not completely eliminate the header row because it contains the information about what each index item means. If you decide to go that route, you’ll want to make sure to do something like apps_data_header = apps_data[0] before eliminating that row so that you preserve that information.

@april.g

thank you april for the explanation.

i have a question in regards to loops because it seems kind of like a big deal in terms of learning the basic concepts.

when we look at the For Loops (section 9) how come we need to add the variable rating_sum = 0. Is this always the case for loops or just in this example?

Happy to help! In this particular mission, our end goal is to find out the average app rating, and to do that, we need to add up all the ratings and then divide by the number of ratings. Our example only had 5 rows so it’s easy to do by hand, but later on we’ll be working with thousands of rows – and this is where loops can come in handy!

Our loop cycles through each row and gets the rating from the row. However, it doesn’t hold all the ratings for the previous rows though. We need the rating_sum variable outside the loop ready to go so that we can keep a running tally. If we try to add within the loop, the information is lost on the next iteration. My general rule of thumb: if I need to keep something, have a variable outside the loop I can write to.

In other missions, you’ll use loops to make lists and dictionaries, where you’ll initialize a list or dictionary variable and append values to them from the loop.

Here is a good exercise so you can see what is happening with your loop: make use of print(). I use the print function a lot when I’m trying to understand what’s going on with a bit of code or troubleshooting. Put the line print(rating_sum) in your loop after the line where you add the rating. Every time your loop runs, it will print the current status of the rating_sum variable!

I hope that helps you out. Happy coding!

@april.g

I see what you mean. The concept is there it’s just when I try to picture it all going through the code I get a little bit intimidated. Is this normal for new python users?

Also, in previous lessons we set row_1, row_2, row_3, etc etc as each variable. That I get it. (IE: I understand setting List of Lists but where was that done in this example?) In the section of 7. Opening a File it said this would be covered in future courses as output so I did not dig more into it.

But in this example, we used row[7] as the index number for rating of the app. How did we get to this conclusion? Was it preset in the beginning of the code during the open, from/import, reader, and list as an output?

Thank you! Final question before I move on to next section.

I think it’s normal! It takes a while to train your brain to think in another language. :grin: I just started learning Python in August and I feel more comfortable with it now than I did then. What’s really great about learning by doing is that after a while things start to stick better as time goes on.

The setting of lists of lists was started on screen 6 of the lists and loops mission, where you had 5 lists that you combined into 1.

If you go to screen 7 on opening a file, it shows you a section of the dataset that you’re working with. It has 7,197 rows and 16 columns. On that screen we’re opening up the file and having the functions turn that table of data into a list of lists, where each row is a list. If you count (starting at 0) from the id column up to 7, you’ll end up at the 6th column which is the user_rating column. So in your loop, you’re going through each row and getting the 7th index of the list row[7].

Usually in the missions, when Dataquest introduces a dataset, they’ll show it in tabular form like they did on screen 7. What helped me out when I was getting started was taking a screenshot of that page so I could refer to it later.