Python Data Analysis Basics Practice Problems: Analyzing Game Sales 6 — Need Help for the Answer

Screen Link: https://app.dataquest.io/m/1000331/python-data-analysis-basics-practice-problems/7/analyzing-game-sales-6

DQ Code (answer):

import csv
# We solved the problem by using functions to organize our code
# but this was not a requirement, you can solve in another way.

def read_games(filename):
    """
    Auxiliary function to read a CSV
    """
    with open(filename) as f:
        reader = csv.reader(f)
        rows = list(reader)
    return rows[1:]

def group_games_by_col(games, col_index):
    """
    Given a column index, creates a dictionary
    where the keys are the values on that column
    and each value is the list of all games with
    the same value on that column.
    """
    games_per_col = {}
    for game in games:
        key = game[col_index]
        if not game[col_index] in games_per_col:
            games_per_col[key] = []
        games_per_col[key].append(game)
    return games_per_col

def compute_publisher_sales_per_zone(games):
    """
    Compute the sales of each publisher by zone.
    """
    games_per_publisher = group_games_by_col(games, 4)
    publisher_sales_by_zone = {}
    for publisher in games_per_publisher:
        publisher_sales_by_zone[publisher] = {zone: 0 for zone in zones_index}
        for games in games_per_publisher[publisher]:
            for zone in zones_index:
                zone_sales = games[zones_index[zone]]
                publisher_sales_by_zone[publisher][zone] += float(zone_sales)
    return publisher_sales_by_zone

zones_index = {
    'NA_Sales': 5,
    'EU_Sales': 6,
    'JP_Sales': 7,
    'Other_Sales': 8
}

games = read_games('game_sales.csv')
publisher_sales_by_zone = compute_publisher_sales_per_zone(games)
sales_ubisoft = publisher_sales_by_zone['Ubisoft']
most_sales_zone_ubisoft = max(sales_ubisoft, key=sales_ubisoft.get)

I have solved this exercise, but my code is not so elegant. I tried to understand the answer provided by DQ for this exercise. I am struggling to understand the zone_index part (see below):

for publisher in games_per_publisher:
        publisher_sales_by_zone[publisher] = {zone: 0 for zone in zones_index}
        for games in games_per_publisher[publisher]:
            for zone in zones_index:
                zone_sales = games[zones_index[zone]]
                publisher_sales_by_zone[publisher][zone] += float(zone_sales)
zones_index = {
    'NA_Sales': 5,
    'EU_Sales': 6,
    'JP_Sales': 7,
    'Other_Sales': 8
}

Could someone explain to me this part?

Thanks in advance!

What exactly is confusing you here?

If you have gone through or understood how dictionaries in python work, this should not be too confusing. Try to write down as a reply what steps from that code you do understand and which ones you don’t and then we can narrow down on the issue you are having.

Hi @the_doctor,

Thanks for asking for the clarification!

I do not understand {zone: 0 for zone in zones_index} for the code below:

publisher_sales_by_zone[publisher] = {zone: 0 for zone in zones_index}

After getting your reply, I printed out publisher_sales_by_zone[publisher] and I got this:

{'NA_Sales': 0, 'EU_Sales': 0, 'JP_Sales': 0, 'Other_Sales': 0}

It seems to me that the line of code above assigns 0 to each sales zone (‘NA_Sales’, ‘EU_Sales’, etc).
However, the value (5, 6, 7, 8) of the zones_index dictionary given originally is the column index (as below), is used later to get the zone_sales:

zones_index = {
    'NA_Sales': 5,
    'EU_Sales': 6,
    'JP_Sales': 7,
    'Other_Sales': 8
} 
zone_sales = games[zones_index[zone]]

This is the part that confuses me. I hope that my question is clear now. It would be great if you could explain this part to me.

Good work on that detailed response!

So, Python has a particular feature called comprehensions - like a list comprehension or a dictionary comprehension.

Just a quick overview, a list comprehension would be something like -


a = [i for i in range(5)]

The above will result in a being a list which stores the values 0 to 4. The above is equivalent of -

a = []

for i in range(5):
    a.append(i)

list comprehensions offer certain advantages over normal for loops which we won’t get into now.

Similarly there are dictionary comprehensions. That’s what the code does -

{zone: 0 for zone in zones_index}

It creates a dictionary, where the keys are the zones and the values are initialized to 0. In the solution code, we create such a dictionary for each publisher.

So, we have a dictionary publisher_sales_by_zone where for each publisher we store the information for each zone.

What information does each zone correspond to then?

We need it to save the total number of sales of that publisher in that zone.

But as of now those values are all 0s. This is where we use our games dataset.

We first group games by publisher and store it in games_per_publisher. So games_per_publisher is a dictionary where the keys are the publishers and for each publisher key the value would be a list of lists that stores the rows from our dataset corresponding to that publisher.

This brings us to the following part of the code -

for games in games_per_publisher[publisher]:
            for zone in zones_index:
                zone_sales = games[zones_index[zone]]
                publisher_sales_by_zone[publisher][zone] += float(zone_sales)

So, as we know from above at each iteration, games will be a list of lists containing rows from our dataset corresponding to that particular publisher.

Here’s an example of a single list from a specific list of lists that games stores -

[‘Wheel of Fortune’, ‘PS2’, ‘N/A’, ‘Misc’, ‘Unknown’, ‘0.47’, ‘0.36’, ‘0’, ‘0.12’, ‘0.95’]

Since the above list is just a row from our dataset, it has the following structure -

[Name, Platform, Year, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales]

What variables do we have at index positions 5, 6, 7, 8?

That’s right, it’s the sales values that we need! We stored those indices in our dictionary zones_index.

That’s why we have the following -

games[zones_index[zone]]

We get the index from zones_index using zone, and using that index we get the corresponding sales value from games. And in the final step -

publisher_sales_by_zone[publisher][zone] += float(zone_sales)

For a particular publisher for a particular zone we just add the corresponding sales values.

Let me know if the above helps or not. There could be a simpler way to handle this, but it should give you an idea of how dictionaries are being used here.

1 Like

Thanks a lot! Your explanation is very clear! Now, I got it, especially the dictionary comprehension part! :slightly_smiling_face:

1 Like