Potus data set - finding who visited the most each month

Hi all, this is my first post! I’m actually not sure if this question should be posted on this forum or on the slack forum.

My question comes from the “Working with dates and times in Python” mission from the “Python for data science: intermediate” course" which uses the “potus” data set. One of the suggested extension exercises is to find out who visited the White House the most each month. Below is what I came up with, but I wonder if there is a better way of doing this (I’m sure there is). Thanks in advance!

   def most_freq_visitor_that_month(month):
        visitor_freq = {}
        for row in potus:
            visitor_name = row[0]
            start_date = row[2]
            appt_month = start_date.month
            if appt_month == month:
                if visitor_name in visitor_freq:
                    visitor_freq[visitor_name] += 1
                    visitor_freq[visitor_name] = 1
        return max(visitor_freq)

month_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
for mo in month_list:
    print(mo, ":", most_freq_visitor_that_month(mo))
1 Like

Hey @kevinl! Discourse is the best place! It will be our community moving forward.

Happy Learning!

The code posted isn’t super efficient, since it iterates through the dataset 12 times - once for each month. If you know for sure you want to collect the results for every month, you could do it with one loop through the dataset.

You could create a list of dicts - one for each month - and when you loop through the data update the dict associated with the month from each row. You could also store each month’s dict in another dict where the keys would be the month numbers or names.

This would be more efficient for getting data for all twelve months, but if you just want to look up a single month, your function looks good for that purpose!

1 Like

A solution to this problem using pandas.

import pandas as pd

potus = pd.read_csv('potus_visitors_2015.csv')
potus['month'] = pd.to_datetime(potus['appt_start_date'],format="%m/%d/%y %H:%M")
potus['month'] = potus['month'].dt.strftime('%m')
potus_counter = potus.groupby(['name', 'month'], as_index=False)['visitee_namefirst'].count()
result = potus_counter[potus_counter['visitee_namefirst'] == potus_counter.groupby(['month'])['visitee_namefirst'].transform(max)]

print (result[['month', 'name', 'visitee_namefirst']].set_index('month').sort_values('month', ascending = False))

Hi @collin thanks for the reply! I wasn’t able to come up with a solution with a list of dicts, but I found a way to do it with a dict of dicts:

january = {}
february = {}
march = {}
april = {}
may = {}
june = {}
july = {}
august = {}
september = {}
october = {}
november = {}
december = {}

month_dict = {
for row in potus:
    name = row[0]
    month = row[2].strftime("%B")
    for mo in month_dict:
        if mo == month:
            if name not in month_dict[mo]:
                month_dict[mo][name] = 1
                month_dict[mo][name] += 1

for mo in month_dict:
    print(mo, ": ", max(month_dict[mo])) 

Is this what you were suggesting?

Thanks for the reply @moriturus7! I haven’t made it to the pandas section of the course yet, but once I’ve covered that I’ll definitely come back to your suggestion.

Overall, you can further improve your code by using defaultdict and elimination of other code. See below for details:

From your code

for mo in month_dict:
        if mo == month:
            if name not in month_dict[mo]:
                month_dict[mo][name] = 1
                month_dict[mo][name] += 1

Next level improvement use dict.get method

month_dict[mo][name]  = month_dict[mo].get(name, 0) + 1
  1. You don’t need to iterate 12 month each time to insert to a particular month. Use the advantage of a dictionary and directly access the month via the dictionary key.

  2. Using .get(key, default_value) method, in our case .get(name, 0), when name is first access, the default value of 0 is return.

To further improve the code, use defaultdict from the collections library.

from collections import defaultdict
months = ["jan", ........, "dec"]

Setup dict with defauldict

month_dict = {}
for m in months:
    #default value for integers are 0
    month_dict[m] = defaultdict(int)

Alternative to setup the dict with defaultdict in a single line.

month_dict = dict(zip(months, [defaultdict(int) for _ in range(len(months))]))

Do not need to check whether the value exist before, since defaultdict takes care of it. All you need is to increment the value by 1.

for row in potus:
      name = row[0]
      month = row[2].strftime("%B")
      month_dict[mo][name] += 1

Optional: convert back to dict if you don’t need defaultdict

   month_dict[m] = dict(month_dict[m]) 

On other notes, you have to be careful for the following example:

  • From your code
january = {}
month_dict = dict()
month_dict["January"] = january 
  • Assign january dict some value.
january[A] = 1 

outputs: {A: 1}

Ok, month_dict["january"] still points to january.

  • Change value month_dict["january"] value
month_dict["january"] = [1, 2, 4]

outputs: {A: 1}

Now, january no longer points to month_dict["january"]. Any non-dictionary (changing to a list) changes to month_dict["january"] will not reflect the same result to january.

You may have unwanted changes to january as a result.

  • To fixed it:
    Assign january at the end after you computed month_dict["january"]

Took me about 35+ minutes to edit this.

Hope it helps,

1 Like

So the beauty of a dict of dicts is that you can then use a key - the month name - to look up each month’s dict. When iterating through potus you can just lookup which dict to use with the following:

target_dict = month_dict[month]

So you can simplify the loop by not looping through every month in each iteration:

for row in potus:
    name = row[0]
    month = row[2].strftime("%B")
    target_dict = month_dict[month]
    if name not in target_dict:
        target_dict[name] = 1
        target_dict[name] += 1

Also, the dict of dicts can be created a bit cleaner. Instead of naming each internal dict, you could use an empty dict as the value for each key in month_dict:

month_dict = { 'January':{},

or you could use integers as the keys and create the dict easily with dict comprehension:

month_dict = {i:{} for i in range(1,13)}

and then use the month number instead of name to look things up.

For the list of dicts, I was thinking you could just use the list indices to reference each month, so the code would become:

month_visits = [{} for _ in range(12)] #list comprehension creates 12 empty dicts

for row in potus:
    name = row[0]
    month = row[2].month - 1 #subtract 1 to get the correct index

    if name not in month_visits[month]:
        month_visits[month][name] = 1
        month_visits[month][name] += 1

for month, month_dict in enumerate(month_visits):
    print(month, ": ", max(month_dict,key=month_dict.get)

In the printing loop, using key=month_dict.get means the max function will look at the values (counts) for each key (name) and return the key (name) with largest value (count).

Hope this helps!

Thanks so much for this @alvinctk. I’ve learnt a lot from your post, namely:

  1. Not needing to iterate over a dictionary
  2. The get method
  3. defaultdict - in my particular example, is the main advantage being able to avoid checking if the value exists in the dictionary? Compared to using the get method is it just neater code?
  4. Being able to create my dict of dicts more efficiently
  5. Avoiding the pitfall of assigning a name to a dict in a dict unnecessarily
1 Like

Yes, defaultdict of a particular type - let say, int, you don’t need to check if the key in the dictionary exists. The default value will always be zero.

defaultdict is an improvement suggestion by a Python core developer Raymond Hettinger from .get method. It’s an example of code transformation - “When you see this, do that instead.”

1 Like

I am also stuck on this same problem. My code is virtually identical to what kevinl posted at the beginning of this thread. I am wondering if there is another approach to solving this problem, using concepts that have been introduced thus far in the Dataquest Python courses? I also know that finding the maximum value in a dictionary will return the maximum key, not the value of the key - how can I solve this, again, using what we have been exposed to so far in the Dataquest Python course?

Have you been able to resolve your issue?

Define a function to find max visiting frequency per person in a given month
(Haven’t learn this but I had to find a way to return key with max value in a dictionary and searched it in google)

def max_person(data):
    person = {}
    for row in data:
        if row[0] not in person:
            person[row[0]] = 1
            person[row[0]] += 1
    return [max(person, key = person.get), person[max(person, key = person.get)]]     

create a list of months

mnth = []        
for row in potus:
    if row[2].month not in mnth:

find max visiting person for each month

for i in mnth:      
    lst = [] 
    for row in potus:
        if row[2].month == i:
    print(['Month ' + str(i)] +  max_person(lst))


['Month 1', 'Jesus MurilloKaram', 3]
['Month 2', 'Sheila JacksonLee', 5]
['Month 3', 'Anna M. Yeo', 3]
['Month 4', 'Alan C. Prather', 7]
['Month 5', 'Chris Coons', 6]
['Month 6', 'Marilda W. Averbug', 6]
['Month 7', 'Russell A. Wilson', 4]
['Month 8', 'ELIZABETH C. NUNEZ', 3]
['Month 9', 'Shaojun ■■■■', 6]
['Month 10', 'AnnaMaria R. Mottola', 8]
['Month 11', 'MICHAEL LISZEWSKI', 3]
['Month 12', 'Brian E. Fallon', 4]

First time Python learner here. Let me know if this helps or any improvement suggestions. Thanks