Question on list comprehension and Lambda functions - mission 7/14

Hi everyone,
first help post in the community!
I am struggling to understand mission 7/14 in the the List Comprehension mission.

Code reported below:
hn_clean is a list of dictionaries grabbed with the json module.

def retrieve_num_comments(dictionary):
    return dictionary['numComments']

most_comments = max(hn_clean, key=retrieve_num_comments) 

Given the fact that retrieve_num comments should have a dictionary as an argument, it is not clear to me how the syntax works for most_comments variable.

  • I am applying max() on an iterable, the hn_clean lsit
  • however the retrieve_num_comments function is called on dictionaries… does it mean that key is called on the iterables of the list?

Hope I was able to explain myself…
THanks for any help!

1 Like

Hello nlong! Welcome to our community!

Functions like max(), min(), and sorted() have a special key parameter that you can use to specify a function to be called on each element of that list!

This page in the documentation has an example that illustrates this very well:

So yes, the key parameter is used to pass in the function that you want executed on each of the elements of that list. When you pass in the function this way, you’re passing in the function as an argument inside the max() function.

The use of a function you’re most familiar with is probably something like the following:

def add_two(x)
    return x+2

add_two(2)

Here you’re simply calling the function directly. You’re passing in an argument and telling the function: “here, take this integer and do your thing”.

The main difference when using the key parameter to pass in a function as an argument is that you use it to tell the “primary” function (i.e. max(), min(), etc): “Here, take this list and take this function to use on elements of that list. And then do your thing.

Hopefully that clears it up!

To answer your questions more directly:

Precisely. The function (an argument) gets called on each value of the list (the list also being an argument), and only then does the max()/min() function do its job. They key parameter is what makes this possible.

most_comments = max(hn_clean, key=retrieve_num_comments)

In essence, this is what’s going on:

You call a function, max(), and pass in two arguments - an iterable, and a function you want used on the individual components of said iterable. The key parameter is what communicates this intention of yours to the max() function.

As you know, what your function does is it returns the value associated with the 'numComments' key in each dictionary. After the iteration is performed behind the scenes, the max() function therefore will end up with a list of several ’numComments' values. So it finally does its job, and selects the maximum value from that list.

It then returns the element within the original iterable that contained this maximum 'numComments' value it found.

2 Likes

Hey blueberrypudding85, thanks for taking time to go through this!
I didn’t see the key function page in the python documentation and I was slightly confused about that.

Reading that page + your quote definitely made this clearer to me.

Onwards then! And thanks for the super fast support!

Introducing you to 2 more uses of key

  1. operator.itemgetter: https://docs.python.org/3.7/howto/sorting.html
  2. Terminal node of decision tree: https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/

    This second example is less intuitive as it uses 2 pieces of information. The key operates on not just on the unique outcomes from set(outcomes) but also requires outcomes to count which item in the list is most frequently occurring, so it’s not only operating on each item in the iterable as you would usually expect

Thank you blueberrypudding85 for your explanation, however, one more point that confuses me is that the the parameter in the original function is a dictionary while it is a list when we call the max function.

def retrieve_num_comments(dictionary):
return dictionary[‘numComments’]
most_comments = max(hn_clean, key=retrieve_num_comments)
hn_clean is a list while in retrieve_num_comments the parameter is a dictionary.

Could you please explain this discrepancy?

Hi @Aly!

It’s been a while since I made that post - but this is what I think the confusion is -

While hn_clean itself is a list, the individual elements inside that list are names of dictionaries, so when you pluck out a value from there, you’re actually getting the name of a dictionary, and so when you do dictionary['numComments'], you’re fetching the numComments value associated with that dictionary.

Essentially what’s happening is you’re taking a list of dictionaries, extracting the numComments value of each dictionary, and then based on those values, you’re selecting the maximum (and returning the name of the dictionary that contained this max numComments value.

This snippet of my earlier post I think partly addresses this: