Prison Break Syntax clarification

There are two tables in the URL in the exercise, why did the second Table "Escapes in fiction " not get parsed?
How the first table “Actual Attempts” was selected only. How did we specify the table or location of the CSV to be parsed? I don’t see it anywhere.

Second Question in the same Vein,

Line 1: max_year = max(data, key=lambda x: x[0])[0]
Line 2: data[index1][index2]

Are Line1 and Line2 saying the same thing?

Hello,

All the functions are in the helper module which you can read yourself by downloading the notebook. I’m going to attach it here for reference:

helper.py (1.9 KB)

Here’s the relevant snippet:

def data_from_url(url):
    df = pd.read_html(url)[1]
    lol = df.to_numpy().tolist()
    return lol

All of the tables are parsed but the only one we are interested is the table for non-fictional escape attempts which is indexed by 1. If you modify the [1] to [2], you’ll then get the fictional attempts. For example:

from helper import *

# override the helper's data_from_url function
def data_from_url(url):
    df = pd.read_html(url)[2]
    lol = df.to_numpy().tolist()
    return lol

url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"

data = data_from_url(url)

data # to print

And for the following:

I’m not sure if I understand your question clearly, but I’ll try:

If by the same you mean will they give you the same output, then it depends. You can get lucky with the second line and get a value that’s equal to max_year. Vice versa.

For the first line, the max function is for finding the max year (aka the integer with the highest number) from a list. But because data is a list containing multiple lists, instead of a list containing only the years (integers), it’s a bit more difficult to find the max year. So, instead we’ll have to identify first which list inside data that contains the max year, and use key to determine which index should be the determiner of what is compared i.e. we use index 0 to make year as the basis for comparison.

The max function will then return the list that contains the max year, and we can then get the value by indexing it with [0].

The second line is just multidimensional indexing (I think). You “dig” into the list of lists twice i.e. first, to get a list and second, to get the value inside it. In a way the first and second line is the same in this regard because both involve “digging” twice into the list, but the former is slightly more complicated than the latter.

I’m unsure if my answers are clear enough, so please tell me that’s the case.

1 Like

Thank you for the excellent words. I understood that the helper function was behind it, I should have looked into it to it more.

Regarding

max(data, key=lambda x: x[0])[0]

The thing that is getting me is the relationship between the x and the list in data. for example inside the max function is this an effective equivalent or saying the same thing at execution?

max(data, lambda data: data[0])[0]

The reason I thought it is the same as data[0][0] is because at index 0 it is data[0][0].

1 Like

Thank you for the clarification.

Using your example, the data referred to in the lambda is not the same as the original data. It’s just a variable that represents an element in data, and the name is arbitrary. A better name that represents it might be row[0] or el[0].

Here’s the difference between the two data for example:

Outside the lambda: data[0] equals to [1985, "Value 1", "Value 2", ...]

Inside the lambda: data[0] equals to 1985.

Maybe it’s clearer if you look at it this way:

data = [ 
        ["1", "2"], # an element or a row
        ["3", "4"]
       ]

max(data, key=lambda row: row[0])[0] # row here is an element/a part of data

Essentially, the lambda is a for loop

# the lambda works a bit like this
for row in data:
   return row[0] # returns the year 

# the year will then be used determine which row has the max value

Or we can also structure it differently by splitting the indexing into two separate lines:

row_with_max_year = max(data, key=lambda row: row[0])

max_year          = row_with_max_year[0] 

Note that row_with_max_year can be equivalent to data[0], data[1], data[2], and so forth.

1 Like

I will read your replies a couple of times but I think I got it. It’s just the logic and information flow, once understood everything will be ok.

Thanks Again

1 Like

Abstraction, as in hiding the inner working of the function, can be confusing for those who are starting out. One thing you can try is to implement your own max function, and you’ll probably understand it better than any explanation or documentation.

Good luck learning.

1 Like