# Stuck on CPU Bound Programs

Hi!

So I’m working on the CPU Bound Programs mission and got to step 13 (Practicing writing efficient algorithms) where the instruction is to ‘Use pandas groupby to find the `product_link` with the highest `relevance` for each unique `query` .’ Course link

This is where I got stuck. How do I start thinking about solving this? I had initialized a dictionary and the loop for iterating through ‘query’ with ‘item’ but then? This is also suggested by the hint of the step. But how do I ‘track the highest relevance for each term and the associated link in a dictionary’? Eventually I checked the answer and I see stuff I haven’t come across before so that makes me think I could not have figured this out on my own without seriously Googling for this stuff.

The following stuff is new (and therefor unknown) to me, maybe you can point me in the right direction to where to learn this stuff? Maybe perhaps in courses on Dataquest:

• lambda
• enumerate
• loop with two iterators

I have been looking for courses with lamba in it and came accross:

Both courses which are not in the Data Engineer Path.

Appreciate a response. Thanks!

3 Likes

Hi @DataBuzzer,

It is true that the proposed solutions uses some python programming concepts that we didn’t learn yet. We are working on improving that.

However, it is possible to solve this questions without them.

## Lambda

A lambda function is like a regular function but defined with another syntax. The code:

``````def pandas_algo():
get_max_relevance = lambda x : x.loc[x["relevance"].idxmax(), "product_link"]
return data.groupby("query").apply(get_max_relevance)
``````

Is the same as the following:

``````def get_max_relevance(x):

def pandas_algo():
return data.groupby("query").apply(get_max_relevance)
``````

There is a course on lambda functions later in the DE paths but I agree that we should not be using them before teaching it.

## Enumerate

When you do a `for` loop using `enumerate()` you get access to both the index and the value rather than just the value.

For example, a simple `for` loop will iterate over the values:

``````for value in [5, 7, 3, 8]:
print(value)
``````
``````5
7
3
8
``````

Using `enumerate` will iterate over the indexes and the values at the same time:

``````for index, value in enumerate([5, 7, 3, 8]):
print(index, value)
``````
``````0 5
1 7
2 3
3 8
``````

In the solution we use it but it is totally possible to solve it without it. The `algo()` function can be rewritten without `enumerate()` as follows:

``````def algo():
for i in range(len(query)):
row = query[i]
``````

## Pandas concepts

The `pandas_algo()` is using some functions that I am not sure we learn before.

The `DataFrame.groupby()` method groups the rows of the dataframe by a given column. Imagine that you have this dataset:

``````   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
``````

If we group by `Animal` then we have two groups each with two rows. For example:

``````df = pd.DataFrame({
'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
'Speed': [380., 370., 24., 26.]
})
for group in df.groupby('Animal'):
print(group)
``````
``````('Falcon',    Animal  Speed
0  Falcon      380.0
1  Falcon      370.0)
('Parrot',    Animal  Speed
2  Parrot       24.0
3  Parrot       26.0)
``````

We can use the `apply()` method to apply a function to each group. For example, we can compute the maximum speeds of the animals in each group by doing:

``````def max_speed(group):
return group['Speed'].max()

print(df.groupby('Animal').apply(max_speed))
``````
``````Animal
Falcon    380.0
Parrot     26.0
dtype: float64
``````

In the context of this problem, we group by `query` and apply a function that returns the `product_link` of the row with maximum `relevance` within each group.

I hope this helps

1 Like

Hi @Francois,

Thanks for taking the time and effort for this great reply! Really appreciated.

After some practice and research I understand your approaches and how they work. I recognize that my skill level and intuition is just not there yet, but it’s coming.

Gr. Ron

2 Likes

@Francois I appreciate the explanation, although I do think this question was really not well thought out.
The solution checker doesn’t ask for an output to validate either algorithm against.
It says to “develop an algorithm”, but doesn’t specify whether this should be a function and (if it is a function) what it’s inputs and outputs should be.
To that effect:
pandas_algo() will return product links in a pandas Series with query as its index.
algo() with return a dictionary whose keys are queries and whose values are a list containing relevance and product link. You could pull out the product link only with a dictionary comprehension:
`links = {k, v[1] for k, v in links.items()}`
Alternatively, you could track the query max relevance and the query product link in separate dictionaries. I feel like this looks slightly cleaner, and so far as I can tell it doesn’t affect order of complexity.

``````def algo():