Hi @DataBuzzer,
It is true that the proposed solutions uses some python programming concepts that we didn’t learn yet. We are working on improving that.
However, it is possible to solve this questions without them.
Lambda
A lambda function is like a regular function but defined with another syntax. The code:
def pandas_algo():
get_max_relevance = lambda x : x.loc[x["relevance"].idxmax(), "product_link"]
return data.groupby("query").apply(get_max_relevance)
Is the same as the following:
def get_max_relevance(x):
return x.loc[x["relevance"].idxmax(), "product_link"]
def pandas_algo():
return data.groupby("query").apply(get_max_relevance)
There is a course on lambda functions later in the DE paths but I agree that we should not be using them before teaching it.
Enumerate
When you do a for
loop using enumerate()
you get access to both the index and the value rather than just the value.
For example, a simple for
loop will iterate over the values:
for value in [5, 7, 3, 8]:
print(value)
5
7
3
8
Using enumerate
will iterate over the indexes and the values at the same time:
for index, value in enumerate([5, 7, 3, 8]):
print(index, value)
0 5
1 7
2 3
3 8
In the solution we use it but it is totally possible to solve it without it. The algo()
function can be rewritten without enumerate()
as follows:
def algo():
links = {}
for i in range(len(query)):
row = query[i]
if row not in links:
links[row] = [0,""]
if relevance[i] > links[row][0]:
links[row] = [relevance[i], product_link[i]]
return links
Pandas concepts
The pandas_algo()
is using some functions that I am not sure we learn before.
The DataFrame.groupby()
method groups the rows of the dataframe by a given column. Imagine that you have this dataset:
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
If we group by Animal
then we have two groups each with two rows. For example:
df = pd.DataFrame({
'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
'Speed': [380., 370., 24., 26.]
})
for group in df.groupby('Animal'):
print(group)
('Falcon', Animal Speed
0 Falcon 380.0
1 Falcon 370.0)
('Parrot', Animal Speed
2 Parrot 24.0
3 Parrot 26.0)
We can use the apply()
method to apply a function to each group. For example, we can compute the maximum speeds of the animals in each group by doing:
def max_speed(group):
return group['Speed'].max()
print(df.groupby('Animal').apply(max_speed))
Animal
Falcon 380.0
Parrot 26.0
dtype: float64
In the context of this problem, we group by query
and apply a function that returns the product_link
of the row with maximum relevance
within each group.
I hope this helps 