What does lambda x, y: x+y mean in PySpark's reduceByKey function?

Screen Link: https://app.dataquest.io/m/60/introduction-to-spark/9/explanation

I have a question regarding an example code below.

daily_show.map(lambda x: (x[0], 1)).reduceByKey(lambda x, y: x+y)

In this PySpark’s code, what does lambda x,y: x+y do? I understood that map function returns tuples consisting of two values, but I am not sure about reduceByKey fuction.

Sincerely,

Hello @tokeihananda, I’ve not yet done this mission.
As you can see lambda x,y: x+y is a function that requires two parameters and returns their sum. We used reduceByKey() to reduce the word string by applying the sum function on value. The result will contain unique words and their count. Check out the below article:

1 Like

@ tokeihananda Tokeihananda here this reducebykey(f) takes the same element of tuple as a key , and counts the other element of the tuple as the lamda function declared here lambda x, y: x+y, image ----> image , since (1+1+1+1) = 4
If you asked to multiply each corresponding element to the key like lambda x, y: x*y , then it would be( '1991', 1), since 1 * 1 * 1 * 1 = 1.
Please mark my answer as solution, if you find it useful.

2 Likes