Screen Link: https://app.dataquest.io/m/60/introduction-to-spark/9/explanation
I have a question regarding an example code below.
daily_show.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y)
In this PySpark’s code, what does
lambda x,y: x+y do? I understood that map function returns tuples consisting of two values, but I am not sure about reduceByKey fuction.
Hello @tokeihananda, I’ve not yet done this mission.
As you can see
lambda x,y: x+y is a function that requires two parameters and returns their sum. We used
reduceByKey() to reduce the word string by applying the sum function on value. The result will contain unique words and their count. Check out the below article:
@ tokeihananda Tokeihananda here this
reducebykey(f) takes the same element of tuple as a key , and counts the other element of the tuple as the lamda function declared here
lambda x, y: x+y, ----> , since
(1+1+1+1) = 4
If you asked to multiply each corresponding element to the key like
lambda x, y: x*y , then it would be
( '1991', 1), since
1 * 1 * 1 * 1 = 1.
Please mark my answer as solution, if you find it useful.