hamlet_text_only = hamlet_with_ids.filter(lambda line:len(line)!=1).map(lambda line:line.remove('')) hamlet_text_only.take(5)
What I expected to happen:
I expected the system to filter out all rows that have content other than just the line number. Then I expected the system to remove all the empty strings
' 'in each line.
What actually happened:
I end up with this massive error log. I’m unsure of how
take function has any bearing on this.
I’m pretty sure that the remove function works because I ran this function separately on a single line and got a successful output.
def clean(line): line.remove('') return line print(clean(['0', '', 'HAMLET', '']))
I’m clear that my answer is incorrect since it only considers the first blank space. However, the reason for asking is the error that came up . It seems unnecessary.
Any help would be appreciated.