Just a quick question about the provided solution vs my own. The mission suggests adding every word from each row in the SMS column to a list called vocabulary. When its done, it says to convert the vocabulary list to a set using the set() function, then convert it back to a list using the list() function. Apparently this allows you to remove duplicates.
Before reading the whole question (a mistake I often make!) I iterated over each word but instead of using set() and list() at the end, I used an if statement within the for loop to skip words that were already in the list.
I believe it resulted in the same list (once sorted), but is my way inefficient/slow in comparison? Is using an if statement in this manner frowned upon in some way?
Apologies if this is a dumb question or answered elsewhere in the course!
trainingset['SMS'] = trainingset['SMS'].str.split() vocabulary =  for row in trainingset['SMS']: for word in row: if word not in vocabulary: vocabulary.append(word)
Solution Code (using my variable names):
trainingset['SMS'] = trainingset['SMS'].str.split() vocabulary =  for row in trainingset['SMS']: for word in row: vocabulary1.append(word) vocabulary = list(set(vocabulary))