Finding Word Frequencies - Neither my solution nor the recommended solution run

Screen Link: https://app.dataquest.io/m/171/quickly-analyzing-data-with-parallel-processing/9/finding-word-frequencies

My Code:

def word_frequencies(filename):
    
    # Fill in the function here
    opened = open(filename,'r')
    file_text = opened.read()
    file_text = re.sub('[^0-9a-zA-Z]+', ' ', file_text)
    file_text = file_text.lower()
    word_list = file_text.split(" ")
    
    def remove_small_words(word):
        return len(word) > 4
    
    word_list = filter(remove_small_words, word_list)
    counts = Counter(word_list)
    
    
    return counts

results = []
pool = concurrent.futures.ProcessPoolExecutor(max_workers=4)
filenames = ["lines/{}".format(f) for f in os.listdir("lines")]
word_counts = pool.map(word_frequencies, filenames)
word_counts = list(word_counts)

total_word_counts = sum(word_counts, Counter())
top_200 = total_word_counts.most_common(200)

What I expected to happen: Both my solution, as well as the example answer should produce the desired list of top 200 popular words.

What actually happened: NEITHER my solution NOR the provided solution run successfully. Both time out.

Your code run has timed out.
This could be caused by writing an infinite loop, or an issue with our system.
Issue still persisting? Check our status page, or read about how to troubleshoot this error.

Hi @bbartley,

The solution code works, only that it’s written in a bit chaotic way. Just start copying it from the line from collections import Counter inclusive and till the end.

About your code, there are some issues to be considered and fixed:

  • Don’t define one function inside another.
  • In your code, word_list is actualy a filter object, not a real list.
  • Counter accepts a list as an input. Hence the world_list should be converted in a list anyway.

Eventually, without modification, both solutions were able to run successfully. I am wondering if it was a server load or resource allocation issue in the environment.

I appreciate the feedback on naming, and will make those updates.

1 Like