Hi everyone,
Wow, it seems that the topic of my another project is not super-popular in the Community!
Anyway, it’s about a very necessary tool nowadays: a spam filter for SMS messages. It resulted to be quite a precise filter, with 98.74% of accuracy. However, making the system more complicated by considering letter case of the words made it much less accurate and, hence, this experiment was not considered further.
I also investigated those very few messages classified wrongly by the filter and found some features in common among them. It seems that spam senders have a clear idea of how spam filters work, so they also figured out the ways of how to override the system
Additionally, I created a word cloud to display the 100 most “spamish” words and found patterns in them. Also, throughout the project, I used pretty-printing a lot, including how to better visualize numbers, tables, and sections. And for writing pieces of formulas in markdown, I found a good trick for displaying lower and upper indices (used only the first one). If you need to know it as well, ask me Hopefully, you’ll find these things useful for your future projects as well.
Any feedback from you is very welcome. What can be improved / modified / optimized in terms of code, storytelling, project structure? Or if you find any typos, discrepancies, issues, etc., please let me know.
Thanks a lot in advance!
Building a Spam Filter with Naive Bayes.ipynb (572.9 KB)
Click here to view the jupyter notebook file in a new tab