Hello everyone! I would like to share with you my project related to Building a Spam Filter with Naive Bayes. I would like to share it with you because I performed a scikit-learn MultinominalNB classification on the same dataset and compared its accuracy with manually-written algorithm. I hope, this comparison will be interesting for you. I would also appreciate a feedback related to my code in general and also to scikit-learn library usage. I believe, it is quite practical to compare manual approach with industry-accepted libraries, since we will probably use the latter once for our daily job.
Thank you in advance. Peace.
Mission screen : Learn data science with Python and R projects
Building a Spam Filter with Naive Bayes.ipynb (41.5 KB)
Click here to view the jupyter notebook file in a new tab
Welcome back to the Community with another awsome project! You’ve done a great job applying both manual approach and scikit-learn classifier, and you even obtained a higher accuracy in that your experiment. Your code is clean and perfectly commented, the storytelling is really cool, with all the necessary links, mathematical references, and emphasizing the most important points. Well done!
Some minor suggestions:
- You can combine the adjacent code cells if they have no output or markdown cells in between(e.g. -, -, -, -).
- Right before the code cell , something happened in the last sentence in markdown (selecting alpha).
- You can create the
classify function without that intermediate step.
- Consider rounding outputs for the accuracy values in both cases. It’s especially easy since you’re using f-string formatting, which is great.
Hope my ideas were helpful. Good luck with your future projects and keep up this high level!
Thank you for your valuable comments. The end of markdown before cell  is definitely a failure that I somehow missed .