Personal Project: E-mail Spam Detection

Hi DQ!

This is a personal project borne out of guilt from doing the SMS spam project which was entirely guided. I thought that in order to truly learn, I had to do a project of my own.

The following is the fruit of that labour.
E-mail Spam Detection.ipynb (937.8 KB)

As always, any feedback is welcome. I would love to know if section 3 in the report makes sense. I was trying to explain Bayes classification and Logistic Regression, and it started becoming really long, so I had to cut it short.

Happy reading!!

Click here to view the jupyter notebook file in a new tab

Hey @jesmaxavier! Thanks for sharing your project with the Community :slight_smile: You have a nice portfolio project with excellent narrative and code. I’m also glad that you’ve learned a lot while doing it. I also liked your style of naming the project’s sections!

I think your explanations in section 3 are very clear and absolutely worth including. You have not been too wordy, no worries.

Some suggestions from my side:

  • You have some typos
  • You code style inconsistencies like missing spaces after variable assignments, etc.
  • I believe you wanted to use Google docstrings style for function documentation. Don’t forget that you have to use the data type and not an arbitrary variable name that you return. Also, in [13], you used the datatype dataframe but I believe it should be pd.DataFrame, indicating the package it comes from (abbreviation pd for pandas is fine, and a correct name for the data structure)
  • You can also use MarkDown to write down the formulas :slight_smile: It’s maybe not that easy as inserting images but it delivers better quality :slight_smile:
  • This means that 1 out of every 10 e-mails could be incorrectly classified which seems to be a very good statistic. - is it? 10% of incorrect results is actually a lot! You may say that it’s OK for a simple algorithm like Naive Bayes, but I wouldn’t entrust my mailbox to it
  • Make sure to clearly label axes of the plot, increase its size, remove the top and right spines, and play with other colors
  • You explain what sensitivity and specificity are after introducing them before in the functions. You may consider moving these explanations up the project

Hope you find my suggestions helpful. Happy coding :slight_smile:

