Hello everyone! I am so excited about this, I don’t know where to start.
I have recently completed this guided project on my way to my hometown. You can find the last screen of the project here. The Guided Project was sooo much fun to do (thanks a lot, Dataquest!), so I decided to make one more like it. I looked for datasets on various sites, I ended up using this news data I’ve found on Kaggle eventually. However, I’ve read somewhere that Logistic Regression was also good at text-classifying and I had some prior knowledge about it. (Thanks, Andrew Ng!) Therefore, I decided to challenge my skills a bit and used Logistic Regression for this with term-frequencies as the words’ weights, again.
Soooo, I made this! It’s a news classifier, classifying news as “fake” or “real” by their titles. I have thoroughly cleaned, visualized and analyzed the data before fitting it to the model and I feel so proud of myself. Completing this project in two days has motivated & inspired me, I hope you like it, too! Shoutout to Dataquest team for their AMAZING teaching methods! This has been a really nice journey, I am now 90% in to the Data Analyst path and will soon be graduating from it. Thank you all community members as well, you all have been really nice and helpful to me. God, I’m a sentimental one, lol.
Congratulations o having this much fun. I must admit that the Naive Bayes spamfilter project was one of my favorites too
And great fake news classifier! It’s good to see that you’re taking your skills into the almost real world!
On the fake news classifier I have a remark:
You’ve been feature engineering quite nicely, especially because the category column and date column were giving away a lot. So your model has gotten a really high accuracy.
However probably in the real world the model should be able to recognize fake news without the date column and without the category column. It is quite unrealistic that every single message from the ‘all’ category would be fake. Therefore I would like to suggest a challenge: Try to detect the real or fake news only by using the title column
One other thing, I have recently finished the data scientist path, and life after that becomes quite interesting. As you can see in this post I have no clue what to do haha. I’m curious what you are going to do afterwards!
I am very glad you have linked your post, I completely missed it! I checked it out and I will be back to it for further reading later on!
About the challenge you have suggested: I have actually only used the title column as a feature. I have only used the date and category column while analyzing data for points I might be missing, but I didn’t fit them into the model. So, challenge complete!
I am honored that you liked my project, as for what I’m planning to do, I have already begun applying to internships and entry-level jobs. I have probably applied to more than 40 jobs/internships now and am going to be patiently waiting for some while. You can check out this YouTube playlist by Stanford Online, it’s Andrew Ng’s 2018 Autumn CS229 class. You will surely not regret it and you can find the curriculum here. It’s really good and detail oriented. I will probably go back and watch all of it again after I complete the Data Analyst Path.
Oh dear, some poor reading led to a very easy challenge. I should really read more notebooks haha.
Good luck in your search for a job! Really curious to hear how you will fare Let’s hope you’ll get payed really well soon!
I can see that the notes of the course are very detailed thanks for sharing this, it will be valuable when I’ll dive deeper into ML. I will most likely take watch the videos as well (but only when I use a certain model haha). But your youtube link is broken
At the moment I have devoted my time to learning scrapy:
I think scrapy is really a cool tool to get a lot of data in a short amount of time. I mean with it you can basically get anything from the web . This is a real challenge on a programming level though.
Good luck with your last 10%
Cheers @DavidMiedema ! Also, thanks for inspiring me to use Scrapy, I definitely will check it out to improve my Data Collection skills.