Hello community members,
Just now I completed the guided project “Popular Data Science Questions”:
My results are in this notebook:
PopularDataScienceQuestions.ipynb (1.0 MB)
It was an extensive and challenging project for me, in which I could apply a lot of things that I learned so far while following Dataquest courses, including data extraction using SQL, a lot of dataframe operations (pandas), visualization with both matplotlib and seaborn.
As always, interested in any feedback that anyone may have!
Click here to view the jupyter notebook file in a new tab
Thanks for sharing your project with DQ community. Have gone through it, and gained a lot . The information background of the data is well worked, the explanations given , the use of comments , the links provided, the visualization process are very informing. How you did your data extraction is so interesting and those explanations given are so profound, good work indeed . I also gained some technique on your technical preparations, they are so cool.
Have got few suggestions to raise;
- I don’t think if the commented code lines in cell
 serve any purpose in your workings, you can consider deleting the whole code cell, or if it does, kindly make me understand.
- Most of your verification are based on the uniqueness of the numerical values , which is okay, but I think in most cases ,Boolean values speaks much louder instead . It will also minimize the repetition of values when outputted.
- I think you ought to have explained the meaning of the ratios displayed in cell and cell same to correlations values displayed as a graph in cell.
- Check on the second sentence in the observation made in the your last graph, I think the word ‘However’ is repeated.
Otherwise from my side, I can affirm that everything is well worked on and just to congratulate you mate for the good work. All the best in your upcoming projects.
Thank you for reviewing my project, and for you feedback - much appreciated!
I agree with all your suggestions. For the 2nd comment, if you have the chance, would you able to elaborate a bit further? Would you have an example of a cell where I could do this in a better way ? Would that e.g. be cell ? And what would you suggest instead? An output like “Checking that the total number of viewed tags after grouping is the same as the original number of tags…: correct!” or something like that?
I find myself doing lots of verification of the code that I wrote by checking numbers, examples, etc. I am never sure though how much of those checks I should keep in my notebook, or rather delete them again. At one hand it shows that the results should be correct, to myself and any audience, at the other hand it may be bad for readability. Any thoughts regarding that?
My apologies for the late reply, nevertheless, I would wish to clarify on my second point. The code cell below will evaluate to true if the sum of
FavoriteCount is the same as the sum of
tags_favoritecount. With this , you just need to add little bit of comments for the reader to understand what exactly the assignment does, like we expect a ‘true’ value if their is uniqueness otherwise ‘False’ will be displayed.
# Verify correctness by checking totals
# We except a true value if the sums are the same
print(poststags["FavoriteCount"].sum() == tags_favoritecount.sum())
Hi @brayanopiyo18 , a belated “thank you!” for your additional comments to my question. (I was unavailable for some time, therefore not able to respond sooner). I see what you mean with your suggestions, makes sense, thank you! I will take this into account for an update of this project (when the time is right) and/or for next projects!
This great @jasperquak ,