Thank you very much for the detailed feedback! I’ve done my best to address each point, and have put my response to each as a numbered list below your bullets so it’s distinctive. I’ve also redone some of the last section for a stronger conclusion. Let me know what you think of the updates!
Popular Data Science Questions on Stack Exchange.ipynb (321.7 KB)
- I would recommend having a section that clearly marks introduction and conclusion to help non-technical viewers quickly go over the main thread of your project and its outcome. In addition, I would include a link from where the viewers could download the dataset to help them go over the data if they so chose to.
- Introduction and conclusion sections have been added with a stronger, more actionable conclusion.
- I’ve included a link to the DETE database, which is where this pull is from. Do you recommend I include a host for the csv export or should I show SQL code?
- The Data exploration section is apt. It clearly gives details about the data. However the Data Clean-up section could use some more work. While you have given details on the columns you intend to clean, it would be best to provide context to
why you have chosen those columns to clean and how they might affect your data analysis.
- More description and why points added
- Of your two graphs the Most Popular Tags graph is really neat and seems to follow most of the Gestalt Principles. In contrast the Top 20 Tags by Usage & Views graph needs some work.
- I agree and wasn’t satisfied with the way this graph turned out either. Is it bad practice to include exploratory graphs that don’t provide rich information themselves, but lead to something better? I’d sooner remove this section since the results don’t end up in the conclusion. One way I could maybe take it is to create a stacked bar chart with all the metrics, but I think there would be too much noise.
- You have shared your thoughts and analysis after each table and graph which is really good. However, I noticed that you ended the section Popularity Marker Overlap with the assumption that knowledge of the datascience domain is common place with the use of “With our knowledge of the datascience domain, we can see that…”. It may not be to the reader, therefore it is best to provide a link to some references that may help the user get a better understanding of the domain. You could also have a special section that gives an idea of the domain to the user.
- I added links to high level descriptions of those libraries and languages and a link to an overview of what Python is and what a Python library is.
- I’m unsure where I picked up the habit of using inclusive language, but I see it a lot in projects. Do you believe it’s fine to do so or should I be using passive language? i.e. don’t include: I, we, our, me, my, etc.
I only did a quick analysis of your coding and I feel your coding style is good.
- I think you have done a good job with coding. The style seems to be consistent. All the comments are in place
- You have not shirked away from using descriptive variable names which is really helpful
- I noticed that you have
SettingwithCopyWarning warnings. I did run your code and I was able to remove it. I would recommend that you go over this blog post which should help you to remove the error. I know its a long read but its a worthy investment.
- I was able to get rid of the first one through appropriate use of .loc described in the article, but the second one was a mystery to me. I ended up just turning off the alert and turning it back on to avoid the flag. Would you mind sharing if you agree with this method or if there’s a clear reason why this happened? I’ve read that there can be false flagging.
- This next one I’m not sure of, but in the 41st cell you have created a function for normalization. I believe that you are trying to standardize the columns but your formula is something that I’m not aware of. I believe the correct formula for standardizing is . I go this from here . I could be wrong and I would recommend if someone else could comment.
- I haven’t gotten to z-scores yet, but here is where I got the function from from. This was taught leading up to the section. Google says min-max functions are a type of normalization, but I don’t understand how they are different from z-scores.
- You have re-run the project from the beginning and that’s very helpful in numbering and identifying the cells correctly
- It appears neat if you leave space above each comment so that the code and its associated comment can be segregated neatly.
Click here to view the jupyter notebook file in a new tab