BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

[Re-upload]Project Feedback - Popular Data Science Questions

[This is a reupload - my notebook should now be visible.]

Hello DQ community!

Please provide general feedback on my latest guided project. I felt like this one provided less direct guidance, so I’m curious how I did.

Shoutout to @Elana_Kosourova - the post where you shared your example guided project was really helpful! (I also borrowed some ideas with some changes.)

https://app.dataquest.io/c/84/m/469/guided-project%3A-popular-data-science-questions/11/next-steps

Popular Data Science Questions on Stack Exchange.ipynb (246.7 KB)

Click here to view the jupyter notebook file in a new tab

Hi,
I’ve been going through your project and thought I could provide some tips to make it better. It is clear that you’ve put in effort but some more kinks would help it get to the top.

Presentation
  • I would recommend having a section that clearly marks introduction and conclusion to help non-technical viewers quickly go over the main thread of your project and its outcome. In addition, I would include a link from where the viewers could download the dataset to help them go over the data if they so chose to.
  • The Data exploration section is apt. It clearly gives details about the data. However the Data Clean-up section could use some more work. While you have given details on the columns you intend to clean, it would be best to provide context to why you have chosen those columns to clean and how they might affect your data analysis.
  • Of your two graphs the Most Popular Tags graph is really neat and seems to follow most of the Gestalt Principles. In contrast the Top 20 Tags by Usage & Views graph needs some work.
  • You have shared your thoughts and analysis after each table and graph which is really good. However, I noticed that you ended the section Popularity Marker Overlap with the assumption that knowledge of the datascience domain is common place with the use of “With our knowledge of the datascience domain, we can see that…”. It may not be to the reader, therefore it is best to provide a link to some references that may help the user get a better understanding of the domain. You could also have a special section that gives an idea of the domain to the user.
Coding Style

I only did a quick analysis of your coding and I feel your coding style is good.

  • I think you have done a good job with coding. The style seems to be consistent. All the comments are in place
  • You have not shirked away from using descriptive variable names which is really helpful
Bugs
  • I noticed that you have SettingwithCopyWarning warnings. I did run your code and I was able to remove it. I would recommend that you go over this blog post which should help you to remove the error. I know its a long read but its a worthy investment.
  • This next one I’m not sure of, but in the 41st cell you have created a function for normalization. I believe that you are trying to standardize the columns but your formula is something that I’m not aware of. I believe the correct formula for standardizing is image. I go this from here . I could be wrong and I would recommend if someone else could comment.
Miscellaneous
  • You have re-run the project from the beginning and that’s very helpful in numbering and identifying the cells correctly
  • It appears neat if you leave space above each comment so that the code and its associated comment can be segregated neatly.

That all from me. Overall good job :grinning_face_with_smiling_eyes: :+1: and keep projecting :rocket:on

Hi jesmaxavier,

Thank you very much for the detailed feedback! I’ve done my best to address each point, and have put my response to each as a numbered list below your bullets so it’s distinctive. I’ve also redone some of the last section for a stronger conclusion. Let me know what you think of the updates!

Popular Data Science Questions on Stack Exchange.ipynb (321.7 KB)

Presentation
  • I would recommend having a section that clearly marks introduction and conclusion to help non-technical viewers quickly go over the main thread of your project and its outcome. In addition, I would include a link from where the viewers could download the dataset to help them go over the data if they so chose to.
  1. Introduction and conclusion sections have been added with a stronger, more actionable conclusion.
  2. I’ve included a link to the DETE database, which is where this pull is from. Do you recommend I include a host for the csv export or should I show SQL code?
  • The Data exploration section is apt. It clearly gives details about the data. However the Data Clean-up section could use some more work. While you have given details on the columns you intend to clean, it would be best to provide context to why you have chosen those columns to clean and how they might affect your data analysis.
  1. More description and why points added
  • Of your two graphs the Most Popular Tags graph is really neat and seems to follow most of the Gestalt Principles. In contrast the Top 20 Tags by Usage & Views graph needs some work.
  1. I agree and wasn’t satisfied with the way this graph turned out either. Is it bad practice to include exploratory graphs that don’t provide rich information themselves, but lead to something better? I’d sooner remove this section since the results don’t end up in the conclusion. One way I could maybe take it is to create a stacked bar chart with all the metrics, but I think there would be too much noise.
  • You have shared your thoughts and analysis after each table and graph which is really good. However, I noticed that you ended the section Popularity Marker Overlap with the assumption that knowledge of the datascience domain is common place with the use of “With our knowledge of the datascience domain, we can see that…”. It may not be to the reader, therefore it is best to provide a link to some references that may help the user get a better understanding of the domain. You could also have a special section that gives an idea of the domain to the user.
  1. I added links to high level descriptions of those libraries and languages and a link to an overview of what Python is and what a Python library is.
  2. I’m unsure where I picked up the habit of using inclusive language, but I see it a lot in projects. Do you believe it’s fine to do so or should I be using passive language? i.e. don’t include: I, we, our, me, my, etc.
Coding Style

I only did a quick analysis of your coding and I feel your coding style is good.

  • I think you have done a good job with coding. The style seems to be consistent. All the comments are in place
  • You have not shirked away from using descriptive variable names which is really helpful

[details=“Bugs”]

  • I noticed that you have SettingwithCopyWarning warnings. I did run your code and I was able to remove it. I would recommend that you go over this blog post which should help you to remove the error. I know its a long read but its a worthy investment.
  1. I was able to get rid of the first one through appropriate use of .loc described in the article, but the second one was a mystery to me. I ended up just turning off the alert and turning it back on to avoid the flag. Would you mind sharing if you agree with this method or if there’s a clear reason why this happened? I’ve read that there can be false flagging.
  • This next one I’m not sure of, but in the 41st cell you have created a function for normalization. I believe that you are trying to standardize the columns but your formula is something that I’m not aware of. I believe the correct formula for standardizing is image. I go this from here . I could be wrong and I would recommend if someone else could comment.
  1. I haven’t gotten to z-scores yet, but here is where I got the function from from. This was taught leading up to the section. Google says min-max functions are a type of normalization, but I don’t understand how they are different from z-scores.

[details=“Miscellaneous”]

  • You have re-run the project from the beginning and that’s very helpful in numbering and identifying the cells correctly
  • It appears neat if you leave space above each comment so that the code and its associated comment can be segregated neatly.

[/details]

Click here to view the jupyter notebook file in a new tab

Appreciate you considering my feedback for you project. :grinning:
I’ve tried to give more feedback based on the changes you’ve made.

  • I think you introductions and conclusions are now more appropriate :+1:
  • I think an SQL code would be appropriate. That way the reader is at liberty to explore the source more.
  • As the old adage goes, a picture speaks a thousand words. So even exploratory graphs have something to say. If, however you feel that you cannot justify that graph, its best to keep it out. I also just noticed that in your graph you included the data point NaN which does not make sense for the graph, so I would recommend removing it if you plan to keep the graph as we are only focusing on posts that have tags.
  • The reason I felt the first graph could do more work is because it could be made more cleaner based on the Gestalt Principle like removing spines, increasing size of the title etc.
  • Besides adding the links, it would add more legitimacy if you could give your own understanding about those topics. That says you have some understanding about the topic and you are giving the reader references to get a better understanding of the topic.
  • Nothing wrong with having inclusive language because you are providing your opinion.
  • Ah good! I actually forgot about this formula from the Fuzzy Questions lesson. I will leave it to you to revisit this part after you have gone through the Statistics lessons to understand how z-scores are more reliable than using maximum and minimum values. You’ll appreciate the lesson even more.

Hope that helps!! :+1: