Guided Project (201): Star Wars Survey

Hi all,

here is my take on the solution.

This was an interesting project. I also learned some neat tricks while checking some other solutions, like building navigation links.

While doing it I came across some indications that people who haven’t watched a particular episode should not have been allowed to rank them. It’s possible that this aspect of the survey was flawed unless there is something I am missing.

Link to DataQuest last screen

Basics.ipynb on GitHub

Basics.ipynb (873.0 KB)


Hi Ivan,

Thank you for sharing your project with the Community!

Your data analysis is very thorough and profound, the code is clean and well-commented. Also, it was a great approach to combine both header rows into one and introduce the necessary substitutions in the resulting column names. Another interesting idea was to find all the odds in the dataset and put them in the appendix, these things look really curious for further consideration and fixing!

Here are some suggestions from my side, which hopefully will be helpful to you:

  • The navigation links, while being in general a good idea to add in the project, don’t open (at least in GitHub). This should be fixed.
  • It’s better to use the bold font pointedly, only for emphasizing some important details in the project.
  • It’s recommended to re-run the whole project when it’s already completed, to have all the code cells in order and starting from 1.
  • A good practice is to use “we” instead of “I” throughout the project (as I read in some guidelines and then saw in many projects).
  • Visualizations. It’s better to add a title to each plot (and make its font big enough for better readability), despine the pots, and for some of them change labels to something shorter (or alternatively, consider horizontal bar plots for such cases).
  • Please add a conclusion section, better before the appendix.
  • The code cells [75] and [83]: here probably some automation can be applied, using a for-loop.
  • The code cells [76] and [79]: better to check here only the just converted columns.
  • The code cells [89]-[93]. In this part of the project, I would emphasize that the rating 1 is actually the best one, while 6 - the worst. Otherwise mentioning “negative correlation” between the rating and the number of viewers looks scaring and confusing :blush:
  • The code cell [96]. In this way, you’re losing the categories of people between 30 and 60 y.o. (also for the further analysis), as well as people with the income between 25 and 150k. They represent quite big middle categories, which should be kept and, probably, even further subdivided for subsequent data analysis.
  • Just a curiosity, what “201” means in the project title? Is it the year of the survey occasionally truncated?

That’s all about my observations. In general, your project looks very nice and solid, and the insights are interesting and sometimes even unexpected.

Good luck with your further projects and happy holidays! :christmas_tree:

1 Like

Hi Elena,

thanks so much for the very thorough feedback. After reading it I went back and made changes.

Updated GitHub link

Basics_feedback_v01.ipynb (1.1 MB)

I am adding some comments below.

  • markdown links that I learned about recently and work wonderfully on project viewer unfortunately do not work in github. I googled this topic and only came across open issues with it. This is because github renders .ipynb files in a different way.
  • use of bold font - I think I agree. I went through my solution and made some changes.
  • making sure cells start with 1 - great tip
  • use of “We” vs “I” - I tend to default to “we” most of the time, however, could “We” imply that this is a team project, especially when shared on github and other ways? A lot of code shared this way is created by teams. If my goal is to showcase my skill to a potential employer, would it be permissible to use “I” instead?
  • visualizations - I’ll admit my plots were rather lazy, so I made changes like you suggested
  • as I stated in the solution, I did omit some ways to slice the data, including some age brackets, My goal here specifically was to be selective and compare the most distant age brackets for brevity. I added additional notes making this clearer.
  • 201 in the title refers to the DataQuest mission number. I changed it to make it clearer.

    Click here to view the jupyter notebook file in a new tab
1 Like

Hi Ivan,

Great, I’m happy that my suggestions were useful!

As for the markdown links in GitHub, yes, it seems it renders them differently. I had a similar problem with the pictures which rendered perfectly in Jupyter, but were not shown at all in GitHub and, correspondingly, in nbviewer. I discovered it already after sharing one of my guided projects, it was quite frustrating. After a lot of unsuccessful googling (exactly as you told, I found only open issues there), I asked the guys here in the Community, and luckily, they advised me some solutions, which looked not obvious at all. So, for your future projects, be careful also with the pictures in markdown cells and check the project before sharing. And I am now curious to check if the links in my own projects work, or I have the same problem.

As for this “we”, personally, I don’t like this convention. Why on earth should I refer to some mythic “we” if I did this project myself? :sweat_smile: I’d rather prefer using passive voice, but in those guidelines it was clearly written to avoid also this. Well, judging by the other people’s projects here, everebody really uses this strange “we” following the guidelines, and so do I.