Hey DQ Community,
This project is about discovering benchmark metrics for introducing an educational app to the market, from both the Google Play Store and Apple App Store datasets. (DQ staff might be interested in this, too, haha)
I am open to any critical feedback on this project. Please let me know how it reads, whether you get bored and where you got bored/lost interest. (I’m very interested in knowing that)
I also have a very basic question from using Seaborn: how do you get rid of unwanted text output before a graph? (They just pop up in the output)
For example, a lot of my graphs have “caveats in the documentation” alerts that I want to delete before publishing somehow. I got the graphs with the data I wanted.
I’m not sure if it’s something like a setting I can turn off. (??)
Anyone with more experience in Seaborn would probably be able to point me in the right direction.
I’m sure others will face this issue, too.
Thanks bunches! ~Jacqueline
DataVisualizationProject_EducationalApps.ipynb (3.9 MB)
Click here to view the jupyter notebook file in a new tab
P.S. I came back to this project and just realized that Release Year would not coordinate with the price, but the Scraped Time’s year…oh mami…
Hello @jacqisbackyessheis! Thanks for sharing your project with the Community! I see that you’ve done an extensive analysis of these two data sets and write great conclusions. Always your every step is supported by your reasoning:)
The problem of warning is that you get a
pandas. You should be aware of this warning and do not ignore it because it may cause troubles to you data set and you may risk working with wrong data. You can read more about this warning and possible solutions here.
And a few suggestions:
- Make sure that you are not using the bold blue text in the introduction. The project’s style should be consistent!
- Correctly format the links to the data sets. I guess you’ve added a space between the square and round brackets that results in incorrect markdown formatting
- It’s better to import all packages in the first code cell so the reader can have an idea of what’s used to run the project
- In the first section where you describe the columns, it’s better to create a bullet/numbered list to tell us what are these columns. It’ll improve readability:) What do you think?
- You can also format columns’ names as
- When you delete the columns you can say that they are not relevant to answer the questions.
- In the plot
Google vs. Apple Educational Apps as % Total it’s not clear which pie chart refers to the Google or Apple dataset. The
Google text is outside both of them
- I guess, it’s better to describe the plot with text below each plot rather than explain what’s on it beforehand. A good plot should talk about the data autonomously. The text is anyway necessary but just to confirm what the reader has already seen on the plot:)
- It’s better to increase the size of labels and titles of all your plots to increase readability
- You should be able to use the datetime module to easily extract different part of a date
- It’s a good idea to make the key points of your project bold. For example, you can make this phrase bold: higher prices are offered in February-March and/or September-October
- I think, it’s better to get rid of the confidence interval in the line plots because they are very difficult to read
The Covid years on the Google Play Store reveal a sharp peak in prices during March 2021, otherwise prices were fairly stable throughout 2020. - that’s a very high increase indeed. Have you investigated what happened that month?
- You have some typos
- Color only the key bars (with the same color) in the
Google Play Store's Educational App Ratings (Paid & Free Apps) plot and make all the rbars gray. They should catch the reader’s attention and increase the data-ink ratio. Do the same for the Apple data set
- If you just want to demonstrate the rating ranges at different prices, you should remove the grouping by year as it only decreases readability of the plots
Apps priced between 3.99 and 4.99 appear to have higher concentrated ratings between 3 and 5, however it remains undetermined whether this is because those prices occur less frequently. - you repeat this phrase twice
- In cells
 use the
print statements instead of comments
Google Educational Apps: Price vs. User Rating Per Year plot, the main title overlaps with the title of the top two plots
- Anyway, I don’t know what the usefulness of the plots
Apple Educational Apps: Price vs. User Rating Per Year and
Google Educational Apps: Price vs. User Rating Per Year
That’s it for me! Happy coding and good luck with your next projects @jacqisbackyessheis
Dear Artur Sannikov,
Thank you so very much for your thoughtful, thorough and comprehensive feedback. I appreciate the time you put into this, and I will be going through each bullet point line by line to make the changes.
Everything you comment is very useful and makes sense to me. And I am both inspired and humbled by the attention you put on my project. Thank you so much!