High Dataquest friends,
This is my first guided project in the Data science track. It would be great to receive feedback and comments on the following for on-going improvement and moving towards professional level in this track:
- Usage of Markdowns
- Overall flow / story telling
- Data cleaning approach
- Analysis and critical reasoning behind my recommendation
My Jupyter notebook file below:
What can I say about your project?! This is one of the most detailed and original Profitable App Project I have personally seen on DQ community. I feel that I can only review it if I have done a better job than you. Since that is not the case, I will just stick to the things I learned and enjoyed from this project.
I have always seen (including in my case) people removing the whole row with missing data. It is for the first time I’m seeing someone finds what is missing and then finds out the missing data and inserting it at the right place. That was nice. But I just wish, you’d have included the step how you found the row with missing value.
The whole project was easy to read. It also gave a lot of explanations and insights. Overall it is a great work.
Usage of markdown is awesome. Loved the headings, subheadings and the structure. Smart usage of preformatted words. Hence the flow is also good. I think I have already mentioned about the data cleaning approach. It was great as well. By this time you have created quite a lot of expectations. So I was very curious to find your conclusion. But when I got to the conclusion, I felt like it was being rushed and slightly inspired by DQ instructions. I’m pretty sure you’d have come up with a more groundbreaking conclusion with a bit more time. Anyway it is really amazing project. Looking forward to going through more of your future projects.
Thanks for the feedback and motivational comments.
Re: missing category data and fixing it - I did two things and probably it would be better to amend that in the commentary in the notebook.
- I read the dataset discussion in Kaggle and DQ commentary about that row
- Then I used Excel pivot to see distinct categories and this stuck like a sore thumb.
I could use frequency table / dictionary as well but did not think and had excel opened to see few things.
Prior to this when looking at the dataset being a few years old, I started searching for few apps in the current App and PlayStore to compare. So I did the same for this one too in PlayStore.
On an another note: I see that now PlayStore have cleaned up their genre/category as its lot better than what is in the dataset few year back.
Re: Conclusion section - You are right that I had rushed it a bit. Pure curiosity of finding more and more information vs not progressing in the course as much as I had hoped led to me to arriving purely from the AppStore and super imposing that on the PlayStore analysis and felt the finish a bit rushed.
Thanks again for the overall feedback - More of the good things to note and continue, few to note and remind myself in the next project to do better.
Glad to know that you found the review to be of use.
I completely understand this feeling. It happened to me even in the latest guided project.
Thanks for explaining the missing details here in the reply.
Very methodical and impressive shas.
First of all, great job!! I have enjoyed following your project and taking it as an inspiration to go one step further.
Digging into the family category I have found your solution a little complicated and I have come up with another much simpler and I would like to know your opinion. I have taken advantage of the functions defined during the project to have less code.
- Create a dataset with the apps od the family category.
- Categorize between educational and fun as you have done.
for row in android_eng_free:
if row == 'FAMILY':
educational = 0
for key in family_dataset:
if 'Education' in key:
educational_sum = round(educational/len(family_dataset)*100,2)
fun_sum = 100 - educational_sum
print('education apps: ', educational_sum)
print('fun apps: ', fun_sum)
I also would like your opinion about this function to find missing values as @jithins123 asked :
for row in dataset[1:]:
for a in row:
if a == '' or a == 'Nan':
Thanks for sharing your project!!!
Thanks for expanding on the project.
I like taking all the educational tags from the list of genres vs remaining being the fun - Definitely much clear way of separating the FAMILY category - I will keep note of this and take this to my Jupyter notebook at the FAMILY category analysis section as I am hoping to revisit my earlier projects as I hit certain milestones in my learning journey.
Re: The missing value - I think I got a bit complacent knowing that I found that using Excel pivots.
But a general question on CSV data sets: Would data scientists still analyse those first parse analysis using Python or quick and easy using excel (I have no experience from a data scientist perspective as I come from Database SQL developer perspective)?
It’s great to know that DQ as a community expand and share constructive feedback on each other’s project.
Your analysis absolutely blew my mind , thank you for sharing your work, it’s inspiring and motivating .