I feel like I’m getting a hang of these guided projects now. I used to spend a lot more time wrapping up the projects. Now I feel like I’m anticipating how I’ll use various parts of the analysis and making decisions about how to present them as I go. This reduces the amount of rework and editing I have to do to wrap up the project.
After investigating the distribution of spending I decided to create some custom bins for analyzing the distribution of spending, rather than repeatedly scrubbing out outliers. Rather than worrying as much about the mean and median, I looked at how many people seemed like they’d be willing to spend $59/month on our (hypothetical) offering.
I also did a preliminary analysis of various channels for advertising and sponsorships to show the (hypothetical) marketing team how we might be able to help them figure out the details of advertising campaigns.
I’d appreciate feedback on my presentation of the analysis. Was it easy to follow? What parts were challenging?
From where do you guys get the big brain, to analyze a dataset like this ! Amazing work start to end
You know that annoying journalist, who calls you and praises you on their show first thing, but then the first question they throw at you is quiet controversial ?! Yup that’s me
So here comes my question - I quite didn’t understand the numbers you got for Reach Analysis.
What exactly are we comparing there. What I understand is they represent %ages but, how to really read them.
For instance, Resourceany column has 99, 99, 99, 100%. What exactly accounts 100 for USA?
Now, that I have raised my question, recall that when you give a befitting reply to the journalist and they are now closing the show, they pretend they are still best friends with you but still try to make a closing remarks which are actually directed “for your improvement and benefit”, here comes that…
Thanks for sharing this project with the community, and teaching me a new slang “boffins”. Also try to add these 2 lines to your import library code block, re-run the entire workbook, and observe the plots again.
import seaborn as sns
sns.set(style = "whitegrid", font_scale = 1.3) # any number higher than 1 but lower than 1.5 (or you can experiment with this) & whitegrid is optional too.
I see now that I should have added more context for the reach analysis.
The percentages represent the portion of survey respondents in each country who indicated that they used one or more resources in each category.
As for the small differences in the % of people reporting the use of web resources (like Free Code Camp, or Stack Overflow) I can’t exactly account for the differences. I suspect that it’s user error and/or lazyness. These are people, after all, who answered an online survey, on learning programming. It seems unlikely that they’d never use a website in their learning pursuits. In any case, I’d regard the small difference as measurement noise.
I will try the chart formatting tip. I’ve been reading a bit about creating my own chart styles, but I’m not there, yet.
I’m sorry it’s not clear. I’m updating the notebook. Perhaps it would help if I explained the underlying data better (as I should have)?
The survey asked people whether they’d engaged with a number of specific coding events, podcasts, web sites and YouTube channels. For each of those categories, I created a summary column. The column was marked True if they’d answered in the affirmative for one or more of the individual questions in that category, otherwise it was marked false. I then aggregated each of these summary columns by country to get the percentage.
So, the 100% for the USA in the ResourceAny column (which I’ve renamed in my updated draft) means that 100.0% of respondents in the USA indicated that they’d used one or more websites (like StackOverflow) in their learning pursuits.
Does that help?
The guidance was to calculate the mean amount respondents spent per month on learning to program. This was very sensitive to outliers and the fact that significant numbers of people answered that they spent nothing. After one round of scrubbing out outliers and still being left with rather vague numbers I decided to take an alternative approach. I categorized people’s mostly spending and then looked at the number who were spending enough to fit our offering into their budget. I used this to conclude that India was a better market to advertise in than Canada or the UK, despite large differences in per-capita GDP.
The Reach analysis looked at a different question, a question that wasn’t asked in the assignment, but I judged as being useful and that could be addressed by the data. I realized that I could help guide where marketing chose to advertise in order to reach the largest group of potential customers in each market. I identified advertising on programming-related websites as the best single channel, over advertising or sponsorship of YouTube channels, Podcasts, or coding-related events. The dataset actually includes finer detail about specific websites (and events, podcasts and YouTube channels), but this analysis offers a starting point for further discussion with our hypothetical marketing team.