Sample Representativeness


I am doing Finding the Best Markets to Advertise In GP and am a bit stuck in what DQ means by “Sample representativeness”. I think that this means that our data set correctly describes the target population (it’s not biased because some group is overrepresented or underrepresented). However, in this case, we should conclude that the sample is representative enough because the proportion of Mobile and Web developers is over 80%. Why?

Hi Artur. I’m facing the same problem in this first phase of the project.
Indeed, the column “JobRoleInterest” has more than 60% of missing values, i.e. people who didn’t declare any job role interest.
How can we conclude that it is representative just because 86% of those who declared an interest are in the web/modile domains?
60% of missing values means that anything can potentially happen…and nobody is considering the issue, even in the solution notebook…

I tried to find some further representativness-related info in the Employment related columns, but nothing emerged…

It’s not the first time that i don’t agree with some of the taken approaches, so I’d like to discuss this point.


1 Like

I think I’ll move on with the analysis considering only those 6992 of 18175 respondents who declared a job role interest.

I also think part of the problem is the fuziness of the description of our fictious company: the instructions say that we produce courses on web, mobile, data science, games etc..
That “etc” makes it impossible to really define which people are we interested in.
I suppose the instructions were written by considering the structure on the dataset, but I think sample representativeness is such a serious matter that more clarity would be important :).

Maybe i’m overthinking, but being a project about fundamental stastistics concepts I’d like to make a sound analysis.

Hey Artur, just read the 4th screen of the instructions. There you can find info that would have been probably more sensible in the previous screen (the one about representativeness):

“To make sure you’re working with a representative sample, drop all the rows where participants didn’t answer what role they are interested in.
Where a participant didn’t respond, we can’t know for sure what their interests are, so it’s better if we leave out this category of participants.”

Hi @fab.pellegr! Thanks for your answers:) I had already dropped all rows with no answer but I’m still not sure what representativeness means in this case. I just suppose it’s also somehow connected to the number of different roles that people are interested in. For example, if we have 10 web developers, and 6000 mobile developers, I wouldn’t say that this sample is representative because it does not represent our target population.

1 Like

I think representativeness is treated in a somehow “relaxed” way in this project, something like “let’s see if it makes sense to consider all of the respondents to the survey, or just a part of them, knowing that our courses are mainly on web and mobile development” ;).

1 Like