In case it helps someone out… minutes of my call…
A mutual friend recently got me access to a thought-leader who’s in such big demand that he’s literally in a different country every week. Here’s how it went :
You’ve done a lot of online courses. If you had to pick just five to recommend to someone, which would they be. The ones that are, in your opinion, the best return on investment
i) Question formulation technique (QFT) from Right Thinking Institute in Boston - how to ask the right question…
ii) Cambridge Advanced Leadership Programme
iii) MIT : Inquisitive Data Science (couldn’t find this. Do I have it right?)
iv) Coursera : Philosophy of Science
v) Google’s Digital Marketing Certificate (how to use social media, etc)
You mentioned in the … interview there are 500k datasets available on Americans. Could you mention a few that are worth getting comfortable analyzing through python.
i) The US Population Census is a gold mine of data
ii) Bureau of Labor Statistics datasets
Do you have any side projects that you are not able to devote enough time to that I could maybe help out with so I gain experience?
A : I am interested in the future of work - what kinds of jobs will gain in the future. How trends are shifting in the pre/post COVID landscape. How the job market has changed?
Do you know anyone else who is trying to ramp up to get to your level that you would recommend I connect with?
A : Abhishek Thakur is a 4x Kaggle GM - big in ML. (I obviously wasn’t clear enough
What are some mistakes you made or you see others making on their learning journey that cost time and effort?
A : Biggest regret was not letting go of the stick-it-out do-not-give up mentality which caused me to stay much longer in toxic work-environments than I otherwise would have.
You mentioned you do still spend time doing actual data analysis. How much time do you spend on data cleaning?
A : 80% (comment : that’s interesting that your clients don’t take the time to clean the data ahead of time so they get more value from your time)
What are some utilities you wish you had?
A : It would be good to have utilities that :
i) check if the dataset has been fabricated - if data follows a mathematical pattern and is not truly random or is random, but the nature of the distribution does not match the typical nature of the underlying phenomenon.
ii) perform fact checking - or cross validation of key metrics derived from the data with other available sources. For example, retrieve from the population data that the average age of the population is 80, but flag that by checking with established data sources. Basically, establish the trustworthiness of the data.
Follow up :
Regarding the dataset fabrication-flagging and cross-checking, you might be interested in Kristin Sainani’s (Stanford) Webinar “How to Be a Statistical Data Detective” : https://www.youtube.com/watch?v=JG_gCIGFaQI. She mentions statcheck.io and GRIM as two tools to use to analyze published research. They operate on the publication and not the dataset.