Ten Minutes with a World-Class Data-Scientist

In case it helps someone out… minutes of my call…

A mutual friend recently got me access to a thought-leader who’s in such big demand that he’s literally in a different country every week. Here’s how it went :

  1. You’ve done a lot of online courses. If you had to pick just five to recommend to someone, which would they be. The ones that are, in your opinion, the best return on investment

    i) Question formulation technique (QFT) from Right Thinking Institute in Boston - how to ask the right question…

    ii) Cambridge Advanced Leadership Programme

    iii) MIT : Inquisitive Data Science (couldn’t find this. Do I have it right?)

    iv) Coursera : Philosophy of Science

    v) Google’s Digital Marketing Certificate (how to use social media, etc)

  2. You mentioned in the … interview there are 500k datasets available on Americans. Could you mention a few that are worth getting comfortable analyzing through python.

    i) The US Population Census is a gold mine of data

    ii) Bureau of Labor Statistics datasets

  3. Do you have any side projects that you are not able to devote enough time to that I could maybe help out with so I gain experience?

    A : I am interested in the future of work - what kinds of jobs will gain in the future. How trends are shifting in the pre/post COVID landscape. How the job market has changed?

  4. Do you know anyone else who is trying to ramp up to get to your level that you would recommend I connect with?

    A : Abhishek Thakur is a 4x Kaggle GM - big in ML. (I obviously wasn’t clear enough :slight_smile:

  5. What are some mistakes you made or you see others making on their learning journey that cost time and effort?

    A : Biggest regret was not letting go of the stick-it-out do-not-give up mentality which caused me to stay much longer in toxic work-environments than I otherwise would have.

  6. You mentioned you do still spend time doing actual data analysis. How much time do you spend on data cleaning?

    A : 80% (comment : that’s interesting that your clients don’t take the time to clean the data ahead of time so they get more value from your time)

  7. What are some utilities you wish you had?

    A : It would be good to have utilities that :

    i) check if the dataset has been fabricated - if data follows a mathematical pattern and is not truly random or is random, but the nature of the distribution does not match the typical nature of the underlying phenomenon.

    ii) perform fact checking - or cross validation of key metrics derived from the data with other available sources. For example, retrieve from the population data that the average age of the population is 80, but flag that by checking with established data sources. Basically, establish the trustworthiness of the data.

Follow up :

Regarding the dataset fabrication-flagging and cross-checking, you might be interested in Kristin Sainani’s (Stanford) Webinar “How to Be a Statistical Data Detective” : https://www.youtube.com/watch?v=JG_gCIGFaQI. She mentions statcheck.io and GRIM as two tools to use to analyze published research. They operate on the publication and not the dataset.

12 Likes

@ananth.ch Thanks for a great share. Lots of valuable resources in there. :star_struck:

My pleasure. BTW, would be very interested in resources that let you measure your progress as a data-scientist - are there online testing tools, etc? I know there are Kaggle contests, but that’s an extreme. I’m thinking more along the lines of the kinds of python code snippets you should be able to write in your sleep to do essential tasks.

Wow! What a cool chat! :star_struck:

Thank you so much for taking notes on your call and sharing them with the community!! @ananth.ch :heart:

1 Like

Thanks for sharing, very helpful Information :).

I don’t know an online testing tool that is widely recognized to evaluate a data scientist’s general skill level. From what I read, it seems that the job requirements for the title data scientist can differ a lot both from company to company and from time to time.

If your goal is to land a job, a data scientist friend of mine suggested starting with A Collection of Data Science Take Home Challenge. This and my portfolio is what I planned to do after finishing the DQ course. Also, I came across this article a few months ago that I find very useful in the process of looking for a job later. The author laid out every step of the process of getting DS offers in two months.

I’d like to be more in touch with the data scientist community, but the learning part takes up so much time and energy I just don’t feel like going on social media and socialize anymore. :weary:

3 Likes

Thanks for sharing your output with the community:) I really appreciate that.

2 Likes

Thanks! Emma is clearly one in a million. She’ll rise to the top.

I got into it because I thought DS is a skill worth having. As a circuit designer, I was spending more time than I liked tabulating simulation results and looking for blemishes - analysis. So I eventually invested in scripts that generated Excel output with sheets showing the worst cases with links back to the data, even hiring freelancers on Upwork to build utilities that saved me a lot of time. I job as a Data Scientist would be nice if it involves working on interesting problems, rather than a job as a job :slight_smile:

2 Likes

Another good article from Towards Data Science is this one:
https://towardsdatascience.com/what-no-one-will-tell-you-about-data-science-job-applications-bff2d4b5e983

You could also try reaching out to people here https://www.datahelpers.org/ for advice/mentoring.