I did half a year of insurance claim classification which helped with classification metrics questions, but all using open source and never touched GCP except copy pasting commands from fastai trying to set up a deep learning VM instance. Also done 1 week of image classification using
train_datagen.flow_from_directory and tf.data API with TFRecords. Experience from the last one was helpful in the exams.
On Day 1 i had totally no idea what 53/81 bullet points in the exam guide mean’t, or how to achieve those points. After studying https://developers.google.com/machine-learning/crash-course, i also realized some of the 28/81 which i thought i knew, was not what it’s supposed to be.
I don’t think having a ton of ML knowledge is necessary for a few reasons.
- Exam has very little implementation/ debugging question, mostly GCP tool selection and solution architecting (sometimes open source given as option but usually GCP tool wins for serverless scalability). I would definitely have not attempted with 20 days study if implementation is required.
- Even if someone done something in the past before (eg. handling imbalanced data), he may not have done it in the google suggested way. Yes it’s not as objective and there are indeed google recommended practices to memorize.
- A good portion of the exam is on GCP specific tools, commands, workflows. If someone does not study GCP, he won’t know whats possible, or how are development, test, deploy, monitor workflows done using GCP tools. Knowing how to do it outside GCP does not mean it’s the correct answer. Often on-prem tools or doing it locally is wrong.
- It is not in Google’s favour to make exams incredibly hard. People who have enough experience would not need the certificate to prove anything. Making it too hard discourages people from studying for exam, which means less GCP users, less exam fees earned, less open source companies employing these test takers and switching to GCP on a company level.
There are some arguments supporting the benefit of previous experience:
- Dataflow is based on Apache Beam, Cloud Composer on Airflow, AI Platform pipelines on Kubeflow, so if you already used the open source version, you can go through code in tutorials faster, and know why some tools are overkill and obviously the wrong choice compared to another tool in the multiple-choice. But remember again, implementation is rarely tested. What’s more important is knowing what GCP specific source and sinks are available for Dataflow, and how a GCP pipeline allows for certain workflows/shortcuts, which may not have been possible with open source tools.
- People who read/experience more can better distinguish which business metric to apply for what situation, or what ML problem can be framed from given features and vague requirements. However, there is only very basic ML, technical jargon required before common sense can take over.
- People who read/experience more will know more ways to do something, or more ways something can go wrong and it’s negative impact, and use that knowledge to be able to identify and infer what when wrong when presented a scenario and what steps to take to fix it.(eg. Data leakage, bad train-test-split, training-serving skew, underfitting). However, knowing solutions is not enough, because you must also know what to try first, and here comes again google recommended practices to study.
Basically, i see more memorization of best practice, familiarity with tool selection for a set of constraints/requirements/situations than reasoning from experience, so i downweigh the importance of experience before attempting the exam.
I’m sure you won’t be a newbie after following their recommended practice path, given that you have more time to really understand the implementation as well rather than just focusing on API connections as i did. By the time the final exam version comes out (maybe in 1 to a few months) you’ll be an expert already. I did it because i wanted to stop being a cloud newbie, and to expose myself to more data engineering and CI/CD. I believe engineering is the pre-requisite to any science and people should have the ability to write good code before writing fast code.