Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

Lessons from 4 weeks of Qwiklabs

Beginning 6 November, I registered for a Qwiklabs 1 month promo by Google Cloud and completed 140 labs on a range of topics across a whole gamut of Analytics and Machine Learning services that Google Cloud provides (eg. BigQuery, Cloud Storage, Cloud Functions, Google Kubernetes Engine, ML APIs, AI Platform, Looker).

This article aims to help others planning to learn through Qwiklabs to make the most of such opportunities (Next one being https://cloudonair.withgoogle.com/events/apac-best-of-next21).

https://images.unsplash.com/photo-1455577380025-4321f1e1dca7?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

What Qwiklabs is good for

Bite-sized introduction to workflows

Qwiklabs organizes its Labs into Quests, labs are also part of qwiklabs Courses, courses are part of Learning Paths. The catalog search function allows filtering by Solution and Level of difficulty to allow learners coming from different levels of experience to find exactly what they need.

Courses piece together video materials from Coursera specializations. They have pre-requisite structures (MLOps (Machine Learning Operations) Fundamentals comes before ML Pipelines on Google Cloud). The videos provide a gentler introduction to the topics compared to directly jumping into the lab which could be tough if your coding skills are not advanced enough to self-help from lab code and documentation.

Also the courses/specializations have lab solution videos where someone walks through the lab and explains how things work. It serves as a preview of what to focus on when you start the lab (I prefer watching the video before than after attempting the lab). Because these videos fast forward long running processes (GKE cluster startup, dataflow jobs), it gives an estimate of how much attention time is really needed purely for implementation, not counting researching documentation, understanding given code, and time executing long running processes. This lets you plan when to leave your computer for a break.

For infrastructure tasks, the labs provide the essential lines to run to achieve something. That cuts through the clutter and gives a good skeleton to start building more knowledge from relevant sections in documentation instead of diving into documentation blindly from the start.

Links to github code

My favourite feature so far is that you can read the lab lesson before starting it, and often there are links to github repos that you will clone when doing the lab. You can go directly to these repos and click around for more examples to broaden your learning.

Tip: Press (dot) on any github page to preview the repo in a vscode editor. For files you don’t have to edit, I prefer this to using cloud shell editor, because the “outline view” function there doesn’t always work, and the editor dies when lab time ends and your session gets shut down.

Up to 5 attempts to re-do lab

You may not be able to complete the lab on time if you’re inexperienced with the tools introduced. Fortunately, the monthly subscription promo allows 5 attempts (you have to redo everything though, so keep the parameterised commands in another file for convenient copy-pasting to get past the already completed parts).

Besides running out of time, you may also want to re-attempt if you learned something from another lab relevant to the current one and want to come back to apply some new ideas, new ways of implementing something, or new outputs/logs/metrics to observe that you missed on first attempt.

https://images.unsplash.com/photo-1439396874305-9a6ba25de6c6?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

What Qwiklabs lacks

Why this way

Qwiklabs usually only tells you what to do (run this command here, replace the parameters and run this notebook), but not why that tool is used, or what if you did it in another way.

The learner should actively complement the lab with design guidelines from documentation to understand why something is done, what are the alternatives and their pros/cons. The amount of learning you get out of the lab directly depends on how curious you are to ask why and tenacious you are to pore through docs.

Updated code

It is difficult to know as a beginner whether the code is updated or not,. Often I see deprecated command line options and tools used (subprocess.check_call() instead of subprocess.run()). There are also labs using Tensorflow 1 when Tensorflow 2 is out now.
However, these are small issues and do not detract from the main value of Qwiklabs, that is to teach the workflow.

https://images.unsplash.com/photo-1504194104404-433180773017?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

How to make good use of it

Before starting the lab

  1. Expand the Checkpoints panel on the top right to view the learning objectives. Read through instructions to identify possible long running processes and plan what time to do what.
  2. Open the relevant github repos (ctrl+f git clone) and get familiar with the folder structure. Identify sections of code where you need more time to understand/debug. Get exposed to some concepts so you know what to focus on during the lab and helps prevents missing out important parts and having to redo.

During the lab

  1. Look at relevant documentation of the command line options to understand what the options mean and what are the other unspecified arguments and their default values, and build a framework of how the resource is organized.
  2. Tweak BigQuery queries you are given and try to come up with a solution other than the suggested one so it doesn’t become another copy-paste-run-pass-forget loop.
  3. While there’s time left look at other jupyter notebooks in same repository. Some of these notebooks do not require additional hardware and shouldn’t get your lab shut down or account banned for using more resources than necessary for the lab.
    Through exploring beyond the instructed folder, I learned both CLI and python ways of interacting with BigQuery and cloud storage, and found the python way, which is broken down to more steps having more structure, easier to remember than the list of options that CLI takes.
  4. Open the relevant console pages to get a sense of what’s going on beneath the hood because the lab instructions won’t always specify. Very often, cloud storage buckets are created to store temporary objects before jobs are executed. VM instances have startup-script that are already setup for you and ran when you start lab.
  5. Note down statements in the instructions or code patterns that make no sense yet, you may understand them after going through more labs/docs in future
  6. For notebook labs, substitute in the necessary credentials, add autotime https://stackoverflow.com/a/66931419/8621823, then run-all (some steps take time so you can spend that time reading the instructions after triggering the runs). You can delete resources and run again if want to follow the creation process more slowly later.

After the lab

  1. Review the checkpoints again and try to recall how all the pieces fit together. For larger labs, it could be confusing which service is calling which service using which input files stored where, and returning what artifacts stored where for which downstream service to do what.
  2. Think back to how you interacted with the same cloud services in previous labs and reflect on what new tools you learned (eg. interactions using cli, python client calling rest api, console)
  3. At the end of the lab under “Finish your Quest” section, you can see which quests the lab comes under, and from those quests find other relevant labs. Also, under the “Next steps” section, look at what links could be helpful to bookmark.

https://images.unsplash.com/photo-1540940046315-20e42773d0d6?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

Some quirks to watch

  1. There can be mismatch between how the instructions want you to name variables and what the lab checker checks for, so read carefully.
  2. Sometimes commands directly copy pasted from the instructions don’t work. It could be that a file created from a previous step doesn’t exist yet. Trying running the command for the broken step again.
  3. For BigQuery labs, the suggested solution may not be fully loaded, so the SQL will have syntax error. Refresh the lab to get the full sql
  4. Answer checkers do not pass correct result. This may happen for some BigQuery labs that autogenerate parameters like dates for your queries. I restarted the lab on a new set of parameters using the exact same query and it passed on 2nd attempt.
  5. Randomly generated query parameters of BigQuery labs can cause instructions to make no sense. For example, in Create ML Models with BigQuery ML: Challenge Lab instructions ask to filter the Single Trip subscriber type. However, this type only exists in year 2019 (another lab parameter) and my lab generated 2020 as year so I couldn’t find Single Trip but eventually solved it using other similarly named subscriber types.
  6. There are wrong solutions in Insights from Data with BigQuery: Challenge Lab, Query 4: Fatality Ratio. The checker expects a query doing SUM(cumulative_confirmed) which “double” counts as many times as there are rows since cumulative is already a summed concept.
  7. Often labs require actions (eg. viewing public BigQuery datasets) that pop up new tabs. These new tabs can open up in a personal google account instead of staying in the randomly generated qwiklabs account. (i don’t like to use incognito because tab switching is faster than window switching, and incognito + non-incognito cannot be in same chrome window).
    The projects dropdown would also change to “Select a project”. Usually this is easy to spot and errors when you try to run something, but sometimes there’s no warning and you may be silently running huge queries on your own credit card.
  8. Some labs Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab will not pass with correct solution until gcloud auth login (not required in instructions) is done.

https://images.unsplash.com/photo-1521336575822-6da63fb45455?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

What I learned

Improved SQL skills

  1. Reading longer and deeply nested data ingestion and cleaning queries taught me to collapse .sql and read from inside out to manage that.
  2. Looking at the SQL generated by Looker behind the hood after some clicks on the front end strengthened certain SQL concepts like GROUP BY and PIVOT.
  3. Working with BigQuery which uses ARRAYS and STRUCTs strengthen the mental model of how SQL executes, and helps me move more freely across various levels of aggregation.

How SSH works

Most labs just ask the learner to use nano because that is the easiest way out, but that’s mistake prone and I wanted something better. I learn’t to setup the remote-ssh extension in vscode to work with VMs and explored how gcloud compute ssh [email protected]_name and ssh -i id.rsa [email protected]_ip behaves in both enable-oslogin modes.
If you’re not familiar enough with ssh, using cloud shell editor by clicking “open in new window” lets you see both the editor and terminal on same screen. The editor there is powerful enough to do debugging so you can observe the structure and all the returned values from the response of calling some ML API.
Jupyter lab notebooks that import peripheral python files can also be worked with through remote-ssh, just that you have to change your user directory to jupyter instead of the default student-xxx user generated by the lab.

Shell skills

My shell skills improved from the numerous examples of how scripts are written, and organized. This gets me thinking more broadly about when something should be written in python vs bash. I used https://www.shellscript.sh/ to get a foundation, but still trying to get better at all the quirks of shell expansion, quoting behaviour and regex escaping in grep vs egrep.

Kubernetes

I gained a deeper appreciation of how difficult it is to learn kubernetes. It is one thing to know how to write yaml definitions of Objects but a totally different skill to know how many replicas on what machine types to use. Pod scaling up/down can be effected through a cascade of actions (eg. Horizontal Pod Autoscaler waits 5 min before downscaling replicas, after which GKE autoscaler waits 10 min before removing nodes from nodepool) which complicates debugging.

https://images.unsplash.com/photo-1493713838217-28e23b41b798?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

What I still wish to improve

BI skills (Looker, DataPrep)

Most labs on these tools are about finding the right buttons to click instead of solving exploratory questions, which is a limitation of labs designed to be checked by automated checkers.

Data engineering (DataFlow, DataProc)

Some of these require understanding Java because there are no python samples. Also, Apache Beam and (py)spark are big data tools that take extra resources outside labs to learn.

https://images.unsplash.com/photo-1630359753833-985920943d8c?ixlib=rb-1.2.1&q=85&fm=jpg&crop=entropy&cs=srgb

Resources picked up along the way

Github Repositories

  1. https://github.com/GoogleCloudPlatform/training-data-analyst
  2. https://github.com/GoogleCloudPlatform/data-science-on-gcp
  3. https://github.com/GoogleCloudPlatform/mlops-on-gcp
  4. https://github.com/GoogleCloudPlatform/professional-services
  5. https://github.com/googlecodelabs

Miscellaneous

  1. Ecommerce Schema (to practise working with ARRAY/STRUCT):
    https://support.google.com/analytics/answer/3437719?hl=en

  2. Architecture Reference Patterns:
    https://cloud.google.com/architecture/reference-patterns/overview

  3. Speaker Series recordings:
    [NEW] Google Cloud Speaker Series - Perform Foundational Infrastructure Tasks in Google Cloud - 12.09.21.mp4 - Google Drive

Recommendations

Just go through qwiklabs if your coding skills are good enough to understand what the github repos are doing, and can self help through documentation and if you want to complete more labs in limited time (The course videos can be long winded).

Go through courses if you are completely new to the subject, they get you on board some concepts and components under the hood to understand why and what to watch for. There are labs as part of courses that can’t be searched as individual labs from the catalog, and have only 3 allowed attempts instead of 5.

Go through full Coursera specializations (may not be free) if coming as complete beginner who can’t understand SQL, shell scripts or python.

3 Likes