Workarounds as Proof of Experience

Overcoming limitations is valuable experience

Sep 23, 2022

We used Google Cloud Platform (GCP) at one of my previous employers. I led our Cloud Center of Excellence team. Part of leading a team like that is keep up-to-date with all changes that your vendors make. This isn’t so hard to do because they tend to have handy release/new pages.

For instance, I visited the following pages every day:

In my current role, I have less need to visit these pages every day, but I still do every once in a while. Today, on GCP’s release notes page, I saw this change:

From the GCP release notes. Automation engineers rejoice!

These Cloud SQL updates GCP made stuck out to me because the previous functionality was a source of frustration for my team. Let me explain.

Previously, deleting a Cloud SQL Instance did not release the Instance name from whatever GCP’s backend name pool is. Why? I have no idea, but I suspect GCP might wisely keep your database around for a while after you “delete it” just in case one of their big customers has second thoughts1. Maybe GCP wants to be the hero when they get the following message from a customer:

“Hello Google, we didn’t actually mean to delete our production user database that we forgot to take backups on. Is there anything you can do? Also, our spend is $5 million a month with you”.

So if this is wise of GCP then why was it frustrating for my team?

Consider the case where you are a cloud engineer. Part of your job is developing automation and testing that it works. This might involve running and re-running Terraform commands such as plan, deploy, and destroy. You might run this on the following snippet to create a database instance called receipts:

resource "google_sql_database_instance" "main" {
  name             = "receipts"
  database_version = "POSTGRES_14"
  region           = "us-central1"

  settings {
    tier = "db-f1-micro"
  }
}

Straightforward and not so bad! However, once you destroy it and run it again, you’d get something like this message2 after running Terraform plan/apply:

Error: database name "receipts" already exists

Do you see our source of frustration now? Once you use a name, it sticks around for a while after being deleted which means you have to pick a new name each time you fully test the automation! To workaround this frustration, we’d do stuff like this:

resource "random_id" "name" {
  byte_length = 8
}

resource "google_sql_database_instance" "main" {
  name             = "receipts-${random_id.name.dec}"
  database_version = "POSTGRES_14"
  region           = "us-central1"

  settings {
    tier = "db-f1-micro"
  }
}

This makes the code only slightly more complicated and your database instance names won’t be as clean. Like, I would rather look at receipts in the console than receipts-37563856!

But now GCP has fixed this…two years late to make my life simpler, but it’s good to see they fixed it.

So why write about this frustration of mine? I hope you’ll keep reading…

One of major differences between an experienced engineer and someone new to the field is that the experienced engineer will have gone through problems and frustrations like the one above. And of course they find workarounds to those problems. You really can’t easily learn these things without experiencing them for yourself because they are rarely present in the official documentation or training materials. It is often simple to Google the answers to these problems but not always. This is why real world experience is better than certifications and training.

I’ll also add that finding a workaround to a problem will help you solve similar problems in the future or avoid them altogether. From our example, the “workaround lesson” is that uses suffixes in your cloud resource names can help you avoid problems!

Anyways, your troubles will vary based on whatever technology you use. Here are some from my experience that you might find fun or had the misfortune of experiencing yourself:

Kubernetes - if you’ve used Kubernetes, then you’ve almost definitely been burned by the now deprecated PodSecurityPolicy. This monster caused us SO many headaches at my roles where we used Kubernetes.
Terraform - ever tried looping over a provider?
AWS CDK - Stack Exports for inter-stack references are magic until you have to remove one.

When interviewing an engineer for a cloud engineering role, one of my favorite questions is “what frustrations have you had with <insert-tool-here>”? If an engineer has no frustrations with a tool, then they probably don’t have much experience with it. It’s also feels pretty validating when I hear people have had the same problems I have had but maybe that’s selfish.

DO NOT DELETE YOUR IMPORTANT DATABASES TO TEST THIS OUT. THIS IS SPECULATION ON MY PART.

I made up this message. Now that GCP has changed the functionality, I can’t reproduce it. I hope you’ll forgive me.

Sheep Code

Discussion about this post