Google kills Site Reliability Engineering?

Is SRE gone forever?

Jul 24, 2023

On the “Blind” app I recently read a rumor that Google is transitioning away from having dedicated Site Reliability Engineering (SRE) teams. This came as a mild shock to me because Google is the company that introduced the SRE position to the world. The book is great, and you should read it if you haven’t.

As a quick reminder, Site Reliability Engineering approaches software operations as software engineering problems and seeks to solve operational problems using software. This is distinct from DevOps which is where the software engineering team for a particular service is additionally responsible for running that service in production. And as always, this is muddied by the fact that there are “DevOps” teams which would be better named “SRE” teams because they don’t actually do any software engineering for a service.

Anyways, getting back to Google. I still have not been able to find whether the rumor is true or not, but I suspect that given the current market conditions and their layoff earlier in the year that Google might be taking a deeper look at how they run their software.

Much of what Site Reliability Engineering originally set out to accomplish can now be achieved using various services, platforms, and practices. The container image has made packaging software standardized (mostly). Kubernetes has standardized container orchestration. CloudFormation/CDK/Terraform/etc. has made cloud infrastructure more manageable. There are a million managed observability (logging, tracing, metrics, alarms, etc) services to choose from as well. Even if you are running in a data center and use few managed services, you can still use open source tools like Kubernetes/ElasticSearch/Grafana and have what feels like modern managed platform (at least for software engineering teams).

I suspect Google (if the rumor is true) intends to shift the operations responsibilities fully to the software engineering teams. Many successful software companies already operate that way. I am speculating here, but I also suspect that Google may have identified they would get more value by having their SRE teams focused on producing cloud services in the observability realm.

I do not think that SRE (including misnamed DevOps teams) will go away in most of the industry. A typical “SRE/DevOps Engineer” is well-versed in the services/tools/practices I mentioned above whereas many software engineers prefer to solves problems primarily in their application codebase. However, many companies do tend to follow Google’s lead, so we’ll see.

Sheep Code

Discussion about this post