Using Machine Language to Improve AWS Operations

By Published On: January 17, 2023

You’ve used Lambda, S3, and RDS to build a successful application on AWS. However, as your application grows and becomes more successful, you may start to run into problems related to load and scaling. Is the problem with an underperforming service? Are you in the wrong AWS region? Is your application design not up to the task? These issues can be challenging to navigate and may leave you wondering what your next move.

In this video, I discussed some solutions that developers and developer groups can consider when facing these types of problems. One potential solution that I suggest is hiring a Site Reliability Engineer (SRE). An SRE is a specialized engineer responsible for ensuring that an application is highly available, reliable, and performing well. They work closely with developers and operations teams to identify and resolve issues related to load and scaling. Google popularized the idea of the SRE and you can find substantial research on the practice.

What if you could apply AWS’ years of developing and troubleshooting applications against your application infrastructure and operations? What if an AWS SRE could look at your Cloudwatch data alert and direct your developers to troublespots in code or your AWS VPC? How much would you pay for that expertise? While that level of professional services may be cost-prohibitive, AWS offers an alternative in DevOpsGuru.

DevOpsGuru is an ML-driven assistant that automates many of the tasks an SRE undertakes. DevOpsGuru scans AWS logs for known inefficiencies and other issues. It prioritizes and alerts based on the findings. I haven’t used the service. However, I have to imagine, like most AI, it’s an augmentation of human capability vs. a replacement. In the case of a small operations team, DevOpsGuru may mitigate the need to hire a full-time SRE. It doesn’t forgive an operations team from doing much of the hygiene required to ensure smooth operations.


DevOpsGuru isn’t the only Day 2 product we’ve covered during our AWS Everyday series. In the series, we take a fresh look at each of AWS’ 238 products. CodeGuru is ML-based code review, and AWS Detective is ML-based security detection. How useful are these tools? As an anecdotal example, I asked ChatGPT to write an initial version of this blog post based on the transcript of my video on AWS DevOpsGuru. Approximately 15% of the suggested text survived my edits. However, the ML tool pointed me in a direction, and I ran with it. I’d approach these operations tools with the same expected outcome.

Share This Story, Choose Your Platform!