Using Machine Language to Improve AWS Operations

By Keith TownsendPublished On: January 17, 2023

You’ve used Lambda, S3, and RDS to build a successful application on AWS. However, as your application grows and becomes more successful, you may start to run into problems related to load and scaling. Is the problem with an underperforming service? Are you in the wrong AWS region? Is your application design not up to the task? These issues can be challenging to navigate and may leave you wondering what your next move.

In this video, I discussed some solutions that developers and developer groups can consider when facing these types of problems. One potential solution that I suggest is hiring a Site Reliability Engineer (SRE). An SRE is a specialized engineer responsible for ensuring that an application is highly available, reliable, and performing well. They work closely with developers and operations teams to identify and resolve issues related to load and scaling. Google popularized the idea of the SRE and you can find substantial research on the practice.

What if you could apply AWS’ years of developing and troubleshooting applications against your application infrastructure and operations? What if an AWS SRE could look at your Cloudwatch data alert and direct your developers to troublespots in code or your AWS VPC? How much would you pay for that expertise? While that level of professional services may be cost-prohibitive, AWS offers an alternative in DevOpsGuru.

DevOpsGuru is an ML-driven assistant that automates many of the tasks an SRE undertakes. DevOpsGuru scans AWS logs for known inefficiencies and other issues. It prioritizes and alerts based on the findings. I haven’t used the service. However, I have to imagine, like most AI, it’s an augmentation of human capability vs. a replacement. In the case of a small operations team, DevOpsGuru may mitigate the need to hire a full-time SRE. It doesn’t forgive an operations team from doing much of the hygiene required to ensure smooth operations.

Conclusion

DevOpsGuru isn’t the only Day 2 product we’ve covered during our AWS Everyday series. In the series, we take a fresh look at each of AWS’ 238 products. CodeGuru is ML-based code review, and AWS Detective is ML-based security detection. How useful are these tools? As an anecdotal example, I asked ChatGPT to write an initial version of this blog post based on the transcript of my video on AWS DevOpsGuru. Approximately 15% of the suggested text survived my edits. However, the ML tool pointed me in a direction, and I ran with it. I’d approach these operations tools with the same expected outcome.

Keith Townsend

Keith Townsend is a seasoned technology leader and Chief Technology Advisor at Futurum Group, specializing in IT infrastructure, cloud technologies, and AI. With expertise spanning cloud, virtualization, networking, and storage, Keith has been a trusted partner in transforming IT operations across industries, including pharmaceuticals, manufacturing, government, software, and financial services.

Keith’s career highlights include leading global initiatives to consolidate multiple data centers, unify disparate IT operations, and modernize mission-critical platforms for “three-letter” federal agencies. His ability to align complex technology solutions with business objectives has made him a sought-after advisor for organizations navigating digital transformation.

A recognized voice in the industry, Keith combines his deep infrastructure knowledge with AI expertise to help enterprises integrate machine learning and AI-driven solutions into their IT strategies. His leadership has extended to designing scalable architectures that support advanced analytics and automation, empowering businesses to unlock new efficiencies and capabilities.

Whether guiding data center modernization, deploying AI solutions, or advising on cloud strategies, Keith brings a unique blend of technical depth and strategic insight to every project.