I’m re-visiting a report I wrote for one of my first independent advisory customers. I cringe at some of the structural parts of the report. I didn’t provide an overview of the customer’s business or restate why they engaged me. I’ve lost some of that fidelity over the five years. I want to critique some of my recommendations with my modern knowledge of infrastructure. So, let’s grade 5-year younger Keith’s work.
Overall Report Mechanics
First off, I love the technical recommendations and that ultimately the customer paid us. You have a commanding knowledge of the technology and a deep understanding of disaster recovery, application availability, and the impact of network design on a distributed system. You also thought out of the box for the period. You recognized the public cloud as a viable solution for a customer without the internal skill or resources to operate a warm site disaster recovery environment.
Keith, I’ll give you a C+. You are a management consultant, and you didn’t focus the executive summary on the business outcomes. Reading the report today, I have no context of why you are making the recommendations you make. You and your sponsor may have understood those outcomes. However, if your sponsor were to share the report to his CEO, context is missing. You would not accept this report as final today. You’d send it back to the team for additional background and a more well-rounded executive overview. I expect better from you then and now.
While you hired an editor, the editor was technical. Next time spend a little more money and hire an editor to help with the overall document flow and basic grammatical errors. Did you really miss “publically.” The spell check should have picked that up.
I still agree that memory represented the most significant risk to achieving a growing customer base by 50%. In a virtualized environment, the two common bottlenecks are memory and I/O. That current environment ran between 80% to 90% memory utilization. With such high rates, customers would have complained about poor application response time. You missed the benefit of improved application response time.
You did identify the highest risk of serving 50% more customers. I see the environment ran a Citrix delivered app for multiple tenants. CPU utilization remained low at 20% to 30%. Storage I/O looked fine. While the customer could benefit from an overall hardware refresh, increasing memory would also meet the performance needs while deferring the additional cost of new server infrastructure. No HCI for you 2015-Keith!
Oh, customers were running all of their VM traffic through physical firewalls back then. Today it may seem like a no-brainer to implement a software-defined network (SDN). But in 2015, it wasn’t as practiced. Your customer for this project didn’t have the staff to implement and maintain the VXLAN implementation you suggested. VMware NSX wasn’t yet mature. It was a cutting edge recommendation you would have struggled to help make real. Be grateful they didn’t bite on that recommendation. You should have gone the safe route and recommended virtual firewalls.
The physical network was 1Gbps? You sure they weren’t having I/O problems. If I remember correctly, they were running iSCSI for the storage network. Even by 2015 standard, 1Gbps seems relatively slow. Even our CTOAHI runs on 5-year Arista 10Gbps switches. No complaints about your recommendation here. They should have taken you up on the 10Gbps upgrade. I do now remember why you suggested VXLAN. You were a big Arista switch fan, and if I remember correctly, Arista’s VXLAN implementation was reliable.
Disaster Recovery and vCloud AIR
Keith! Your recommendation that the customer leverage vCloud Air was before you tried to use it! I had to reach back and pull up your takedown of the solution. Except for vCloud Air not being a great solution at the time, the recommendation was the right one on paper. The client had a Recovery Point Object (RPO) and Recovery Time Objective (RTO) that they could not meet. I like that you called out it would take a couple of hundred man-hours to recover the environment vs. the service-level agreement of 24-hours.
The customer lacked proper DR documentation, automation, and testing. You called each of these defects out and provided a remediation plan. Obviously, the modern solution is to replicate to the public cloud and maintain some minimum VMware public cloud footprint. Just note that capacity isn’t guaranteed. It’s a lesson learned from the pandemic.
You had the perfect customer in a CFO. It’s not many times a business leader calls a technology advisory firm. It would be best if you had approached it via the lens of a CFO. You realized too late. He moved on once you got your feet underneath you. We learned a great deal from this report. With all of your former peers from PwC, you didn’t ask for feedback? I think you were scared, and as much as you don’t want to admit, it had a little imposter syndrome. Leverage your network, you may have missed out on long term customer I’d have to this day.
Kudos to you on getting the work. It was $15K of side income. Don’t ever be shy about asking for help again. You don’t know everything. I don’t expect you to know everything. I do expect you to leverage your network.