When Small Isn’t Simple: Lessons from Deploying Granite 13B on IBM Cloud
I recently spent a few days experimenting with WatsonX and Granite 13B on IBM Cloud. The cost? $69—easily absorbed by the promotional credit.
But the real takeaway wasn’t the billing.
It was the architectural shift required to work effectively with smaller LLMs.
Small Models Aren’t Just Smaller
Smaller models are often framed as more efficient or “lighter” alternatives to foundation models. But that framing obscures a critical truth: they’re not just scaled-down versions of large models—they operate under different constraints, and those constraints have real architectural implications.
The most obvious constraint is the smaller context window. That forces a rethink in how you handle everything from prompts to memory to application design.
(i.e., less space to hold conversation history or detailed reference documents)
This is why the enterprise focus on RAG (Retrieval-Augmented Generation) isn’t going away. A small model’s limited working memory requires external systems to deliver timely, domain-specific knowledge. The model alone isn’t enough—you have to build the rest of the scaffolding around it.
Prompt Engineering is Back (and Never Really Left)
The most surprising element of the Granite 13B experience was how much I had to return to explicit prompt engineering.
It felt like working with ChatGPT 3.5 all over again—before the reasoning-heavy models made prompt sloppiness feel acceptable.
We’ve grown accustomed to large models that can infer our intent from loosely structured input. But smaller models don’t carry that same cognitive slack. With limited parameters and smaller context windows, they force you to be intentional.
You can get solid outputs—but you have to earn them.
Every token counts. Prompt structure matters. You’re not just writing prompts; you’re designing interactions.
Architectural Shifts: It’s Not Just About Cost
When evaluating smaller models, most conversations center on cost, latency, or control. But none of those benefits come for free.
Smaller models change how your application architecture must work.
You’re not just swapping out a model—you’re rethinking the design space around it.
You have to make decisions about:
- How you compress and structure prompts
- How to use system prompts and chaining more deliberately
- How you manage memory over time—especially for chat-based apps
- What tooling (RAG, vector search, orchestration) is required to close gaps in model capability
This whole exercise is a reminder of a core principle:
Simplicity is a strategy.
When you choose a smaller, specialized model—whether for cost, privacy, or deployment flexibility—you are sacrificing the simplicity that comes with a general-purpose foundation model.
And that means architects must design that simplicity back into the platform.
Final Thought
Smaller LLMs like Granite 13B can absolutely deliver value. But you can’t approach them as plug-and-play replacements for larger models.
You have to treat them as a different class of model—with different affordances and very different constraints.
And that demands architecture, not just engineering.
If you’ve gone down this path—evaluating open-weight or efficient LLMs for production use—I’d love to hear what tradeoffs you’ve had to navigate. Drop a comment or email.
Share This Story, Choose Your Platform!

Keith Townsend is a seasoned technology leader and Founder of The Advisor Bench, specializing in IT infrastructure, cloud technologies, and AI. With expertise spanning cloud, virtualization, networking, and storage, Keith has been a trusted partner in transforming IT operations across industries, including pharmaceuticals, manufacturing, government, software, and financial services.
Keith’s career highlights include leading global initiatives to consolidate multiple data centers, unify disparate IT operations, and modernize mission-critical platforms for “three-letter” federal agencies. His ability to align complex technology solutions with business objectives has made him a sought-after advisor for organizations navigating digital transformation.
A recognized voice in the industry, Keith combines his deep infrastructure knowledge with AI expertise to help enterprises integrate machine learning and AI-driven solutions into their IT strategies. His leadership has extended to designing scalable architectures that support advanced analytics and automation, empowering businesses to unlock new efficiencies and capabilities.
Whether guiding data center modernization, deploying AI solutions, or advising on cloud strategies, Keith brings a unique blend of technical depth and strategic insight to every project.




