Blog

AI, Orchestration, and the Illusion of Speed

February 3, 2026 by Shahriar Ahmed Shovon

Post Thumbnail

Prologue

I know all the stuff I am doing. I’ve done it hundreds of times before. So why bother doing the same boilerplate again and again? Why not off-load it to AI and invest my cognition and time into something better?

That’s what I told myself just a year ago, in mid-June 2025.

It’s a set of observations about what quietly breaks both technically and mentally when AI is introduced into long-lived software systems.

The 80% Problem

Fast forward to January 2026. I feel overwhelmed looking at my own codebase. I fear adding a new feature or fixing a bug because it often introduces new issues. Things break in multiple places when I intended to fix one minor thing. Making sure the AI orchestration works perfectly eats most of my time instead of deciding architecture or design principles.

And before you ask, I didn’t just prompt it once and let it do everything. I divided problems into multiple parts, generated specs with very specific details, and then let it handle the scoped work. Even after all of that, my 75K+ LOC codebase ended up bloated with inconsistent files, random helper functions, and the same utility duplicated in global libs and then again inside local files, used everywhere. All of them required manual cleaning often after the prompt response.

Despite having coding rules and guides, for certain things it just ignored them.

The worst part is that it worked 80% of the time. And that’s exactly why it’s the worst. When you get 80% comfort, it becomes nearly impossible to track down the remaining 20% of inconsistency. Reviewing every single piece of code turns into a massive time bleed.

This isn’t an anti-AI piece. It’s about long-lived systems and sustained ownership. The cognitive effects and junior-developer risks are downstream consequences, not the core claim.

This is where most advice about “using AI correctly” begins.

So let’s talk about what a proper workflow actually looks like.

The “Correct” Way to Use AI

You start by doing the architecture manually. You lock down the modules, folder structure, naming rules, API contracts, tech stack. Everything. You spend hours thinking about how the project should look at 120K LOC, just to give AI boundaries so it doesn’t spit out a mess. Not because you enjoy it, but because you don’t want your AI tool going rogue and producing Frankenstein code across ten domains.

Then you split the project into domains. Auth, billing, notifications, user service, shared libs, infra whatever applies. One module at a time. You create context files, a README_AI.md for the project, module-specific guides, conventions, and rules.

Yes, it takes hours of prep. But this is the only way AI produces code you can merge without regretting it later.

After all that, you still only use AI for small tasks. CRUD, schemas, DTOs, migrations, tests. One prompt for one small task. Not an entire service. Not a whole domain.

You don’t skip tests either. You ask AI to generate the test skeleton first, then the code, then refactor. Every generated piece gets reviewed manually afterward. You rename things, simplify, enforce consistency. You refactor with AI after reviewing, not before.

The Cost of Doing It Right

Then you do weekly cleanup passes. Dead code, duplicated logic, unnecessary complexity. Sometimes AI finds it. Sometimes it misses it. Sometimes it mixes things up. You fix it.

Before merging anything, you add quality gates. Linting, type checks, tests, and a manual sanity check. You don’t auto-merge AI code unless you want a dumpster fire.

You also keep context small. You don’t feed the entire repo and expect miracles. Module-level context, summaries, and only relevant files. Smaller context works better.

Now step back and look at this honestly. All these workflow steps, clever hacks around tools, paying hundreds of dollars, all to save time typing code manually. Do you really treat that as a good trade?

I don’t.

Neural Networks & Me

On top of that, I lost confidence in myself. I stopped trusting myself to make irreversible changes in my own codebase. My mind would scream, “Ask the AI tool, it might give you a better idea.”

The problem isn’t delegation. It’s delayed ownership. When an external system proposes structure under uncertainty, feedback loops lengthen. You stop forming sharp error models. Decisions become reversible by default. Over time, judgment shifts from “I know why this works” to “this looks reasonable.” That’s not a tooling issue. It’s a cognition issue.

After optimizing workflows, switching tools, doing spec-driven development, injecting context properly, going back and forth endlessly, I eventually retired from all of it.

Think about this for a second. You spend two extra hours just so the AI tool doesn’t hallucinate. And you know what? You could have typed all that yourself in one and a half hours. With less crappy code, better structure, better understanding, and predictable lines of code.

No guesswork. Pure firsthand experience.

At this point, it’s easy to dismiss all of this as emotional frustration. So instead of ranting, let’s switch to numbers.

The Numbers That Almost Work

Imagine you’re building a SaaS application. Not a toy. Not a demo. A real system that keeps changing while it’s being built. The system eventually reached about 120K lines of code, split into 24 modules, each roughly 5,000 LOC. It was built from scratch, module by module, and intended to be long-lived.

Before AI entered the picture, I already knew this work.

Without AI

Once the architecture settled, my manual pace per module started working efficiently. Boilerplate, CRUD, DTOs, and migrations took around 15 hours. Tests took about 5 hours. Around 10 hours for Core business logic. Refactoring, documentation, and cleanup took another 2 hours. That put each module at roughly 32 hours.

Across 24 modules, that comes out to about 32 x 24 = 768 hours.

With AI

AI entered the loop where it supposedly shines: boilerplate, repetitive structure, and test scaffolding. Core logic and final integration stayed manual. What changed wasn’t just who wrote the code, but how time was spent. A new task appeared: orchestration.

Per module, orchestration meant preparing context, writing and refining prompts, reviewing generated code, refactoring to align with conventions, and verifying tests and edge cases. That alone averaged about 9 to 10 hours per module.

With AI, a typical module broke down differently. Human-written logic and integration took around 12 hours. AI-generated code, after review and refactor, took about 6 hours. Orchestration took about 9.5 hours. That puts a module at roughly 27.5 hours.

Across all 24 modules, the manual total was about 768 hours. The AI-assisted total landed around 660 hours. Net savings: roughly 108 hours, or about 10 to 20 percent.

I understand these numbers aren’t universal. Again, not something very vague too.

Up to this point, the math works.

What the Numbers Don’t Capture

The math assumes things that don’t hold up under sustained use. It assumes reviews stay strict. Attention doesn’t degrade. Boundaries remain intact. Architectural judgment stays fully human.

In practice, AI-generated code often looks clean and reasonable. Reviews get faster. “Looks fine” becomes enough. Over time, reviews become probabilistic. Small inconsistencies slip in. Boundaries soften. Understanding shifts from deep to good enough.

In a greenfield system, this happens even earlier, because AI isn’t just filling gaps. It’s participating in shaping the system itself. Nothing breaks immediately, and that’s the real problem.

The numbers capture real reductions in repetitive work and a real but capped speedup. What they don’t capture are: missed review errors, quiet inconsistencies, partial ownership of early design decisions, and the cost of change later.

Those costs don’t show up in velocity charts. They show up when you touch the system again, for maintenance or extend.

Predictions & Possibilities

This is where the numbers stop being abstract and start affecting who should use these tools at all.

The workflow I described did help by about 10 to 15 percent. For a senior developer with lots of experiences, that trade-off can be justified. I have no argument there.

But the equation flips when your internal error models are still forming. When you haven’t yet accumulated enough personal mistakes. When intuition is still being built.

At that stage, AI doesn’t just help you move faster. It replaces feedback loops. You don’t just outsource typing. You outsource judgment. And once that happens early, it becomes very hard to tell which parts of your understanding are actually yours.

That’s why I decided: AI never touches my code, system design, or debugging.

Speed can be added later. Integrity cannot.

In a future article, will talk about the WHY.


Discover More

About


This is my personal blog, where I write about various topics related to software development, technology, and my own experiences. I enjoy exploring new technologies, frameworks, and programming languages, and sharing what I learn with others.