← Back to Blog

AI & Tools

Most of What You're Calling an AI Agent Should Just Be a Script

Most of the work an SMB wants to hand to an AI agent is a deterministic workflow that a script runs cheaper, faster, and more auditably. A look at the failure rates, the consumption-pricing surprises, and the security exposure, plus a simple test for the few cases where agency actually earns its keep.

Most of What You're Calling an AI Agent Should Just Be a Script

Every vendor pitch I sit through lately has the same shape. The product is an "AI agent." It "works autonomously." It "thinks." It will "transform your operations." And then the demo shows the thing resetting a password or moving a record from one column to another.

That is not an agent. That is an if statement with a marketing budget.

I want to be careful here, because there is real capability in this category and I am not interested in being the guy who waves it all away. But the gap between what agentic AI is being sold as and what it actually does for a small business in 2026 is wide enough to drive a budget overrun through. So let me say the unpopular thing plainly: most of the problems an SMB is trying to solve with an "agent" are deterministic workflows that do not need agency, do not benefit from it, and get more expensive and less auditable the moment you bolt a language model onto them.

The interesting part is figuring out the small number of cases where that is not true.

"Agentic" Is Doing a Lot of Work in That Sentence

An agent, in the strict sense, is a system that plans, makes decisions inside defined boundaries, calls tools, evaluates the result, and adjusts. The defining feature is non-determinism. You give it a goal, not a script, and it figures out the path. That is genuinely useful when the path cannot be known in advance.

The catch is that almost everything an SMB wants to automate has a knowable path. Password resets. Invoice routing. Pulling a contact into the CRM. Sending the onboarding sequence. Classifying a support ticket and dropping it in the right queue. These are not reasoning problems. They are sequences. The same input should always produce the same output, and if it doesn't, that's a bug, not a feature.

Here's the tell. Microsoft, which is selling agents about as hard as anyone, describes workflows in its own Copilot Studio documentation as completing actions in a "deterministic, reliable way," and positions agents as the thing you embed at the specific step where the work has real variability. Read that again, because it is the whole argument. Even the company with the strongest commercial incentive to make everything an agent is telling you that the backbone should be deterministic and the agent is the exception you reach for when a step genuinely needs judgment.

That is the right mental model, and it is almost the exact inverse of how the technology is being marketed to people who don't have an IT team to know better.

The Pilot Graveyard Is Real, and It's Not the Model's Fault

If agents were the no-brainer the hype suggests, the deployment numbers would show it. They don't.

Gartner's 2026 survey found that only 17 percent of organizations have actually deployed AI agents, even as more than 60 percent say they plan to within two years. MIT's research on enterprise AI put it more bluntly: 95 percent of enterprise AI pilots fail to scale, and only about 5 percent deliver measurable profit impact. IDC found that 88 percent of AI proofs-of-concept never reach wide deployment. And Gartner expects more than 40 percent of agentic AI projects to be canceled outright by the end of 2027, citing escalating costs, unclear value, and inadequate risk controls.

Sit with that cancellation number. Nearly half of these projects get killed, and the reasons are not "the model wasn't smart enough." Forrester's read is that agent failures come from ambiguity, miscoordination, and unpredictable system behavior rather than traditional bugs. The model is rarely the bottleneck. The bottleneck is that nobody defined what success looked like, the agent didn't have clean access to the data or tools it needed, and there was no discipline for evaluating its output once it was live.

None of those failure modes get fixed by a better model. They get fixed by the boring engineering work that agents were supposed to let you skip. Which is exactly why so many SMBs are about to pay for an expensive lesson: the autonomy is the easy part to buy and the hard part to operate.

The Meter Is the Part Nobody Budgets For

This is the section I wish more people led with, because it is where SMBs actually get hurt.

Agentic platforms have quietly introduced a second meter that runs alongside your per-seat licensing, and it charges for what the agent does, not how many people have access. Microsoft 365 Copilot is $30 per user per month for enterprise, or $18 for the small-business add-on on annual commitment (a promo that reverts to $21 after June 30, 2026). That part is predictable. Then you build an agent in Copilot Studio, and the consumption meter starts: a penny per Copilot Credit, or prepaid packs of 25,000 credits for $200 a month. Credits are not one-per-message. A single agent response can burn several credits depending on how much it retrieves and how many actions it takes.

Salesforce plays the same game from the other side. Agentforce add-ons start at $125 per user per month, the full Agentforce 1 edition runs $550, and a mid-size operation handling 50,000 customer interactions a month can spend around $100,000 monthly on agent conversations alone, on top of existing seat fees.

Then there's the multiplier nobody mentions in the demo. When a human clicks "resolve ticket," that's one action. When an agent resolves the same ticket, it might retrieve context, call a tool, evaluate the result, take a follow-up action, and log the outcome, and depending on how the vendor meters, that's several billable events for one resolved ticket. Gartner's forecast is that by 2027, 40 percent of enterprises using consumption-priced AI tooling will see unplanned costs exceeding twice their budget. The FinOps community already ranks AI cost visibility as its single hardest problem, because each layer of the bill arrives separately, in a different format, under a different cost center.

For a 200-person enterprise with a finance team, that's a painful quarterly surprise. For a 12-person company without one, it's the kind of bill that ends the whole experiment and sours them on AI for two years. The deterministic version of the same workflow, running as a Power Automate flow or a scheduled script, costs a rounding error by comparison and does not have a meter that speeds up when the agent decides to think harder.

Every Agent Is a Privileged User You Didn't Interview

The cost problem is at least visible on an invoice. The security problem is the one that doesn't show up until something goes wrong.

An agent that can act on your behalf is, functionally, a new identity in your environment with credentials and access. The industry has a name for this now: non-human identities, and they are multiplying faster than anyone's ability to govern them. The trouble is that an agent behaves like a privileged service account, except it makes adaptive decisions and, unlike a service account, it can be talked into things.

That's not hypothetical. OWASP's Agentic AI Top 10 now lists prompt injection, memory poisoning, and tool misuse as the leading risks, and the reason prompt injection is so dangerous is that it can escalate privilege without ever breaking authentication. The agent is working as designed; it just got instructions from the wrong place, embedded in a document or an email it was asked to process, and interpreted a vague input as authorization to do something broader. Your existing security stack does not catch this, because a web application firewall doesn't understand a reasoning chain and your DLP doesn't inspect what's inside a model's context window. As one analysis put it, the gap isn't a misconfiguration, it's a category mismatch.

And the supply chain underneath these agents is young and soft. In March 2026, a backdoored version of LiteLLM, the language-model gateway used by CrewAI, Microsoft GraphRAG, and dozens of other agent frameworks, sat on PyPI for three hours. In that window it was downloaded roughly 47,000 times, shipping an autonomous attack bot to everyone who pulled the update. Three hours.

Here is what makes this acute for SMBs specifically: the whole appeal of an agent for a small business is that it acts without supervision, and the whole risk of an agent is that it acts without supervision. Those are the same sentence. A company with no internal IT is the least equipped to scope agent permissions correctly, monitor agent behavior, and respond when an agent does something it shouldn't, which is precisely the company being told that agents will let them run lean.

The Test I Actually Use

None of this means "don't deploy agents." It means deploy them deliberately, and default to the boring option until the boring option genuinely can't do the job. Here's the test I run on any candidate workflow before it gets near an agent.

Can you write down every step? If you can describe the process as a fixed sequence of steps with clear branches, it is a workflow, and it should be a deterministic flow or a script. It will be cheaper, faster, fully auditable, and it will do the same thing every single time. Reaching for an agent here is paying a premium for non-determinism you actively don't want.

Is there exactly one step that requires judgment? This is the sweet spot, and it's what Microsoft's own design points at. Build the deterministic backbone, and embed an agent only at the single node where the input is genuinely variable: classifying a freeform support message, extracting structured data from a messy document, drafting a response that needs to read tone. Everything around that step stays deterministic and auditable. The agent earns its keep at one point in the chain rather than running the whole thing.

Does the cost survive the heaviest case, not the average? Model your highest-volume, highest-retrieval workflow against consumption pricing, not your typical one. If the math only works at the average, it doesn't work, because the bill is set by your worst month.

Can you scope it, log it, and kill it? Before an agent touches anything, it needs least-privilege access to only what that specific task requires, a complete audit trail of what it did, and a kill switch. If you can't answer all three, you're not ready to deploy it, and a deterministic alternative that needs none of that scaffolding is sitting right there.

The Bottom Line

The honest version of the agentic AI pitch for a small business is much less exciting than the one you're being sold, and much more useful. A handful of your workflows have a genuinely variable step where an agent will do something a script can't, and for those, the technology is real and worth the money and the governance overhead. The rest, the overwhelming majority, are deterministic processes that a flow or a script will run more cheaply, more reliably, and more auditably, with no meter accelerating in the background and no new privileged identity to babysit.

The companies that win with this technology won't be the ones that deployed the most agents. They'll be the ones that correctly identified the few places agency was worth paying for and ran everything else as the boring, predictable automation it always should have been. Deterministic by default. Agentic on purpose. That's the whole strategy, and it fits on an index card.

Everyone selling you the index-card version as a platform subscription would prefer you didn't notice.


Ironwright helps small businesses cut through the AI agent hype and automate what's actually worth automating. No autonomous systems you can't afford to run or govern, just practical automation, deterministic where it should be and agentic where it earns it.

← Back to Blog