- The model is a commodity: the bottleneck is communication with the expert and workflow redesign.
- The treasure is what the expert doesn’t mention, not what they ask for. Trust is the technical condition.
- Connect the data and own the sources of truth: every well-modeled piece of data is a brick that stays yours.
- Redesign the workflow the “ignorant” way, AI-native. The moat is the workflow plus the expertise.
- Make quality predictable with evals and the expert’s signature. Own the harness, swap the model.
- Start from a deliverable worth weeks and do it in days. AI is a multiplier, not a discount.
No LLM handed me this knowledge. We learned it in the field, with real people and real projects, getting it wrong and building on the mistakes.
The feedback that told me we were on the right track came from a colleague: “with this new AI-native workflow we calmly handle deadlines that used to be structurally impossible”. That sentence is worth more than any benchmark, because it describes the only thing that counts: an agent that isn’t a demo, but something a company actually uses, every day.
The model has stopped being the problem
Partiamo da un dato scomodo. Il MIT, nel report State of AI in Business 2025, scrive che a fronte di 30-40 miliardi di dollari investiti in GenAI il 95% delle organizzazioni non vede alcun ritorno[1]. La frase che conta è un’altra: questo divario non dipende dalla qualità del modello, ma dall’approccio. McKinsey arriva allo stesso punto da un’altra strada, la riprogettazione dei workflow è la leva con l’effetto maggiore sull’impatto economico dell’AI[5]. E lo studio MAST, analizzando oltre 1.600 tracce di sistemi multi-agente, conclude che i fallimenti sono di design, non di capacità del modello[2].
How capable the technology is: model quality, context, reliability.
Transferring years of hard-won expertise in days, with real trust.
Rebuilding a human-born process into one designed for agents.
Translated: the model is now an extraordinarily powerful commodity. The bottleneck has moved to two deeply human things. Communication with whoever knows the domain, and the ability to redesign a process born for humans into one meant to work with agents.
That’s why staying current with the state of the art has become a prerequisite, not an advantage. Knowing MCP, skills, and long-horizon agentic flows serves one purpose only: to stop thinking about technology. When implementation is no longer the problem, all attention goes to how the work should be done.
It’s a paradox only in appearance. The more technical you are, the less technology is your job.
The model isn’t the moat. The workflow and the expertise are.
The hard part is talking to each other
The domain expert is the most valuable person in the room and, almost always, the one who struggles most to change mental model. Not because they’re closed off: their standards are settled over years of work, and those standards are their guarantee of quality. Now they have to transfer the same years of expertise in a few days, at the same quality as always, through a different way of working.
Hence the first predictable mistake: the expert asks for the features they think they want, usually to automate what they already see. The real treasure is what they don’t mention, because they’re so used to the old system they can’t imagine it being touched. The parts most in need of AI are often the ones the expert doesn’t even realize they do sub-optimally.
The question I always ask, at the first table, is this: what’s the thing that would change your life and that we haven’t even named, because we assume it’s impossible? That’s how you find what a person believes is unfeasible and desperately needs.
Then communication has to be fed with honest feedback from both sides. Trust is the technical condition, not an extra. The two common fears, “I’ll lose my job” and “I’ll work more with tighter deadlines”, both come from AI used badly: it opens a token hole in the budget, pushes people away, and misses the point.
What would change your life and we assume is impossible? That’s where you start.
Connecting the data levels the field
When communication holds, the technical work starts with data. a16z put it in black and white: enterprise agents often don’t work for lack of context, because “revenue is a business definition, it isn’t hard-coded in a data warehouse”. Company data lives scattered, and an LLM dropped onto fragmented data hallucinates. In one of our projects on asset data, a model went as far as saying a holding owned 257% of itself. It wasn’t the model’s fault, but the way the data reached it.[3]
Connecting the sources levels the field, and it isn’t trivial. You have to figure out which data to connect, how to model it so the LLM keeps performing, where to keep the single source of truth (SSoT), and how to transform it to build proprietary datasets over time. Here’s a compounding competitive advantage: every piece of data you bring in and model well is a brick that stays yours.
There’s a recurring mistake, especially in agencies: experts already use powerful platforms in the usual “human” way and almost never notice that those same vendors have already shipped AI or MCP features that would change everything. I’ve often discovered that the feature we needed was already inside a tool the company had been paying for years. Connecting systems the right way is worth more than building a new one the wrong way.
The same goes for the agent’s tools. A tool must be designed for the agent, not as a thin wrapper over an API, and the agent-computer interface deserves the same care as the ones for humans. By presenting MCP servers as code to call instead of as calls, Anthropic documented a workflow going from 150,000 to 2,000 tokens. It’s a vendor number, I say it for honesty, but the direction is right.[4]
Every piece of data you bring in and model well is a brick that stays yours.
Redesigning the workflow the “ignorant” way
At this point you design the flow. The way I prefer I call “ignorant”, with affection: ignore the existing traditional system and try to rebuild it from scratch, forcing a constraint, “one single person has to be able to do this, with these AI tools”. AI-native thinking naturally develops an AI-centered workflow that replaces the human-centered one. With the expert’s help, the result is better, faster, and genuinely cheaper.
At Intarget this step has a name we chose to say out loud, in a lecture at Bocconi: Fullstack AI Company. A company rebuilt around AI, not a company that uses AI.
Domain expertise, taste, judgment, accountability.
Focus groups, feedback loops, shared methods.
Specialists and orchestrators. The model lives here: one piece, not the system.
Architecture, data, observability, governance.
The same goes for how you build. Personally I no longer write code line by line: I write architecture, constraints, acceptance criteria. AI writes the code, I review, iterate, ship. What used to take months becomes weeks, then days.
A real case, instructive precisely because it failed three times before it worked. At Intarget the strategic knowledge about the most important clients lived in people’s heads: every meeting with a C-level took hours to rebuild the context, and when someone left, the memory left with them. Three attempts on hand-filled PowerPoints had died for the same reason: no update ritual, no owner, an unmaintainable format. The AI-native version flipped the constraint: the Business Partner doesn’t fill in a dossier, they answer questions, and the system keeps a structured knowledge base alive on its own (a company brain fed by the CRM and public sources). The deliverable is no longer a file. It’s a system that feeds itself.
The deliverable is no longer a file. It’s a system that feeds itself.
The expert’s signature against the token lottery
Here I get to the part that separates serious work from AI slop. A powerful LLM, well guided and with the right data, does almost everything. But what distinguishes a reproducible delivery from winning the token lottery is the signature of an expert who knows more than the model and oversees standards, output, and continuous updating. A brilliant model tells you how things are done based on its training data. The secret sauce only comes from someone who actually does the job.
For this to work, quality has to be made predictable, not just checked at the output. The tool is called an eval: sets of tests that measure the system’s reliability on real scenarios.
The harness
- Tools with strict contracts
- Evals and verification (pass^k)
- Domain context
- Observability
The model
The clearest example of an operational signature is one of our agents that does quality control on ad creatives before they go live.
What would stop the upload: formats, specs, platform requirements.
Brand safety, policy, claims: what can’t go live.
The expert’s standard: performance, consistency, creative strength.
There’s a detail I particularly love. When one of our orchestrators produced a presentation coordinating six specialist agents, every slide carried the signature of the agent that wrote it. Radical transparency: every piece has an owner, and quality can be traced.
Own the harness, swap the model. The durable asset is the environment, not the model.
Start from a deliverable worth weeks, do it in days
Theory becomes practice when you tackle a very specific task. Pick a deliverable that today takes several people, weeks of manual and research work, needs careful validation, and produces high value. Start there with the AI-native flow, the ignorant way, and do it in days. Watch the quality and prepare the evals, so it stays predictably high over time.
To pick the right deliverable, the matrix from the first article in the series comes in handy: you cross how much it matters to the business with how standardized it already is. Top right, high-impact and already standardized processes, is the “start here”.
A case, anonymized, from Intarget’s Innovation Hub: a seventy-page strategic pitch for a big education-sector brief, with competitive analysis, personas, insight. Normally a team of six to eight people does it over several weeks. We built it with two people, with a content-strategy orchestrator, saving 50-60% of the time versus the historical baseline. And here’s the important thing: quality went up, not down. The senior’s comment was “insight the team couldn’t have generated on its own”. AI didn’t remove value from the work. It removed the mechanical part.
AI didn’t remove value from the work. It removed the mechanical part.
A multiplier, not a discount on margin
The biggest mistake is treating AI as a way to do the same things a bit faster and shave a few points of margin. Andrej Karpathy puts it well: AI unlocks what you were never able to do, and thinking it can just copy-paste what you already do makes you miss its real value. If you aim only at efficiency, it’s easier for AI to eat your margin than to grow it. McKinsey notes that those who get the most add goals of growth and innovation, not just savings.
If instead you aim at “impossible before, easy now”, the returns can take off, and the next bottleneck becomes go-to-market. Production stops being the limit, and marketing and sales become it.
That’s why I come back to the colleague’s sentence, the one about “structurally impossible” deadlines becoming manageable. It doesn’t describe a saving: it describes a threshold that shifts. AI didn’t make consulting less human, it made it less mechanical.
It doesn’t describe a saving. It describes a threshold that shifts.
In production, not in slides
Putting agents into production, effectively, is less a matter of model and more a matter of method. Stay current with the state of the art, so you can put communication and domain expertise first. Earn the experts’ trust and dig out what they consider impossible. Connect the data and own your sources of truth. Redesign the workflow the ignorant way. Make quality predictable with evals and the signature of those who know. Start from a deliverable that’s worth it, and do it in days.
One chapter deserves an article of its own: how you actually capture AI’s value before bad governance eats it. I’ll tackle that another time. For now the principle we use as a compass is enough: use AI as a system, not as a session.
The AI others leave you in slides, we put into production.
Do you have a deliverable that costs weeks today?
On a call we figure out whether it’s the right candidate and what the first concrete step would be. At Yempik we build custom agents and automations, with a fixed price and code that stays yours. If you’d rather get a sense of costs first, see our pricing.
Book a callThe questions we get asked most
If the model isn’t the problem, does the model still matter?
It matters, but it has become a powerful commodity: Claude, GPT, and Gemini do things that were science fiction two years ago. The difference between a demo and a system in production isn’t the model, it’s the environment around it: connected data, tools with contracts, evals, and an expert’s signature. Own the harness, and you can swap the model whenever you want.
Where do you start to put an agent into production?
From a single deliverable that today takes weeks of manual work, needs careful validation, and is worth a lot to the business. You rebuild it the AI-native way and do it in days, with evals that keep quality predictable over time. You don’t start from technology, you start from the process.
Do people need to be replaced?
No. The domain expert is the most valuable person: it’s their signature that makes quality reproducible. AI removes the mechanical part, not the value. The fears “I’ll lose my job” or “I’ll work more” come from AI used badly; used well, it shifts the threshold of what the team can do.
Is this a Yempik or an Intarget project?
It’s a point of view by Simone Bova. The cases come from his work as an AI Engineer at Intarget; Simone is also a co-founder of Yempik, which builds custom AI agents and automations for companies, from prototype to production. It isn’t a Yempik engagement: it’s the method, told by someone who practices it in the field.
How much does it cost and how long does it take?
It depends on the deliverable, but the logic is “done in days, not weeks”. At Yempik we work with a fixed price and stated timelines, and the source code stays yours. The first step is a call to figure out whether the process is the right candidate.
I wrote this article myself. The method, the cases, and the opinions come from my work as an AI Engineer at Intarget and from Yempik, which I co-founded. It isn’t a Yempik engagement: it’s a point of view. For the writing I got help from Claude on editing, clarity, and layout; the substance is mine, the tool is declared.
Sources
- [1]MIT Project NANDA, “The GenAI Divide: State of AI in Business 2025”. nanda.media.mit.edu
- [2]Cemri et al., “Why Do Multi-Agent LLM Systems Fail?” (MAST), NeurIPS 2025. arxiv.org
- [3]Andreessen Horowitz (a16z), “Your Data Agents Need Context”. a16z.com
- [4]Anthropic, “Code execution with MCP: building more efficient AI agents”. www.anthropic.com
- [5]McKinsey QuantumBlack, “The State of AI in 2025”. www.mckinsey.com
- [6]Anthropic Economic Index (March 2026), on real-world agent use by sector, read by Garry Tan (Y Combinator). www.anthropic.com