Conversation as an Interface

The Setup That Never Ships

You know the feeling. You discover a new LLM workflow tool. It has skills, plugins, custom agents, prompt templates, slash commands, MCP servers, and a YAML config file with 47 optional fields. You spend an afternoon setting it up. You write a skill that formats your commit messages. You write another skill that summarizes PRs. You write a third skill that generates release notes from the first two.

By the end of the day, you have not shipped anything. But you feel productive. You have configured productivity. That is not the same thing.

This is productivity theater. The elaborate setup. The nested abstractions. The tooling that exists to manage other tooling. It looks like work. It feels like progress. But the code you needed to write is still unwritten, and now you have a new maintenance burden that has nothing to do with your actual project.

I have done this. More than once. The setup is seductive because it promises future efficiency. "Once this is configured, everything will be faster." But the future never arrives. The skill you wrote needs updating because the LLM drifted. The plugin conflicts with another plugin. The YAML schema changed in a minor version bump. You are now debugging your productivity system instead of using it.

The Pragmatic Programmer Had a Tip for This

Tip 27 in The Pragmatic Programmer is blunt: Don't Outrun Your Headlights. It is about not building more than you can see clearly ahead. Do not design abstractions for problems you do not have yet. Do not optimize workflows before you know what the workflow is.

Most LLM tooling violates this principle immediately. You are asked to configure skills, agents, hooks, and automations before you have even used the base tool enough to know what you need. You are building infrastructure for hypothetical productivity instead of solving the problem in front of you.

The Math Works Against You

Here is the part nobody wants to talk about: LLMs get worse as you give them more to read.

Context windows are not infinite memory. They are attention mechanisms with decay. The further a piece of information is from the model's current position, the weaker its influence on the output. This is not a bug. It is how transformers work. The architecture itself encodes a preference for recency.

When you load a conversation with skills, plugin documentation, custom instructions, MCP tool schemas, and elaborate system prompts, you are front-loading the context with information the model will progressively ignore. Your carefully crafted skill that formats commit messages according to your team's conventions? By the time the model gets to the part where it actually writes the commit message, that skill is thousands of tokens away. The model has drifted.

Instruction drift is measurable. Research consistently shows that LLMs follow early instructions less reliably as context grows.

Multi-document QA accuracy by document position (20 documents). Data from Liu et al., 2023.

The more you stuff into the beginning, the less of it survives to the end. Your custom skills are not being ignored because the model is bad. They are being ignored because you buried them under so much context that attention decayed before it reached the task.

This is the paradox: the more you configure, the less effective your configuration becomes. Skills designed to make the model more consistent introduce the very context bloat that makes the model less consistent.

The Hallucination Amplifier

Context bloat does something else: it increases hallucination rates.

This is not intuitive. You would think more context means more grounding, more facts for the model to anchor on. But LLMs do not retrieve information from context the way a database retrieves rows. They interpolate. They pattern-match. They predict the next likely token given everything they have seen.

When you give an LLM a clean, focused prompt with minimal context, there are fewer patterns to interpolate between. The model's output is constrained by the narrow space you defined. When you give it a sprawling context with skills, tools, instructions, and documentation, the model has more raw material for interpolation, and more opportunities to combine that material in ways that sound plausible but are wrong.

Your skill library is a hallucination surface. Every skill you add is another pattern the model might incorrectly blend into its response. The commit message skill might leak phrasing into a code comment. The PR summary skill might influence how the model describes a function. These are not hypotheticals. I have watched it happen.

What the Tool Was Designed For

Here is the thing that gets lost in the tooling arms race: LLMs were designed for conversation.

Not for executing skill trees. Not for navigating plugin hierarchies. Not for following 47 YAML configuration fields. Conversation. You say something. The model responds. You clarify. It adjusts. You guide. It follows. This is the interface the architecture was optimized for.

The conversational loop has a property that skill-based systems do not: immediate feedback. When you tell the model to do something and it misunderstands, you see it immediately. You correct it in the next message. The model adapts. The correction is fresh in context, not buried under thousands of tokens of preamble.

Skills remove this feedback loop. A skill fires, produces output, and you get what you get. If the output is wrong, you do not correct the skill: you either edit the result manually or you go modify the skill definition and try again. The latency between mistake and correction is orders of magnitude higher.

Conversation keeps the model honest by keeping the feedback loop tight.

The Skills You Do Not Need

Here is a thought experiment. Look at the skills and plugins you have installed. For each one, ask: how often do I actually use this? And when I use it, does it work correctly without adjustment?

Most people discover that 80% of their tooling is dormant. It was configured once, used twice, and now sits in the context contributing to decay while providing no value. The remaining 20% works sometimes, but requires enough manual correction that you might as well have just asked the model directly.

The skills you need are the ones that provide capabilities the model does not have: web searches, file operations, external API calls. MCP servers for tools make sense because they expose actions the model cannot take on its own. But skills that are just prompt templates? Those are context bloat with extra steps.

Constrain the Attack Surface

So what do you actually do? You use engineering.

The mistake is thinking the LLM is the system. It is not. The LLM is a component in your system. Your job is to design the system so that when the component fails (and it will fail), the failure is contained.

Write tests. This is the single most effective constraint on LLM output. The model cannot hallucinate past a failing test. If your test suite expects a function to return an integer and the LLM writes code that returns a string, the test fails. The hallucination is caught before it merges. You did not need a skill for this. You needed a test.

Review the code. Human review is not a fallback for when tooling fails. It is the primary mechanism. The LLM proposes. You review. You accept, reject, or modify. This loop is exactly what conversation was designed for. The review happens in the conversation itself, not in some external quality gate that fires after the fact.

Write the architecture yourself. I have written about this before with Mermaid flowcharts. The LLM should not be deciding your system's structure. You decide the structure. You write the diagram. You specify the contracts. The LLM implements the details within the boundaries you drew. This is how you prevent architectural drift: not by configuring a skill, but by defining the constraints yourself.

Keep prompts focused. A single clear task in a single message gets better results than a skill that tries to anticipate every variation. If you need the model to format a commit message, tell it how in that conversation. If you need it to summarize a PR, describe what you want in that conversation. The context is fresh. The attention is on the task. The instruction has not decayed.

The Pragmatic Path

The path forward is simpler than the productivity theater suggests:

Use the LLM through conversation. That is what it is for.
Write tests so hallucinations cannot survive to production.
Define architecture yourself so the model operates within boundaries.
Add tools only when the model literally cannot do something without them.

The best interface for an LLM is the one it was designed for. Conversation. Everything else is overhead.