Beyond Vibe Coding: OpenSpec and the 16-Million-Token-Hour

Categories: ai

Tags: ai workflow openspec engineering-standards golang

A few weeks ago, I explored the concept of Vibe Coding—a “hands-off-the-wheels” workflow where I acted as the architect and let AI agents handle the implementation. It was an exhilarating experiment in pure intent, but it quickly hit a wall. As I pushed deeper into OpenCode’s big-pickle model and reached the inevitable 40% Context Cliff, the “vibe” turned sour. The AI started re-implementing existing logic and “hallucinating” fixes that were actually just suppressed errors.

To solve this, I’ve moved from vibe coding to a more rigorous model: Agent-Assisted Software Engineering. This isn’t just a catchy phrase; it’s an emerging paradigm supported by researchers at Princeton (SWE-agent) and the National University of Singapore, who define the agent as a “team member” capable of handling repository-level tasks through a structured Agent-Computer Interface.

The Problem: Intent Decay at 40k Tokens

The primary challenge with pure vibe coding is Intent Decay. In a transient chat window, the “source of truth” is fragile. Even with modern models, I’ve found that the architectural goals start to blur once the conversation crosses the 40,000-token mark. At this point, the model suffers a cognitive collapse—it stops looking at the internal utilities it helped you write an hour ago and begins re-implementing logic inline.

Automated quality gates like golangci-lint can keep the code structurally sound by enforcing complexity limits (gocyclo < 8), but they are blind to the purpose of the feature. A perfectly linted function can still be fundamentally wrong.

OpenSpec: Behavior Driven Development for Agents

OpenSpec bridges the gap between human intent and AI execution by moving the planning layer directly into the repository. It turns a “suggestion” into a “verifiable contract” using a workflow that bears a striking resemblance to Behavior Driven Development.

If you are familiar with the work of Dan North or the Cucumber ecosystem, OpenSpec will feel like home. It takes the core tenets of Behavior Driven Development—collaborative discovery and living documentation—and adapts them for an agent-centric world.

Specification by Example

At the heart of this transition is the principle of Specification by Example, pioneered by Gojko Adzic. Specification by Example is a collaborative approach to defining requirements through concrete, real-world scenarios. In the world of LLMs, these examples serve as “grounding” points. Just as Specification by Example uses examples to align stakeholders and developers, it now aligns the human architect and the AI agent. By providing concrete “GIVEN/WHEN/THEN” scenarios, we eliminate the ambiguity where hallucinations thrive.

The OpenSpec workflow follows this disciplined, product-focused cycle:

Explore: Analyze the current repository state and identify the functional gap.
Propose: The agent generates a proposal.md (the “What”), a design.md (the “How”), and a tasks.md breakdown.
Review: You review the Spec Delta. This is the “Three Amigos” meeting of the AI era, where you verify the scenarios before a single line of code is written.
Apply: The agent implements the code based on the approved tasks.
Verify: The implementation is validated against the scenarios.

This structure is essential for building any complex product, but it is mandatory when working with AI. It ensures the agent is always working against a persistent, repository-resident source of truth that doesn’t evaporate when the chat context rolls over.

The Token Tax: 16 Million per Hour

Rigorous engineering isn’t free. Moving from “vibes” to “specs” requires a massive amount of communication between the agent and the model.

When you involve an agent in the full Explore-to-Verify cycle, the token consumption is staggering. I’ve seen these workflows drive 16 million tokens per hour while generating complex features. At this scale, you aren’t just “chatting” with an AI; you are operating a high-throughput industrial engine of code generation.

This “Token Tax” is the price of reliability. These high-throughput workflows aren’t just “chatting”; they are executing recursive repository-level searches, reading multiple files in parallel for cross-reference, and performing exhaustive plan-to-code verification. You are spending tokens to ensure the agent doesn’t take a wrong turn at the 40k context mark.

Conclusion: The Renewal of Behavior Driven Development

The most important lesson from the last two months is that using AI requires more discipline, not less.

The rise of Large Language Models isn’t replacing the need for rigor; it is renewing the relevance of Behavior Driven Development. It was always about bridging the communication gap between different levels of technical expertise. Today, that gap exists between the human architect and the AI agent.

By using automated structural gates (linters) with verifiable functional gates (OpenSpec), we make the principles of rigorous engineering accessible to everyone. The “vibe” might be the spark, but the Spec is the fuel that keeps the engine running—even at 16 million tokens per hour.

Ready to stop vibing and start engineering? Check out OpenSpec and start building your own living documentation.

Stream of Consciousness

Mark Eschbach's random writings on various topics.