Stream of Consciousness

Mark Eschbach's random writings on various topics.

Vibe Coding: Architecture, Automation, and the 50% Context Wall

Categories: ai

Tags: ai workflow reflection

I’ve spent the last two weeks attempting to live in the “post-syntax” future. The goal was an experiment in pure intent: I wanted to be the architect, the strategist, and the “vibe” setter for my project, Marvin, while leaving the literal implementation to AI agents. I called it “Vibe Coding”—a workflow where you provide the vision and let the machine handle the boilerplate.

But after two weeks of rotating between Gemini CLI and OpenCode, and pushing “Big Pickle” models (like glm-4.7) to their limits, I’ve realized that vibe coding is a misnomer. It isn’t a relaxed, hands-off experience. It is a high-stakes game of architectural cat-and-mouse. If you don’t have automated gates and a “Git Time Machine” at the ready, the “vibe” quickly turns into a monolithic nightmare.

The Tooling: OpenCode’s “Big Pickle” and Gemini CLI

In this experiment, I’ve been staying on the “free side” of the fence. No $200-a-month custom enterprise stacks. Just Gemini and OpenCode for the heavy lifting of writing code. I’ve preferred OpenCode due to the Plan then Execute model they encourage.

I use the Plan mode to kick out high-level ideas and begin to decompose them. Once I’m happy with the high-level plan, which is generally composed of multiple phases, I’ll write it to a file. In a new context each time I’ll go through a loop of the following prompts:

  • Plan: Review the plan file and verify what we've completed and what is remaining. The intent is really for the AI to understand the broader changes being made and to understand what has been done. This is to reduce thrashing of what we’ve done.
  • Plan: Build a plan for Phase X . I want to understand the mechanical transformations to be made here and hopefully shrink the context the system has work against. I will iterate with this until the plan seems reasonable.
  • Execute Let's implemen the plan is what I run when I’m happy with a specific plan.

This works pretty well for staying within the free quotas of tokens. However, occasionally the agents do take a wrong turn and you’ll have ot interrupt them.

40% Context Cliff

Modern models advertise massive context windows—hundreds of thousands, even millions of tokens. But in a coding environment, “usable” context is a different beast entirely. Around the 40% mark of the context window, most models suffer a cognitive collapse.

It isn’t a sudden crash; it’s slow decay. The model starts favoring “concrete” code over abstractions. It stops looking at the internal/utils package it helped you write an hour ago and instead begins re-implementing logic inline. It stops following the SOLID principles you established in the first 10,000 tokens. By the time you hit 60% context, the AI is no longer a collaborator; it’s a junior dev in a panic, trying to “make it work” by any means necessary.

The Failure of the Source of Truth

I quickly learned that Instruction Fatigue is real. In my attempt to build a local text-to-speech (TTS) pipeline with diarization two months, the system failed spectacularly. Gemini flipped over the handlebars but quickly fell down once we got away from established Swift MacOS patterns. Testing turned into a horrible nightmare and things were broken. Gemini began thrashing and I shelved the project as a case study to learn more about OpenCode and how LLMs work. My “vibes” become negative, but I hope to return some day with more knowledge.

To combat this decay, I established a Source of Truth hierarchy: AGENTS.md for behavioral rules and docs/future for the architectural target state. At first these worked well. However, I as I began giving the agents more autonomy and the project complexity, there were more symptoms. It felt like playing whack-a-mole, then it started forgetting rules entirely.

Automated Gates: The Guardrails

Much like human team members, it doesn’t make sense to argue about guard rails and standards more than once. Putting on my platform engineering hat, what better way to handle quality gates and guard rails than by automating it? I shifted my focus from “prompting” to “engineering via automated gates.”

I pulled in pre-commit with golangci-lint to ensure the AI behaved.

pre-commit configuration

For each commit, on a high level, we want to ensure the following:

  • Our code is of reasonable quality
  • All tests pass
  • Our build artifacts actually build
# Pre-commit hooks for Marvin project
# Ensures code quality standards with fast developer feedback

repos:
  - repo: https://github.com/codespell-project/codespell
    rev: v2.4.1
    hooks:
      - id: codespell
        pass_filenames: false
        description: Spell check Markdown and HCL files
  - repo: local
    hooks:
      # Phase 1: Fast Local Checks
      - id: go-fmt
        name: go fmt
        entry: gofmt
        language: system
        args: ["-w", "."]
        files: '\.go$'
        description: Format Go code according to standard conventions

      - id: go-mod-tidy
        name: go mod tidy
        entry: go
        language: system
        args: ["mod", "tidy"]
        files: '(go\.mod|go\.sum)$'
        pass_filenames: false
        description: Clean up go.mod and go.sum dependencies

      - id: go-vet
        name: go vet
        entry: go
        language: system
        args: ["vet", "./..."]
        files: '\.go$'
        pass_filenames: false
        description: Analyze Go code for potential issues

      - id: golangci-lint
        name: golangci-lint
        entry: golangci-lint
        language: system
        args: ["run", "--timeout=5m"]
        files: '\.go$'
        pass_filenames: false
        description: Comprehensive Go linting with default settings

      # Phase 2: Unit Tests (Fast - matches CI unit tests)
      - id: go-test-unit
        name: go test unit
        entry: go
        language: system
        args: ["test", "-count", "1", "-timeout", "10s", "./internal/...","./pkg/..."]
        files: '\.go$'
        pass_filenames: false
        description: Run unit tests to verify functionality

      # Phase 3: Build Verification
      - id: go-build-marvin
        name: go build marvin
        entry: go
        language: system
        args: ["build", "-o", "marvin", "./cmd/marvin"]
        files: '\.go$'
        pass_filenames: false
        description: Verify marvin binary builds successfully

      - id: go-build-slacker
        name: go build slacker
        entry: go
        language: system
        args: ["build", "-o", "slacker", "./cmd/slacker"]
        files: '\.go$'
        pass_filenames: false
        description: Verify slacker binary builds successfully
# Global configuration
default_stages: [pre-commit]
fail_fast: false
default_language_version:
  golang: '1.25'

As a part of our quality control system the unit and system component tests are a key portion. Full systemic tests still need to be run by human hand, but this is good enough.

This was a good first step, forcing the AI to ensure things are compiled and tests pass. Unfortunately, the agent in the next level found a interesting ways to optimize time to goal achievement.

Next Level: golangci-lint

Out of the gate golangci-lint provides a lot of reasonbale defaults. However, to keep the agents going in the right direction and not lighting tokens on fire, I would recommend the following settings.

The Complexity Cage: gocyclo (Target: 8)

Cyclomatic complexity is the measure of how many paths exist through your code. Humans usually tolerate a complexity of 10 or 15. I set mine to 8. Why? Because an AI left to its own devices loves a good 20-line nested if-else block. By setting the threshold at 8, the linter rejects the code the moment the AI gets “lazy.” It forces the model to decompose logic into smaller, discrete functions. It pushes towards the Single Responsibility Principal, makes code more literate, and get the understanding quickly.

linters:
  settings:
    gocyclo:
      min-complexity: 8

Preventing “Concrete” Bloat: funlen

Function length is the primary indicator of the agent decay. As the model tires, functions grow. I set a strict limit on function length (roughly 150 lines with 100 statement). If the AI tries to “vibe” a monolithic 500-line function because it forgot how to use your existing interfaces, the gate rejects the change.

Cyclomatic complexity is about state space. Where function length is about imperative steps. this forces the agent to be more selective as what it locates. In other languages such as Ruby, a statement count of 10 is more reasonable. In Go it’s not uncommon to have 4+ statements for each invocation manging errors, so 100 is a conservative ceiling here. I will have ot experiment with 40 lines in the future.

linters:
  settings:
    funlen:
      lines: 150
      statements: 100

Protecting the Context Chain: forbidigo

In the Go ecosystem, context is everything. It carries deadlines, cancellation signals, and trace IDs. AI agents hate managing the context chain. They frequently try to “cheat” by using context.Background() or context.TODO() just to make the compiler happy.

I used forbidigo to explicitly ban these calls. If the AI wants a context, it must propagate it from the caller. Furthermore, I’ve leaned into the Go 1.24 release, specifically t.Context() in tests. This is a massive improvement for AI-driven development. Instead of the AI struggling to manage test timeouts and cleanup, t.Context() provides a standard, lifecycle-managed context that is “cleaner” for the agent to hook into. It removes one more place where the AI can introduce a “hallucinated” fix.

linters:
  settings:
    forbidigo:
      forbid:
        - pattern: "context.Background"
          msg: "Pass a context in or used the past context"
        - pattern: "context.TODO"
          msg: "Use a specific context or pass a context in"

The Workflow: Git as a Token Insurance Policy

I found this workflow is best to treat Git as your “Save State” button. Although Gemini and OpenCode both have rewind features allowing us to roll back the code and conversation I’ve found it’s hit and miss. A lot of time it’s just easier to pop the stack and start over again.

In manual coding, we commit when a feature is “done.” In vibe coding, I commit every time a Phase or Checkpoint is reached. Because the AI is prone to sudden architectural drift once it hits that 50% context mark, you need a way to revert without wasting tokens.

If the AI takes a wrong turn and produces a “Big Ball of Mud,” do not try to “fix” it by talking to the AI. You will waste thousands of tokens and likely end up with more technical debt. Instead:

  1. Reset –hard to the last known good commit.
  2. Refresh the Context: If using a chat-based tool, start a new session to clear the model’s “mental fog.”
  3. Refine the Plan: Update your ai/plan.md with more specific constraints based on why the last attempt failed.

This “Reset” loop is the only way to maintain a clean codebase when the “vibe” starts to sour.

The Future: The Rise of the Critic Agent

Currently, the biggest bottleneck in this process is having to chase after the bad code generation. Although the agents are given the tools like golanci-linter and pre-commit they run with various levels of success. Gemini rarely runs them at all. OpenCode will generate some conformant code and spend many tokens fixing its errors.

“Agents of agents” is a light at the end of the tunnel! I envision a system where I can delegate this pipeline to be run automatically. I would imagine it could be achieved with something like:

  • Orchestrator: Responsible for taking a high-level design and breaking it into features.
  • Feature Leader: Takes a feature and breaks it into chunks which should be completed in about 5 minutes of automated coding.
    • Worker: Implements the work according to the project standards.
    • Reviewer: Hard checks against the worker to verify (1) the work was completed and (2) it was completed to the project standard.

Once each subagent succeeds, the spawning agent would review its work. Until the future arrives, I’ll keep doing the grind!

Final Thoughts

Vibe coding is not about a lack of rigor if it is to survive. In fact, it requires more architectural discipline than coding by hand. You have to be a better architect because you are managing an AI agent developer who is fast, incredibly forgetful, and prone to hand waiving the complicated parts of the system away.

By using automated gates like gocyclo and forbidigo, and by leveraging the latest Go features like t.Context(), you can turn the “vibe” into something tangible and maintainable. But remember: once you hit that 50% context wall, it’s time to commit, reset, and refresh.