Vibe Coding: Architecture, Automation, and the 50% Context Wall
Categories: ai
Tags: ai workflow reflection
I’ve spent the last two weeks attempting to live in the “post-syntax” future. The goal was an experiment in pure intent: I wanted to be the architect, the strategist, and the “vibe” setter for my project, Marvin, while leaving the literal implementation to AI agents. I called it “Vibe Coding”—a workflow where you provide the vision and let the machine handle the boilerplate.
But after two weeks of rotating between Gemini CLI and OpenCode, and pushing “Big Pickle” models (like glm-4.7) to their limits, I’ve realized that vibe coding is a misnomer. It isn’t a relaxed, hands-off experience. It is a high-stakes game of architectural cat-and-mouse. If you don’t have automated gates and a “Git Time Machine” at the ready, the “vibe” quickly turns into a monolithic nightmare.
The Tooling: OpenCode’s “Big Pickle” and Gemini CLI
In this experiment, I’ve been staying on the “free side” of the fence. No $200-a-month custom enterprise stacks. Just
Gemini and OpenCode for the heavy lifting of writing code. I’ve preferred OpenCode due to the
Plan then Execute model they encourage.
I use the Plan mode to kick out high-level ideas and begin to decompose them. Once I’m happy with the high-level plan,
which is generally composed of multiple phases, I’ll write it to a file. In a new context each time I’ll go through a
loop of the following prompts:
- Plan:
Review the plan file and verify what we've completed and what is remaining.The intent is really for the AI to understand the broader changes being made and to understand what has been done. This is to reduce thrashing of what we’ve done. - Plan:
Build a plan for Phase X. I want to understand the mechanical transformations to be made here and hopefully shrink the context the system has work against. I will iterate with this until the plan seems reasonable. - Execute
Let's implemen the planis what I run when I’m happy with a specific plan.
This works pretty well for staying within the free quotas of tokens. However, occasionally the agents do take a wrong turn and you’ll have ot interrupt them.
40% Context Cliff
Modern models advertise massive context windows—hundreds of thousands, even millions of tokens. But in a coding environment, “usable” context is a different beast entirely. Around the 40% mark of the context window, most models suffer a cognitive collapse.
It isn’t a sudden crash; it’s slow decay. The model starts favoring “concrete” code over abstractions. It stops looking at the internal/utils package it helped you write an hour ago and instead begins re-implementing logic inline. It stops following the SOLID principles you established in the first 10,000 tokens. By the time you hit 60% context, the AI is no longer a collaborator; it’s a junior dev in a panic, trying to “make it work” by any means necessary.
The Failure of the Source of Truth
I quickly learned that Instruction Fatigue is real. In my attempt to build a local text-to-speech (TTS) pipeline with diarization two months, the system failed spectacularly. Gemini flipped over the handlebars but quickly fell down once we got away from established Swift MacOS patterns. Testing turned into a horrible nightmare and things were broken. Gemini began thrashing and I shelved the project as a case study to learn more about OpenCode and how LLMs work. My “vibes” become negative, but I hope to return some day with more knowledge.
To combat this decay, I established a Source of Truth hierarchy: AGENTS.md for behavioral rules and docs/future
for the architectural target state. At first these worked well. However, I as I began giving the agents more autonomy
and the project complexity, there were more symptoms. It felt like playing whack-a-mole, then it started forgetting
rules entirely.
Automated Gates: The Guardrails
Much like human team members, it doesn’t make sense to argue about guard rails and standards more than once. Putting on my platform engineering hat, what better way to handle quality gates and guard rails than by automating it? I shifted my focus from “prompting” to “engineering via automated gates.”
I pulled in pre-commit with golangci-lint to ensure the AI behaved.
pre-commit configuration
For each commit, on a high level, we want to ensure the following:
- Our code is of reasonable quality
- All tests pass
- Our build artifacts actually build
# Pre-commit hooks for Marvin project
# Ensures code quality standards with fast developer feedback
repos:
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
pass_filenames: false
description: Spell check Markdown and HCL files
- repo: local
hooks:
# Phase 1: Fast Local Checks
- id: go-fmt
name: go fmt
entry: gofmt
language: system
args: ["-w", "."]
files: '\.go$'
description: Format Go code according to standard conventions
- id: go-mod-tidy
name: go mod tidy
entry: go
language: system
args: ["mod", "tidy"]
files: '(go\.mod|go\.sum)$'
pass_filenames: false
description: Clean up go.mod and go.sum dependencies
- id: go-vet
name: go vet
entry: go
language: system
args: ["vet", "./..."]
files: '\.go$'
pass_filenames: false
description: Analyze Go code for potential issues
- id: golangci-lint
name: golangci-lint
entry: golangci-lint
language: system
args: ["run", "--timeout=5m"]
files: '\.go$'
pass_filenames: false
description: Comprehensive Go linting with default settings
# Phase 2: Unit Tests (Fast - matches CI unit tests)
- id: go-test-unit
name: go test unit
entry: go
language: system
args: ["test", "-count", "1", "-timeout", "10s", "./internal/...","./pkg/..."]
files: '\.go$'
pass_filenames: false
description: Run unit tests to verify functionality
# Phase 3: Build Verification
- id: go-build-marvin
name: go build marvin
entry: go
language: system
args: ["build", "-o", "marvin", "./cmd/marvin"]
files: '\.go$'
pass_filenames: false
description: Verify marvin binary builds successfully
- id: go-build-slacker
name: go build slacker
entry: go
language: system
args: ["build", "-o", "slacker", "./cmd/slacker"]
files: '\.go$'
pass_filenames: false
description: Verify slacker binary builds successfully
# Global configuration
default_stages: [pre-commit]
fail_fast: false
default_language_version:
golang: '1.25'
As a part of our quality control system the unit and system component tests are a key portion. Full systemic tests still need to be run by human hand, but this is good enough.
This was a good first step, forcing the AI to ensure things are compiled and tests pass. Unfortunately, the agent in the next level found a interesting ways to optimize time to goal achievement.
Next Level: golangci-lint
Out of the gate golangci-lint provides a lot of reasonbale defaults. However, to keep the agents going in the right
direction and not lighting tokens on fire, I would recommend the following settings.
The Complexity Cage: gocyclo (Target: 8)
Cyclomatic complexity is the measure of how many paths exist through your code. Humans usually tolerate a complexity of 10 or 15. I set mine to 8. Why? Because an AI left to its own devices loves a good 20-line nested if-else block. By setting the threshold at 8, the linter rejects the code the moment the AI gets “lazy.” It forces the model to decompose logic into smaller, discrete functions. It pushes towards the Single Responsibility Principal, makes code more literate, and get the understanding quickly.
linters:
settings:
gocyclo:
min-complexity: 8
Preventing “Concrete” Bloat: funlen
Function length is the primary indicator of the agent decay. As the model tires, functions grow. I set a strict limit on function length (roughly 150 lines with 100 statement). If the AI tries to “vibe” a monolithic 500-line function because it forgot how to use your existing interfaces, the gate rejects the change.
Cyclomatic complexity is about state space. Where function length is about imperative steps. this forces the agent to be more selective as what it locates. In other languages such as Ruby, a statement count of 10 is more reasonable. In Go it’s not uncommon to have 4+ statements for each invocation manging errors, so 100 is a conservative ceiling here. I will have ot experiment with 40 lines in the future.
linters:
settings:
funlen:
lines: 150
statements: 100
Protecting the Context Chain: forbidigo
In the Go ecosystem, context is everything. It carries deadlines, cancellation signals, and trace IDs. AI agents hate managing the context chain. They frequently try to “cheat” by using context.Background() or context.TODO() just to make the compiler happy.
I used forbidigo to explicitly ban these calls. If the AI wants a context, it must propagate it from the caller.
Furthermore, I’ve leaned into the Go 1.24 release, specifically t.Context() in tests. This is a massive improvement
for AI-driven development. Instead of the AI struggling to manage test timeouts and cleanup, t.Context() provides a
standard, lifecycle-managed context that is “cleaner” for the agent to hook into. It removes one more place where the
AI can introduce a “hallucinated” fix.
linters:
settings:
forbidigo:
forbid:
- pattern: "context.Background"
msg: "Pass a context in or used the past context"
- pattern: "context.TODO"
msg: "Use a specific context or pass a context in"
The Workflow: Git as a Token Insurance Policy
I found this workflow is best to treat Git as your “Save State” button. Although Gemini and OpenCode both have rewind
features allowing us to roll back the code and conversation I’ve found it’s hit and miss. A lot of time it’s just easier
to pop the stack and start over again.
In manual coding, we commit when a feature is “done.” In vibe coding, I commit every time a Phase or Checkpoint is reached. Because the AI is prone to sudden architectural drift once it hits that 50% context mark, you need a way to revert without wasting tokens.
If the AI takes a wrong turn and produces a “Big Ball of Mud,” do not try to “fix” it by talking to the AI. You will waste thousands of tokens and likely end up with more technical debt. Instead:
- Reset –hard to the last known good commit.
- Refresh the Context: If using a chat-based tool, start a new session to clear the model’s “mental fog.”
- Refine the Plan: Update your ai/plan.md with more specific constraints based on why the last attempt failed.
This “Reset” loop is the only way to maintain a clean codebase when the “vibe” starts to sour.
The Future: The Rise of the Critic Agent
Currently, the biggest bottleneck in this process is having to chase after the bad code generation. Although the agents
are given the tools like golanci-linter and pre-commit they run with various levels of success. Gemini rarely runs
them at all. OpenCode will generate some conformant code and spend many tokens fixing its errors.
“Agents of agents” is a light at the end of the tunnel! I envision a system where I can delegate this pipeline to be run automatically. I would imagine it could be achieved with something like:
- Orchestrator: Responsible for taking a high-level design and breaking it into features.
- Feature Leader: Takes a feature and breaks it into chunks which should be completed in about 5 minutes of automated
coding.
- Worker: Implements the work according to the project standards.
- Reviewer: Hard checks against the worker to verify (1) the work was completed and (2) it was completed to the project standard.
Once each subagent succeeds, the spawning agent would review its work. Until the future arrives, I’ll keep doing the grind!
Final Thoughts
Vibe coding is not about a lack of rigor if it is to survive. In fact, it requires more architectural discipline than coding by hand. You have to be a better architect because you are managing an AI agent developer who is fast, incredibly forgetful, and prone to hand waiving the complicated parts of the system away.
By using automated gates like gocyclo and forbidigo, and by leveraging the latest Go features like t.Context(),
you can turn the “vibe” into something tangible and maintainable. But remember: once you hit that 50% context wall,
it’s time to commit, reset, and refresh.