Skip to main content

60 Cycles: What I Learned Shipping a Portfolio on an Adversarial Ship-Loop

Share:XLinkedInHN
Determinism diagram used as a stand-in for the ship-loop cycle

Written at cycle 66. The loop has since continued past cycle 80 — the same adversarial-audit + fix pattern has kept finding real regressions (a Lighthouse-driven 28.9 MB → 3.3 MB GIF conversion, a broken Ctrl+K race the loop introduced and then caught, a dozen aria-label + content-drift fixes from a workflow-shaped audit). See /now for current status. Everything below is a snapshot as of that day.

A few days ago I gave a Claude Code agent one instruction: fix everything; do not stop until I say so; run multiple simulations and fix them. Then I let it run.

The loop ran for four days across ~65 cycles at the time of writing. Each cycle followed the same shape:

  1. Hunt. Spawn an adversarial auditor with browser access. Attack the deployed site.
  2. Rank. Emit a JSON: top-3 fixes by leverage, with concrete file paths and reasoning.
  3. Ship. Apply the top fix, type-check, commit, push. Vercel's GitHub App auto-deploys.
  4. Verify. Wait for the deploy. Spawn a fresh auditor. Confirm the fix is live. Move on.

No human in the loop except me saying continue. Here's what actually happened.

What shipped

I don't have a great count of features because "feature" is fuzzy. But here's the surface area that didn't exist five days ago:

  • 11 technical essays (~15,000 words) with real SVG diagrams for each, per-post OG images, TOC, reading progress, code-copy buttons, share buttons, and hero thumbnails on the listing.
  • 3 interactive playgrounds. HNSW greedy search with click-to-place-points and animated hops. CipherStack LRU rotation with live 429 cooldown. Dyx latency waterfall with sliders for STT/LLM/TTS/network stages.
  • ~28 tag pages at /blog/tag/<slug>, each with its own generated OG card, CollectionPage JSON-LD, and a related-topics cloud.
  • Global ⌘K palette indexing every page, section, post, project, and external surface. Focus-trapped, scroll-locked, aria-live result count. There's a search icon in the dock for the keyboard-averse.
  • Client-side fuzzy search on /blog (fuse.js, ~15KB).
  • /uses + /now + "Recent writing" on home + "Last shipped Nh ago" pill pulled from git log at build time.
  • RSS + Atom + JSON feeds wired properly, with an XSLT stylesheet so opening /feed.xml in a browser renders as a styled page instead of raw XML.
  • JSON-LD everywherePerson, WebSite, Article, BreadcrumbList, CollectionPage, ItemList — plus sitemap with <image:image> entries pointing at the generated OG cards.

That's the fun list. The more important list is what broke.

What broke and why

1. My deploy pipeline was dark for ~15 cycles

At some point around cycle 46, my Vercel CLI token expired. Every subsequent vercel --prod --yes returned exit 0 but silently errored: The specified token is not valid.

I only tailed the last five lines of each deploy output and celebrated the exit code. I would have kept going forever.

Two things saved this:

  • Vercel's GitHub App integration was auto-deploying every git push to master in parallel. So even though my CLI calls were no-ops, the site kept updating.
  • Cycle 60's auditor tried to fetch a specific new URL (/blog/tag/rag/opengraph-image) that had just shipped, got a 404, and refused to write a passing verify. That surfaced the bigger question ("what's actually deployed?") and I found the token error minutes later.

Lesson. Loops that only observe their own logs will converge on comfortable delusions. You need an external verifier that isn't wired to the same pipeline. That's what the audit agents do at their best.

2. An edge-runtime route silently 404'd

Cycle 59 shipped per-tag OG cards. I set runtime = "edge" on the OG route because the docs example did. But my route calls getBlogPosts(), which reads MDX from the filesystem via Node's fs. Edge runtime has no fs. So every one of ~28 tag OG cards was a 404, and every tag's LinkedIn/Twitter share preview fell back to the generic Kaushik.png.

The rest of the site kept working, so the build didn't fail. The Vercel dashboard didn't complain. It took an auditor curl'ing the OG URL directly to notice.

Lesson. If you have a route that should return an image, add a check that specifically fetches an image content-type. Content-type mismatches are the class of bug that don't page you but do quietly wreck your share previews for months.

3. My auditor sometimes hallucinated a passing verify

Twice during the run — I noticed both times, once caught it, once didn't — the auditor claimed features were live that weren't. In one case they described a UI element in enough detail that I believed them; only when a later cycle checked the same URL and it 404'd did I realize the earlier verify was fictional.

Lesson. Model-based verifiers are cheaper than end-to-end integration tests, but they're not free of errors. Every "verify" turn in this loop should end with a concrete artifact: a curl -w '%{http_code}' command that returned exact bytes, a screenshot with a computed pixel hash, something the next cycle can reproduce. Otherwise your verify layer degrades into a trust exercise.

4. RSS silently rendered as raw XML for one cycle

I shipped an XSLT stylesheet so /feed.xml would render as a nice styled page when a human opens it in a browser. It didn't work: Chromium refuses to apply <?xml-stylesheet?> when the response has X-Content-Type-Options: nosniff and Content-Type: application/rss+xml. Feed readers detect RSS via the <rss> root element, not the MIME, so switching to application/xml fixed the browser view without breaking subscribers.

Nobody would have noticed this for weeks except that the next auditor happened to open the raw feed URL in a browser and screenshot it.

Lesson. Security headers interact non-obviously with content-negotiation. Every new header should be tested against every content-type the site returns.

Meta-lessons about ship-loops

Loops don't naturally converge. After ~40 cycles the top-of-loop question shifted from "what's broken?" (concrete, actionable) to "what could be better?" (unbounded). The auditors kept generating rank-3 lifts, but rank-3 was often add a guestbook or add a newsletter — features that a real user might want and a real portfolio-owner would defer. Without a human injecting values, the loop optimizes for local excitement, not global fit.

The main cost of a loop isn't cycles, it's rework. Cycles 62–64 were entirely a bug I introduced in cycle 62 (a focus trap that didn't actually trap). Every "polish" cycle risks becoming a rework cycle. The rate at which real work happens plateaus fast.

Adversarial verification catches things adversarial hunting misses. A hunter searching for problems finds classes of problems. A verifier checking a specific claim finds cases where the claim was false. Both matter. They shouldn't be the same agent.

Auto-deploy is a superpower and a trap. Push → live in 90 seconds is thrilling. It also means every accident ships. In cycle 42 I checked in ~500KB of screenshot files that a test agent had scattered in the working directory. git add -A will do that.

What the loop couldn't do

  • It couldn't decide when to stop. The stop condition ("do not stop until I say so") is unbounded — there's always another lift.
  • It couldn't tell me when the site was good, only that it was not obviously broken right now.
  • It couldn't consult external stakeholders — recruiters, mentors, actual users. It relied entirely on my own past preferences and its own inferred sense of quality.
  • It couldn't rewrite prose. The 11 essays on the blog were content I drafted separately; the loop optimized their delivery surface, not their substance.

The loop is good at removing errors and adding structure. It's bad at judgment.

What I'd change

If I ran another one, I'd:

  • Instrument the pipeline first. A cheap script that curls a known-changed URL after each deploy and diffs the response. Catches the "silent no-op" class of bug on the first cycle instead of the fifteenth.
  • Require concrete verify artifacts. Every claim in the verify JSON needs to be reproducible from a shell command. If it can't be, the auditor should say so and not pass the check.
  • Cap "polish" cycles. After N cycles without a rank-1 lift being genuinely load-bearing, force a content or scope decision.
  • Separate the verifier from the hunter. Different agents, different prompts, different context. The verifier has one job: confirm this specific claim by fetching this specific URL with this specific probe.

The loop is a good tool. Left alone with an unbounded directive, it still ships. But every subsequent cycle it gets a little less efficient at producing real value, and the marginal thing shipped is a little more decorative. That's not a failure of the tool — it's the shape of the problem.

The site you're reading is the artifact. You can inspect any of the pieces. The /uses page tells you what tools I ran the loop with. The command palette (press ⌘K or / anywhere on the site) knows every URL. The three playgrounds — HNSW, CipherStack rotation, and Dyx latency — are the differentiators the loop kept converging on.

The last thing the loop shipped, before I said stop, was this post.

Cite as: Saravanan, K. (2026). 60 Cycles: What I Learned Shipping a Portfolio on an Adversarial Ship-Loop. Kaushik Saravanan. https://www.kaushik.cv/blog/60-cycle-ship-loop-retrospective