TL;DR
Vibe-coded apps can go to production. Most of the ones I see need four things before they survive real traffic: an honest audit, a triage call about refactor vs rewrite, AI-assisted remediation wired into real feedback loops, and more observability than a traditional agency would bother with.
I am Mykhailo, CTO of DestiLabs. We clean up and ship AI-assisted codebases for a living. This is the exact process I run when a founder shows up with a Cursor, Claude Code, Lovable, bolt.new, v0, or Replit project and says "can this go live?"
Short answer: usually yes. Sometimes no. Always only after you know which one.
What Does "Vibe-Coded" Actually Mean?
A vibe-coded application is one where most of the code was produced by an AI coding tool - Cursor, Claude Code, Lovable, bolt.new, v0, Replit Agent, Windsurf, or similar - with a human in a steering role rather than a line-by-line authoring role. The term is loose on purpose. What matters for production readiness is not how the code was written, but whether the process had guardrails.
Almost every codebase I inspect now has meaningful AI involvement. The real spectrum is this:
- One-shot MVPs: a founder prompted a builder tool until the demo worked. No tests, no types in the risky places, one file per feature, secrets in the repo. These need the most work.
- Scaffolded vibe-coded apps: code generated by AI but reviewed by someone who knows the domain. Integration boundaries are roughly correct. These are cheaper to fix than their owners think.
- AI-assisted production codebases: humans drive, AI accelerates. These are what serious teams look like in 2026.
"Can vibe-coded apps go to production?" is not a yes or no question. It is a triage question. Below is how I run that triage.
Step 1: Do a Paid Audit Before You Commit to Anything
We start every engagement with a paid audit. Free audits produce polite answers. Paid audits produce truthful ones, and they give the founder a deliverable whether they hire us for the fix or not.
An audit on a vibe-coded codebase is not the same as a traditional code review. The failure modes are different. What I look for:
- Duplication. AI tools reinvent the same function in three different files under three different names. We run our own in-house duplication detector on top of the usual static analysis because off-the-shelf tools miss semantic duplicates - two different implementations of the same business rule.
- Secrets and credentials. API keys in the repo, in the client bundle, in commits that were "fixed" in a later commit but still sit in the git history.
- Trust boundaries. Where does untrusted input enter the system, and where does it get parsed, validated, or escaped? Vibe-coded apps routinely skip validation because the LLM "remembered" a happy path.
- Data model consistency. Does the database schema match what the code actually writes? I have seen migrations that silently drifted from the ORM for months.
- Dependency hygiene. Unpinned versions, abandoned packages, half-migrated frameworks.
- Error handling shape. Does an error surface to the user, get silently swallowed, or crash the process? All three patterns usually coexist in the same codebase.
Good old static analysis still pulls weight here. Linters, type checkers, dependency auditors. But the highest-signal finding in most audits is usually duplication: it is the cheapest proxy for "how much of this code does the author actually understand?"
You want the audit output to be a written, ranked list of risks, each tied to a file path and a concrete fix. Not a vibe check.
Step 2: The Triage Call - Refactor or Rewrite?
After the audit we get on a call with the founder and say, plainly, whether this patient can still be saved or whether it should be put to rest. That framing works because it forces a binary decision instead of a slow, expensive middle path.
Surprisingly many founders pick the rewrite. Once you put the real number on "fix what you have" versus "start clean with the lessons you learned," the math often favors the second option - especially when the MVP was built to answer a product question, not to be a durable system.
Here is the rubric I use live on that call:
| Signal | Lean Refactor | Lean Rewrite |
|---|---|---|
| Core data model | Mostly right, some drift | Confused or contradicted by code |
| Duplication rate | Bounded, in a few hotspots | Everywhere, same logic in 3+ shapes |
| Test coverage | Zero but the surface area is small | Zero and the surface area is large |
| Stack choice | Reasonable for the problem | Wrong language or framework for scale |
| Auth and payments | Present but messy | Missing, stubbed, or half-integrated |
| Founder's understanding of the code | "I know where things live" | "I prompted it into existence" |
| Business model | Validated, metrics exist | Still searching for product-market fit |
If most signals point to rewrite, we rewrite. Usually in 4 to 8 weeks, with real tests, a clean data model, and the prompts, skills, and playbooks we built during the audit still applying.
If most signals point to refactor, we refactor. Incrementally, behind feature flags, with the audit's risk list as the backlog.
The worst outcome is a half-commitment: pouring three months into a refactor of a codebase that needed a rewrite. That is what the honest call prevents.
Step 3: AI-Assisted Remediation With Real Feedback Loops
Fixing a vibe-coded app with more vibe-coding does not work. Fixing it with an AI agent wired into real feedback loops works extremely well. This is the part most teams get wrong.
We use Claude Code heavily. Past project context goes into skills files (skills.md) so the agent does not relearn the same lesson every session. Every change lands behind a human code review - no exceptions, including when the change was a one-liner the agent was "sure" about.
The single highest-leverage move: design feedback loops that the AI agent can run itself.
A few patterns we use repeatedly:
- Fixing payments? The agent must run a full end-to-end payments loop in a Stripe sandbox: create a customer, attach a payment method, charge, handle the webhook, verify the ledger. Not unit tests. The full loop. If it passes, the fix is real. If it does not, the agent iterates without waking me up.
- Fixing a UI bug? The agent spins up a headless browser, navigates to the page, performs the user action, screenshots the result, and diffs against the expected state. "The button is blue now" from a screenshot diff is a far stronger signal than "the test passes."
- Fixing a backend contract? The agent hits the real endpoint with the real payload shape against a staging environment and parses the response. No mocks. Mocks drift.
Integration tests with real external calls - not mocks - are the primary quality gate. A mock tells you the shape you imagined is consistent with itself. A real call tells you whether your code works.
Things that are still hard to close the loop on, honestly: video generation, voice conversations, anything that requires a remote VM, anything driven by physical hardware. For these we fall back to tactical tricks - recording short fixtures, replaying transcripts, narrowing the surface area the agent is allowed to touch - but the loop is weaker. I tell clients this up front.
The human code review is the backstop. The agent catches most of its own mistakes when the feedback loop is tight. The reviewer catches the rest.
Step 4: Ship With More Observability Than You Think You Need
Vibe-coded code fails in surprising ways. You pay for that surprise once, in debugging time, when something breaks at 2am. The cheapest insurance is tracking everything.
We reuse CI/CD pipelines, tracing, and metrics heavily across projects. On a typical production-ready vibe-coded app we wire in:
- Structured logs at every trust boundary - inbound request, outbound API call, database write.
- Distributed tracing so that a single user action can be followed from the browser through every service it touches.
- Error tracking with source maps, release tags, and user context.
- Product analytics on the actual user-facing events.
- LLM-call logging for any AI features - prompts, responses, tokens, latency, cost. This is non-negotiable for agentic features.
- Synthetic monitoring on the critical user paths that would lose money if they broke.
This is more than a traditional agency would install on a comparable project. I am explicit about that with clients. The point is simple: when something breaks in a codebase that was 70% AI-generated, you want as much context as possible when you go looking. You cannot re-read the code and trust your memory. The logs are the memory.
The corollary: cost controls on AI spend must ship before the feature does. Rate limits, per-user caps, model-routing rules, alert thresholds. A single runaway loop can drain a month of margin in an afternoon.
When a Rewrite Actually Wins
Sometimes the right answer is to scrap it. Situations where I recommend a rewrite without hesitation:
- The codebase is in a language or framework the team cannot hire for, and scale is the near-term goal.
- The data model is fundamentally wrong and migrating it would touch every file anyway.
- Core security primitives (auth, payments, PII handling) were stubbed and would need to be rebuilt from scratch.
- The product direction changed after MVP and the code encodes the old direction.
- The founder does not want to own this codebase long-term.
The counterintuitive truth: a rewrite with Claude Code, a clear spec, and the lessons from the old codebase takes less time than most founders expect. The MVP already de-risked the product question. The rewrite is a pure execution play, and execution is where AI coding tools are strongest.
Frequently Asked Questions
Can vibe-coded apps go to production?
Yes, if you audit them, triage honestly, fix with AI-assisted feedback loops, and ship with more observability than usual. Most vibe-coded MVPs need real work before they handle paying customers, but the work is bounded and doable.
Is vibe-coded code production-ready by default?
No. AI coding tools optimize for "the demo works." Production readiness requires explicit investment in tests, security, data integrity, and observability. That investment is what this playbook describes.
How do I fix a vibe-coded codebase without making it worse?
Start with a paid audit to map the real risks. Then decide refactor versus rewrite on the strongest signals (data model, duplication, stack fit). Make changes behind feature flags, wire AI agents into real feedback loops (sandboxes, headless browsers, real API calls), and require human code review on every merge.
How long does it take to get a vibe-coded app production-ready?
Most engagements we run take 4 to 10 weeks. Refactors land on the shorter end when the audit surface area is small. Rewrites usually take 6 to 8 weeks because the product question is already answered and the spec is clear.
Should I rewrite my Lovable, bolt.new, or v0 MVP before scaling?
Usually not immediately - but plan for it. These tools are excellent for validating a product. They are weaker when the system needs to survive real concurrency, real auth, and real compliance. Run the triage rubric above. If the data model and stack are reasonable, refactor. If either is wrong, rewrite before growth forces the issue.
Do I still need human code review if I use Claude Code or Cursor?
Yes, always. The AI agent catches its own mistakes when feedback loops are tight, but the reviewer catches the ones the agent cannot see - architectural drift, security assumptions, business-rule conflicts. Human review is not optional on anything that touches production.
Key Takeaways
- 1Vibe-coded apps are not one thing. Triage the specific codebase before you commit to refactor or rewrite.
- 1Pay for the audit. Paid audits surface duplication, drifted schemas, and missing trust boundaries - the real risks. Free audits surface opinions.
- 1The refactor-versus-rewrite call should be binary and honest. Half-commitments cost more than either path done cleanly.
- 1AI-assisted remediation works when the agent can run real feedback loops - payments in a sandbox, UI in a headless browser, backends against staging. Mocks do not count.
- 1Human code review stays on every merge. No exceptions.
- 1Observability is the insurance policy on AI-generated code. Track more than a traditional agency would, be explicit about it with clients, and include AI-specific cost controls.
Want a CTO-Grade Audit of Your Vibe-Coded Codebase?
If you built an MVP with Cursor, Claude Code, Lovable, bolt.new, v0, or Replit and you need it to survive real traffic, we can help. Paid audit, honest triage call, and - if it makes sense - a fixed-scope remediation or rewrite.

Mykhailo Kushnir