March 2026
Smoke Tests & Harnesses in Vibe Coding
Testing discipline that keeps AI-generated code from blowing up in production.
Smoke Testing
A smoke test is the simplest check you can write — and the most important one. Does the thing start? Does the main path run without immediately blowing up? That's it.
The name comes from hardware: power on a new circuit board and watch for literal smoke. Smoke means stop everything. No smoke means keep going.
In vibe coding, Claude generates 200 lines in 30 seconds. You are not reading all of it carefully. A smoke test tells you whether the output is real before you build anything on top of it.
Survival
Does it import? Does the main function run? Does it reach the database? One assertion is enough. You're checking for smoke, not correctness.
Everything Else
Edge cases, bad inputs, every branch — that's for unit tests. Smoke tests are deliberately blunt. They run fast and they run first.
# Does it even start?
def test_smoke():
from my_app import run
result = run(limit=1)
assert result is not None # smoke clearedTest Harness
A test harness is the scaffolding that lets you run code safely — without hitting real databases, real APIs, or real production. Mocked credentials, in-memory databases, fake responses. Build it once at the start of a project and it pays off forever.
Without a harness, testing means running the real thing — which means you can't run tests freely, can't run them in CI, and definitely can't run them without real credentials in the codebase.
Before Claude writes a single route or function, ask it to set up the test harness. Mock the database, stub the external APIs, write a conftest.py. Build on top of that — not the other way around.
# conftest.py — runs before anything else is collected
from unittest.mock import MagicMock
import bq_helpers
# BQ client fires at module level — patch it before any test loads
bq_helpers.get_bq_client = lambda: MagicMock()
# Every test now runs without GCP credentials or a live database# Route write_dataframe to a local file via env var — no DOMO account needed
os.environ['DOMO_OUTPUT_NAME'] = 'My Output Dataset'
exec(compile(open('t1_prep.py').read(), 't1_prep.py', 'exec'), {})
del os.environ['DOMO_OUTPUT_NAME']
# Tile runs exactly as in production — but locally, instantlyStop Whack-a-Mole.
Reproduce the Error.
When something breaks in vibe coding, the instinct is to describe the symptom to Claude, accept a fix, and move on. This works about 60% of the time. The other 40% the same bug comes back in a different form — because neither you nor Claude understood what actually caused it.
Before you ask Claude to fix anything, reproduce the error in a test first. Now you have something concrete. Claude fixes against a failing test rather than a vague description — and you know the fix worked because the test goes green.
Reproduce it first
Write the smallest possible test that triggers the failure. Don't fix anything yet. If you can't reproduce it in a test, you don't understand it well enough to fix it.
Give Claude the failing test, not the symptom
"This test fails — find out why" is a better prompt than "scoring seems wrong on some deals sometimes." One is concrete. One is a guess.
Fix until the test passes
Green test means the bug is gone. Not "I think that's probably fixed." No more hoping.
Leave the test in
That bug is now a permanent regression test. It can never silently come back. This is how you stop playing whack-a-mole on that issue forever.
# Bug: "some deals are being scored wrong" — too vague
# Step 1: reproduce it
def test_clean_deal_approves():
bgcheck = MagicMock() # looks fine. it's not.
result = score_deal(bgcheck)
assert result.decision == "approve" # FAILS — gets "review"
# Step 2: now we understand it
# MagicMock().ftc_action_found is truthy — it's a MagicMock, not False
# The scorer treats it as a red flag and downgrades every deal
# Step 3: fix — explicit safe defaults on every field
bgcheck.ftc_action_found = False
bgcheck.ofac_hit = False
bgcheck.risk_flags = []
# Test passes. Bug documented. Locked in as a regression test.Security Prompts
AI-generated code is fast and usually functional. It is not written with security in mind — it's written to pass your prompt. Claude won't volunteer that it just wrote a SQL injection vector. You have to ask.
SQL Injection
Dynamic queries built with f-strings instead of parameterized inputs.
Hardcoded Secrets
API keys or tokens written directly into the code instead of env vars.
Unguarded Endpoints
Admin routes or delete handlers with no auth check.
Open CORS
allow_origins=["*"] is fine in dev. Not when there's real data behind it.
Data Exposure
API responses returning full DB rows when you only needed two fields.
Dependencies
Packages with known CVEs Claude pulled in. npm audit / pip audit — one command.
Use these prompts at the end of every session, before you push anything.
Security audit what we just built. Look specifically for: SQL injection via string formatting, hardcoded credentials or API keys, endpoints missing authentication, overly permissive CORS, and API responses leaking more data than needed.
Review the packages we added this session. Are there known vulnerabilities in any of them? Are we importing anything we don't actually need?
List every route or endpoint in what we built. For each one, tell me what would happen if an unauthenticated user hit it directly.
# Hardcoded secrets
grep -r "api_key\s*=\s*['\"]" --include="*.py" .
# f-string SQL (injection risk)
grep -r 'f"SELECT' --include="*.py" .
# Known package vulnerabilities
npm audit
pip auditWork with Nektar
We build production tools for businesses with real domain expertise. If you're sitting on data you've never been able to use — let's talk.
Book a free data audit