← Insights

March 2026

Smoke Tests & Harnesses in Vibe Coding

Testing discipline that keeps AI-generated code from blowing up in production.

Smoke TestingTest HarnessDebuggingSecurity
01

Smoke Testing

A smoke test is the simplest check you can write — and the most important one. Does the thing start? Does the main path run without immediately blowing up? That's it.

The name comes from hardware: power on a new circuit board and watch for literal smoke. Smoke means stop everything. No smoke means keep going.

In vibe coding, Claude generates 200 lines in 30 seconds. You are not reading all of it carefully. A smoke test tells you whether the output is real before you build anything on top of it.

What it checks

Survival

Does it import? Does the main function run? Does it reach the database? One assertion is enough. You're checking for smoke, not correctness.

What it ignores

Everything Else

Edge cases, bad inputs, every branch — that's for unit tests. Smoke tests are deliberately blunt. They run fast and they run first.

Minimal smoke test
# Does it even start?
def test_smoke():
    from my_app import run
    result = run(limit=1)
    assert result is not None   # smoke cleared
02

Test Harness

A test harness is the scaffolding that lets you run code safely — without hitting real databases, real APIs, or real production. Mocked credentials, in-memory databases, fake responses. Build it once at the start of a project and it pays off forever.

Without a harness, testing means running the real thing — which means you can't run tests freely, can't run them in CI, and definitely can't run them without real credentials in the codebase.

Ask Claude to build it first

Before Claude writes a single route or function, ask it to set up the test harness. Mock the database, stub the external APIs, write a conftest.py. Build on top of that — not the other way around.

Patch BigQuery before any test imports run
# conftest.py — runs before anything else is collected
from unittest.mock import MagicMock
import bq_helpers

# BQ client fires at module level — patch it before any test loads
bq_helpers.get_bq_client = lambda: MagicMock()

# Every test now runs without GCP credentials or a live database
DOMO tile harness — runs a 4.5M row pipeline locally in 8 seconds
# Route write_dataframe to a local file via env var — no DOMO account needed
os.environ['DOMO_OUTPUT_NAME'] = 'My Output Dataset'
exec(compile(open('t1_prep.py').read(), 't1_prep.py', 'exec'), {})
del os.environ['DOMO_OUTPUT_NAME']

# Tile runs exactly as in production — but locally, instantly
03

Stop Whack-a-Mole.
Reproduce the Error.

When something breaks in vibe coding, the instinct is to describe the symptom to Claude, accept a fix, and move on. This works about 60% of the time. The other 40% the same bug comes back in a different form — because neither you nor Claude understood what actually caused it.

Before you ask Claude to fix anything, reproduce the error in a test first. Now you have something concrete. Claude fixes against a failing test rather than a vague description — and you know the fix worked because the test goes green.

1

Reproduce it first

Write the smallest possible test that triggers the failure. Don't fix anything yet. If you can't reproduce it in a test, you don't understand it well enough to fix it.

2

Give Claude the failing test, not the symptom

"This test fails — find out why" is a better prompt than "scoring seems wrong on some deals sometimes." One is concrete. One is a guess.

3

Fix until the test passes

Green test means the bug is gone. Not "I think that's probably fixed." No more hoping.

4

Leave the test in

That bug is now a permanent regression test. It can never silently come back. This is how you stop playing whack-a-mole on that issue forever.

Vague bug → failing test → fix (csl-deal-pipeline)
# Bug: "some deals are being scored wrong" — too vague

# Step 1: reproduce it
def test_clean_deal_approves():
    bgcheck = MagicMock()              # looks fine. it's not.
    result = score_deal(bgcheck)
    assert result.decision == "approve"  # FAILS — gets "review"

# Step 2: now we understand it
# MagicMock().ftc_action_found is truthy — it's a MagicMock, not False
# The scorer treats it as a red flag and downgrades every deal

# Step 3: fix — explicit safe defaults on every field
bgcheck.ftc_action_found = False
bgcheck.ofac_hit         = False
bgcheck.risk_flags       = []

# Test passes. Bug documented. Locked in as a regression test.
04

Security Prompts

AI-generated code is fast and usually functional. It is not written with security in mind — it's written to pass your prompt. Claude won't volunteer that it just wrote a SQL injection vector. You have to ask.

💉

SQL Injection

Dynamic queries built with f-strings instead of parameterized inputs.

🔑

Hardcoded Secrets

API keys or tokens written directly into the code instead of env vars.

🚪

Unguarded Endpoints

Admin routes or delete handlers with no auth check.

🌐

Open CORS

allow_origins=["*"] is fine in dev. Not when there's real data behind it.

📤

Data Exposure

API responses returning full DB rows when you only needed two fields.

📦

Dependencies

Packages with known CVEs Claude pulled in. npm audit / pip audit — one command.

Use these prompts at the end of every session, before you push anything.

Prompt 1 — General sweep

Security audit what we just built. Look specifically for: SQL injection via string formatting, hardcoded credentials or API keys, endpoints missing authentication, overly permissive CORS, and API responses leaking more data than needed.

Prompt 2 — Dependency check

Review the packages we added this session. Are there known vulnerabilities in any of them? Are we importing anything we don't actually need?

Prompt 3 — Auth audit

List every route or endpoint in what we built. For each one, tell me what would happen if an unauthenticated user hit it directly.

Quick manual grep
# Hardcoded secrets
grep -r "api_key\s*=\s*['\"]" --include="*.py" .

# f-string SQL (injection risk)
grep -r 'f"SELECT' --include="*.py" .

# Known package vulnerabilities
npm audit
pip audit

Work with Nektar

We build production tools for businesses with real domain expertise. If you're sitting on data you've never been able to use — let's talk.

Book a free data audit