Pavel Gajvoronski

Posted on Apr 15

I Built 23 Pages in One Day With AI. Then One API Key Almost Killed Everything

#ai #agents #buildinpublic #architecture

This is a build-in-public update on Kepion — an AI platform that deploys companies from a text description. First post here.*

"This is a build-in-public update..."

Two days ago I shared the architecture. Today I want to share what actually happened when I started building — the wins, the disasters, and the numbers.

The disaster: 3 hours lost to a phantom API key

I sat down at 8am ready to build. Opened my terminal. Ran GSD-2 (my build orchestrator). Got this:

Error: All credentials for "anthropic" are in a cooldown window.

My Max plan showed 3% usage. The tool said I was rate-limited. For three hours I debugged, restarted, cleared caches, filed a support ticket. The fix?

unset ANTHROPIC_API_KEY

An old API key from a previous tool installation was silently overriding my subscription. One environment variable. Three hours gone.

The lesson: invisible defaults are the most dangerous bugs in AI tooling.

I'm sharing this because every developer building with AI agents will hit this. Your LLM provider's auth layer has more failure modes than your application code.

What GSD-2 actually built in one day

Once the auth was fixed, I pointed GSD-2 at Kepion and let it work. Here's the raw output from a single day:

Security hardening (10 items)

Deny-by-default auth middleware — every new route is blocked unless explicitly whitelisted
Path traversal fix in vault manager
WebSocket authentication (was anonymous before)
CORS whitelist replacing wildcard *
Password policy: 12+ chars, uppercase, digit, special char
Rate limiting by user email instead of IP
Upload validation: file extension whitelist, 5MB limit
Business ownership verification on all endpoints
Session scoping by user_id
Login attempt tracking with 30-minute lockout after 10 failures

Observability (shipped)

Every HTTP request gets a trace_id
Every agent call becomes a span linked to the trace
Slow trace detection (>5s)
Error trace listing
All persisted in SQLite

Cost intelligence (shipped)

Per-agent, per-model, per-business cost breakdown
Anomaly detection: flags agents with z-score > 2σ above mean
Cost circuit breaker: blocks requests at configurable limits

Team Memory (shipped)

Agents save learnings across sessions
Effectiveness scoring (0.0–1.0)
Auto context injection — relevant memories prepended to prompts
Categories: solution, pattern, mistake, optimization

Checkpoint & Replay (shipped)

Checkpoint after every chain step
Resume on failure with can_resume: true
Dead letter queue for chains that fail after all retries
Configurable retry policies: default, critical, fast_fail

Event-driven triggers (shipped)

5 trigger types: schedule, webhook, event_pattern, vault_change, threshold
4 action types: run_agent, run_chain, webhook_out, notify

Web UI: 23 pages (shipped)

Full Next.js 16 dashboard with collapsible sidebar
Dashboard, Chat, Agents, Pipelines, Businesses, Integrations
Vault, Research, Patterns, YouTube, Workflows, Gate
Costs, Traces, Triggers, Admin, Pricing, Account
Live support chat widget with typing indicators
Pricing page with 5 tiers and competitive comparison table

Telegram bot: fully functional (shipped)

/start with auto-registration and JWT token storage
/agents, /agent, /business, /status, /costs, /help
Free text → auto-routing to the right agent
Typing indicators while agents think
Auth headers on every API call

The numbers

Metric	Value
Services	30+
API endpoints	40+
Agent prompts (v3)	31 × 17 sections each
Tests	180+
Web UI pages	23
Telegram commands	15
Lines changed in one day	~3,000+

One person. One AI build tool. One day.

What I learned

1. Security is invisible until it isn't. Nobody sees path traversal protection. But without it, the first user with ../../etc/passwd in a vault search owns your server. I'm glad GSD-2 caught every item from the CONCERNS.md audit.

2. Observability changes everything. Before traces, debugging a 5-agent chain was guesswork. Now I can see: request → router (2ms) → researcher (4.3s) → sentinel (1.1s) → warden (0.8s) → response. The bottleneck is always the researcher.

3. Cost circuit breakers are non-negotiable. Without them, one hallucinating agent in a loop burns through your OpenRouter budget in minutes. Our circuit breaker has 4 levels: per-request ($2), per-agent-hourly ($10), per-business-daily ($50), platform-hourly ($100).

4. Team Memory is the moat. Every business Kepion creates makes the next one better. Agents save what worked and what failed. Business #5 benefits from patterns discovered in businesses #1-4. This compounds. Competitors can copy the code — they can't copy the accumulated knowledge.

What's next

Autonomous Operations — agents posting to Twitter, sending emails, running outreach. Every output goes through Sentinel (fact-check) and Warden (quality gate) before publishing. Quality over spam.
Full Deploy Pipeline — /deploy chess-school → buy domain → deploy frontend (Vercel) → deploy backend (Railway) → configure Paddle payments → live URL. One command.
Code Ownership — all generated code pushes to the user's GitHub. You own everything. Kepion is the builder, not the landlord.

Questions for you

I'm genuinely curious:

How do you handle AI agent costs in production? We built a 4-tier model routing system (Free → Budget → Performance → Premium) with auto-escalation on failure. Is anyone doing this differently?
Team Memory vs RAG — what's your experience? We went with vault-based memory with effectiveness scoring instead of pure vector search. The scoring means bad memories decay. Has anyone combined both approaches?
What's your threshold for "good enough" security in an MVP? We went aggressive (deny-by-default, path traversal, rate limiting) before launch. Some say ship fast, secure later. Curious where others draw the line.

Follow the build: GitHub | kepion.app

DEV Community