Originally published on AIdeazz — cross-posted here with canonical link.
After shipping a dozen AI agents into production—from financial advisors that text Venezuelan migrants via WhatsApp to multi-agent procurement systems running on Oracle Cloud—I've developed a specific definition. An AI agent observes its environment, decides on actions, executes those actions, and persists state between runs. Everything else is a chatbot with extra steps.
The Four Components That Define an AI Agent
When I started AIdeazz, I built chat wrappers. Called them agents. They weren't. Real agents need four components working together:
1. Observation: The agent must gather data from its environment. Not just user messages—system logs, API responses, database states, time of day, previous interaction history. My WhatsApp financial literacy agent observes not just the current question but the user's entire conversation history, their timezone, and their progress through educational modules.
2. Decision: Based on observations, the agent selects actions from its available set. This isn't prompt engineering—it's routing logic. My procurement agent decides whether to query inventory, contact a supplier, or escalate to a human based on stock levels, order urgency, and supplier response times.
3. Action: The agent executes in the real world. Database writes. API calls. Message sends. File generation. My invoice processing agent doesn't just extract data—it updates Oracle Fusion, triggers approval workflows, and sends notifications.
4. State Persistence: The agent remembers across sessions. Not just conversation history—goals, progress, pending tasks, learned preferences. Without state, you have a stateless function, not an agent.
Most "AI agents" fail this definition at step 3 or 4. They observe (read prompts) and decide (generate responses) but can't act beyond returning text or persist meaningful state beyond chat history.
Production Architecture: Multi-Agent Systems on Oracle
In production, single agents rarely work alone. My procurement system runs five specialized agents:
- Inventory Monitor: Observes stock levels, predicts depletion, triggers reorder workflows
- Supplier Liaison: Manages vendor communications, tracks response times, maintains relationship scores
- Price Optimizer: Analyzes market data, suggests order timing, negotiates bulk discounts
- Quality Auditor: Reviews shipments, tracks defect rates, updates supplier ratings
- Orchestrator: Coordinates the other four, resolves conflicts, maintains system state
Each agent runs as a containerized service on Oracle Cloud Infrastructure. They communicate through a message queue (Oracle Streaming) with a shared state store (Autonomous Database). The orchestrator prevents race conditions and ensures consistency.
Architecture decisions that matter:
- Synchronous vs Asynchronous: Async everywhere except user-facing confirmations. Agents process observations in parallel.
- State Storage: JSON documents in Oracle Autonomous. Each agent maintains local state with periodic sync to global state.
- Failure Handling: Each action is idempotent. Failed actions retry with exponential backoff. State rollback on critical failures.
- Resource Limits: CPU/memory caps per agent. Groq for speed-critical decisions, Claude for complex reasoning. Hard timeout of 30 seconds per decision cycle.
This isn't theoretical. This system processes ~1,200 purchase orders monthly for three companies in Panama City.
Routing Decisions: When to Use Groq vs Claude
Model routing is where production differs from demos. My agents use three decision paths:
Groq (Llama 3.1 70B): Speed-critical, well-defined decisions. Inventory level checks, simple classifications, standard response generation. 200ms average latency. Costs $0.002 per decision.
Claude (Sonnet 3.5): Complex reasoning, multi-step planning, nuanced communication. Supplier negotiation, quality analysis, conflict resolution. 2-3 second latency. Costs $0.015 per decision.
Hardcoded Logic: Deterministic operations. Database queries, calculation, data validation. 10ms latency. Free.
The router itself uses Groq—a quick classification of complexity and urgency. High-urgency + low-complexity goes to Groq. Low-urgency + high-complexity goes to Claude. Everything else tries hardcoded logic first.
Real example: Invoice processing agent receives a PDF. Router sends to:
- Hardcoded logic: Check file format, size, sender domain
- Groq: Extract standard fields (invoice number, date, amount)
- Claude: Handle exceptions (non-standard formats, multiple currencies, partial payments)
This hybrid approach cut our per-invoice processing cost from $0.08 (all Claude) to $0.012 (mixed routing) while maintaining accuracy.
State Management: Beyond Chat Memory
Chat memory isn't agent state. Agent state includes:
Goal Trees: What the agent is trying to accomplish. My tutoring agent maintains a tree of learning objectives per student. Each interaction updates progress percentages.
Task Queues: Pending actions with priorities and deadlines. The procurement orchestrator maintains queues per supplier with SLA tracking.
Relationship Maps: Understanding of entities and their connections. My CRM agent tracks not just contact info but interaction patterns, response rates, deal probabilities.
Learned Parameters: Optimized thresholds and weights. The pricing agent adjusts markup percentages based on win/loss rates.
We store state in three layers:
- Hot State: Redis cache for active sessions. Sub-millisecond access. 24-hour TTL.
- Warm State: Oracle Autonomous JSON. Recent history and active goals. 50ms query time.
- Cold State: Object Storage for full history. Compressed monthly archives. Retrieved only for analysis.
State sync is eventually consistent. Agents write to hot state immediately, batch sync to warm state every 60 seconds. Critical state changes (goal completion, task creation) trigger immediate sync.
Telegram and WhatsApp: Agents in Messaging Constraints
Half my agents operate through Telegram or WhatsApp. This introduces specific constraints:
Message Limits: WhatsApp Business API allows 1,000 free conversations/month, then $0.005-0.08 each. Agents must batch responses and avoid conversation splits.
Media Handling: Both platforms compress images, limit file types. My document analysis agent converts everything to PDF before processing.
Session Windows: WhatsApp's 24-hour session window means agents must prompt users before timeout or pay to reinitiate.
User Expectations: Instant responses expected. My architecture uses streaming responses—send "typing" indicator immediately, stream chunks as processed.
Telegram is more flexible (no conversation charges, better API limits) but reaches fewer users in Latin America. I route tech-savvy users to Telegram, everyone else to WhatsApp.
Implementation pattern that works:
1. Receive message via webhook
2. Send immediate acknowledgment (typing indicator)
3. Queue for processing with 5-second timeout
4. If simple query: respond directly
5. If complex: send progress updates every 2 seconds
6. If timeout: apologize, offer simpler options
Common Failure Modes in Agent Design
From my failures and partial successes:
The Overconfident Agent: Early versions made decisions without confidence thresholds. My financial advisor once recommended cryptocurrency to a risk-averse retiree. Now every decision includes confidence scoring. Below 0.7 triggers human review.
The State Explosion: Agents that remember everything eventually slow down. My first CRM agent loaded full history for every decision. Now we use sliding windows—last 30 days hot, last 180 days warm, everything else cold.
The Infinite Loop: Agent A asks Agent B for data. B needs approval from A. Deadlock. Solution: global interaction counter with hard limits. Any agent conversation over 10 exchanges triggers orchestrator intervention.
The Hallucination Chain: One agent's hallucination becomes another's trusted input. My procurement agent once ordered 10,000 units instead of 100 because the inventory agent hallucinated a decimal point. Now all inter-agent communications include source citations and confidence scores.
The Cost Spiral: Early multi-agent system burned $500 in a day due to recursive Claude calls. Now every agent has daily spend limits and falls back to cheaper models when approaching limits.
Real Constraints: Latency, Cost, and Reliability
Production agents face constraints demos ignore:
Latency Budget: Users tolerate 3-5 seconds for complex requests, under 1 second for simple ones. My architecture allocates:
- Network round trip: 200ms
- Database query: 100ms
- LLM inference: 500ms-3s (model dependent)
- Business logic: 200ms
- Buffer: 500ms
Cost Per Decision: Running agents at scale means watching pennies:
- Groq: $0.002/decision
- Claude: $0.015/decision
- Database read: $0.0001
- Database write: $0.0003
- Message send: $0.005 (WhatsApp)
A complex user interaction might involve 5 decisions, 10 database operations, 3 messages. Total cost: ~$0.10. Price accordingly.
Reliability Requirements: My financial advisory agent maintains 99.5% uptime. Architecture implications:
- Multi-region deployment (Oracle Ashburn and Phoenix)
- Database replication with automatic failover
- Circuit breakers on all external APIs
- Graceful degradation (if Claude is down, use Groq with warnings)
- Health checks every 30 seconds with auto-restart
Building Your First Real Agent
Skip the chat wrapper. Build something that acts:
Pick One Real Action: Database write, API call, file generation. Not just text responses.
Design State First: What must persist between runs? User preferences? Task progress? Learned parameters?
Start Single-Agent: Multi-agent is powerful but complex. One agent that does one thing well beats five that step on each other.
Instrument Everything: Log every observation, decision, action, state change. You'll need these when debugging why your agent ordered 10,000 widgets at 3 AM.
Plan for Failure: Every external call will fail. Every LLM will hallucinate. Every state will corrupt. Design recovery into the architecture.
My simplest production agent monitors server logs and creates Jira tickets for errors. 200 lines of Python. Saves 2 hours daily. That's an agent—observes (logs), decides (error severity), acts (creates tickets), persists (tracks which errors are already reported).
What is an AI agent? A system that observes, decides, acts, and remembers. Everything else is varying degrees of chatbot. The difference matters when your code starts moving money, sending messages, or making promises on your behalf.
Top comments (0)