How I Built a Voice-Controlled AI Agent with Groq & Streamlit

#python #ai #streamlit #groq

I built a fully working voice-controlled AI agent that transcribes speech, classifies intent, and executes local tools — all powered by Groq's free AI APIs.

🎯 What It Does

You speak → It listens → It understands → It acts.

Say "Write a Python retry function" → generates code and saves to file
Say "Summarize this text" → returns a clean summary
Say "What is machine learning?" → responds conversationally
Say "Create a file called notes.txt" → creates the file safely

🏗️ Architecture

Audio Input → Groq Whisper (STT) → Groq LLaMA 3.3 70B (Intent) → Tool Execution → Streamlit UI

🧠 Models I Chose & Why

Speech-to-Text: Groq Whisper large-v3

Transcribes audio in under 2 seconds
Free tier available, no GPU needed

LLM: Groq LLaMA 3.3 70B Versatile

Accurately classifies intent from natural speech
Handles compound commands like "write X and save to Y.py"

⚙️ Tech Stack

Streamlit — Web UI
Groq API — STT + LLM
Python — Backend logic

🚧 Challenges I Faced

1. Model Deprecation
During development, llama3-8b-8192 was decommissioned by Groq. I switched to llama-3.3-70b-versatile which is more powerful and still free.

2. Compound Commands
Handling commands like "Write a bubble sort and save it to sort.py" required careful prompt engineering to extract both the intent and filename simultaneously.

3. Safe File Operations
All file writes are sandboxed to an output/ folder with path traversal protection so no system files can be accidentally overwritten.