Built as part of the Mem0 AI — MLOps and AI Infra Internship Assignment
A voice-controlled AI agent that accepts audio input, classifies intent, executes local tools, and displays results in a clean UI — all powered by Groq's free API.
🎥 Video Demo: [YouTube Link Here]
📝 Article: [Dev.to/Medium Article Link Here]
- 🎤 Dual Audio Input — Record from microphone OR upload .wav/.mp3 files
- 🔊 Speech-to-Text — Powered by Groq Whisper Large V3
- 🧠 Intent Detection — LLaMA3-70B classifies user commands into 4 intents
- ⚡ Tool Execution — Creates files, writes code, summarizes text, or chats
- 🧩 Session Memory — Tracks all actions taken during the session
- 🖥️ Clean Gradio UI — Shows transcription, intent, action, and output
User speaks
↓
Audio file (.wav/.mp3)
↓
[stt.py] Groq Whisper → Transcribed Text
↓
[intent.py] LLaMA3-70B → Intent Classification
↓
[tools.py] Tool Router
├── create_file → Creates file in output/
├── write_code → Generates code, saves to output/
├── summarize → Summarizes text, saves to output/summary.txt
└── general_chat → LLM conversation
↓
[app.py] Gradio UI → Displays all results
↓
Session History (In-memory)
| Component | Technology | Why Chosen |
|---|---|---|
| UI | Gradio 4.x | Fast to build, professional look |
| STT | Groq Whisper Large V3 | Free, fast, no GPU required |
| LLM | Groq LLaMA3-70B | Free API, excellent performance |
| File Ops | Python pathlib | Built-in, reliable |
| Env Vars | python-dotenv | Secure API key management |
This project uses Groq's API for both Speech-to-Text and LLM inference instead of running models locally. This was chosen because:
- Groq provides free tier access
- It achieves ultra-fast inference (faster than most local setups)
- It makes the project hardware-agnostic — runs on any machine
- Whisper Large V3 and LLaMA3-70B are available on Groq for free
- Python 3.9 or higher installed
- A free Groq account
git clone https://github.com/YOUR_USERNAME/voice-ai-agent.git
cd voice-ai-agentpip install -r requirements.txt- Go to console.groq.com
- Sign up for free
- Click API Keys → Create API Key
- Open the
.envfile and replaceyour_groq_api_key_herewith your actual key:
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxx
python app.pyThen open your browser and go to: http://localhost:7860
voice-ai-agent/
├── app.py ← Main Gradio UI application
├── stt.py ← Speech-to-Text (Groq Whisper)
├── intent.py ← Intent classification (LLaMA3)
├── tools.py ← Tool execution (files, code, summarize, chat)
├── config.py ← Loads .env file
├── requirements.txt ← Python dependencies
├── .env ← Your API key (never commit this!)
├── .gitignore ← Excludes .env from git
└── output/ ← All generated files go here (auto-created)
| Intent | Example Command | Action |
|---|---|---|
| Create File | "Create a text file called notes" | Creates empty file in output/ |
| Write Code | "Write Python code for a retry function" | Generates & saves code to output/ |
| Summarize | "Summarize the benefits of machine learning" | Generates summary, saves to output/summary.txt |
| General Chat | "What is artificial intelligence?" | Responds conversationally |
- ✅ Session Memory — All actions tracked throughout the session
- ✅ Compound Commands — Handles multiple intents in one command
- ✅ Graceful Degradation — Clear error messages for all failure cases
- ✅ Safety Constraint — ALL file operations restricted to
output/folder
Built for the Mem0 AI MLOps & AI Infra Internship assignment. Inspired by the concept of persistent memory in AI agents — a core problem that Mem0 is solving.
