🎙️ Voice-Controlled Local AI Agent

Built as part of the Mem0 AI — MLOps and AI Infra Internship Assignment

A voice-controlled AI agent that accepts audio input, classifies intent, executes local tools, and displays results in a clean UI — all powered by Groq's free API.

📸 Demo

🎥 Video Demo: [YouTube Link Here]
📝 Article: [Dev.to/Medium Article Link Here]

✨ Features

🎤 Dual Audio Input — Record from microphone OR upload .wav/.mp3 files
🔊 Speech-to-Text — Powered by Groq Whisper Large V3
🧠 Intent Detection — LLaMA3-70B classifies user commands into 4 intents
⚡ Tool Execution — Creates files, writes code, summarizes text, or chats
🧩 Session Memory — Tracks all actions taken during the session
🖥️ Clean Gradio UI — Shows transcription, intent, action, and output

🏗️ Architecture

User speaks
    ↓
Audio file (.wav/.mp3)
    ↓
[stt.py] Groq Whisper → Transcribed Text
    ↓
[intent.py] LLaMA3-70B → Intent Classification
    ↓
[tools.py] Tool Router
    ├── create_file  → Creates file in output/
    ├── write_code   → Generates code, saves to output/
    ├── summarize    → Summarizes text, saves to output/summary.txt
    └── general_chat → LLM conversation
    ↓
[app.py] Gradio UI → Displays all results
    ↓
Session History (In-memory)

🛠️ Tech Stack

Component	Technology	Why Chosen
UI	Gradio 4.x	Fast to build, professional look
STT	Groq Whisper Large V3	Free, fast, no GPU required
LLM	Groq LLaMA3-70B	Free API, excellent performance
File Ops	Python pathlib	Built-in, reliable
Env Vars	python-dotenv	Secure API key management

🔧 Hardware Note

This project uses Groq's API for both Speech-to-Text and LLM inference instead of running models locally. This was chosen because:

Groq provides free tier access
It achieves ultra-fast inference (faster than most local setups)
It makes the project hardware-agnostic — runs on any machine
Whisper Large V3 and LLaMA3-70B are available on Groq for free

🚀 Setup Instructions

Step 1 — Prerequisites

Python 3.9 or higher installed
A free Groq account

Step 2 — Clone the Repository

git clone https://github.com/YOUR_USERNAME/voice-ai-agent.git
cd voice-ai-agent

Step 3 — Install Dependencies

pip install -r requirements.txt

Step 4 — Set Up API Key

Go to console.groq.com
Sign up for free
Click API Keys → Create API Key
Open the .env file and replace your_groq_api_key_here with your actual key:

GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxx

Step 5 — Run the App

python app.py

Then open your browser and go to: http://localhost:7860

📁 Project Structure

voice-ai-agent/
├── app.py              ← Main Gradio UI application
├── stt.py              ← Speech-to-Text (Groq Whisper)
├── intent.py           ← Intent classification (LLaMA3)
├── tools.py            ← Tool execution (files, code, summarize, chat)
├── config.py           ← Loads .env file
├── requirements.txt    ← Python dependencies
├── .env                ← Your API key (never commit this!)
├── .gitignore          ← Excludes .env from git
└── output/             ← All generated files go here (auto-created)

🎯 Supported Intents

Intent	Example Command	Action
Create File	"Create a text file called notes"	Creates empty file in output/
Write Code	"Write Python code for a retry function"	Generates & saves code to output/
Summarize	"Summarize the benefits of machine learning"	Generates summary, saves to output/summary.txt
General Chat	"What is artificial intelligence?"	Responds conversationally

🌟 Bonus Features Implemented

✅ Session Memory — All actions tracked throughout the session
✅ Compound Commands — Handles multiple intents in one command
✅ Graceful Degradation — Clear error messages for all failure cases
✅ Safety Constraint — ALL file operations restricted to output/ folder

🙏 Acknowledgements

Built for the Mem0 AI MLOps & AI Infra Internship assignment. Inspired by the concept of persistent memory in AI agents — a core problem that Mem0 is solving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Voice-Controlled Local AI Agent

📸 Demo

✨ Features

🏗️ Architecture

🛠️ Tech Stack

🔧 Hardware Note

🚀 Setup Instructions

Step 1 — Prerequisites

Step 2 — Clone the Repository

Step 3 — Install Dependencies

Step 4 — Set Up API Key

Step 5 — Run the App

📁 Project Structure

🎯 Supported Intents

🌟 Bonus Features Implemented

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
app.py		app.py
config.py		config.py
intent.py		intent.py
requirements.txt		requirements.txt
stt.py		stt.py
tools.py		tools.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ Voice-Controlled Local AI Agent

📸 Demo

✨ Features

🏗️ Architecture

🛠️ Tech Stack

🔧 Hardware Note

🚀 Setup Instructions

Step 1 — Prerequisites

Step 2 — Clone the Repository

Step 3 — Install Dependencies

Step 4 — Set Up API Key

Step 5 — Run the App

📁 Project Structure

🎯 Supported Intents

🌟 Bonus Features Implemented

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages