close

DEV Community

Nishtha Darji
Nishtha Darji

Posted on

How I Built a Voice-Controlled AI Agent with Groq & Streamlit

I built a fully working voice-controlled AI agent that transcribes speech, classifies intent, and executes local tools β€” all powered by Groq's free AI APIs.

🎯 What It Does

You speak β†’ It listens β†’ It understands β†’ It acts.

  • Say "Write a Python retry function" β†’ generates code and saves to file
  • Say "Summarize this text" β†’ returns a clean summary
  • Say "What is machine learning?" β†’ responds conversationally
  • Say "Create a file called notes.txt" β†’ creates the file safely

πŸ—οΈ Architecture

Audio Input β†’ Groq Whisper (STT) β†’ Groq LLaMA 3.3 70B (Intent) β†’ Tool Execution β†’ Streamlit UI

🧠 Models I Chose & Why

Speech-to-Text: Groq Whisper large-v3

  • Transcribes audio in under 2 seconds
  • Free tier available, no GPU needed

LLM: Groq LLaMA 3.3 70B Versatile

  • Accurately classifies intent from natural speech
  • Handles compound commands like "write X and save to Y.py"

βš™οΈ Tech Stack

  • Streamlit β€” Web UI
  • Groq API β€” STT + LLM
  • Python β€” Backend logic

🚧 Challenges I Faced

1. Model Deprecation
During development, llama3-8b-8192 was decommissioned by Groq. I switched to llama-3.3-70b-versatile which is more powerful and still free.

2. Compound Commands
Handling commands like "Write a bubble sort and save it to sort.py" required careful prompt engineering to extract both the intent and filename simultaneously.

3. Safe File Operations
All file writes are sandboxed to an output/ folder with path traversal protection so no system files can be accidentally overwritten.

✨ Bonus Features

  • βœ… Human-in-the-loop confirmation before file operations
  • βœ… Session memory β€” last 4 turns passed as context
  • βœ… Auto fallback if API fails
  • βœ… Compound command support

πŸ”— Links

Thanks for reading! Feel free to star the repo if you found it useful ⭐

Top comments (0)