Voice-Controlled Local AI Agent

Overview

This project implements a local AI agent that processes user input through voice or text, understands intent, performs actions, and returns results through a structured interface.

The system uses local models for speech recognition and language understanding, ensuring it works without paid APIs.

Features

Audio input via file upload (.wav, .mp3)
Text input support
Speech-to-text using Whisper (local)
Intent detection using rule-based logic with LLM fallback (Ollama)
Tool execution:
- File creation
- Python code generation and saving
- Text summarization
- General chat responses
Compound command handling (multiple actions in one input)
Persistent chat history using JSON storage
Clean Streamlit UI displaying:
- Transcription
- Intent
- Final output

Architecture

Input (Audio/Text)
        ↓
Speech-to-Text (Whisper)
        ↓
Intent Detection
        ↓
Action Execution
        ↓
Output + Storage

Project Structure

voice-ai-agent/
│
├── app.py              # Streamlit application
├── stt.py              # Speech-to-text (Whisper)
├── intent.py           # Intent detection logic
├── tools.py            # Action execution
├── memory.py           # Persistent chat storage
├── output/             # Generated files
├── chat_history.json   # Saved chat history
└── README.md

Setup Instructions

1. Clone the Repository

git clone <your-repository-link>
cd voice-ai-agent

2. Install Dependencies

uv add streamlit openai-whisper ollama

3. Install FFmpeg

Whisper requires FFmpeg.

Download: https://www.gyan.dev/ffmpeg/builds/

Add to PATH:

C:\ffmpeg\bin

Verify:

ffmpeg -version

4. Install Ollama

Download: https://ollama.com

Pull model:

ollama pull llama3

Verify:

ollama run llama3

Run the Application

uv run streamlit run app.py

Example Inputs

Chat

what is artificial intelligence

Code Generation

create a python file hello.py with a hello function

File Creation

create a file notes.txt

Summarization

summarize artificial intelligence is transforming industries worldwide

Compound Command

create a file test.txt and write a python file hello.py

Persistent Memory

Chat history is stored locally in:

chat_history.json

This allows the application to retain conversation history across sessions without using a database.

Safety Constraint

All generated files are saved only inside:

output/

This prevents unintended modifications to system files.

Challenges

Handling noisy or inaccurate speech transcription
Ensuring correct intent classification for short or ambiguous inputs
Cleaning LLM outputs for consistent formatting
Managing compound commands reliably
Maintaining persistence without a database

Future Improvements

Real-time microphone input
Improved intent classification using structured prompts
Support for document upload and summarization
Multi-session chat management
More robust error handling for unclear inputs

Author

Thamizha

Summary

This project demonstrates a complete local AI pipeline combining speech recognition, intent understanding, and action execution with persistent memory and modular design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice-Controlled Local AI Agent

Overview

Features

Architecture

Project Structure

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Install FFmpeg

4. Install Ollama

Run the Application

Example Inputs

Chat

Code Generation

File Creation

Summarization

Compound Command

Persistent Memory

Safety Constraint

Challenges

Future Improvements

Author

Summary

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Voice-Controlled Local AI Agent

Overview

Features

Architecture

Project Structure

Setup Instructions

1. Clone the Repository

2. Install Dependencies

3. Install FFmpeg

4. Install Ollama

Run the Application

Example Inputs

Chat

Code Generation

File Creation

Summarization

Compound Command

Persistent Memory

Safety Constraint

Challenges

Future Improvements

Author

Summary