Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
enhanced_tools.py	enhanced_tools.py
registry.py	registry.py
requirements.txt	requirements.txt
test_semantic_tools_hallucinations.ipynb	test_semantic_tools_hallucinations.ipynb
token_comparison_app.py	token_comparison_app.py
token_efficiency_analysis.ipynb	token_efficiency_analysis.ipynb

Semantic Tool Selection: Reducing Agent Hallucinations

AI agents with many similar tools pick the wrong one and waste tokens. This demo builds a travel agent with Strands Agents and uses FAISS to filter 29 tools down to the top 3 most relevant, comparing filtered vs unfiltered tool selection accuracy.

Based on research: "Internal Representations as Indicators of Hallucinations in Agent Tool Selection"

The Problem

Research (Internal Representations, 2025) identifies 5 critical agent failure modes when tools scale:

Function selection errors - Calling non-existent tools
Function appropriateness errors - Choosing semantically wrong tools
Parameter errors - Malformed or invalid arguments
Completeness errors - Missing required parameters
Tool bypass behavior - Generating outputs instead of calling tools

The dual problem:

❌ Hallucination risk: More tools = more inappropriate selections
❌ Token waste: Sending all tool descriptions on every call (29 tools = ~4,000 tokens per query)

The Solution

Semantic tool selection filters tools before the agent sees them:

Results: Improved accuracy, fewer tokens

Why Strands Agents Makes This Production-Ready

Strands Agents provides native capabilities that enable semantic tool selection in production:

1. Dynamic Tool Swapping

# Add/remove tools at runtime without recreating the agent
agent.tool_registry.register_tool(new_tool)
agent.tool_registry.unregister_tool(old_tool)

2. Conversation Memory Preservation

# Swap tools between queries while keeping conversation history
swap_tools(agent, new_tools)  # agent.messages preserved

3. Runtime Tool Discovery

Agent picks up tool changes automatically at each event loop
No manual refresh needed—just modify tool_registry
Zero-downtime tool updates in production

Traditional frameworks require agent recreation to change tools, losing conversation state. Strands maintains memory while tools change dynamically.

Learn more: Strands Tool Registry

Setup

Prerequisites

Python 3.9+
Strands Agents — AI agent framework
Optional: Neo4j connection for real hotel data (from ../01-hotel-rag-demo)

Model

This demo uses OpenAI with GPT-4o-mini by default (requires OPENAI_API_KEY environment variable).

You can swap the model for any provider supported by Strands — Amazon Bedrock, Anthropic, Ollama, etc. See Strands Model Providers for configuration.

Configure Environment Variables

Create a .env file with your OpenAI API key:

# OpenAI API Key (required)
OPENAI_API_KEY=your_openai_api_key_here

How to get your API key: Get from platform.openai.com/api-keys

Install

uv venv && uv pip install -r requirements.txt

Files

File	Purpose
`test_semantic_tools_hallucinations.ipynb`	Main demo - Comprehensive notebook with 29 tools, ground truth verification
`token_comparison_app.py`	Token savings verification - Standalone script to measure token reduction
`enhanced_tools.py`	31 travel agent tools (29 generic + 2 with optional Neo4j data)
`registry.py`	FAISS-based semantic tool filtering

Run the Demo

Open `test_semantic_tools_hallucinations.ipynb` in Jupyter, Kiro, or your preferred notebook environment.

What it does:

Tests 13 travel queries on 29 tools
Compares Traditional (all 29 tools) vs Semantic (top 3 filtered)
Verifies against ground truth (real hotel database)
Shows token savings and error reduction

Key features:

Real hotel data from Neo4j graph database
Objective accuracy measurement
Detailed error analysis
Token cost comparison

Verify Token Savings

Run the standalone token comparison script to verify the savings claimed in Part 3 of the notebook:

uv run token_comparison_app.py

What it measures:

Compares 3 approaches: Traditional, Semantic, Semantic+Memory
Shows actual token usage per query
Demonstrates memory accumulation cost
Verifies swap_tools() preserves conversation history

Expected output:

Token breakdown:

Traditional: 29 tools × 50 tokens = ~1450 tokens/query (constant)
Semantic: 3 tools × 50 tokens = ~150 tokens/query (constant)
Memory: ~150 tokens + conversation history (~400 tokens/turn, accumulates)

How It Works

Traditional Approach (Baseline)

# Agent sees ALL 31 tools on every query
agent = Agent(tools=ALL_TOOLS, model=model)
agent("How much does Hotel Marriott cost?")
# Token cost: ~4,500 tokens (31 tool descriptions)
# Risk: Picks wrong tool from 31 options

Semantic Approach (Optimized)

# 1. Build FAISS index once
build_index(ALL_TOOLS)

# 2. Filter tools per query
query = "How much does Hotel Marriott cost?"
relevant_tools = search_tools(query, top_k=3)
# Returns: [get_hotel_pricing, get_hotel_details, search_hotels]

# 3. Agent sees only 3 relevant tools
agent = Agent(tools=relevant_tools, model=model)
agent(query)
# Token cost: ~500 tokens (3 tool descriptions)
# Risk: Picks correct tool from 3 focused options

Production Pattern: Preserving Conversation Memory

For multi-turn conversations, use Strands' native tool swapping to maintain conversation history:

def swap_tools(agent, new_tools):
    """Swap agent's tools without losing conversation memory"""
    agent.tool_registry.registry.clear()
    agent.tool_registry.dynamic_tools.clear()
    for tool in new_tools:
        agent.tool_registry.register_tool(tool)

# Create agent once
agent = Agent(tools=initial_tools, model=model)

# Multi-turn conversation with dynamic tool filtering
for query in queries:
    selected = search_tools(query, top_k=3)
    swap_tools(agent, selected)  # Tools change, agent.messages preserved
    agent(query)  # Full conversation history intact

Why this works: Strands calls tool_registry.get_all_tools_config() at each event loop cycle, automatically picking up runtime changes. No agent recreation needed.

Key advantages:

Zero conversation loss across tool swaps
Same agent instance handles all queries
Add/remove tools between any two queries
Production-ready for long conversations

Learn more: Strands Agent Architecture

Search for tools in your AgentCore gateway with a natural language query

Enhanced Tools with Real Data

The notebook includes 6 tools connected to the Neo4j hotel database:

@tool
def search_real_hotels(country: str, min_rating: float = 0.0) -> str:
    """Search real hotels in a specific country from our database."""
    # Executes Cypher query on Neo4j
    # Returns actual hotel data from 515K reviews

@tool
def get_top_hotels(country: str, limit: int = 5) -> str:
    """Get top-rated hotels in a country."""
    # Real aggregation from graph database

These tools provide ground truth for objective accuracy measurement.

Research Background

This demo implements findings from:

Internal Representations as Indicators of Hallucinations - Tool selection hallucinations increase with tool count
Production systems report 89% token reduction (rconnect.tech)

Next Steps

Demo 03 - Multi-Agent Validation — Cross-validate tool selections with Executor → Validator → Critic
Demo 04 - Neurosymbolic Guardrails — Add symbolic rules to block invalid tool calls

Contributing

Contributions are welcome! See CONTRIBUTING for more information.

Security

If you discover a potential security issue in this project, notify AWS/Amazon Security via the vulnerability reporting page. Please do not create a public GitHub issue.

License

This library is licensed under the MIT-0 License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Semantic Tool Selection: Reducing Agent Hallucinations

The Problem

The Solution

Why Strands Agents Makes This Production-Ready

Setup

Prerequisites

Model

Configure Environment Variables

Install

Files

Run the Demo

Verify Token Savings

How It Works

Traditional Approach (Baseline)

Semantic Approach (Optimized)

Production Pattern: Preserving Conversation Memory

Enhanced Tools with Real Data

Research Background

Next Steps

Contributing

Security

License

FilesExpand file tree

02-semantic-tools-demo

Directory actions

More options

Directory actions

More options

Latest commit

History

02-semantic-tools-demo

Folders and files

parent directory

README.md

Semantic Tool Selection: Reducing Agent Hallucinations

The Problem

The Solution

Why Strands Agents Makes This Production-Ready

Setup

Prerequisites

Model

Configure Environment Variables

Install

Files

Run the Demo

Verify Token Savings

How It Works

Traditional Approach (Baseline)

Semantic Approach (Optimized)

Production Pattern: Preserving Conversation Memory

Enhanced Tools with Real Data

Research Background

Next Steps

Contributing

Security

License