Local Deployment
Deploy RAG agents locally using Podman for development and testing.
Overview
Local deployment provides a complete RAG environment with containers running on your machine using Podman. The system uses watsonx.ai (IBM Cloud) for embedding generation and LLM inferencing, while vector storage and search run locally. This is ideal for:
- Development and testing
- Learning the system
- Cost-effective experimentation (only pay for watsonx.ai API usage)
- Local data processing with cloud AI capabilities
Architecture
graph TB
subgraph "IBM Cloud"
WX[watsonx.ai]
end
subgraph "Local Machine"
A2A[A2A Agent<br/>:8001]
MCP[MCP Server<br/>:8000]
Milvus[Milvus<br/>:19530]
etcd[etcd]
MinIO[MinIO<br/>:9001]
Loader[Data Loader<br/>one-time]
A2A -->|Query| MCP
MCP -->|Vector Search| Milvus
MCP -->|Embeddings| WX
MCP -->|LLM Inference| WX
Milvus -->|Metadata| etcd
Milvus -->|Storage| MinIO
Loader -.->|Load Documents| Milvus
Loader -->|Generate Embeddings| WX
end
Key Components:
- watsonx.ai (IBM Cloud): Provides embedding generation and LLM inferencing
- Embedding Model:
ibm/granite-embedding-278m-multilingual - LLM Model:
openai/gpt-oss-120b - Local Containers: Run on your machine using Podman
- A2A Agent: Agent-to-Agent protocol interface
- MCP Server: Model Context Protocol server with RAG tools
- Milvus: Vector database for semantic search
- etcd: Metadata storage for Milvus
- MinIO: Object storage for Milvus
- Data Loader: One-time job to load and index documents
Prerequisites
Required Software
-
Podman - Container runtime
-
podman-compose - Compose tool
-
Python 3.10+ - For local development
Required Credentials
- Watsonx.ai API Key - From IBM Cloud
- Watsonx.ai Project ID - From your Watsonx.ai project
Quick Start
Or directly:
Configuration
Environment Variables
Create .env from .env.example:
# Watsonx.ai Configuration (Required)
WATSONX_API_KEY=your-api-key-here
WATSONX_PROJECT_ID=your-project-id-here
WATSONX_URL=https://us-south.ml.cloud.ibm.com
# Embedding Model Configuration
EMBEDDING_MODEL=ibm/granite-embedding-278m-multilingual
EMBEDDING_DIMENSION=768
# LLM Configuration
LLM_MODEL=openai/gpt-oss-120b
LLM_MAX_TOKENS=16384
LLM_TEMPERATURE=0.7
# Milvus Configuration
MILVUS_HOST=milvus
MILVUS_PORT=19530
MILVUS_COLLECTION_NAME=rag_knowledge_base
MILVUS_METRIC_TYPE=COSINE
# RAG Configuration
RAG_CHUNK_SIZE=512
RAG_CHUNK_OVERLAP=50
RAG_TOP_K=5
RAG_SCORE_THRESHOLD=0.7
# Logging
LOG_LEVEL=INFO
Deployment Process
The deployment script performs these steps:
- Validate Prerequisites
- Check Podman installation
- Check podman-compose installation
-
Verify .env configuration
-
Start Milvus Stack
- Start etcd (metadata store)
- Start MinIO (object storage)
- Start Milvus (vector database)
-
Wait for health checks
-
Build and Start MCP Server
- Build container from Containerfile
- Start MCP Server
-
Wait for health check
-
Build and Start A2A Agent
- Build container from Containerfile
- Start A2A Agent
-
Wait for health check
-
Load Shakespeare Data
- Build data loader container
- Process Shakespeare text
- Generate embeddings via watsonx.ai
- Index embeddings in Milvus
Service Endpoints
After deployment, services are available at:
| Service | Endpoint | Description |
|---|---|---|
| MCP Server | http://localhost:8000 | RAG tools API |
| MCP Health | http://localhost:8000/health | Health check |
| A2A Agent | http://localhost:8001 | Agent API |
| A2A Health | http://localhost:8001/health | Health check |
| Milvus | localhost:19530 | Vector database |
| MinIO Console | http://localhost:9001 | Object storage UI |
IBM watsonx Orchestrate Integration
Once the RAG agent is deployed locally, you can integrate it with IBM watsonx Orchestrate Developer Edition for enterprise-grade orchestration and workflow management.
Prerequisites
- Orchestrate Developer Edition: Download and install from IBM Developer
- Entitlement Key: Obtain from IBM Marketplace (wxo/myibm)
- Running RAG Agent: Ensure the A2A agent is running on http://localhost:8001
Setup Orchestrate
-
Configure Environment:
-
Start Orchestrate:
-
Create and Import Shakespeare Agent:
# Activate virtual environment source .venv/bin/activate # Create and import Shakespeare knowledge agent orchestrate agents create \ -n shakespeare-rag-agent \ -t "Shakespeare Knowledge Agent" \ -k external \ --description "RAG agent with complete works of Shakespeare. Use for questions about Shakespeare's plays, sonnets, characters, quotes, and literary analysis." \ --api http://host.lima.internal:8001 \ --provider external_chat/A2A/0.3.0 \ -o rag-agent-config.yml
Note: Use host.lima.internal to access the host machine from Lima VM where Orchestrate runs.
Knowledge Base: This agent contains the complete works of William Shakespeare and is ideal for literary questions, character analysis, and quote identification.
Verify Integration
# List imported agents
orchestrate agents list
# Test agent health
curl http://localhost:8001/health
Configuration Options
The RAG agent configuration supports:
- A2A Protocol: Full Agent-to-Agent communication
- Capabilities: rag_query, knowledge_search, document_qa
- Resource Limits: CPU and memory constraints
- Retry Policies: Exponential backoff with configurable limits
- Health Monitoring: Automatic health checks and recovery
For detailed configuration options, see:
- Orchestrate Configuration: orchestrate/rag-agent-config.yml
- IBM watsonx Orchestrate Documentation
Orchestrate Benefits
Integrating with Orchestrate provides:
- Workflow Orchestration: Coordinate multiple agents and services
- Enterprise Security: OAuth 2.0, RBAC, audit logging
- Scalability: Dynamic scaling based on workload
- Monitoring: Comprehensive metrics and alerting
- Integration: Connect to 100+ enterprise systems
Testing the Deployment
Health Checks
Query Shakespeare
# Simple query
curl -X POST http://localhost:8000/tools/rag_query \
-H "Content-Type: application/json" \
-d '{
"query": "What did Hamlet say about being?"
}'
# Get collection stats
curl -X POST http://localhost:8000/tools/rag_stats
Access MinIO Console
- Open http://localhost:9001
- Login with:
- Username:
minioadmin - Password:
minioadmin
Management
View Logs
# All services
podman-compose logs -f
# Specific service
podman-compose logs -f mcp-server
podman-compose logs -f a2a-agent
podman-compose logs -f milvus
podman-compose logs -f data-loader
Check Status
Stop Services
# Stop all services
podman-compose down
# Stop and remove volumes (clears data)
podman-compose down -v
Restart Services
Rebuild After Changes
# Rebuild specific service
podman-compose up -d --build mcp-server
# Rebuild all services
podman-compose up -d --build
Data Persistence
Data is stored in Podman volumes:
etcd_data- etcd configurationminio_data- Object storagemilvus_data- Vector database
To clear all data:
Troubleshooting
Milvus Won't Start
# Check logs
podman-compose logs milvus
# Check dependencies
podman-compose ps
# Restart with fresh volumes
podman-compose down -v
podman-compose up -d milvus
Port Already in Use
Edit podman-compose.yml to change port mappings:
Data Loader Fails
# View logs
podman-compose logs data-loader
# Check Shakespeare file
ls -la ../../data/reference/
# Run manually
podman-compose up data-loader
Connection Issues
# Test Milvus from MCP Server
podman exec -it rag-mcp-server curl http://milvus:9091/healthz
# Check environment
podman exec -it rag-mcp-server env | grep MILVUS
Development Workflow
Access Container Shell
# MCP Server
podman exec -it rag-mcp-server /bin/bash
# A2A Agent
podman exec -it rag-a2a-agent /bin/bash
Monitor Resources
Update Code
- Make code changes
- Rebuild container:
podman-compose up -d --build mcp-server - Check logs:
podman-compose logs -f mcp-server
Performance Tuning
Resource Limits
Edit podman-compose.yml to adjust resources:
Milvus Configuration
Adjust Milvus settings in podman-compose.yml: