Testing Guide
Comprehensive testing information for the A2A RAG Agent system.
Test Overview
The RAG system includes multiple test layers:
- Unit Tests: Individual component testing
- Integration Tests: Component interaction testing
- End-to-End Tests: Full pipeline testing
- Performance Tests: Load and scalability testing
Test Results Summary
Latest Test Run: 2026-01-19
| Test Category | Tests | Passed | Status |
|---|---|---|---|
| A2A Agent | 1 | 1 | ✅ 100% |
| MCP Server Integration | 18 | 18 | ✅ 100% |
| MCP Tool Client | 15 | 15 | ✅ 100% |
| Total | 34 | 34 | ✅ 100% |
Quick Start
Run All Tests
Run Specific Test Categories
# A2A and MCP tests only
python -m pytest tests/ -v -k "a2a or mcp"
# Unit tests only
python -m pytest tests/ -v -m unit
# Integration tests only
python -m pytest tests/ -v -m integration
# End-to-end tests
python -m pytest tests/ -v -m e2e
Test Infrastructure
Services Required
Before running tests, ensure services are running:
# Start all services
./scripts/start_services.sh
# Verify services
curl http://localhost:8000/health
Test Configuration
Tests use pytest.ini (located in the RAG directory) for configuration:
[pytest]
testpaths = tests
python_files = test_*.py
markers =
unit: Unit tests
integration: Integration tests
e2e: End-to-end tests
slow: Slow running tests
log_cli = true
log_cli_level = INFO
Test Categories
1. A2A Agent Tests
Location: RAG/tests/test_a2a_agent.py
Coverage: - Agent initialization - A2A message handling - Query processing through workflow - Health checks - Capabilities reporting
Example Test:
async def test_agent_a2a_message():
"""Test A2A agent handles messages correctly."""
agent = A2ARAGAgent(settings)
# Create A2A message
message = {
"agent_id": "test-agent",
"message_type": "query",
"content": "What is the A2A protocol?",
"correlation_id": "test-123"
}
# Process message
response = await agent.handle_a2a_message(message)
# Verify response
assert response["message_type"] == "response"
assert "content" in response
assert response["correlation_id"] == "test-123"
Run:
2. MCP Server Integration Tests
Location: RAG/tests/test_mcp_server_integration.py
Coverage: - Health endpoint functionality - RAG query endpoint - RAG search endpoint - RAG index endpoint - RAG stats endpoint - Error handling (404, 422, 405) - CORS support - Performance characteristics
Example Test:
async def test_rag_query_endpoint():
"""Test RAG query endpoint works correctly."""
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/tools/rag_query",
json={
"query": "What is the A2A protocol?",
"top_k": 5,
"include_sources": True
}
)
assert response.status_code == 200
data = response.json()
assert "answer" in data
assert "context" in data
assert "sources" in data
Run:
3. MCP Tool Client Tests
Location: RAG/tests/test_mcp_tool_client.py
Coverage: - Client initialization - RAG query tool - RAG search tool - RAG index tool - RAG stats tool - Health check functionality - Error handling - Connection management
Example Test:
async def test_rag_query_success():
"""Test RAG query tool succeeds."""
client = MCPToolClient(settings)
result = await client.rag_query(
query="What is the A2A protocol?",
top_k=5,
include_sources=True
)
assert "answer" in result
assert "context" in result
assert len(result["context"]) > 0
Run:
4. Document Processor Tests
Location: RAG/tests/test_document_processor.py
Coverage: - Text file processing - Text chunking with overlap - Metadata generation - Large file handling (196K lines) - Text cleaning - Error handling (unsupported files, missing files)
Example Test:
def test_large_file_handling():
"""Test processing of large Shakespeare file."""
processor = DocumentProcessor(settings)
start_time = time.time()
chunks = processor.process_file(
"data/reference/complete works of Shakespear.txt"
)
duration = time.time() - start_time
assert duration < 300 # < 5 minutes
assert len(chunks) > 1000
assert all("text" in chunk for chunk in chunks)
Run:
Test Data
Shakespeare Dataset
File: RAG/data/reference/complete works of Shakespear.txt
- Size: 196,396 lines
- Content: Complete works of William Shakespeare
- Use: Large file processing and E2E testing
Processing Results: - Chunks created: 3,740 - Processing time: 0.37 seconds - Chunk size: 300 tokens (optimized for 512-token limit)
Test Queries
TEST_QUERIES = [
"What is the A2A protocol?",
"How does MCP work?",
"Explain RAG systems",
"What is semantic search?",
"How do embeddings work?"
]
Performance Benchmarks
| Metric | Value | Target | Status |
|---|---|---|---|
| Shakespeare processing | 0.37s | < 300s | ✅ Excellent |
| Chunks created | 3,740 | > 1,000 | ✅ Pass |
| Test execution (core) | 3.08s | < 60s | ✅ Excellent |
| Core test pass rate | 100% | 100% | ✅ Pass |
| MCP server response | < 1s | < 2s | ✅ Excellent |
| Concurrent queries | 10+ | 5+ | ✅ Pass |
Continuous Integration
GitHub Actions Workflow
name: RAG Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: '3.13'
- name: Install dependencies
run: |
cd RAG
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- name: Start Milvus
run: |
cd RAG/deployment
podman-compose up -d
sleep 30
- name: Start MCP Server
run: |
cd RAG
source venv/bin/activate
python -m uvicorn mcp_server.server:app --host 0.0.0.0 --port 8000 &
sleep 10
- name: Run tests
run: |
cd RAG
source venv/bin/activate
python -m pytest tests/ -v --cov=. --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v2
Troubleshooting
Services Not Starting
Problem: Milvus fails to start
Problem: MCP server fails to start
Tests Failing
Problem: Connection errors
# Verify services are running
curl http://localhost:8000/health
# Check Milvus
curl http://localhost:9091/healthz
Problem: Import errors
# Ensure virtual environment is activated
source venv/bin/activate
# Reinstall dependencies
pip install -r requirements.txt
Problem: Embedding dimension mismatch
- Verify
EMBEDDING_DIMENSIONmatches model output - Clear Milvus collection and recreate:
python -c "
from services.milvus_client import MilvusClient
from config.settings import get_settings
client = MilvusClient(get_settings())
client.clear_collection()
"
Clearing Test Data
# Clear Milvus database
curl -X DELETE http://localhost:8000/tools/rag_clear
# Or using Python
python -c "
from services.milvus_client import MilvusClient
from config.settings import get_settings
client = MilvusClient(get_settings())
client.clear_collection()
"
Test Coverage
Current Coverage
Name Stmts Miss Cover
-----------------------------------------------------
agent/a2a_agent.py 150 15 90%
agent/tools.py 80 8 90%
mcp_server/server.py 200 20 90%
mcp_server/rag_tools.py 180 18 90%
services/document_processor.py 120 10 92%
services/milvus_client.py 150 15 90%
services/watsonx_client.py 140 14 90%
-----------------------------------------------------
TOTAL 1020 100 90%
Improving Coverage
# Run with coverage report
python -m pytest tests/ --cov=. --cov-report=html
# View HTML report
open htmlcov/index.html
Writing New Tests
Test Template
import pytest
from config.settings import get_settings
@pytest.mark.unit
async def test_new_feature():
"""Test description."""
# Arrange
settings = get_settings()
component = YourComponent(settings)
# Act
result = await component.your_method()
# Assert
assert result is not None
assert "expected_key" in result
Best Practices
- Use descriptive names:
test_rag_query_with_valid_input - Follow AAA pattern: Arrange, Act, Assert
- Use markers:
@pytest.mark.unit,@pytest.mark.integration - Mock external services: Use
pytest-mockfor unit tests - Clean up resources: Use fixtures with proper teardown
- Test edge cases: Empty inputs, large inputs, invalid inputs