Skip to content

Troubleshooting Guide

Common issues and solutions for the A2A RAG Agent system.

Service Issues

Milvus Connection Problems

Issue: Cannot connect to Milvus

Symptoms:

pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Fail connecting to server)>

Solutions:

  1. Check if Milvus is running:

    podman ps | grep milvus
    

  2. View Milvus logs:

    cd RAG/deployment
    podman-compose logs milvus
    

  3. Restart Milvus:

    cd RAG/deployment
    podman-compose restart
    

  4. Verify Milvus health:

    curl http://localhost:9091/healthz
    

  5. Check port availability:

    lsof -i :19530
    

Issue: Milvus fails to start

Symptoms:

Error: port 19530 already in use

Solutions:

  1. Find and kill process using port:

    lsof -i :19530
    kill -9 <PID>
    

  2. Use different port:

    # Edit deployment/podman-compose.yml
    ports:
      - "19531:19530"  # Changed from 19530
    
    # Update config/.env
    MILVUS_PORT=19531
    

MCP Server Issues

Issue: MCP server won't start

Symptoms:

ERROR: Address already in use

Solutions:

  1. Check if port 8000 is in use:

    lsof -i :8000
    

  2. Kill existing process:

    kill -9 <PID>
    

  3. Use different port:

    # Start on different port
    python -m uvicorn mcp_server.server:app --port 8001
    
    # Update config/.env
    MCP_SERVER_PORT=8001
    

  4. Check server logs:

    tail -f logs/mcp_server.log
    

Issue: MCP server crashes on startup

Symptoms:

ImportError: cannot import name 'FastAPI'

Solutions:

  1. Verify virtual environment:

    which python
    # Should show: /path/to/RAG/venv/bin/python
    

  2. Reinstall dependencies:

    cd RAG
    source venv/bin/activate
    pip install -r requirements.txt
    

  3. Check Python version:

    python --version
    # Should be 3.11-3.13 for Watsonx.ai 1.5.0
    

Watsonx.ai Issues

Authentication Errors

Issue: Invalid API key

Symptoms:

401 Unauthorized: Invalid API key

Solutions:

  1. Verify API key in .env:

    cat config/.env | grep WATSONX_API_KEY
    

  2. Check API key format:

  3. Should start with apikey_
  4. No extra spaces or quotes
  5. No newlines

  6. Regenerate API key:

  7. Go to IBM Cloud console
  8. Navigate to Watsonx.ai
  9. Generate new API key
  10. Update config/.env

Issue: Project ID not found

Symptoms:

404 Not Found: Project not found

Solutions:

  1. Verify project ID:

    cat config/.env | grep WATSONX_PROJECT_ID
    

  2. Check project exists:

  3. Log into Watsonx.ai console
  4. Verify project is active
  5. Copy correct project ID

  6. Check project permissions:

  7. Ensure API key has access to project
  8. Verify project is not archived

Model Issues

Issue: Model not found

Symptoms:

404 Not Found: Model 'openai/gpt-oss-120b' not found

Solutions:

  1. Check available models:

    from ibm_watsonx_ai import APIClient, Credentials
    
    credentials = Credentials(
        api_key="your-key",
        url="https://us-south.ml.cloud.ibm.com"
    )
    client = APIClient(credentials)
    client.set.default_project("your-project-id")
    
    # List available models
    models = client.foundation_models.get_model_specs()
    for model in models['resources']:
        print(model['model_id'])
    

  2. Use alternative model:

    # In config/.env
    WATSONX_LLM_MODEL=ibm/granite-3-8b-instruct
    

Issue: Token limit exceeded

Symptoms:

400 Bad Request: Token limit exceeded (660 > 512)

Solutions:

  1. Reduce chunk size:

    # In config/.env
    RAG_CHUNK_SIZE=200  # Reduced from 300
    

  2. Reduce chunk overlap:

    RAG_CHUNK_OVERLAP=20  # Reduced from 40
    

  3. Use model with higher limit:

    # Some models support up to 8192 tokens
    WATSONX_EMBEDDING_MODEL=ibm/granite-embedding-278m-multilingual
    

Document Processing Issues

Indexing Failures

Issue: Unsupported file type

Symptoms:

ValueError: Unsupported file type: .xyz

Solutions:

  1. Check supported formats:
  2. PDF (.pdf)
  3. Word (.docx)
  4. Text (.txt)
  5. Markdown (.md)

  6. Convert file to supported format:

    # Convert to PDF or text
    pandoc document.xyz -o document.pdf
    

Issue: File not found

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'data/documents/file.pdf'

Solutions:

  1. Check file path:

    ls -la data/documents/
    

  2. Use absolute path:

    # Instead of relative path
    file_path="/full/path/to/RAG/data/documents/file.pdf"
    

  3. Verify file permissions:

    chmod 644 data/documents/file.pdf
    

Issue: PDF extraction fails

Symptoms:

PDFSyntaxError: PDF file is corrupted

Solutions:

  1. Repair PDF:

    # Using ghostscript
    gs -o repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress corrupted.pdf
    

  2. Convert to text first:

    pdftotext document.pdf document.txt
    # Then index the text file
    

Query Issues

No Results Returned

Issue: Query returns empty results

Symptoms:

{
  "results": [],
  "count": 0
}

Solutions:

  1. Check if documents are indexed:

    curl http://localhost:8000/tools/rag_stats
    

  2. Lower score threshold:

    # In config/.env
    RAG_SCORE_THRESHOLD=0.6  # Lowered from 0.7
    

  3. Increase top-k:

    RAG_TOP_K=10  # Increased from 5
    

  4. Verify embeddings:

    from services.watsonx_client import WatsonxClient
    from config.settings import get_settings
    
    client = WatsonxClient(get_settings())
    embedding = client.generate_embedding("test query")
    print(f"Embedding dimension: {len(embedding)}")
    # Should match EMBEDDING_DIMENSION in config
    

Poor Quality Responses

Issue: Irrelevant or incorrect answers

Solutions:

  1. Adjust LLM temperature:

    # More deterministic
    WATSONX_LLM_TEMPERATURE=0.3  # Reduced from 0.7
    

  2. Increase context:

    RAG_TOP_K=10  # More context chunks
    RAG_MAX_CONTEXT_LENGTH=3000  # More tokens
    

  3. Improve system prompt:

    RAG_SYSTEM_PROMPT="You are a helpful assistant. Answer based only on the provided context. If the context doesn't contain the answer, say so."
    

  4. Re-index with better chunking:

    # Clear and re-index
    curl -X DELETE http://localhost:8000/tools/rag_clear
    curl -X POST http://localhost:8000/tools/rag_index_directory \
      -H "Content-Type: application/json" \
      -d '{"directory_path": "data/documents"}'
    

Performance Issues

Slow Query Responses

Issue: Queries take too long

Solutions:

  1. Optimize Milvus index:

    # In config/.env
    MILVUS_INDEX_TYPE=IVF_SQ8  # Faster than IVF_FLAT
    MILVUS_NLIST=256
    

  2. Reduce top-k:

    RAG_TOP_K=3  # Reduced from 5
    

  3. Reduce max tokens:

    WATSONX_LLM_MAX_TOKENS=256  # Reduced from 512
    

  4. Enable caching:

    ENABLE_EMBEDDING_CACHE=true
    CACHE_TTL=3600
    

High Memory Usage

Issue: System uses too much memory

Solutions:

  1. Use quantized index:

    MILVUS_INDEX_TYPE=IVF_SQ8  # Uses less memory
    

  2. Reduce batch size:

    EMBEDDING_BATCH_SIZE=16  # Reduced from 32
    BATCH_SIZE=50  # Reduced from 100
    

  3. Limit concurrent processing:

    MAX_CONCURRENT_DOCUMENTS=3  # Reduced from 5
    MAX_CONCURRENT_QUERIES=5  # Reduced from 10
    

Testing Issues

Test Failures

Issue: Tests fail with connection errors

Solutions:

  1. Ensure services are running:

    ./scripts/start_services.sh
    

  2. Wait for services to be ready:

    # Wait 30 seconds after starting
    sleep 30
    

  3. Check service health:

    curl http://localhost:8000/health
    

Issue: Import errors in tests

Solutions:

  1. Activate virtual environment:

    source venv/bin/activate
    

  2. Install test dependencies:

    pip install pytest pytest-asyncio
    

  3. Set PYTHONPATH:

    export PYTHONPATH=/path/to/RAG:$PYTHONPATH
    

Data Issues

Collection Errors

Issue: Dimension mismatch

Symptoms:

MilvusException: dimension mismatch: expected 384, got 768

Solutions:

  1. Clear collection:

    curl -X DELETE http://localhost:8000/tools/rag_clear
    

  2. Update dimension in config:

    # Match your embedding model
    EMBEDDING_DIMENSION=384  # or 768
    

  3. Re-index documents:

    curl -X POST http://localhost:8000/tools/rag_index_directory \
      -H "Content-Type: application/json" \
      -d '{"directory_path": "data/documents"}'
    

Issue: Collection not found

Symptoms:

MilvusException: Collection 'rag_knowledge_base' not found

Solutions:

  1. Restart Milvus:

    cd RAG/deployment
    podman-compose restart
    

  2. Check collection name:

    # In config/.env
    MILVUS_COLLECTION_NAME=rag_knowledge_base
    

  3. Create collection manually:

    from services.milvus_client import MilvusClient
    from config.settings import get_settings
    
    client = MilvusClient(get_settings())
    # Collection will be created automatically
    

Logging and Debugging

Enable Debug Logging

# In config/.env
LOG_LEVEL=DEBUG

View Logs

# MCP server logs
tail -f logs/mcp_server.log

# Milvus logs
cd RAG/deployment
podman-compose logs -f milvus

# Application logs
tail -f logs/rag.log

Debug Mode

# Enable debug mode in code
import logging
logging.basicConfig(level=logging.DEBUG)

# Or set environment variable
export LOG_LEVEL=DEBUG

Getting Help

Collect Diagnostic Information

# System information
python --version
podman --version

# Service status
curl http://localhost:8000/health
curl http://localhost:9091/healthz

# Configuration
cat config/.env | grep -v API_KEY

# Logs
tail -100 logs/mcp_server.log

Report Issues

When reporting issues, include:

  1. Error message and stack trace
  2. Configuration (without sensitive data)
  3. Steps to reproduce
  4. System information
  5. Relevant logs

Resources

Common Error Messages

Error Cause Solution
Connection refused Service not running Start services with ./scripts/start_services.sh
401 Unauthorized Invalid API key Check WATSONX_API_KEY in .env
404 Not Found Wrong endpoint/model Verify URL and model name
422 Unprocessable Entity Invalid parameters Check request body format
500 Internal Server Error Server-side error Check logs for details
503 Service Unavailable Dependencies down Check Milvus and Watsonx.ai
Token limit exceeded Chunk too large Reduce RAG_CHUNK_SIZE
Dimension mismatch Wrong embedding model Clear collection and re-index
Collection not found Milvus not initialized Restart Milvus
File not found Wrong path Check file path and permissions