Skip to content

A2A RAG Agent Overview

The A2A RAG Agent is a production-ready Retrieval-Augmented Generation system that integrates with IBM watsonx Orchestrate using the A2A 0.3.0 protocol. It combines LangGraph workflows, MCP tools, and watsonx.ai to provide intelligent query services over a knowledge base.

Architecture

graph TB
    subgraph "IBM watsonx Orchestrate"
        UI[Chat Interface]
        WF[Workflow Engine]
    end

    subgraph "A2A Agent Server"
        AC[Agent Card]
        RH[Request Handler]
        AE[Agent Executor]
        EQ[Event Queue]
    end

    subgraph "RAG Agent Core"
        LG[LangGraph Workflow]
        ST[Agent State]
        MCP[MCP Tool Client]
    end

    subgraph "Backend Services"
        MCPS[MCP Server<br/>FastAPI]
        MILVUS[Milvus<br/>Vector DB]
        WX[Watsonx.ai<br/>LLM + Embeddings]
    end

    UI --> WF
    WF -->|JSON-RPC 2.0| AC
    AC --> RH
    RH --> AE
    AE --> LG
    AE --> EQ
    EQ -->|Task Updates| WF

    LG --> ST
    LG --> MCP
    MCP -->|HTTP/REST| MCPS
    MCPS --> MILVUS
    MCPS --> WX

    style UI fill:#0f62fe
    style WF fill:#0f62fe
    style AC fill:#ff832b
    style RH fill:#ff832b
    style AE fill:#ff832b
    style LG fill:#e1f5ff
    style MCP fill:#fff4e1

Key Components

1. A2A Agent Server

  • Purpose: Provides A2A 0.3.0 protocol compliance for Orchestrate integration
  • Technology: a2a-server framework with Starlette
  • Components:
  • Agent Card: Describes agent capabilities at /.well-known/agent-card.json
  • Request Handler: Processes JSON-RPC 2.0 requests from Orchestrate
  • Agent Executor: Implements task execution logic
  • Event Queue: Sends real-time task updates to Orchestrate

2. LangGraph Workflow

  • Purpose: Orchestrates RAG operations using state machine patterns
  • Technology: LangGraph for workflow management
  • Workflow Nodes:
  • process_input: Validates and prepares user query
  • retrieve_context: Fetches relevant context from knowledge base
  • generate_response: Finalizes response with metadata
  • handle_error: Manages error conditions
  • Features:
  • Asynchronous processing
  • Conditional routing based on state
  • Error handling and recovery
  • Conversation history tracking

3. MCP Tool Client

  • Purpose: Communicates with MCP server for RAG operations
  • Technology: httpx AsyncClient
  • Methods:
  • rag_query(): Query with LLM generation
  • rag_search(): Semantic search only
  • rag_index(): Index documents
  • rag_stats(): Knowledge base statistics
  • health_check(): Service health verification

4. MCP Server (FastAPI)

  • Purpose: Exposes RAG operations as RESTful API endpoints
  • Technology: FastAPI with Pydantic validation
  • Endpoints:
  • POST /tools/rag_query - Query with LLM generation
  • POST /tools/rag_search - Semantic search only
  • POST /tools/rag_index - Index documents
  • GET /tools/rag_stats - Knowledge base statistics
  • GET /health - Health check

5. Watsonx.ai Integration

  • Purpose: Provides AI capabilities
  • Models:
  • Embeddings: ibm/granite-embedding-278m-multilingual (768 dimensions)
  • LLM: openai/gpt-oss-120b (16384 max tokens)
  • Features:
  • Multilingual semantic embeddings
  • Context-aware response generation
  • Retry logic with exponential backoff

6. Milvus Vector Store

  • Purpose: High-performance vector similarity search
  • Configuration:
  • Metric: COSINE similarity
  • Index: IVF_FLAT
  • Dimension: 768 (matches embedding model)
  • Deployment: Podman containerized

Complete Request Flow

sequenceDiagram
    participant User
    participant Orchestrate as IBM Orchestrate
    participant Server as A2A Server
    participant Executor as Agent Executor
    participant Agent as LangGraph Agent
    participant MCP as MCP Client
    participant MCPS as MCP Server
    participant Watsonx as Watsonx.ai
    participant Milvus as Milvus DB

    User->>Orchestrate: Ask question
    Orchestrate->>Server: JSON-RPC request
    Server->>Executor: execute(context, queue)
    Executor->>Orchestrate: Task created (pending)
    Executor->>Orchestrate: Status: working

    Executor->>Agent: process_query(query)
    Agent->>Agent: process_input
    Agent->>MCP: rag_query(query)
    MCP->>MCPS: POST /tools/rag_query
    MCPS->>Watsonx: Generate embedding
    Watsonx-->>MCPS: Query vector
    MCPS->>Milvus: Search similar vectors
    Milvus-->>MCPS: Top-K results
    MCPS->>Watsonx: Generate answer with context
    Watsonx-->>MCPS: Generated response
    MCPS-->>MCP: Answer + sources
    MCP-->>Agent: Result
    Agent->>Agent: generate_response
    Agent-->>Executor: Final state

    Executor->>Orchestrate: Add artifact (answer)
    Executor->>Orchestrate: Task completed
    Orchestrate-->>User: Display answer

Features

Document Processing

  • Supported Formats: PDF, DOCX, TXT, Markdown
  • Chunking Strategy:
  • Configurable chunk size (default: 300 tokens)
  • Overlap for context preservation (default: 40 tokens)
  • Metadata: Source tracking, chunk indexing, timestamps
  • Vector Similarity: COSINE metric for relevance
  • Configurable Top-K: Retrieve 1-20 most relevant chunks
  • Score Threshold: Filter low-relevance results (default: 0.7)

Response Generation

  • Context-Aware: Uses retrieved chunks as context
  • Source Attribution: Includes source documents and scores
  • Streaming Support: Real-time response generation

Agent State Management

The agent uses TypedDict for state management:

class AgentState(TypedDict):
    """State for the A2A RAG agent."""
    query: str                              # User query
    messages: List[Dict[str, str]]          # Conversation history
    context: Optional[List[str]]            # Retrieved context chunks
    sources: Optional[List[Dict[str, Any]]] # Source information
    response: Optional[str]                 # Generated response
    metadata: Optional[Dict[str, Any]]      # Additional metadata
    error: Optional[str]                    # Error message if any
    next_action: Optional[str]              # Next workflow action

Performance Characteristics

Metric Value Notes
Document Indexing ~0.37s for 196K lines Shakespeare complete works
Query Response Time < 5 seconds Including LLM generation and Orchestrate overhead
Concurrent Queries 10+ simultaneous Tested with async handling
Vector Search < 1 second Average search time
Memory Usage < 2GB For typical workloads
A2A Protocol Overhead < 100ms Task management and updates

Integration with IBM watsonx Orchestrate

Registration

orchestrate agents create \
  -n shakespeare-rag-agent \
  -t "Shakespeare Knowledge Agent" \
  -k external \
  --description "RAG agent with complete works of Shakespeare" \
  --api http://host.lima.internal:8001 \
  --provider external_chat/A2A/0.3.0

Usage

Once registered, the agent can be: - Invoked through Orchestrate's chat interface - Included in multi-agent workflows - Called by other agents via A2A protocol - Monitored through Orchestrate's observability tools

Use Cases

1. Literary Analysis

  • Shakespeare Knowledge Base: Complete works of Shakespeare indexed
  • Character Analysis: Questions about characters, relationships, themes
  • Quote Attribution: Find and attribute famous quotes
  • Plot Summaries: Understand play structures and storylines

2. Educational Applications

  • Student Research: Help students understand Shakespeare's works
  • Teaching Aid: Provide context and analysis for educators
  • Comparative Analysis: Compare themes across different plays
  • Historical Context: Understand Elizabethan era references

3. Content Creation

  • Writing Inspiration: Find relevant quotes and passages
  • Script Development: Reference authentic Shakespearean language
  • Literary References: Verify quotes and attributions
  • Theme Exploration: Discover related content across works

4. Enterprise Knowledge Management

  • Document Q&A: Adapt for corporate documentation
  • Policy Retrieval: Answer questions about policies and procedures
  • Technical Documentation: Index and query technical manuals
  • Research Repositories: Search academic papers and research

Getting Started

See the Quick Start Guide for installation and setup instructions.

For detailed API documentation, see API Reference.

For testing information, see Testing Guide.