Back to Blog

How to Build a Local RAG Pipeline for OpenClaw using MCP and ClawRAG

If you use OpenClaw to control your home server or manage your daily tasks via WhatsApp or Telegram, you've probably hit a wall: **it can't read your private documents.** Sure, you could upload your lease agreements, medical records, or tax returns to OpenAI or Claude, but for privacy-conscious users, that's a dealbreaker. You need a **Retrieval-Augmented Generation (RAG)** system that runs entirely on your own metal. Enter **ClawRAG** — a self-hosted RAG engine designed specifically to connect to OpenClaw via the **Model Context Protocol (MCP)**. ## Why ClawRAG and MCP? Most RAG systems are either too complex for a solo developer's home setup (requiring massive Postgres/pgvector databases) or they integrate poorly with autonomous agents. ClawRAG solves this by running in a single Docker container under 2GB of RAM. Instead of using a standard REST API, it uses **MCP (Model Context Protocol)**. MCP provides structured schemas that OpenClaw understands natively. It exposes `query_knowledge` as a dynamic tool, allowing your agent to smartly decide *when* to search your documents versus when to rely on its general knowledge. ### The Tech Stack - **Parsing:** Docling 2.13.0 (handles nested tables and legacy PDFs flawlessly). - **Storage:** ChromaDB (lightweight, file-based vector storage). - **Search:** Hybrid Search combining Vector similarity and BM25 keyword search, fused using Reciprocal Rank Fusion (RRF) for high accuracy on legal/technical jargon. ## Step-by-Step Installation Guide ### 1. Spin up ClawRAG First, you'll need Docker installed on your machine. ClawRAG comes with a pre-configured `docker-compose.yml`. ```bash # Start the ClawRAG engine and ChromaDB vector store docker compose up -d ``` ### 2. Ingest Your Private Documents You can use a simple cURL command to push your PDFs into the local vector database. For example, let's upload a lease agreement: ```bash curl -X POST http://localhost:8080/api/v1/rag/documents/upload \ -F "files=@my_lease.pdf" \ -F "collection_name=personal" ``` ### 3. Connect to OpenClaw via MCP Now, we tell OpenClaw to attach this RAG engine as a native tool using the MCP transport layer. Run this in your OpenClaw CLI: ```bash openclaw mcp add --transport stdio clawrag npx -y @clawrag/mcp-server ``` ## The Result: Chatting with Your Documents Once connected, you can pull out your phone and text your OpenClaw bot on WhatsApp/Telegram: **You:** *"Search my lease for gardening obligations."* **OpenClaw:** *"According to Section 4.2 of your lease agreement (my_lease.pdf), the tenant is responsible for mowing the lawn bi-weekly and clearing snow from the driveway within 24 hours of a storm."* Zero hallucinations. Zero data sent to the cloud. Perfect, cited answers directly in your chat interface. Ready to take your OpenClaw agent to the next level? Give ClawRAG a spin and let your agent finally read your files securely!