How to Build a Local RAG Pipeline for OpenClaw using MCP and ClawRAG
If you use OpenClaw to control your home server or manage your daily tasks via WhatsApp or Telegram, you've probably hit a wall: **it can't read your private documents.**
Sure, you could upload your lease agreements, medical records, or tax returns to OpenAI or Claude, but for privacy-conscious users, that's a dealbreaker. You need a **Retrieval-Augmented Generation (RAG)** system that runs entirely on your own metal.
Enter **ClawRAG** — a self-hosted RAG engine designed specifically to connect to OpenClaw via the **Model Context Protocol (MCP)**.
## Why ClawRAG and MCP?
Most RAG systems are either too complex for a solo developer's home setup (requiring massive Postgres/pgvector databases) or they integrate poorly with autonomous agents.
ClawRAG solves this by running in a single Docker container under 2GB of RAM. Instead of using a standard REST API, it uses **MCP (Model Context Protocol)**. MCP provides structured schemas that OpenClaw understands natively. It exposes `query_knowledge` as a dynamic tool, allowing your agent to smartly decide *when* to search your documents versus when to rely on its general knowledge.
### The Tech Stack
- **Parsing:** Docling 2.13.0 (handles nested tables and legacy PDFs flawlessly).
- **Storage:** ChromaDB (lightweight, file-based vector storage).
- **Search:** Hybrid Search combining Vector similarity and BM25 keyword search, fused using Reciprocal Rank Fusion (RRF) for high accuracy on legal/technical jargon.
## Step-by-Step Installation Guide
### 1. Spin up ClawRAG
First, you'll need Docker installed on your machine. ClawRAG comes with a pre-configured `docker-compose.yml`.
```bash
# Start the ClawRAG engine and ChromaDB vector store
docker compose up -d
```
### 2. Ingest Your Private Documents
You can use a simple cURL command to push your PDFs into the local vector database. For example, let's upload a lease agreement:
```bash
curl -X POST http://localhost:8080/api/v1/rag/documents/upload \
-F "files=@my_lease.pdf" \
-F "collection_name=personal"
```
### 3. Connect to OpenClaw via MCP
Now, we tell OpenClaw to attach this RAG engine as a native tool using the MCP transport layer. Run this in your OpenClaw CLI:
```bash
openclaw mcp add --transport stdio clawrag npx -y @clawrag/mcp-server
```
## The Result: Chatting with Your Documents
Once connected, you can pull out your phone and text your OpenClaw bot on WhatsApp/Telegram:
**You:** *"Search my lease for gardening obligations."*
**OpenClaw:** *"According to Section 4.2 of your lease agreement (my_lease.pdf), the tenant is responsible for mowing the lawn bi-weekly and clearing snow from the driveway within 24 hours of a storm."*
Zero hallucinations. Zero data sent to the cloud. Perfect, cited answers directly in your chat interface.
Ready to take your OpenClaw agent to the next level? Give ClawRAG a spin and let your agent finally read your files securely!