All posts
AI Tools 14 min read March 23, 2026

RAGFlow: The Open-Source RAG Engine That Actually Understands Your Documents

A practical guide to RAGFlow — the open-source RAG engine with deep document understanding, grounded citations, and intelligent chunking. Setup, usage, 5 killer features, and honest pros/cons.

#RAGFlow#RAG#LLM#Open Source#Document AI#Self-hosted AI
Neel Shah Tech Lead · Senior Data Engineer · Ottawa

You have tried RAG pipelines before. You chunked your PDFs, embedded them, stored them in a vector database, and asked questions. Sometimes you got answers. Often you got hallucinations dressed up as answers — confident, detailed, and wrong.

The problem is usually not the LLM. It is the quality of what you fed it. Garbage chunks in, garbage answers out.

RAGFlow is built around one idea: quality in, quality out. It is an open-source RAG engine that treats document understanding as a first-class problem — not an afterthought. It knows the difference between a table in a PDF and a paragraph. It knows a slide deck is not a wall of text. It chunks intelligently, shows you exactly how it chunked, and lets you correct it before the LLM ever sees a word.

75,900+ GitHub stars. Actively developed. Free to self-host. This guide walks through every step.


Before We Start: What You Need

RAGFlow runs as a Docker Compose stack. The requirements are not negotiable — the deep document parsing alone is CPU and memory intensive.

Minimum requirements
─────────────────────────────────────────
  CPU      4 cores
  RAM      16 GB
  Disk     50 GB free
  Docker   24.0.0 or later
  OS       Linux (x86_64)

One important note up front: ARM64 is not officially supported. If you are on an Apple Silicon Mac or an ARM server, you may hit issues. The community has workarounds, but they are not stable. For this guide, assume x86_64 Linux.

Check your Docker version before starting:

docker --version
docker compose version

If docker compose (without a hyphen) is not recognised, you have the older docker-compose plugin. RAGFlow requires Docker Compose v2. Install it:

sudo apt-get install docker-compose-plugin

How RAGFlow Works: The Big Picture

Before installing, it helps to understand what you are actually running. RAGFlow is not a single service — it is five services that work together.

Your browser / API client


┌────────────────────────────────────────┐
│              RAGFlow API               │
│          (FastAPI backend)             │
└──────┬─────────┬──────────┬────────────┘
       │         │          │
       ▼         ▼          ▼
  ┌─────────┐ ┌───────┐ ┌────────┐
  │  MySQL  │ │ MinIO │ │ Redis  │
  │ (meta)  │ │(files)│ │(cache) │
  └─────────┘ └───────┘ └────────┘


  ┌────────────────────────┐
  │  Elasticsearch /       │
  │  Infinity (vectors)    │
  └────────────────────────┘


  Your LLM + Embedding Model
  (OpenAI, Ollama, Gemini, etc.)
ServiceRole
RAGFlow APIThe brain — handles document parsing, chunking, retrieval, and chat
MySQLStores metadata: knowledge bases, documents, conversations
MinIOObject storage for uploaded files
RedisCaching and task queuing
Elasticsearch / InfinityVector and keyword search engine

You do not configure any of these individually. Docker Compose handles all of it.


Installation

Step 1 — Clone the repository

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker

Step 2 — Choose your Compose file

RAGFlow ships with two versions:

ls docker/
# docker-compose.yml       ← full version (includes Elasticsearch)
# docker-compose-gpu.yml   ← for machines with NVIDIA GPU

For most setups, use the default:

docker compose -f docker-compose.yml up -d

This will pull about 15–20 GB of images the first time. Go make a coffee.

Step 3 — Monitor startup

docker compose logs -f ragflow-server

Wait until you see:

     ____   ___    ______ ______ __
    / __ \ /   |  / ____// ____// /____  _      __
   / /_/ // /| | / / __ / /_   / // __ \| | /| / /
  / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
 /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/

 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:9380

Step 4 — Open the UI

Navigate to http://YOUR_MACHINE_IP in your browser. On a local machine, that is http://localhost.

The default port is 80. If something else is using port 80, edit docker/.env and change SVR_HTTP_PORT=80 before starting.

You will be greeted by the RAGFlow login screen. Create an account — it is local, no email confirmation needed.


Your First Knowledge Base: Step by Step

A “knowledge base” in RAGFlow is a named collection of documents that you can query. Think of it like a folder that the LLM can search.

Step 1 — Connect an LLM

Before you create a knowledge base, you need to tell RAGFlow which LLM to use.

Go to Settings → Model Providers and add your provider. RAGFlow supports:

Commercial:  OpenAI, Anthropic, Google Gemini, Azure OpenAI, Cohere
Self-hosted: Ollama, LM Studio, LocalAI, vLLM

For a local Ollama setup:

  • Provider: Ollama
  • Base URL: http://host.docker.internal:11434
  • Model name: whatever you have pulled (e.g. llama3.1:8b)

From inside Docker, use host.docker.internal to reach your host machine. On Linux with older Docker, use 172.17.0.1 instead.

Step 2 — Create a knowledge base

Click Knowledge Base → Create Knowledge Base. Give it a name. The important settings are:

Chunking method:    General (start here)
Embedding model:    Choose from your configured providers
Language:           English (or match your documents)

Step 3 — Upload documents

Drag and drop files directly into the knowledge base. Supported formats:

Documents:   PDF, DOCX, DOC, TXT, MD
Spreadsheets: XLSX, XLS, CSV
Presentations: PPT, PPTX
Images:      PNG, JPG, JPEG, TIFF (OCR applied automatically)
Web pages:   URL import

Step 4 — Parse and inspect the chunks

After uploading, click Parse. RAGFlow will process the documents. When it finishes, click on a document to see the chunks.

This is where RAGFlow is different from every other RAG pipeline. You can see exactly how your document was cut up, and you can edit individual chunks directly.

Example: A PDF with mixed content
─────────────────────────────────────────────────────────
  Page 1: Title + Introduction → 1 chunk
  Page 2: Table with 12 rows   → 12 chunks (one per row)
  Page 3: Paragraph text       → 3 chunks (by semantic boundary)
  Page 4: Code block           → 1 chunk (kept together)
  Page 5: Scanned image        → OCR text → 2 chunks
─────────────────────────────────────────────────────────

If a chunk looks wrong — too big, split at a bad boundary — you can fix it right there.

Step 5 — Start chatting

Go to Chat → Create Assistant. Link it to your knowledge base. Ask a question.

Every answer includes citations — clickable references that show you exactly which chunk the answer came from.


The 5 Killer Features

Feature 1: Template-Based Intelligent Chunking

Most RAG systems chunk by character count or sentence count. They do not know what the content is. A table gets split down the middle. A code block gets cut in half. A heading gets separated from its paragraph.

RAGFlow uses what it calls template-based chunking — pre-built chunking strategies tuned for specific document types.

Document type → Chunking template → Result
──────────────────────────────────────────────────────────
PDF (general)     General           Semantic paragraph chunks
Table/Excel       Table rows        One chunk per row
Presentation      Slides            One chunk per slide
Legal document    One             One chunk per entire document
Q&A pairs         Q&A               Question + answer as one chunk
Book              Book              Chapter-aware chunking

You choose the template per document type. This is not magic — it is domain knowledge encoded into the chunking logic.

Why this matters: The retriever can only find what was stored coherently. A table chunked correctly as rows can answer “which quarter had the highest revenue?” A table chunked as arbitrary text blobs probably cannot.


Feature 2: Grounded Citations With Source Tracing

Every RAGFlow answer includes references. Not just “Sources: document1.pdf” — actual clickable links to the specific chunk, with the relevant text highlighted.

User: What is the refund policy for enterprise customers?

RAGFlow: Enterprise customers are entitled to a full refund within
30 days of purchase, provided the request is submitted via the
support portal. [1]

[1] contracts/enterprise-agreement-2025.pdf
    Section 4.2, Page 12
    "Enterprise customers shall receive a full refund..."
    ████████████████████ (highlighted in document viewer)

This is not a cosmetic feature. It fundamentally changes how you verify AI answers. Instead of trusting the model, you can click through to the source in seconds.

Why this matters: In any serious use case — legal, medical, financial, customer support — you need to know where an answer came from. Grounded citations make RAGFlow answers auditable, not just useful.


Feature 3: Multiple Recall Strategies With Fused Re-Ranking

Retrieval is not one-size-fits-all. RAGFlow supports three retrieval modes and can combine them:

Retrieval modes
─────────────────────────────────────────────────────────
  1. Vector similarity     Dense embedding cosine search
                           Good for: semantic questions

  2. Full-text search      Keyword-based BM25
                           Good for: exact terms, names, codes

  3. Knowledge Graph       Entity + relationship traversal
                           Good for: "how does X relate to Y"
─────────────────────────────────────────────────────────

Hybrid mode runs vector and full-text search in parallel, then fuses the results using a re-ranking model. The re-ranker scores each candidate chunk for relevance to the actual question — not just similarity.

Hybrid retrieval pipeline
─────────────────────────────────────────────
  Query: "What is the early termination fee?"

        ├──▶ Vector search  → 10 candidate chunks

        └──▶ BM25 search    → 10 candidate chunks


          Merge + deduplicate → 15 unique chunks


          Re-ranker model    → Score each chunk


          Top 5 chunks → LLM → Answer
─────────────────────────────────────────────

Why this matters: Pure vector search misses exact terms. Pure keyword search misses semantic meaning. Hybrid with re-ranking catches both. For real-world documents with numbers, names, and jargon, this makes a measurable difference in answer quality.


Feature 4: Agentic Data Sync From Cloud Sources

RAGFlow is not just a place to upload files. It can pull documents directly from external sources and keep them in sync.

Supported sources as of March 2026:

Cloud storage:   AWS S3, Google Drive, MinIO
Wikis / docs:    Confluence, Notion
Communication:   Discord
Web:             URL crawling

To set up a Confluence sync, for example:

  1. Go to Knowledge Base → Data Sources → Add Source
  2. Choose Confluence
  3. Enter your Confluence URL and API token
  4. Select the spaces to sync
  5. Set a sync schedule (manual, hourly, daily)

RAGFlow will crawl those spaces, parse each page with the same document understanding pipeline, and add them to your knowledge base. When a page is updated and a sync runs, the chunks for that page are refreshed.

Why this matters: The hardest part of enterprise RAG is keeping your knowledge base current. Manual uploads go stale. Automated sync from Confluence or Notion means your assistant always has the latest documentation.


Feature 5: MCP Integration and Code Executor for Agents

RAGFlow is not just a search engine. It has an agentic layer built in.

MCP (Model Context Protocol) support means you can connect RAGFlow to any MCP-compatible client — including Claude Desktop, Cursor, or any tool that speaks MCP. Your knowledge bases become accessible to those tools directly.

The Code Executor lets you build agent workflows that include running code. This is how you go from “question and answer” to actual automated tasks:

Agent workflow example
─────────────────────────────────────────────────────────
  User: "Analyse the sales data from last quarter and
         summarise the top 3 underperforming regions"

  Workflow:
    Step 1: RAG retrieval → find relevant sales chunks
    Step 2: Code executor → run Python to aggregate data
    Step 3: LLM → write summary from results
    Step 4: Return answer with citations
─────────────────────────────────────────────────────────

Memory support (added in 2025) means agents can persist context across conversations — the agent remembers what you discussed last session.

Why this matters: Most RAG systems stop at retrieval. RAGFlow’s agent layer means it can take retrieval output and do something with it — compute, aggregate, transform — before generating an answer.


Connecting External LLMs: Quick Reference

RAGFlow supports a wide range of model providers. Here is a quick reference for the most common setups:

ProviderTypeNotes
OpenAICloudNeeds OPENAI_API_KEY
AnthropicCloudNeeds ANTHROPIC_API_KEY
Google GeminiCloudGemini 3 Pro supported as of early 2026
Azure OpenAICloudNeeds endpoint + key
OllamaLocalhttp://host.docker.internal:11434
LM StudioLocalCompatible with Ollama API format
vLLMLocalOpenAI-compatible endpoint

You can mix providers — use a cloud LLM for answering but a local embedding model for indexing (which keeps your document embeddings private even if answers go to an API).


Troubleshooting

RAGFlow starts but the UI does not load

Check which port it is running on:

docker compose ps

Look for the ragflow-server container and its port mapping. If port 80 is taken, edit .env:

nano docker/.env
# Change: SVR_HTTP_PORT=80
# To:     SVR_HTTP_PORT=8080
docker compose down && docker compose up -d

Documents parse but chunks look wrong

Chunks come out wrong when the template does not match the content type. Go to the document settings and change the chunking method. For a PDF that is mostly tables, switch from General to Table rows. For a dense technical manual, try One to avoid splitting mid-explanation.

LLM connection fails with “connection refused”

From inside Docker, localhost points to the container, not your host. Use host.docker.internal:

# Test Ollama is reachable from the RAGFlow container
docker exec -it ragflow-server curl http://host.docker.internal:11434/api/tags

If that fails, find the Docker bridge IP:

ip route show | grep docker
# Use that IP in your Ollama base URL

Not enough disk space during image pull

The full RAGFlow stack needs about 15–20 GB of images plus your documents and vector indexes. Before starting:

df -h /var/lib/docker
docker system prune -f   # clean unused images

What Is Great, What Is Good, What Still Needs Work

What is great

Quality-in-quality-out document understanding. RAGFlow’s template-based chunking is the best open-source implementation of intelligent document parsing available today. The difference between a table correctly chunked and a table arbitrarily split is the difference between a useful answer and a hallucination.

Grounded citations with visual tracing. Every answer links back to its source with the exact text highlighted. In any serious use case, this is not optional — it is what makes RAG trustworthy.

Explainable chunks. You can see, inspect, and correct every chunk before the LLM sees it. No other major open-source RAG framework gives you this level of transparency and control.

What is good

Active development. The project shipped memory support, Gemini 3 Pro, Discord sync, and MCP integration all within early 2026. The GitHub issues page is responsive and the roadmap is public.

Heterogeneous source support. PDF, DOCX, Excel, PPTX, scanned images, URLs, Confluence, Notion, S3 — it handles all of them with the same chunking quality.

Multiple retrieval strategies. Hybrid vector + BM25 with re-ranking is a genuinely better approach than pure vector search, and it is built in rather than requiring you to wire it together yourself.

What still needs work

ARM64 support. ARM is not officially supported. Apple Silicon developers and ARM server users have to rely on community patches that may break on version updates. This is a meaningful gap as ARM becomes more common in both laptops and cloud infrastructure.

Non-Docker setup. The only supported installation path is Docker Compose. There is no clean way to run RAGFlow on a machine where Docker is not an option — a restricted enterprise server, for example.

Documentation depth. The quick-start docs are good. The deeper features — agent workflows, MCP integration, custom chunking templates — are sparsely documented and often require reading the source code or GitHub issues to understand fully.


Summary

RAGFlow is the open-source RAG engine to reach for when answer quality actually matters. Not for quick prototypes where any retrieval will do — for production-grade document question-answering where wrong answers have real costs.

The five features worth remembering:

FeatureWhy it matters
Template-based chunkingDocuments chunked by type, not by character count
Grounded citationsEvery answer links to its exact source
Hybrid retrieval + re-rankingVector + keyword + reranker beats any single approach
Cloud source syncConfluence, Notion, S3 — always current, not stale uploads
MCP + code executorRetrieval plus action, not just retrieval

To get started:

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
docker compose up -d

Then open http://localhost and build your first knowledge base.

The GitHub repo is at github.com/infiniflow/ragflow. With 75,900+ stars and active development, it is worth following.

Frequently asked questions

What is RAGFlow: The Open-Source RAG Engine That Actually Understands Your Documents about?

A practical guide to RAGFlow — the open-source RAG engine with deep document understanding, grounded citations, and intelligent chunking. Setup, usage, 5 killer features, and honest pros/cons.

Who should read this article?

This article is written for engineers, technical leads, and data teams working with RAGFlow, RAG, LLM.

What can readers use from it?

Readers can use the article as a practical reference for ai tools decisions, implementation tradeoffs, and production engineering workflows.