Perplexity charges a subscription. ChatGPT Deep Research requires a Plus tier and runs on OpenAI’s infrastructure. Every query you submit to either service trains their models, enriches their analytics, and leaves a record of your research interests on someone else’s server.
Local Deep Research (LDR) is the open-source answer. It performs the same class of deep, multi-source, cited research — using any LLM you choose, running entirely on your own hardware, with zero telemetry and AES-256 encrypted local storage. One Docker Compose command and it is running.
This guide covers what LDR is, how the research pipeline works, how to set it up, and where it actually outperforms its proprietary competitors.
What problem this solves
“Deep research” as a product category means something specific: not a single web search and a summary paragraph, but an agentic loop that plans a research strategy, dispatches multiple searches, reads the results, decides what to look for next, and synthesizes everything into a structured document with real citations.
The gap between a simple chatbot answer and a deep research report is the difference between asking a colleague a question and asking them to spend an afternoon properly investigating it. LDR automates the afternoon.
The reasons to self-host this are practical:
- Privacy. Research queries reveal intent. Corporate competitive analysis, personal health questions, legal research, investment theses — you may not want these associated with your account on an external service.
- Cost. LDR is free software. Run it with a local model and your marginal cost per query is electricity. Use an API-backed model and you pay token costs without any platform markup.
- Model choice. Perplexity’s backend is opaque. LDR works with Ollama, LM Studio, llama.cpp, OpenAI, Anthropic, Google, or any of 100+ models via OpenRouter. You pick the right model for the task.
- Search control. 20+ configurable search engines: academic databases, privacy-preserving search aggregators, specialized corpora. You decide what gets searched and what does not.
How the research pipeline works
User Query
│
▼
Strategy Selection
(quick / detailed / focused-iteration / langgraph-agent)
│
▼
Autonomous Search Dispatch
(web + academic + documents — parallel)
│
▼
Result Synthesis via LLM
│
▼
Source Attribution + Citation
│
▼
Valuable Papers → Encrypted Local Library
│
▼
Structured Report (PDF / Markdown)
The four research strategies cover different tradeoffs:
| Strategy | Time | Best For |
|---|---|---|
| Quick Summary | 30 sec – 3 min | Fast factual lookups with citations |
| Detailed Research | 5 – 15 min | Multi-angle topic exploration |
| Focused Iteration | 10 – 20 min | High-accuracy deep dives (best benchmark scores) |
| LangGraph Agent | variable | Autonomous — selects search engines and strategies dynamically |
The LangGraph agentic mode is the most powerful: it does not follow a fixed search plan. It evaluates partial results mid-research and decides whether to drill deeper, search a different engine, or synthesize what it has. It is closer to how a human researcher actually works than a fixed pipeline.
Knowledge compounding is the other architectural difference worth understanding. Every research session optionally downloads valuable sources into a personal encrypted library. Future queries search both live web results and your accumulated private knowledge base simultaneously. The library grows over time, and subsequent research on related topics benefits from everything you have previously collected.
Supported LLM backends
LDR works with any model accessible via an OpenAI-compatible API — local or cloud.
Local (no API costs)
| Backend | Default Endpoint | Notes |
|---|---|---|
| Ollama | http://localhost:11434 | Easiest local setup; pulls models via CLI |
| LM Studio | http://localhost:1234/v1 | GUI-based; good for model switching |
| llama.cpp + llama-server | http://localhost:8080/v1 | Best performance-per-watt |
Tested models: Llama 3.x, Mistral, Gemma 2, DeepSeek, Qwen 2.5. Any model your hardware can serve will work.
Cloud (API key required)
- OpenAI: GPT-4o, GPT-4-mini, GPT-3.5-turbo
- Anthropic: Claude 3 Opus, Sonnet, Haiku
- Google: Gemini 1.5 Pro, Flash
- OpenRouter: 100+ models with a single key
The model choice meaningfully affects output quality. On the SimpleQA benchmark, GPT-4-mini with the focused-iteration strategy reached 95% accuracy. Gemini-2.0-flash reached 82%. Local models score lower on this benchmark but the gap closes significantly on domain-specific research where the model’s training data is relevant.
Supported search engines
This is where LDR genuinely separates from proprietary alternatives. You can configure exactly what gets searched:
Academic databases
- arXiv — preprints in physics, CS, math, economics
- PubMed — biomedical literature
- Semantic Scholar — AI-powered academic search
- NASA ADS — astrophysics and space science
- Zenodo — open research data and publications
General web
- SearXNG (self-hosted) — privacy-respecting aggregator that queries Google, Bing, DuckDuckGo and others without any individual engine account
- Wikipedia — structured encyclopedic content, excellent citation density
- Wayback Machine — archived versions of web pages
- Google via SerpAPI / PSE — if you have keys
Specialized
- GitHub — source code, READMEs, issues
- OpenClaw — legal case law
- Elasticsearch — your own indexed document collections
Premium options
- Tavily — AI-optimized web search API with high extraction quality
- Brave Search — independent index, no Google dependency
- The Guardian — journalism and long-form content
LangChain retrievers (custom)
FAISS, Chroma, Pinecone, Weaviate, and any other LangChain-compatible vector store can be plugged in as a search source. This means your internal documentation, codebase, or proprietary knowledge base becomes a first-class search source alongside the public web.
Journal quality scoring
LDR integrates 212,000+ indexed sources via OpenAlex and DOAJ for journal reputation scoring. Research results from predatory journals or low-quality sources can be filtered or flagged.
Installation
Docker Compose is the recommended path. It bundles LDR, Ollama, and SearXNG in one command:
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d
Open http://localhost:5000 after about 30 seconds.
That is the complete quickstart. Ollama serves local models. SearXNG handles web search without requiring any external API keys. The whole stack is self-contained.
pip install (developer setup)
If you want to integrate LDR into a Python project or customize the codebase:
pip install local-deep-research
Works on Windows, macOS, and Linux. Encryption libraries are pre-built — no manual compilation needed.
Unraid
LDR has a dedicated Unraid deployment guide for homelab setups where Docker Compose is not the primary workflow.
Configuration
LLM settings
In the web UI (Settings → LLM), configure:
- Provider: Ollama, OpenAI, Anthropic, etc.
- Model name:
llama3.2,gpt-4o-mini,claude-3-5-sonnet-20241022, etc. - API key and endpoint URL for non-local providers
- Temperature, max tokens, request timeout
Search settings
Per search engine, configure:
- Enable/disable per research type
- Rate limiting and retry behavior
- API keys for premium engines (Tavily, SerpAPI, Brave)
- Custom SearXNG instance URL if you run your own
Research settings
- Default strategy (quick / detailed / focused-iteration / langgraph-agent)
- Citation format
- Export format (PDF, Markdown)
- Journal quality filter threshold
The REST API and MCP server
LDR exposes a full REST API with per-user authentication and WebSocket support for real-time progress updates. This makes it usable as a backend service in larger workflows — trigger a research job programmatically, stream the progress, and collect the output.
The MCP (Model Context Protocol) server integration lets you connect LDR directly to Claude Desktop or Claude Code. Once connected, you can invoke research from within a Claude conversation:
| Tool | Duration | What it does |
|---|---|---|
search | 5–30 seconds | Single engine lookup, no LLM processing |
quick_research | 1–5 minutes | Fast cited answer |
detailed_research | 5–15 minutes | Multi-source synthesis |
generate_report | 10–30 minutes | Full structured report |
analyze_documents | 30 sec–2 min | Query your local library |
The search tool without LLM processing is particularly useful for monitoring use cases — you can query specific engines programmatically without burning tokens.
Privacy and security architecture
Encryption
LDR uses SQLCipher with AES-256 — the same encryption standard as Signal — for local database storage. Per-user isolated databases with no master decryption key means a zero-knowledge design: the application cannot decrypt another user’s data even if it wanted to.
Zero telemetry
- No analytics, no tracking, no phone-home calls
- Network activity only when you initiate a search
- No external scripts loaded into the UI
- Usage metrics stay in your local encrypted database
The documentation is transparent about the one limitation: credentials held in process memory during an active session cannot be encrypted. This is “an industry-wide accepted reality” shared by password managers, browsers, and API clients. Mitigation: session-scoped credential lifetimes and core dump exclusion.
Security scanning
The CI pipeline runs CodeQL, Semgrep, DevSkim, and Bearer for static analysis. OWASP ZAP for dynamic testing. Dockle, Hadolint, and Checkov for container security. Gitleaks and OSV-Scanner for dependency and secret scanning. Docker images are signed with Cosign and ship with SLSA provenance. For a privacy-focused tool, the security posture is unusually thorough.
Comparison to proprietary alternatives
| LDR | Perplexity | ChatGPT Deep Research | |
|---|---|---|---|
| Privacy | Fully local option, zero telemetry | Cloud, analytics | Cloud |
| Model choice | Any LLM | Proprietary | GPT-4 only |
| Cost | Free + optional API costs | Subscription | Plus tier |
| Citation transparency | Sources + reasoning visible | Sources shown | Sources shown |
| Self-hosted | Yes | No | No |
| REST API | Full REST + Python SDK | Indirect | Indirect |
| Search engine control | 20+ configurable engines | Proprietary selection | Proprietary selection |
| Academic databases | arXiv, PubMed, Semantic Scholar | Limited | Limited |
| Local documents | Yes (encrypted library) | No | No |
The benchmark numbers support real performance parity for the right tasks: 95% accuracy on SimpleQA with GPT-4-mini and focused-iteration strategy. The same benchmark position Perplexity and ChatGPT Deep Research occupy at the top of independent evaluations.
Performance expectations
Research time scales with depth and model speed:
| Mode | Typical Duration |
|---|---|
| Quick summary | 30 seconds – 3 minutes |
| Detailed research | 5 – 15 minutes |
| Full report generation | 10 – 30 minutes |
Local model speed is the dominant variable. A well-quantized 7B model on a modern GPU runs fast enough for comfortable detailed research. A 70B model or a slow CPU inference setup will push toward the upper end of these ranges.
Who this is for
Researchers and academics — literature review across arXiv, PubMed, and Semantic Scholar in a single query, with the results downloaded to a growing personal library. No subscription. No query caps.
Journalists and investigators — source-attributed research with Wayback Machine access for archived content. Everything local, nothing logged externally.
Enterprise teams — combine a private Elasticsearch or vector store of internal documents with live web search. LDR queries both simultaneously. The REST API integrates into existing workflows.
Privacy-conscious individuals — run everything on local hardware with a local model. No query leaves your machine except the web searches you explicitly configure.
Cost-sensitive deployments — free software plus whatever API costs you choose to incur. No per-query fees, no seat pricing, no tier restrictions.
The news and subscription system
LDR includes an AI-filtered topic monitoring system. Define topics to watch, set a schedule, and LDR runs searches periodically and filters results through the LLM to surface only genuinely relevant updates. This works without burning LLM tokens on the search step itself — the raw search results are checked first, and the LLM only processes results that pass an initial relevance filter.
Getting started
# Docker Compose (recommended — includes Ollama and SearXNG)
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d
# → http://localhost:5000
# pip (developer/integration use)
pip install local-deep-research
The project is MIT licensed. The community is active on Discord and r/LocalDeepResearch. The benchmark leaderboard is on Hugging Face if you want to compare configurations before committing to a hardware or model choice.
Repository: github.com/LearningCircuit/local-deep-research