What is Local Deep Research?

Local Deep Research is an open-source AI research assistant that can search multiple sources, read documents, and synthesize cited reports while running on user-controlled infrastructure.

Why run deep research locally?

Running deep research locally improves privacy, gives more control over models and search sources, can reduce subscription costs, and keeps sensitive research queries off third-party platforms.

Who should use Local Deep Research?

Local Deep Research is useful for developers, researchers, analysts, and teams that need cited research workflows with stronger privacy, model choice, and self-hosting control.

Local Deep Research: Run Perplexity-Level…

Perplexity charges a subscription. ChatGPT Deep Research requires a Plus tier and runs on OpenAI’s infrastructure. Every query you submit to either service trains their models, enriches their analytics, and leaves a record of your research interests on someone else’s server.

Local Deep Research (LDR) is the open-source answer. It performs the same class of deep, multi-source, cited research — using any LLM you choose, running entirely on your own hardware, with zero telemetry and AES-256 encrypted local storage. One Docker Compose command and it is running.

This guide covers what LDR is, how the research pipeline works, how to set it up, and where it actually outperforms its proprietary competitors.

What problem this solves

“Deep research” as a product category means something specific: not a single web search and a summary paragraph, but an agentic loop that plans a research strategy, dispatches multiple searches, reads the results, decides what to look for next, and synthesizes everything into a structured document with real citations.

The gap between a simple chatbot answer and a deep research report is the difference between asking a colleague a question and asking them to spend an afternoon properly investigating it. LDR automates the afternoon.

The reasons to self-host this are practical:

Privacy. Research queries reveal intent. Corporate competitive analysis, personal health questions, legal research, investment theses — you may not want these associated with your account on an external service.
Cost. LDR is free software. Run it with a local model and your marginal cost per query is electricity. Use an API-backed model and you pay token costs without any platform markup.
Model choice. Perplexity’s backend is opaque. LDR works with Ollama, LM Studio, llama.cpp, OpenAI, Anthropic, Google, or any of 100+ models via OpenRouter. You pick the right model for the task.
Search control. 20+ configurable search engines: academic databases, privacy-preserving search aggregators, specialized corpora. You decide what gets searched and what does not.

How the research pipeline works

User Query
    │
    ▼
Strategy Selection
(quick / detailed / focused-iteration / langgraph-agent)
    │
    ▼
Autonomous Search Dispatch
(web + academic + documents — parallel)
    │
    ▼
Result Synthesis via LLM
    │
    ▼
Source Attribution + Citation
    │
    ▼
Valuable Papers → Encrypted Local Library
    │
    ▼
Structured Report (PDF / Markdown)

The four research strategies cover different tradeoffs:

Strategy	Time	Best For
Quick Summary	30 sec – 3 min	Fast factual lookups with citations
Detailed Research	5 – 15 min	Multi-angle topic exploration
Focused Iteration	10 – 20 min	High-accuracy deep dives (best benchmark scores)
LangGraph Agent	variable	Autonomous — selects search engines and strategies dynamically

The LangGraph agentic mode is the most powerful: it does not follow a fixed search plan. It evaluates partial results mid-research and decides whether to drill deeper, search a different engine, or synthesize what it has. It is closer to how a human researcher actually works than a fixed pipeline.

Knowledge compounding is the other architectural difference worth understanding. Every research session optionally downloads valuable sources into a personal encrypted library. Future queries search both live web results and your accumulated private knowledge base simultaneously. The library grows over time, and subsequent research on related topics benefits from everything you have previously collected.

Supported LLM backends

LDR works with any model accessible via an OpenAI-compatible API — local or cloud.

Local (no API costs)

Backend	Default Endpoint	Notes
Ollama	`http://localhost:11434`	Easiest local setup; pulls models via CLI
LM Studio	`http://localhost:1234/v1`	GUI-based; good for model switching
llama.cpp + llama-server	`http://localhost:8080/v1`	Best performance-per-watt

Tested models: Llama 3.x, Mistral, Gemma 2, DeepSeek, Qwen 2.5. Any model your hardware can serve will work.

Cloud (API key required)

OpenAI: GPT-4o, GPT-4-mini, GPT-3.5-turbo
Anthropic: Claude 3 Opus, Sonnet, Haiku
Google: Gemini 1.5 Pro, Flash
OpenRouter: 100+ models with a single key

The model choice meaningfully affects output quality. On the SimpleQA benchmark, GPT-4-mini with the focused-iteration strategy reached 95% accuracy. Gemini-2.0-flash reached 82%. Local models score lower on this benchmark but the gap closes significantly on domain-specific research where the model’s training data is relevant.

Supported search engines

This is where LDR genuinely separates from proprietary alternatives. You can configure exactly what gets searched:

Academic databases

arXiv — preprints in physics, CS, math, economics
PubMed — biomedical literature
Semantic Scholar — AI-powered academic search
NASA ADS — astrophysics and space science
Zenodo — open research data and publications

General web

SearXNG (self-hosted) — privacy-respecting aggregator that queries Google, Bing, DuckDuckGo and others without any individual engine account
Wikipedia — structured encyclopedic content, excellent citation density
Wayback Machine — archived versions of web pages
Google via SerpAPI / PSE — if you have keys

Specialized

GitHub — source code, READMEs, issues
OpenClaw — legal case law
Elasticsearch — your own indexed document collections

Premium options

Tavily — AI-optimized web search API with high extraction quality
Brave Search — independent index, no Google dependency
The Guardian — journalism and long-form content

LangChain retrievers (custom)

FAISS, Chroma, Pinecone, Weaviate, and any other LangChain-compatible vector store can be plugged in as a search source. This means your internal documentation, codebase, or proprietary knowledge base becomes a first-class search source alongside the public web.

Journal quality scoring

LDR integrates 212,000+ indexed sources via OpenAlex and DOAJ for journal reputation scoring. Research results from predatory journals or low-quality sources can be filtered or flagged.

Installation

Docker Compose is the recommended path. It bundles LDR, Ollama, and SearXNG in one command:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

Open http://localhost:5000 after about 30 seconds.

That is the complete quickstart. Ollama serves local models. SearXNG handles web search without requiring any external API keys. The whole stack is self-contained.

pip install (developer setup)

If you want to integrate LDR into a Python project or customize the codebase:

pip install local-deep-research

Works on Windows, macOS, and Linux. Encryption libraries are pre-built — no manual compilation needed.

Unraid

LDR has a dedicated Unraid deployment guide for homelab setups where Docker Compose is not the primary workflow.

Configuration

LLM settings

In the web UI (Settings → LLM), configure:

Provider: Ollama, OpenAI, Anthropic, etc.
Model name: llama3.2, gpt-4o-mini, claude-3-5-sonnet-20241022, etc.
API key and endpoint URL for non-local providers
Temperature, max tokens, request timeout

Search settings

Per search engine, configure:

Enable/disable per research type
Rate limiting and retry behavior
API keys for premium engines (Tavily, SerpAPI, Brave)
Custom SearXNG instance URL if you run your own

Research settings

Default strategy (quick / detailed / focused-iteration / langgraph-agent)
Citation format
Export format (PDF, Markdown)
Journal quality filter threshold

The REST API and MCP server

LDR exposes a full REST API with per-user authentication and WebSocket support for real-time progress updates. This makes it usable as a backend service in larger workflows — trigger a research job programmatically, stream the progress, and collect the output.

The MCP (Model Context Protocol) server integration lets you connect LDR directly to Claude Desktop or Claude Code. Once connected, you can invoke research from within a Claude conversation:

Tool	Duration	What it does
`search`	5–30 seconds	Single engine lookup, no LLM processing
`quick_research`	1–5 minutes	Fast cited answer
`detailed_research`	5–15 minutes	Multi-source synthesis
`generate_report`	10–30 minutes	Full structured report
`analyze_documents`	30 sec–2 min	Query your local library

The search tool without LLM processing is particularly useful for monitoring use cases — you can query specific engines programmatically without burning tokens.

Privacy and security architecture

Encryption

LDR uses SQLCipher with AES-256 — the same encryption standard as Signal — for local database storage. Per-user isolated databases with no master decryption key means a zero-knowledge design: the application cannot decrypt another user’s data even if it wanted to.

Zero telemetry

No analytics, no tracking, no phone-home calls
Network activity only when you initiate a search
No external scripts loaded into the UI
Usage metrics stay in your local encrypted database

The documentation is transparent about the one limitation: credentials held in process memory during an active session cannot be encrypted. This is “an industry-wide accepted reality” shared by password managers, browsers, and API clients. Mitigation: session-scoped credential lifetimes and core dump exclusion.

Security scanning

The CI pipeline runs CodeQL, Semgrep, DevSkim, and Bearer for static analysis. OWASP ZAP for dynamic testing. Dockle, Hadolint, and Checkov for container security. Gitleaks and OSV-Scanner for dependency and secret scanning. Docker images are signed with Cosign and ship with SLSA provenance. For a privacy-focused tool, the security posture is unusually thorough.

Comparison to proprietary alternatives

	LDR	Perplexity	ChatGPT Deep Research
Privacy	Fully local option, zero telemetry	Cloud, analytics	Cloud
Model choice	Any LLM	Proprietary	GPT-4 only
Cost	Free + optional API costs	Subscription	Plus tier
Citation transparency	Sources + reasoning visible	Sources shown	Sources shown
Self-hosted	Yes	No	No
REST API	Full REST + Python SDK	Indirect	Indirect
Search engine control	20+ configurable engines	Proprietary selection	Proprietary selection
Academic databases	arXiv, PubMed, Semantic Scholar	Limited	Limited
Local documents	Yes (encrypted library)	No	No

The benchmark numbers support real performance parity for the right tasks: 95% accuracy on SimpleQA with GPT-4-mini and focused-iteration strategy. The same benchmark position Perplexity and ChatGPT Deep Research occupy at the top of independent evaluations.

Performance expectations

Research time scales with depth and model speed:

Mode	Typical Duration
Quick summary	30 seconds – 3 minutes
Detailed research	5 – 15 minutes
Full report generation	10 – 30 minutes

Local model speed is the dominant variable. A well-quantized 7B model on a modern GPU runs fast enough for comfortable detailed research. A 70B model or a slow CPU inference setup will push toward the upper end of these ranges.

Who this is for

Researchers and academics — literature review across arXiv, PubMed, and Semantic Scholar in a single query, with the results downloaded to a growing personal library. No subscription. No query caps.

Journalists and investigators — source-attributed research with Wayback Machine access for archived content. Everything local, nothing logged externally.

Enterprise teams — combine a private Elasticsearch or vector store of internal documents with live web search. LDR queries both simultaneously. The REST API integrates into existing workflows.

Privacy-conscious individuals — run everything on local hardware with a local model. No query leaves your machine except the web searches you explicitly configure.

Cost-sensitive deployments — free software plus whatever API costs you choose to incur. No per-query fees, no seat pricing, no tier restrictions.

The news and subscription system

LDR includes an AI-filtered topic monitoring system. Define topics to watch, set a schedule, and LDR runs searches periodically and filters results through the LLM to surface only genuinely relevant updates. This works without burning LLM tokens on the search step itself — the raw search results are checked first, and the LLM only processes results that pass an initial relevance filter.

Getting started

# Docker Compose (recommended — includes Ollama and SearXNG)
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d
# → http://localhost:5000

# pip (developer/integration use)
pip install local-deep-research

The project is MIT licensed. The community is active on Discord and r/LocalDeepResearch. The benchmark leaderboard is on Hugging Face if you want to compare configurations before committing to a hardware or model choice.

Repository: github.com/LearningCircuit/local-deep-research

Local Deep Research: Run Perplexity-Level Research Entirely on Your Own Machine

What problem this solves

How the research pipeline works

Supported LLM backends

Local (no API costs)

Cloud (API key required)

Supported search engines

Academic databases

General web

Specialized

Premium options

LangChain retrievers (custom)

Journal quality scoring

Installation

pip install (developer setup)

Unraid

Configuration

LLM settings

Search settings

Research settings

The REST API and MCP server

Privacy and security architecture

Encryption

Zero telemetry

Security scanning

Comparison to proprietary alternatives

Performance expectations

Who this is for

The news and subscription system

Getting started

Frequently asked questions

What is Local Deep Research?

Why run deep research locally?

Who should use Local Deep Research?

Local Deep Research: Run Perplexity-Level Research Entirely on Your Own Machine

What problem this solves

How the research pipeline works

Supported LLM backends

Local (no API costs)

Cloud (API key required)

Supported search engines

Academic databases

General web

Specialized

Premium options

LangChain retrievers (custom)

Journal quality scoring

Installation

pip install (developer setup)

Unraid

Configuration

LLM settings

Search settings

Research settings

The REST API and MCP server

Privacy and security architecture

Encryption

Zero telemetry

Security scanning

Comparison to proprietary alternatives

Performance expectations

Who this is for

The news and subscription system

Getting started

Frequently asked questions

What is Local Deep Research?

Why run deep research locally?

Who should use Local Deep Research?

Related posts