All posts
AI Tools 16 min read May 8, 2026

Local Deep Research: Run Perplexity-Level Research Entirely on Your Own Machine

Local Deep Research (LDR) is an open-source AI research assistant that searches the web, academic databases, and your own documents — then synthesizes everything into cited reports. Fully local, zero telemetry, any LLM.

#Local AI#Deep Research#Ollama#Privacy#Self-Hosted#Open Source#LangChain#AI Search#RAG#Research Tools
Neel Shah Tech Lead · Senior Data Engineer · Ottawa

Perplexity charges a subscription. ChatGPT Deep Research requires a Plus tier and runs on OpenAI’s infrastructure. Every query you submit to either service trains their models, enriches their analytics, and leaves a record of your research interests on someone else’s server.

Local Deep Research (LDR) is the open-source answer. It performs the same class of deep, multi-source, cited research — using any LLM you choose, running entirely on your own hardware, with zero telemetry and AES-256 encrypted local storage. One Docker Compose command and it is running.

This guide covers what LDR is, how the research pipeline works, how to set it up, and where it actually outperforms its proprietary competitors.


What problem this solves

“Deep research” as a product category means something specific: not a single web search and a summary paragraph, but an agentic loop that plans a research strategy, dispatches multiple searches, reads the results, decides what to look for next, and synthesizes everything into a structured document with real citations.

The gap between a simple chatbot answer and a deep research report is the difference between asking a colleague a question and asking them to spend an afternoon properly investigating it. LDR automates the afternoon.

The reasons to self-host this are practical:

  • Privacy. Research queries reveal intent. Corporate competitive analysis, personal health questions, legal research, investment theses — you may not want these associated with your account on an external service.
  • Cost. LDR is free software. Run it with a local model and your marginal cost per query is electricity. Use an API-backed model and you pay token costs without any platform markup.
  • Model choice. Perplexity’s backend is opaque. LDR works with Ollama, LM Studio, llama.cpp, OpenAI, Anthropic, Google, or any of 100+ models via OpenRouter. You pick the right model for the task.
  • Search control. 20+ configurable search engines: academic databases, privacy-preserving search aggregators, specialized corpora. You decide what gets searched and what does not.

How the research pipeline works

User Query


Strategy Selection
(quick / detailed / focused-iteration / langgraph-agent)


Autonomous Search Dispatch
(web + academic + documents — parallel)


Result Synthesis via LLM


Source Attribution + Citation


Valuable Papers → Encrypted Local Library


Structured Report (PDF / Markdown)

The four research strategies cover different tradeoffs:

StrategyTimeBest For
Quick Summary30 sec – 3 minFast factual lookups with citations
Detailed Research5 – 15 minMulti-angle topic exploration
Focused Iteration10 – 20 minHigh-accuracy deep dives (best benchmark scores)
LangGraph AgentvariableAutonomous — selects search engines and strategies dynamically

The LangGraph agentic mode is the most powerful: it does not follow a fixed search plan. It evaluates partial results mid-research and decides whether to drill deeper, search a different engine, or synthesize what it has. It is closer to how a human researcher actually works than a fixed pipeline.

Knowledge compounding is the other architectural difference worth understanding. Every research session optionally downloads valuable sources into a personal encrypted library. Future queries search both live web results and your accumulated private knowledge base simultaneously. The library grows over time, and subsequent research on related topics benefits from everything you have previously collected.


Supported LLM backends

LDR works with any model accessible via an OpenAI-compatible API — local or cloud.

Local (no API costs)

BackendDefault EndpointNotes
Ollamahttp://localhost:11434Easiest local setup; pulls models via CLI
LM Studiohttp://localhost:1234/v1GUI-based; good for model switching
llama.cpp + llama-serverhttp://localhost:8080/v1Best performance-per-watt

Tested models: Llama 3.x, Mistral, Gemma 2, DeepSeek, Qwen 2.5. Any model your hardware can serve will work.

Cloud (API key required)

  • OpenAI: GPT-4o, GPT-4-mini, GPT-3.5-turbo
  • Anthropic: Claude 3 Opus, Sonnet, Haiku
  • Google: Gemini 1.5 Pro, Flash
  • OpenRouter: 100+ models with a single key

The model choice meaningfully affects output quality. On the SimpleQA benchmark, GPT-4-mini with the focused-iteration strategy reached 95% accuracy. Gemini-2.0-flash reached 82%. Local models score lower on this benchmark but the gap closes significantly on domain-specific research where the model’s training data is relevant.


Supported search engines

This is where LDR genuinely separates from proprietary alternatives. You can configure exactly what gets searched:

Academic databases

  • arXiv — preprints in physics, CS, math, economics
  • PubMed — biomedical literature
  • Semantic Scholar — AI-powered academic search
  • NASA ADS — astrophysics and space science
  • Zenodo — open research data and publications

General web

  • SearXNG (self-hosted) — privacy-respecting aggregator that queries Google, Bing, DuckDuckGo and others without any individual engine account
  • Wikipedia — structured encyclopedic content, excellent citation density
  • Wayback Machine — archived versions of web pages
  • Google via SerpAPI / PSE — if you have keys

Specialized

  • GitHub — source code, READMEs, issues
  • OpenClaw — legal case law
  • Elasticsearch — your own indexed document collections

Premium options

  • Tavily — AI-optimized web search API with high extraction quality
  • Brave Search — independent index, no Google dependency
  • The Guardian — journalism and long-form content

LangChain retrievers (custom)

FAISS, Chroma, Pinecone, Weaviate, and any other LangChain-compatible vector store can be plugged in as a search source. This means your internal documentation, codebase, or proprietary knowledge base becomes a first-class search source alongside the public web.

Journal quality scoring

LDR integrates 212,000+ indexed sources via OpenAlex and DOAJ for journal reputation scoring. Research results from predatory journals or low-quality sources can be filtered or flagged.


Installation

Docker Compose is the recommended path. It bundles LDR, Ollama, and SearXNG in one command:

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d

Open http://localhost:5000 after about 30 seconds.

That is the complete quickstart. Ollama serves local models. SearXNG handles web search without requiring any external API keys. The whole stack is self-contained.

pip install (developer setup)

If you want to integrate LDR into a Python project or customize the codebase:

pip install local-deep-research

Works on Windows, macOS, and Linux. Encryption libraries are pre-built — no manual compilation needed.

Unraid

LDR has a dedicated Unraid deployment guide for homelab setups where Docker Compose is not the primary workflow.


Configuration

LLM settings

In the web UI (Settings → LLM), configure:

  • Provider: Ollama, OpenAI, Anthropic, etc.
  • Model name: llama3.2, gpt-4o-mini, claude-3-5-sonnet-20241022, etc.
  • API key and endpoint URL for non-local providers
  • Temperature, max tokens, request timeout

Search settings

Per search engine, configure:

  • Enable/disable per research type
  • Rate limiting and retry behavior
  • API keys for premium engines (Tavily, SerpAPI, Brave)
  • Custom SearXNG instance URL if you run your own

Research settings

  • Default strategy (quick / detailed / focused-iteration / langgraph-agent)
  • Citation format
  • Export format (PDF, Markdown)
  • Journal quality filter threshold

The REST API and MCP server

LDR exposes a full REST API with per-user authentication and WebSocket support for real-time progress updates. This makes it usable as a backend service in larger workflows — trigger a research job programmatically, stream the progress, and collect the output.

The MCP (Model Context Protocol) server integration lets you connect LDR directly to Claude Desktop or Claude Code. Once connected, you can invoke research from within a Claude conversation:

ToolDurationWhat it does
search5–30 secondsSingle engine lookup, no LLM processing
quick_research1–5 minutesFast cited answer
detailed_research5–15 minutesMulti-source synthesis
generate_report10–30 minutesFull structured report
analyze_documents30 sec–2 minQuery your local library

The search tool without LLM processing is particularly useful for monitoring use cases — you can query specific engines programmatically without burning tokens.


Privacy and security architecture

Encryption

LDR uses SQLCipher with AES-256 — the same encryption standard as Signal — for local database storage. Per-user isolated databases with no master decryption key means a zero-knowledge design: the application cannot decrypt another user’s data even if it wanted to.

Zero telemetry

  • No analytics, no tracking, no phone-home calls
  • Network activity only when you initiate a search
  • No external scripts loaded into the UI
  • Usage metrics stay in your local encrypted database

The documentation is transparent about the one limitation: credentials held in process memory during an active session cannot be encrypted. This is “an industry-wide accepted reality” shared by password managers, browsers, and API clients. Mitigation: session-scoped credential lifetimes and core dump exclusion.

Security scanning

The CI pipeline runs CodeQL, Semgrep, DevSkim, and Bearer for static analysis. OWASP ZAP for dynamic testing. Dockle, Hadolint, and Checkov for container security. Gitleaks and OSV-Scanner for dependency and secret scanning. Docker images are signed with Cosign and ship with SLSA provenance. For a privacy-focused tool, the security posture is unusually thorough.


Comparison to proprietary alternatives

LDRPerplexityChatGPT Deep Research
PrivacyFully local option, zero telemetryCloud, analyticsCloud
Model choiceAny LLMProprietaryGPT-4 only
CostFree + optional API costsSubscriptionPlus tier
Citation transparencySources + reasoning visibleSources shownSources shown
Self-hostedYesNoNo
REST APIFull REST + Python SDKIndirectIndirect
Search engine control20+ configurable enginesProprietary selectionProprietary selection
Academic databasesarXiv, PubMed, Semantic ScholarLimitedLimited
Local documentsYes (encrypted library)NoNo

The benchmark numbers support real performance parity for the right tasks: 95% accuracy on SimpleQA with GPT-4-mini and focused-iteration strategy. The same benchmark position Perplexity and ChatGPT Deep Research occupy at the top of independent evaluations.


Performance expectations

Research time scales with depth and model speed:

ModeTypical Duration
Quick summary30 seconds – 3 minutes
Detailed research5 – 15 minutes
Full report generation10 – 30 minutes

Local model speed is the dominant variable. A well-quantized 7B model on a modern GPU runs fast enough for comfortable detailed research. A 70B model or a slow CPU inference setup will push toward the upper end of these ranges.


Who this is for

Researchers and academics — literature review across arXiv, PubMed, and Semantic Scholar in a single query, with the results downloaded to a growing personal library. No subscription. No query caps.

Journalists and investigators — source-attributed research with Wayback Machine access for archived content. Everything local, nothing logged externally.

Enterprise teams — combine a private Elasticsearch or vector store of internal documents with live web search. LDR queries both simultaneously. The REST API integrates into existing workflows.

Privacy-conscious individuals — run everything on local hardware with a local model. No query leaves your machine except the web searches you explicitly configure.

Cost-sensitive deployments — free software plus whatever API costs you choose to incur. No per-query fees, no seat pricing, no tier restrictions.


The news and subscription system

LDR includes an AI-filtered topic monitoring system. Define topics to watch, set a schedule, and LDR runs searches periodically and filters results through the LLM to surface only genuinely relevant updates. This works without burning LLM tokens on the search step itself — the raw search results are checked first, and the LLM only processes results that pass an initial relevance filter.


Getting started

# Docker Compose (recommended — includes Ollama and SearXNG)
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml
docker compose up -d
# → http://localhost:5000

# pip (developer/integration use)
pip install local-deep-research

The project is MIT licensed. The community is active on Discord and r/LocalDeepResearch. The benchmark leaderboard is on Hugging Face if you want to compare configurations before committing to a hardware or model choice.


Repository: github.com/LearningCircuit/local-deep-research

Frequently asked questions

What is Local Deep Research?

Local Deep Research is an open-source AI research assistant that can search multiple sources, read documents, and synthesize cited reports while running on user-controlled infrastructure.

Why run deep research locally?

Running deep research locally improves privacy, gives more control over models and search sources, can reduce subscription costs, and keeps sensitive research queries off third-party platforms.

Who should use Local Deep Research?

Local Deep Research is useful for developers, researchers, analysts, and teams that need cited research workflows with stronger privacy, model choice, and self-hosting control.