If you are building AI products for India, you have been patching around a real problem: the big models from OpenAI and Anthropic are trained mostly on English. They handle Hindi, Tamil, Bengali — but with obvious gaps. Code-switching (mixing languages mid-sentence), regional dialects, script accuracy for Devanagari and Tamil, culturally grounded reasoning — these are not afterthoughts you can fine-tune away.
Sarvam AI was built to solve this from the ground up. The Sarvam-105B model is their flagship large language model — 105 billion parameters, pre-trained on a multilingual corpus that prioritises all 22 scheduled Indian languages alongside English. This guide walks you through what the model actually does well, how to access it, and how to build with it.
What Is Sarvam AI?
Sarvam AI is a Bangalore-based AI company co-founded by Vivek Raghavan and Pratyush Kumar — both researchers with deep roots in speech and language technology for Indian languages.
The company’s stated mission is building full-stack AI for Bharat: not just language models, but the entire pipeline from speech-to-text and translation to LLM inference and text-to-speech. Every layer is built with Indian languages as a first-class target, not an add-on.
Their model line-up before the 105B:
- Sarvam-2B — open-source, released in 2024, focused on 10 Indian languages
- Sarvam Translate — specialised translation API supporting 22 Indian languages
- Sarvam Speech — ASR and TTS models tuned for Indian accents and scripts
Sarvam-105B is the new flagship. It is the first Indian-developed LLM at this parameter scale designed for multilingual reasoning across all 22 scheduled Indian languages.
What Makes 105B Different
Three things set this model apart from using a general-purpose LLM and prompting it in Hindi.
1. Script-Native Training
Most large models see Indian scripts through tokenisers built for English. A Hindi sentence gets fragmented into sub-word tokens in a way that inflates context length and degrades coherence. Sarvam-105B uses a tokeniser built from scratch on a balanced multilingual corpus — each Indian script is tokenised efficiently. A sentence in Tamil uses roughly the same number of tokens as its English translation.
2. Code-Switching Fluency
Real Indian communication mixes languages constantly. A customer support message might read: “Mera order abhi tak nahi aaya, please help.” (My order hasn’t arrived yet, please help.) Standard LLMs handle this poorly — they either default to English or produce awkward responses that don’t match the register.
Sarvam-105B is trained on naturally occurring code-switched text. It reads the input as a whole and responds in the same register, not a cleaned-up translation of it.
3. Cultural and Contextual Grounding
Names, places, festivals, legal frameworks, government schemes — these are all part of the model’s training distribution rather than gaps in it. When a user asks about PMAY, MNREGA, or GST filing, the model has substantive knowledge rather than a thin English-language summary.
Getting Access
Sarvam AI provides API access through their developer platform.
Requirements
─────────────────────────────────────────
Account sarvam.ai (free tier available)
API key from the developer dashboard
Python 3.9 or later
pip any recent version
Step 1: Create an account
Go to sarvam.ai, create an account, and navigate to the API keys section of your dashboard.
Step 2: Get your API key
Your API key will look like: sk-...
Store it as an environment variable — never hardcode keys in source files:
export SARVAM_API_KEY="sk-your-key-here"
Or add it to your .bashrc / .zshrc for persistence:
echo 'export SARVAM_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrc
Step 3: Install the SDK
pip install sarvam-python
For a minimal setup without the SDK, you can use the REST API directly with requests:
pip install requests
Your First API Call
Let’s verify the setup with a simple Hindi prompt.
import os
from sarvam import SarvamClient
client = SarvamClient(api_key=os.environ["SARVAM_API_KEY"])
response = client.chat.complete(
model="sarvam-105b",
messages=[
{
"role": "user",
"content": "भारत में आर्टिफिशियल इंटेलिजेंस के क्या फायदे हैं?"
}
]
)
print(response.choices[0].message.content)
You should get a coherent, well-structured Hindi response about AI benefits in India. No translation artifacts, no broken Devanagari script.
Using the REST API directly:
import os
import requests
response = requests.post(
"https://api.sarvam.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['SARVAM_API_KEY']}",
"Content-Type": "application/json"
},
json={
"model": "sarvam-105b",
"messages": [
{"role": "user", "content": "Hello, what languages do you speak?"}
]
}
)
data = response.json()
print(data["choices"][0]["message"]["content"])
Five Features That Matter
Feature 1: Multilingual Chat
The model handles all 22 scheduled Indian languages. You can switch languages between turns and it maintains context correctly.
messages = [
{"role": "user", "content": "नमस्ते! मेरा नाम नील है।"},
{"role": "assistant", "content": "नमस्ते नील! आप कैसे हैं?"},
{"role": "user", "content": "I'm good. Can you explain what you just said?"},
]
response = client.chat.complete(
model="sarvam-105b",
messages=messages
)
The model correctly identifies that the user has switched to English and responds accordingly, summarising the Hindi exchange.
Supported languages include:
Hindi (hi) Bengali (bn) Telugu (te)
Tamil (ta) Marathi (mr) Gujarati (gu)
Kannada (kn) Malayalam (ml) Punjabi (pa)
Odia (or) Assamese (as) Urdu (ur)
Sanskrit (sa) Sindhi (sd) Kashmiri (ks)
Konkani (kok) Manipuri (mni) Maithili (mai)
Dogri (doi) Bodo (brx) Santhali (sat)
Nepali (ne)
Feature 2: Document Analysis in Regional Languages
Useful for processing scanned government documents, legal filings, or customer correspondence in Indian languages.
def analyse_document(text: str, question: str, language: str = "hi") -> str:
prompt = f"""नीचे दिया गया दस्तावेज़ पढ़ें और प्रश्न का उत्तर दें।
दस्तावेज़:
{text}
प्रश्न: {question}
उत्तर:"""
response = client.chat.complete(
model="sarvam-105b",
messages=[{"role": "user", "content": prompt}],
max_tokens=500
)
return response.choices[0].message.content
# Example usage
doc = """ग्राम पंचायत कार्यालय
दिनांक: 15 मार्च 2026
प्रमाण पत्र
यह प्रमाणित किया जाता है कि श्री राम प्रसाद शर्मा, पुत्र श्री हरि प्रसाद शर्मा,
ग्राम रामपुर, जिला लखनऊ के स्थायी निवासी हैं।"""
result = analyse_document(doc, "इस दस्तावेज़ में किस व्यक्ति का उल्लेख है?")
print(result)
Feature 3: Code-Switched Customer Support
This is where Sarvam-105B genuinely outperforms general-purpose models. Indian customer support conversations naturally mix Hindi and English — the model handles this without special prompting.
def customer_support_agent(user_message: str) -> str:
system_prompt = """You are a helpful customer support agent for an Indian e-commerce platform.
Respond naturally in the same language mix the user uses. If they write in Hindi, respond in Hindi.
If they mix Hindi and English (Hinglish), match that style. Be warm and helpful."""
response = client.chat.complete(
model="sarvam-105b",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
],
temperature=0.7
)
return response.choices[0].message.content
# Hinglish input
reply = customer_support_agent(
"Mera order 3 din se pending hai, kab milega? Very frustrated hoon."
)
print(reply)
Feature 4: Structured Output in Indian Languages
For applications that need JSON extraction from Hindi or Tamil text — forms, applications, data entry automation.
import json
def extract_from_hindi_form(form_text: str) -> dict:
prompt = f"""नीचे दिए गए फॉर्म से जानकारी निकालें और JSON में दें।
फॉर्म:
{form_text}
केवल JSON लौटाएं, कोई अन्य टेक्स्ट नहीं:
{{
"name": "",
"age": "",
"address": "",
"phone": ""
}}"""
response = client.chat.complete(
model="sarvam-105b",
messages=[{"role": "user", "content": prompt}],
temperature=0
)
raw = response.choices[0].message.content.strip()
# Clean markdown code blocks if present
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw.strip())
form_data = """
नाम: सुनीता देवी
आयु: 34 वर्ष
पता: 12, गांधी नगर, जयपुर, राजस्थान
मोबाइल: 9876543210
"""
result = extract_from_hindi_form(form_data)
print(result)
# {'name': 'सुनीता देवी', 'age': '34', 'address': '12, गांधी नगर, जयपुर, राजस्थान', 'phone': '9876543210'}
Feature 5: Streaming for Real-Time Applications
For chatbots and interfaces where users need to see tokens as they arrive:
import sys
def stream_response(prompt: str) -> None:
stream = client.chat.complete(
model="sarvam-105b",
messages=[{"role": "user", "content": prompt}],
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # newline at end
stream_response("भारत की संस्कृति के बारे में तीन रोचक तथ्य बताइए।")
Real-World Use Cases
Government and Civic Tech
India’s digital governance stack — from Aadhaar to UMANG to DigiLocker — serves hundreds of millions of citizens who interact in regional languages. Sarvam-105B fits naturally into:
- Chatbots for scheme eligibility queries (PM Kisan, Ujjwala Yojana, etc.)
- Automated processing of handwritten or scanned forms in regional languages
- Legal aid tools that explain rights and procedures in local languages
EdTech
India has the world’s largest student population. Tutoring AI that can switch between English explanations and Hindi/Tamil clarifications within the same conversation removes a significant barrier for learners in tier-2 and tier-3 cities.
Financial Services
KYC document processing, loan application analysis, and customer onboarding often involve documents and conversations in multiple Indian languages. The model’s structured output and document understanding capabilities apply directly here.
Healthcare
Patient intake forms, discharge summaries, symptom triage — all of these generate text in regional languages in Indian hospitals. Automated summarisation and extraction that actually works in Marathi or Gujarati has real clinical value.
Comparing with Alternatives
Model Indian Languages Code-switching Context (tokens)
────────────────────────────────────────────────────────────────────────
Sarvam-105B 22 (first-class) Strong 128K
GPT-4o English-primary Moderate 128K
Claude 3.5 Sonnet English-primary Moderate 200K
Gemini 1.5 Pro Hindi support Moderate 1M
IndicBERT 10 (encoder only) N/A 512
────────────────────────────────────────────────────────────────────────
The comparison is not that Sarvam-105B beats GPT-4o at everything — it does not. For complex reasoning in English, general-purpose frontier models are still ahead. The comparison that matters is: for Indian language applications, Sarvam-105B produces noticeably better output at lower cost, since API pricing is calibrated for Indian developers and startups.
Pricing and Rate Limits
Sarvam AI offers tiered pricing designed for Indian market economics:
Tier Tokens/month Rate limit Price (per 1M tokens)
──────────────────────────────────────────────────────────────────────
Free 2M input 5 req/min ₹0
Developer 50M input 60 req/min ₹500 (~$6)
Startup 500M input 300 req/min ₹3,500 (~$42)
Enterprise Unlimited Custom Custom
──────────────────────────────────────────────────────────────────────
Check the current pricing on the Sarvam AI dashboard — these figures are approximate and subject to change.
Common Issues and Fixes
Issue: Response contains mixed script artifacts
If you see garbled output when mixing scripts in the same prompt, be explicit about the expected output language:
# Add language instruction to system prompt
system = "Respond only in Hindi (Devanagari script). Do not mix scripts."
Issue: Rate limit errors (429)
Add exponential backoff:
import time
def complete_with_retry(client, messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.complete(
model="sarvam-105b",
messages=messages
)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
else:
raise
Issue: JSON extraction fails for some Indian language inputs
Use temperature=0 for extraction tasks and add a validation step:
import json
def safe_json_extract(raw: str) -> dict | None:
try:
# Strip markdown if present
if "```" in raw:
raw = raw.split("```")[1].lstrip("json").strip()
return json.loads(raw)
except json.JSONDecodeError:
return None
Getting Started Checklist
Setup
─────────────────────────────────────────────────────────
[ ] Create account at sarvam.ai
[ ] Copy API key to environment variable SARVAM_API_KEY
[ ] pip install sarvam-python
[ ] Run first API call — verify Hindi/English response
First Build
─────────────────────────────────────────────────────────
[ ] Pick one of the five features above that matches your use case
[ ] Run the code example with your own prompt
[ ] Test with code-switched input (mix Hindi + English)
[ ] Verify structured output extraction if you need JSON
Production
─────────────────────────────────────────────────────────
[ ] Add exponential backoff for rate limit handling
[ ] Store API key in environment — never in source code
[ ] Choose the right tier for your expected token volume
[ ] Monitor latency — enable streaming for user-facing features
The Bottom Line
Sarvam-105B is not trying to beat GPT-5 at MMLU benchmarks. It is trying to be the best model for building products that serve India — and for that narrower, more specific goal, it is genuinely well-suited.
If your application involves Indian users who communicate in regional languages, the model’s script-native tokenisation, code-switching fluency, and cultural grounding translate directly into better product quality. The API pricing makes it accessible for Indian startups in a way that frontier model pricing often is not.
For English-primary applications, the general frontier models are still the default choice. For anything India-first, Sarvam-105B should be your starting point.