How are Indian firms training LLMs? | Explained

What Happened

Indian firms are developing Large Language Models (LLMs) trained on Indian soil using Indian compute infrastructure, enabled by the IndiaAI Mission's subsidised GPU access programme.
Training a frontier LLM in India has historically been prohibitively expensive due to the scarcity and high cost of high-performance GPUs — the primary bottleneck for AI model training.
The IndiaAI Mission has subsidised GPU access for startups and research institutions at as low as ₹65/hour, dramatically lowering the barrier to LLM development in India.
The Mixture of Experts (MoE) architecture — used in models like Sarvam AI's 105-billion-parameter model and BharatGen's Param 2 — has emerged as a cost-effective alternative to dense transformer models, reducing inference costs substantially.
Out of 506 foundational AI model proposals received under IndiaAI Mission, 43 specifically target LLM development; four companies (Sarvam, Gnan.AI, GAN.ai, Socket) have received government approval and subsidies.

Static Topic Bridges

IndiaAI Mission — Institutional Framework and Objectives

The IndiaAI Mission was approved by the Union Cabinet in March 2024 with a total budgetary outlay of Rs 10,372 crore, to be implemented by IndiaAI — an independent business division (IBD) under the Ministry of Electronics and Information Technology (MeitY). The Mission's seven pillars cover compute infrastructure, foundational models, datasets, application development, startup financing, research, and responsible AI governance.

Cabinet approval: March 2024; Total outlay: Rs 10,372 crore (approximately $1.25 billion)
Nodal ministry: MeitY (Ministry of Electronics and Information Technology)
Initial target: 10,000 GPUs for national compute infrastructure; current status: over 38,000 GPUs commissioned
Startups receive subsidised GPU access at rates as low as ₹65/hour (vs. market rates of ₹350-500/hour for H100 GPUs)
Sarvam AI received 4,096 NVIDIA H100 SXM GPUs with a subsidy of ₹98.68 crore against a project cost of ₹246.71 crore — selected to build India's Sovereign LLM Ecosystem
Total subsidies disbursed under GPU-allocation scheme: approximately ₹111.85 crore

Connection to this news: The IndiaAI Mission's compute subsidisation is the direct enabler of Indian LLM training on domestic soil — without which the GPU economics make domestic training non-competitive compared to training on foreign cloud infrastructure.

Large Language Models (LLMs) — Technical Fundamentals

A Large Language Model (LLM) is an AI system based on the transformer architecture, trained on massive text corpora to predict and generate language. Training involves running billions of parameters through iterative gradient descent optimisation across trillions of tokens of text. This process is compute-intensive: frontier models like GPT-4 required thousands of NVIDIA A100 or H100 GPUs running for weeks.

Transformer architecture (Vaswani et al., 2017 — "Attention Is All You Need") is the foundation of all modern LLMs including GPT, Llama, and Indian models
Training parameters (size) indicate model complexity: GPT-3 had 175B parameters; Sarvam's model has 105B parameters; BharatGen Param 2 has 17B parameters
India-specific LLM challenge: Most global training data is in English; Indian-language training data (22 scheduled languages) is sparse, requiring curated multilingual datasets
Key GPU requirement for LLM training: NVIDIA H100 (80 GB HBM3 memory, 3.35 PFLOPS FP8 performance) is the current industry standard

Connection to this news: The article's question — "Why is training an LLM on Indian soil a challenge?" — is answered by the GPU scarcity: India had negligible domestic GPU capacity before IndiaAI Mission, forcing reliance on expensive foreign cloud (AWS, Azure, GCP) with data sovereignty implications.

Mixture of Experts (MoE) Architecture — Why It Reduces Cost

A Mixture of Experts (MoE) model is a neural network architecture where only a subset of the model's parameters (called "experts") is activated for any given input token. Instead of routing every token through all 105 billion parameters, an MoE router selects 2-8 expert sub-networks for each token. This dramatically reduces the computational cost per inference while maintaining the total parameter count (and thus the model's knowledge capacity).

MoE vs dense model: A 105B-parameter MoE model may activate only 14-20B parameters per token — inference cost equivalent to a 14-20B dense model, but with 105B total knowledge capacity
Efficiency gain: MoE models can be 3-5x cheaper to run in inference (serving users) than an equivalently sized dense model
Prominent MoE models globally: Mistral's Mixtral 8x7B, Google's Switch Transformer, OpenAI's (reportedly) GPT-4 uses MoE
Indian MoE deployments: Sarvam AI's 105B-parameter model and BharatGen Param 2 (17B, multilingual) both use MoE architecture

Connection to this news: MoE architecture is the technical answer to India's compute constraint: it allows Indian firms to build models with large apparent parameter counts (and thus broad language/domain coverage) while keeping operational GPU costs manageable — critical when subsidised compute is finite.

Digital India and AI Policy — Governance Frameworks

India's AI policy landscape involves multiple frameworks and institutional actors. The NITI Aayog published "National Strategy for Artificial Intelligence" (2018) and "#AIForAll" framework. The Ministry of Electronics and IT (MeitY) coordinates digital governance and AI policy. The IndiaAI Mission represents the most significant government investment in AI infrastructure to date.

NITI Aayog AI Strategy (2018): Identified 5 priority sectors — healthcare, agriculture, education, smart cities, and smart mobility
AI Ethics: MeitY has released draft guidelines on responsible AI development; India was a founding member of the Global Partnership on Artificial Intelligence (GPAI) in 2020
Data governance: The Digital Personal Data Protection Act, 2023 (DPDP Act) regulates personal data — relevant for LLMs trained on user-generated content
National Data Governance Framework Policy (draft, 2022): Aims to create India Datasets Platform for sharing non-personal public sector data — a prerequisite for high-quality Indian-language LLM training data

Connection to this news: India's ambition of "AI sovereignty" — training and running frontier LLMs on domestic infrastructure with Indian data — sits at the intersection of IndiaAI's compute mission, the DPDP Act's data governance rules, and the broader Digital India policy ecosystem.

Key Facts & Data

IndiaAI Mission budget: Rs 10,372 crore (approved March 2024)
Nodal ministry: MeitY (Ministry of Electronics and Information Technology)
National GPU compute capacity: 38,000+ GPUs (surpassing initial 10,000-GPU target)
Subsidised GPU rate for startups: as low as ₹65/hour
LLM proposals under IndiaAI Mission: 43 out of 506 total foundational model proposals
Sarvam AI model: 105 billion parameters, MoE architecture, trained from scratch on domestic compute
BharatGen Param 2: 17 billion parameters, multilingual MoE model
Total subsidies disbursed (GPU scheme): approximately ₹111.85 crore
Sarvam AI GPU allocation: 4,096 NVIDIA H100 SXM GPUs; subsidy ₹98.68 crore
India's scheduled languages: 22 (listed in Eighth Schedule of the Constitution)