Why algorithmic sovereignty should be India’s top priority

What Happened

A commentary argues that "algorithmic sovereignty" — the ability of a nation to independently develop, deploy, and govern its AI systems — should be India's foremost priority in the emerging global AI order.
The article highlights a structural problem: AI systems in widespread use today are trained predominantly on Western datasets, shaped by Western legal scholarship and cultural assumptions, making their outputs systematically biased when applied to Indian or Global South contexts, particularly in geopolitical and legal domains.
The argument situates algorithmic dependence alongside more familiar forms of technological dependence (semiconductors, platforms, operating systems) as a sovereignty-level concern.
India's IndiaAI Mission and the Digital Personal Data Protection Act, 2023 are positioned as initial policy responses — but the article argues they are insufficient without a comprehensive sovereign AI infrastructure strategy.

Static Topic Bridges

Algorithmic Sovereignty — Concept and Strategic Dimensions

Algorithmic sovereignty refers to a nation's capacity to independently develop, operate, and regulate the artificial intelligence systems it relies on — without dependence on foreign-owned algorithms, training data, or computational infrastructure. It is an extension of the broader concept of digital sovereignty, applied specifically to the layer of AI models and the data that trains them.

The concern: When core AI systems — large language models, recommendation algorithms, judicial or legal AI tools — are built on Western data and by Western companies, their embedded assumptions about law, geopolitics, social norms, and cultural values are exported globally
For India: AI systems trained on Western legal scholarship may produce systematically skewed outputs on issues like Kashmir, border disputes, or India's historical narratives — with real consequences if these systems are used in research, policy, or media
Three layers of dependence: (i) Hardware (chips, GPUs — dominated by US firms and manufactured in Taiwan); (ii) Data (training datasets — overwhelmingly English-language and Western-origin); (iii) Models (foundational LLMs — OpenAI, Google, Meta)
Sovereignty gap: Dependence at all three layers means that a nation "effectively outsources cognition itself" — a higher-order vulnerability than hardware supply chains

Connection to this news: The article argues that without indigenous AI models trained on Indian datasets, India's AI ecosystem will reflect imported values and assumptions — a deeper form of dependence than software or hardware reliance alone.

India's Policy Response — IndiaAI Mission and DPDPA

India has launched two major policy instruments to advance digital and AI sovereignty. The IndiaAI Mission (2024), with an allocation of Rs. 10,000 crore, is the primary government initiative to build domestic AI capacity. The Digital Personal Data Protection Act (DPDPA), 2023 establishes the legal framework for data governance.

IndiaAI Mission: Implemented by IndiaAI (an Independent Business Division under MeitY); twin goals — "Make AI in India" and "Make AI Work for India"
Key components of India's National AI Stack: Strategic Integration Layer, Hardware Layer, Compute Layer, Data Layer, Model Layer, API and Applications Layer, Security and Compliance Layer
Compute infrastructure: IndiaAI Mission proposes building a shared AI compute facility with 10,000+ GPU capacity for researchers and startups
DPDPA, 2023: India's first comprehensive data protection law; mandates data localisation for certain sensitive categories; establishes a Data Protection Board
NDEAR (National Digital Education Architecture): India's open, federated framework for education sector data — an example of domain-specific data sovereignty

Connection to this news: The commentary implicitly critiques the gap between India's policy ambitions (IndiaAI Mission, DPDPA) and the ground reality — that most AI tools deployed in India, including in public sector applications, continue to rely on foreign foundation models, defeating the purpose of data localisation.

Western Dataset Bias and Global South Implications

A foundational technical challenge of modern AI is that training data shapes model outputs. The dominance of English-language, US-centric data in training corpora for large language models creates systematic biases — in factual claims, ethical framings, and geopolitical perspectives. For countries like India, where indigenous datasets are sparse and often unevenly digitised, the risk of adopting foreign AI without correction mechanisms is acute.

Dataset dominance: Common Crawl — the largest public web dataset used in LLM training — is approximately 45–50% English content despite English speakers being a minority of the world's population
India-specific risks: AI systems may reflect Western legal interpretations of India's territorial claims, colonial-era historical framings, or value systems at odds with Indian constitutional principles
Caste and gender bias: Indian datasets themselves carry embedded biases from social hierarchies — meaning even indigenously trained models require careful curation
Global South concern: The argument connects to India's leadership positioning in forums like the G20 (New Delhi 2023), where India championed Global South perspectives on AI governance and digital infrastructure

Connection to this news: The article situates algorithmic sovereignty within a broader geopolitical frame — it is not merely a technology policy question but a matter of strategic autonomy analogous to energy security, food security, or military capability. India's aspiration to be a Vishwaguru (global leader) in the AI era requires it to generate, not merely consume, foundational AI systems.

India's Digital Sovereignty Framework — Existing Architecture

India's approach to digital sovereignty combines regulatory measures (data localisation, licensing requirements), infrastructure investments (BharatNet, UPI, DigiYatra), and institutional capacity building. The AI dimension adds a new layer to this framework.

Digital Public Infrastructure (DPI): India Stack (Aadhaar, UPI, ONDC, DigiLocker) — globally recognised as a model for sovereign digital infrastructure
Data localisation: RBI mandates local storage of payment data; DPDPA enables government to specify categories requiring localisation
AI governance: India co-led the Global Partnership on AI (GPAI) in 2020–2021; championed "AI for Good" framing at G20
Sovereign AI models: Government is working toward indigenous foundational AI models, including language models trained on Indian language data — part of IndiaAI Mission's mandate
National Language Translation Mission (NLPM — Bhashini): Develops AI tools for 22 scheduled languages, reducing dependence on English-centric AI for Indian users

Connection to this news: The article calls for algorithmic sovereignty to be placed at the same policy priority level as energy or food security — recognising that the AI systems a nation uses to make decisions about itself must reflect its own values, data, and legal traditions.

Key Facts & Data

IndiaAI Mission allocation: Rs. 10,000 crore; implemented by MeitY
DPDPA, 2023: India's first comprehensive data protection legislation; establishes Data Protection Board
India Stack: Aadhaar (1.4 billion enrolled), UPI (140 billion+ transactions in FY2025), ONDC, DigiLocker — a sovereign DPI model
Bhashini/NLPM: AI translation tools for 22 Indian scheduled languages — reducing English-AI dependence
GPAI: India served as co-chair of the Global Partnership on Artificial Intelligence
G20 2023 (New Delhi): India championed inclusive AI governance and Global South digital equity
Common Crawl dataset: ~45–50% English content — structural over-representation in LLM training
Key risk: AI systems based on Western data that interpret India's geopolitical positions, historical narratives, or legal frameworks through foreign frameworks