44 million unknown variants: What India's Genome Project found, and why it matters

The Genome India Project (GIP), coordinated by the Indian Institute of Science (IISc), Bengaluru, has completed the whole-genome sequencing of 10,000 Indians...

What Happened

The Genome India Project (GIP), coordinated by the Indian Institute of Science (IISc), Bengaluru, has completed the whole-genome sequencing of 10,000 Indians from 99 diverse population groups.
Analysis of this dataset revealed approximately 44 million genetic variants unique to the Indian population — variants not previously catalogued in global genomic databases such as gnomAD or the 1000 Genomes Project.
The findings represent the most comprehensive genetic map of the Indian population ever assembled, covering the country's extraordinary ethnic, linguistic, and geographic diversity.
The data has been submitted to a publicly accessible repository, enabling researchers globally to build precision medicine tools tailored to Indian patients.

Static Topic Bridges

Genome India Project (GIP)

The Genome India Project was launched in 2020 under the Department of Biotechnology (DBT), Government of India, with IISc, Bengaluru as the coordinating institution. It involved more than 20 research institutions across India. The project aimed to sequence 10,000 whole genomes representing India's diverse population groups — a critical step because global genomic databases are overwhelmingly derived from European ancestry populations, rendering them unreliable for Indian disease risk assessments and drug response predictions.

Launched: 2020 under the Department of Biotechnology (DBT)
Coordinating institution: Indian Institute of Science (IISc), Bengaluru
Scale: 10,000 whole genomes from 99 population groups
Completion of sequencing: announced in April 2024; data analysis published 2026

Connection to this news: The 44 million unique variants discovered are a direct output of this sequencing effort, establishing a reference genomic database specific to India.

IndiGen Programme (Precursor Initiative)

Before GIP, the CSIR launched the IndiGen Genome Project in April 2019 as a pilot. IndiGen sequenced 1,008 genomes of healthy Indian individuals from diverse ethnic groups. It identified 55.9 million single allelic variants, of which approximately 32% were unique to India and not found in global databases. IndiGen demonstrated both the feasibility and the scientific necessity of India-specific genome mapping.

Launched: April 2019 by CSIR (Council of Scientific & Industrial Research)
Scale: 1,008 whole genomes
Key finding: ~32% of identified variants unique to India
Objective: disease predisposition mapping, personalized medicine, population genetics

Connection to this news: GIP scaled up the IndiGen pilot more than 10-fold, and the 44 million unique variants in the GIP dataset far exceed what IndiGen captured, validating the case for a larger reference population.

Precision Medicine and Pharmacogenomics

Precision medicine is an approach to disease treatment and prevention that accounts for individual variability in genes, environment, and lifestyle. Pharmacogenomics — the study of how genes affect a person's response to drugs — is a core application. Population-specific genomic databases are essential because genetic variants that determine drug metabolism (e.g., CYP450 enzyme variants) differ substantially across ancestral groups. Without an India-specific reference, Indian patients face risks of incorrect dosing or adverse drug reactions.

Most existing genomic references (gnomAD, 1000 Genomes) are 80–90% derived from European populations
India has ~4,635 distinct population groups with high genetic heterogeneity
India carries approximately 25% of global thorium reserves but also unusually high genetic diversity due to ancient population bottlenecks and endogamy
GIP data will support drug discovery, rare disease diagnosis, and population health surveillance

Connection to this news: The 44 million previously unknown variants form the essential reference backbone for developing India-specific diagnostic kits, risk algorithms, and drug response models.

Large-scale genome sequencing raises concerns around data privacy, genetic discrimination, consent, and data sovereignty. In India, the Digital Personal Data Protection Act, 2023 categorises genetic data as "sensitive personal data" requiring heightened protection standards. There are ongoing policy debates about how genomic data stored in public repositories should be governed to prevent misuse by insurance companies or employers.

Digital Personal Data Protection Act, 2023 provides the current legal framework for sensitive personal data including genomic data
Ethical protocols for genome projects require informed consent from all participants
Data sovereignty — keeping Indian genomic data within Indian jurisdiction — is a stated policy priority

Connection to this news: As GIP data enters public repositories, the governance framework for access, use, and sharing of these 44 million variants becomes a live policy question directly relevant to GS Paper 4 ethics and GS Paper 3 technology governance.

Key Facts & Data

44 million unique genetic variants identified in the Genome India Project dataset, not found in any global database
10,000 whole genomes sequenced from 99 population groups across India
Coordinating institution: Indian Institute of Science (IISc), Bengaluru; funded by Department of Biotechnology (DBT)
Precursor: IndiGen (CSIR, 2019) — sequenced 1,008 genomes; found ~32% unique variants
India accounts for approximately 17.5% of global population but was severely underrepresented in global genomic databases before GIP
Applications: precision medicine, rare disease diagnosis, population genetics, pharmacogenomics, drug discovery
Data governance: governed under Digital Personal Data Protection Act, 2023 for privacy safeguards