Explained | What is genome sequencing and why does the Genome India Project matter?

The Genome India Project, funded and coordinated by the Department of Biotechnology (DBT), completed the sequencing of 10,074 whole genomes from 83 diverse I...

What Happened

The Genome India Project, funded and coordinated by the Department of Biotechnology (DBT), completed the sequencing of 10,074 whole genomes from 83 diverse Indian populations across 99 sites — achieving its foundational target of 10,000 Indian genomes.
The data has been archived at the Indian Biological Data Centre (IBDC), India's first national life-sciences data repository, making it the largest Indian-specific genomic reference dataset ever assembled.
The project identified over 130 million genetic variants in the Indian population, of which approximately 44 million were previously unknown and absent from global genomic databases (which are heavily skewed toward populations of European ancestry).
Key significance: many identified variants affect drug metabolism (pharmacogenomics) — meaning therapies designed around global (predominantly Western) genomes may be suboptimal or unsafe for large segments of India's population.
The dataset spans India's four major linguistic-genetic families: Indo-European, Dravidian, Austro-Asiatic, and Tibeto-Burman — capturing the country's extraordinary genetic diversity rooted in ancient population structures.
The project's long-term goal is to enable precision medicine: customising diagnoses, risk predictions, and treatment protocols to the specific genetic variants prevalent in Indian populations.

Static Topic Bridges

What is Genome Sequencing?

The human genome is the complete set of genetic instructions contained in the DNA of a human cell. It consists of approximately 3 billion base pairs arranged along 23 pairs of chromosomes. Genome sequencing is the laboratory process of determining the precise order of these base pairs — i.e., reading the complete DNA sequence of an individual. The 1990–2003 Human Genome Project (HGP), an international collaboration, produced the first reference human genome at a cost exceeding USD 2 billion over 13 years. Subsequent advances in Next-Generation Sequencing (NGS) technology have reduced the cost from ~USD 1 million per genome in 2007 to approximately USD 200 in 2024–25, democratising large-scale genomic research.

Human genome: ~3 billion base pairs; ~20,000–25,000 protein-coding genes.
Human Genome Project (HGP): Launched 1990; completed 2003; cost >USD 2 billion; 18-country international consortium.
Sanger sequencing (first-generation): Chemical chain-termination method; accurate but slow and expensive.
Next-Generation Sequencing (NGS): High-throughput parallel sequencing; enabled the post-2007 cost collapse; forms the backbone of Genome India.
Current cost: ~USD 200 per genome (2024–25); approaching the USD 100 "threshold" at scale.
Whole Genome Sequencing (WGS): Reads the entire genome; contrasts with targeted sequencing (specific genes) or exome sequencing (protein-coding regions only).

Connection to this news: The Genome India Project uses WGS on 10,074 individuals — the scale and Indian-population specificity of this dataset is what makes it scientifically novel and clinically significant.

The Genome India Project — Structure and Significance

The Genome India Project (GIP) was launched on January 3, 2020, by the Department of Biotechnology (DBT), Government of India. It is led by the Centre for Brain Research at the Indian Institute of Science (IISc), Bengaluru, with collaboration from 20 major scientific and medical institutions across India. The project's immediate goal was to sequence 10,000 whole genomes from India's diverse populations; it has now expanded with plans for a much larger dataset. The data is housed at the Indian Biological Data Centre (IBDC), established by DBT as India's national repository for life-sciences data — analogous to international databases like NCBI (US) or EMBL-EBI (Europe).

Nodal ministry: Department of Biotechnology (DBT), Ministry of Science & Technology.
Lead institution: Centre for Brain Research, Indian Institute of Science (IISc), Bengaluru.
Launch: January 3, 2020.
Sample size: 10,074 individuals from 83 Indian populations across 99 geographic sites.
Data archive: Indian Biological Data Centre (IBDC) — India's first national life-sciences data repository.
Variants identified: ~130 million genetic variants; ~44 million previously unknown (not in global databases).
Population coverage: Spans all four major Indian linguistic-genetic families (Indo-European, Dravidian, Austro-Asiatic, Tibeto-Burman).
Predecessor: IndiGen Initiative (2019) — CSIR-led, sequenced 1,008 Indian genomes as a pilot using CSIR-CCMB (Hyderabad) and CSIR-IGIB (Delhi).

Connection to this news: GIP goes far beyond IndiGen in scale and geographic coverage, making it the definitive genomic baseline for Indian clinical and research applications.

Precision Medicine and Pharmacogenomics — Why Indian Data Matters

Precision medicine (also called personalised medicine) refers to the tailoring of medical treatment to the individual characteristics of each patient — particularly their genetic profile. Pharmacogenomics is the branch that studies how genes affect a person's response to drugs. Drug efficacy and adverse effects are significantly influenced by genetic variants in metabolising enzymes (e.g., CYP450 family). Most global pharmaceutical research and clinical guidelines are developed from data drawn predominantly from populations of European descent — meaning drugs may be calibrated to genetic profiles that do not match the Indian population's specific variant landscape.

The GIP found that many Indian populations carry gene variants implicated in reducing the efficiency and efficacy of antiviral drugs — variants absent from global reference databases.
India's genetic diversity is extreme: due to ancient population bottlenecks, endogamy, and the caste system, Indian sub-populations are genetically distinct from each other and from global reference groups.
Pharmacogenomics relevance: Variants in genes like CYP2C9, CYP2C19, and SLCO1B1 affect metabolism of common drugs (warfarin, statins, antidepressants) and are distributed differently across Indian populations.
A sequenced population baseline enables: (i) population-specific drug dosing guidelines; (ii) identification of genetic disease risk variants prevalent in India (e.g., sickle cell, thalassemia, LQTS); (iii) development of diagnostic tools calibrated to Indian genetic architecture.

Connection to this news: Sequencing 10,000+ Indian genomes creates the reference dataset without which Indian-specific precision medicine is impossible — therapies developed using global (predominantly European) datasets cannot be assumed safe or optimal for Indian patients.

Data Sovereignty and the IBDC Framework

Genomic data is classified as sensitive personal data because it carries lifelong, familial, and population-level health information. The Indian Biological Data Centre (IBDC), set up under the Department of Biotechnology at the National Institute of Plant Genome Research (NIPGR), Faridabad, is designed as a sovereign data repository — ensuring that India's population genomic data is stored domestically under Indian jurisdiction rather than shared with international databases without controlled access.

IBDC: Established by DBT; located at NIPGR, Faridabad; India's first national repository for biological data.
Digital Personal Data Protection Act, 2023: India's framework for protection of personal data, including health data — genomic data could be classified as sensitive personal data requiring higher protection standards.
Data access: Researchers can apply to IBDC for controlled access to the GIP dataset — analogous to dbGaP (US) or EGA (Europe) controlled-access frameworks.
Global context: The GenBank (NCBI, US), European Nucleotide Archive (EMBL-EBI), and DNA Data Bank of Japan form the global INSDC consortium — IBDC complements this with India-specific population data.

Connection to this news: By archiving at IBDC rather than submitting to international databases without controls, India retains sovereignty over its population's genomic data — a policy choice with significant long-term strategic and ethical implications.

Key Facts & Data

Human genome: ~3 billion base pairs; ~20,000–25,000 protein-coding genes; 23 pairs of chromosomes.
Human Genome Project: 1990–2003; cost >USD 2 billion; 18-country consortium; produced the first reference human genome.
Cost of genome sequencing (2007): ~USD 1 million; (2024–25): ~USD 200.
Genome India Project: Launched January 3, 2020; nodal ministry: Department of Biotechnology (DBT).
Lead institution: Centre for Brain Research, Indian Institute of Science (IISc), Bengaluru.
Collaborating institutions: 20 across India.
Sample size: 10,074 individuals; 83 Indian populations; 99 geographic sites.
Variants identified: ~130 million total; ~44 million previously unknown to global databases.
Predecessor: IndiGen (2019), CSIR-led, 1,008 genomes.
Data repository: Indian Biological Data Centre (IBDC), NIPGR, Faridabad.
Linguistic-genetic families covered: Indo-European, Dravidian, Austro-Asiatic, Tibeto-Burman.
Global context: Most global genomic databases are 80%+ of European-descent individuals — creating a representation gap for Indian and South Asian populations.