Current Affairs Topics Archive
International Relations Economics Polity & Governance Environment & Ecology Science & Technology Internal Security Geography Social Issues Art & Culture Modern History

OpenAI, Anthropic accuse Chinese rivals of mass AI data theft


What Happened

  • OpenAI and Anthropic have publicly accused Chinese AI companies — specifically DeepSeek, Moonshot AI, and MiniMax — of conducting coordinated "distillation attack" campaigns against their AI models.
  • Anthropic stated that these Chinese firms flooded its Claude model with large volumes of specially-crafted prompts to extract information for training proprietary models.
  • OpenAI submitted documentation to US legislators citing "accounts associated with DeepSeek employees developing methods to circumvent OpenAI's access restrictions" to obtain outputs for distillation at scale.
  • Both companies have framed these distillation campaigns as national security threats — arguing that authoritarian governments could deploy the resulting AI models for offensive cyber operations, disinformation, and surveillance.
  • The accusation raises fundamental questions about intellectual property protection in the AI industry, where the boundary between permitted use and theft is legally unsettled.

Static Topic Bridges

Knowledge Distillation — Technical Concept

Knowledge distillation is a machine learning technique in which a smaller, more efficient "student" model is trained to mimic the outputs of a larger, more capable "teacher" model. Rather than training from scratch on raw data, the student model learns from the probability distributions (soft labels) over outputs that the teacher model produces — capturing the teacher's "knowledge" more efficiently than training on hard labels alone. The technique was introduced by Geoffrey Hinton, Oriol Vinyals, and Jeff Dean in a 2015 paper.

  • Legitimate use: AI companies routinely distill their own models to create smaller, cheaper inference versions (e.g., Anthropic's Claude Haiku, OpenAI's GPT-4o-mini)
  • Adversarial/commercial distillation: Using a competitor's commercially available API to generate large volumes of input-output pairs → training a competing model on these pairs
  • Cost efficiency: Researchers at Berkeley reportedly recreated an OpenAI reasoning model for ~$450 in compute in 19 hours using distillation from API outputs alone
  • Legal ambiguity: If distillation uses only publicly available API outputs (not stolen code or model weights), its legality under existing IP law is disputed — current copyright law does not clearly protect AI model outputs
  • DeepSeek's achievements: Chinese AI firm DeepSeek released R1 and V3 models in early 2025 that matched or approached US frontier model capabilities at a fraction of the training cost — amplifying concerns about distillation-enabled catching-up

Connection to this news: The accusation is that DeepSeek, Moonshot AI, and MiniMax went beyond normal API use — using "obfuscated methods" and large-scale systematic extraction that crosses into adversarial data harvesting.

Existing intellectual property law — primarily copyright and trade secrets — was not designed for the AI era. Model weights (the trained parameters of an AI model) may not be protectable under copyright (no human authorship). API outputs (text, images) generated by an AI may be protectable but using them for training a competing model falls in legal grey zones not clearly resolved by courts.

  • Trade secret protection (US): The Defend Trade Secrets Act (DTSA, 2016) protects confidential business information obtained through misappropriation — but if distillation uses only public APIs, it may not qualify as trade secret theft
  • Copyright law: US Copyright Office (2023) guidance: AI-generated content without human authorship is not copyrightable — this limits IP protection for AI outputs
  • OpenAI's Terms of Service: Explicitly prohibit using OpenAI outputs to "develop models that compete with OpenAI" — this is contractual, not statutory IP protection
  • Indian IT law context: Information Technology Act, 2000 (amended 2008) — Section 66 (computer-related offences), Section 43 (unauthorized access to computer systems); India's proposed Digital India Act (draft stage) addresses AI governance but not specifically distillation
  • US AI regulation: Executive Order on Safe, Secure and Trustworthy AI (October 2023, Biden); Trump administration issued an AI Action Plan (February 2025) emphasising competitiveness over regulation

Connection to this news: The legal vacuum around AI distillation means US companies are currently relying on contractual ToS enforcement and lobbying for new legislation rather than existing IP law to protect against adversarial distillation.

US-China Technology Competition — AI as a Strategic Domain

The US-China technology rivalry has increasingly centred on AI supremacy. Key inflection points include: US export controls on advanced semiconductors (CHIPS Act, 2022; export control rules restricting Nvidia A100/H100 GPU exports to China), China's national AI strategy (2017), and DeepSeek's January 2025 release demonstrating that China had significantly closed the AI capability gap despite chip restrictions.

  • US CHIPS and Science Act (2022): $52 billion for domestic semiconductor manufacturing; restricted chip exports to China
  • Export controls (Bureau of Industry and Security, October 2022 and updated 2023): Banned export of advanced AI chips (Nvidia A100/H100 equivalents) to China without licence
  • China's "New Generation AI Development Plan" (2017): National strategy to become global AI leader by 2030
  • DeepSeek R1/V3 models (January 2025): Released as open-source; matched GPT-4 level performance; trained on reportedly far fewer H100-equivalent chips — demonstrated that export controls had limited effectiveness
  • Huawei AI chips: China developing domestic GPU alternatives (Huawei Ascend 910) to reduce dependence on Nvidia — export controls accelerate this
  • India's AI policy: National Strategy for Artificial Intelligence (NITI Aayog, 2018); IndiaAI Mission (2024, ₹10,372 crore) — India positioning as "responsible AI" leader

Connection to this news: The distillation allegations are part of the broader US-China AI competition — US companies argue that China is using distillation to circumvent export controls and close the capability gap without the R&D investment, framing it as an existential competitive and national security threat.

Data Governance and Cybersecurity — Relevance for India

The distillation controversy has implications for India's digital governance: AI companies' data practices (what they collect, how it's used), data localisation requirements under India's Digital Personal Data Protection (DPDP) Act, 2023, and India's emerging AI regulatory framework.

  • DPDP Act, 2023: Governs processing of digital personal data of Indian citizens; applies to AI companies processing Indian user data (including foreign companies)
  • Section 16, DPDP Act: Restrictions on transfer of personal data outside India (cross-border data flows) — government may restrict transfers to specific countries
  • IT Act 2000, Section 43A: Liability for body corporates for negligent handling of sensitive personal data — applicable to AI companies
  • India's proposed AI framework: MeitY working on AI governance guidelines; focus on "responsible AI" principles (safety, accountability, transparency)
  • Cybersecurity dimension: If distillation is used to train AI models deployed for cyber attacks, it constitutes an indirect cybersecurity threat — relevant under India's National Cyber Security Policy

Connection to this news: The allegations illustrate a new category of technology-enabled IP theft that existing cybersecurity frameworks were not designed to address — highlighting the need for AI-specific governance mechanisms, relevant for India's own AI policy deliberations.

Key Facts & Data

  • Chinese firms accused: DeepSeek, Moonshot AI, MiniMax
  • Technique: "Distillation attack" — mass-volume prompt injection to extract training data from Claude/GPT models
  • Estimated cost of distillation-based model training: ~$450 for a reasoning model (Berkeley researchers, 2025)
  • DeepSeek R1 release: January 2025; matched GPT-4 class performance at claimed fraction of cost
  • Anthropic's Claude flagged: Systematic "specially-crafted prompts" to train proprietary models
  • US CHIPS Act: Signed August 2022; $52 billion for domestic semiconductor manufacturing
  • Nvidia export controls: Advanced AI chips (A100/H100 equivalents) restricted for China since October 2022
  • IndiaAI Mission: ₹10,372 crore; approved 2024; focus on AI compute, datasets, and application
  • DPDP Act: Passed August 2023; cross-border data transfer restrictions under Section 16