Anthropic, the San Francisco-based AI safety company, has formally accused Chinese tech giant Alibaba of systematically extracting proprietary model weights from its Claude AI system. The allegation, filed in a California federal court, claims that Alibaba’s Qwen team used thousands of targeted queries to reverse-engineer Anthropic’s safety-aligned architecture. This news breaks as Whitehall officials finalise the agenda for the upcoming British AI Safety Summit, with sources confirming that sessions on model extraction and digital sovereignty have been added to address growing cross-border espionage concerns.
The lawsuit details how Alibaba allegedly exploited a technical loophole in Claude’s public API to reconstruct its constitutional AI layer. Anthropic’s evidence includes server logs showing an anomalous spike in queries from Chinese IP addresses, each designed to test Claude’s responses to edge-case prompts that would reveal its underlying reward model. The company argues this amounts to theft of trade secrets, potentially enabling Alibaba to shortcut years of safety research. Alibaba has denied the claims, stating that all its AI models are developed independently and comply with Chinese regulations. Yet the timing is awkward: Beijing recently mandated that all domestic AI systems must include ‘socialist core values’ filters, a requirement that could be technically informed by Anthropic’s safety techniques.
The British government’s response has been swift. The AI Safety Summit, originally scheduled for November, will now feature a closed-door session on ‘Safe AI Supply Chains and Model Protection’. A Downing Street spokesperson said the summit must address the ‘real and present risk of hostile actors weaponising open research’. This is a pivot from the initial focus on existential risk from AGI, now grounded in the immediate threat of industrial espionage. Tech lobbyists are concerned that the agenda shift could lead to stricter export controls on AI model parameters, akin to the semiconductor restrictions already in place.
For the average user, this is more than a corporate spat. It signals the beginning of a fragmented global AI landscape. If nations cannot trust the integrity of each other’s models, we face a digital Iron Curtain where Chinese AI systems are banned in Western app stores, and vice versa. Startups building on open-source models will face compliance nightmares, having to prove their training data was not extracted from proprietary systems. The user experience of AI will degrade: expect more gatekeeping, more jurisdiction-specific features, and less seamless interoperability.
Anthropic’s move is also a test for the concept of AI sovereignty. If model weights become national assets, then every country will want its own indigenous AI stack. Britain has invested heavily in its Foundation Model Taskforce, but a go-it-alone strategy is expensive and slow. The summit must now answer: can we build global safety standards without global trust? Or are we witnessing the end of the AI commons? The answer will shape the next decade of technology, for better or worse.







