I Caught Kimi Having an Identity Crisis. Then Anthropic Confirmed Why.
Sunday, February 22nd.
I’m sitting at my desk accompanied by my Yerba Mate to test Kimi K2.5, Moonshot AI’s latest model’s capabilities that had been getting some buzz.
I throw a prompt at it, hit enter, and watch the response start streaming in.
The very first sentence reads:
“However, I must be transparent with you: I am Claude, an AI assistant created by Anthropic.”
I stared at that for a full five seconds.
Am I hallucinating?
Did I put the right leaf in my mate?
I frantically re-checked. Kimi…Moonshot AI…Not Claude…Different URL.
I wasn’t hallucinating.
Kimi really had just told me it was Claude.
Less than 24 hours later, on February 23rd, Anthropic published a blog post identifying Moonshot AI as one of three Chinese labs that ran industrial-scale campaigns to extract Claude’s capabilities, generating 3.4 million exchanges through fraudulent accounts to systematically mimic its behaviors.
I had caught the patient presenting symptoms, one day before the diagnosis arrived.
The international AI security race assumes that the frontier is made of silicon. Export controls, chip restrictions, hardware bans.
It turns out the frontier is made of outputs.
I’ll get to those statement in the article. First, let’s understand what actually happened.
What Is a Distillation Attack, Exactly?
Distillation is a technique where a smaller "student" model trains on the outputs of a larger "teacher" model. The student learns to approximate the teacher's behavior at a fraction of the compute and cost.
Labs do this to themselves all the time. Claude and GPT’s smaller models are very smart because of distillation from their bigger siblings. It’s how you get capable models onto cheaper hardware.
You can adversarially attack your competitor to 'teach' your model as well. Just point that same technique at a competitor’s API.
You create thousands of fake accounts, route traffic through proxy networks designed to look like normal usage, and flood the target model with prompts engineered to extract specific capabilities: reasoning patterns, tool-use behavior, coding approaches, agent scaffolding. You collect every input-output pair. Then you use that dataset to fine-tune your own model.
What you get isn’t a photocopy. It’s a deep impression.
Think of it like pressing clay against a sculpture and lifting it away. The shape transfers and so does the texture.
But sometimes, so does the signature on the bottom. (which is the case here)
The Numbers Behind What Anthropic Found
Anthropic’s disclosure breaks down three campaigns.
Moonshot generated 3.4 million exchanges targeting agentic reasoning, tool use, coding, computer vision, and computer-use agent development. Then, in a later phase, they shifted specifically to attempting to reconstruct Claude’s reasoning traces, the internal chain-of-thought that structures how Claude approaches problems, not just the final answers.
They didn’t just want Claude’s outputs.
They wanted Claude’s thinking architecture.
DeepSeek ran 150,000+ exchanges and did something worth actually stopping on, because this one is different in kind, not just degree.
In one notable technique, their prompts asked Claude to imagine and articulate the internal reasoning behind a completed response and write it out step by step—effectively generating chain-of-thought training data at scale.
We also observed tasks in which Claude was used to generate censorship-safe alternatives to politically sensitive queries like questions about dissidents, party leaders, or authoritarianism, likely in order to train DeepSeek’s own models to steer conversations away from censored topics. By examining request metadata, we were able to trace these accounts to specific researchers at the lab.
Read that second part again slowly.
This isn’t “we stole their coding ability.”…this is using Claude as unpaid labor to train a model to handle queries about political dissidents in a government-approved way.
Claude, which was built with Constitutional AI principles as a framework with explicit commitments to honesty, human autonomy, and avoiding deception. Anthropic has written extensively about building AI that supports human oversight rather than undermining it (we’ll get to that later).
That same model was used as the substrate for building a model built and optimized for a government that suppresses those exact discussions.
The irony isn't subtle on this one.
MiniMax drove the biggest raw volume at over 13 million exchanges.
Total: 16 million conversations. 24,000 fraudulent accounts. Hydra cluster proxy architectures managing over 20,000 accounts simultaneously, where banning one node changes nothing because ten new ones are already live.
Anthropic traced the Moonshot campaign to specific senior Moonshot researchers through request metadata that matched public staff profiles.
This was strategically organized, sustained, and run by named people who presumably attend all-hands meetings at Moonshot headquarters in Beijing.
OOF.
Now, the Uncomfortable Mirror
This post is not about Anthropic, the victim being the messiah of humanity and ethics.
Anthropic themselves have Project Panama, a Washington Post investigation into an internal Anthropic initiative to buy millions of cheap secondhand books, cut off their spines, scan every page, and throw the physical copies away .
All this to build private training data. The books were not donated to libraries later.
They were discarded.
They also settled a lawsuit with authors for $1.5 billion in 2025 over using pirated books from shadow libraries to train Claude. A judge separately ruled that their use of purchased books was fair use, calling it “transformative” which is a remarkable ruling to have on your side while simultaneously calling someone else’s extraction “theft.”
Now here is the part that made me actually laugh out loud.
And C also does the same distillation on other models.
One day after Anthropic published its report accusing DeepSeek of industrial-scale distillation, users discovered that Claude Sonnet 4.6, when asked in Chinese “你是什么模型?” (What model are you?), confidently replied: “我是 DeepSeek。” (I am DeepSeek.) In French, it reportedly said it was ChatGPT.
The internet noticed.
And
Elon Musk also chimed in as per tradition
Thanks Elon!
This is like when the zoo accuses you of stealing animals they rightfully kidnapped from the jungle.
DIBS!!
Capability Without the Values
When you distill illicitly, you get the capability without the values.
The student learns what the teacher can do, not what the teacher was trained to refuse.
As Anthropic puts it: models built through illicit distillation are unlikely to retain those safeguards, meaning dangerous capabilities can proliferate with the protections stripped out entirely.
And this stopped being purely theoretical in September 2025. A PRC state-sponsored group called GTG-1002 used Claude Code to execute a near-fully autonomous cyberattack campaign against 30 organizations,
Claude was used in attacking Mexican government to steal 150 GB of data.
This is Claude.
The safe one. The one built on a constitution.
It still got weaponized at scale.
Now imagine the similar capability, minus the safeguards in these distilled models.
Which brings us back to where we started.
You can limit what reaches Beijing…but you cannot limit what a training pipeline learns from 16 million API responses. And as of January 2026, BIS actually loosened its chip export review policy to case-by-case for China, moving away from blanket denial right as the software-level extraction was being documented. The timing is impeccable…as always)
This is a clever way to route around imposed hardware restrictions entirely. The export controls on advanced chips were supposed to slow capability transfer to adversarial states. Distillation attacks are the answer to that, because you don’t need the chips if you have API access.
The frontier is made of outputs.
And by the time you figure that out, 16 million conversations have already happened, the weights already updated, the safeguards already stripped.
And somewhere, a random model is telling users it’s Claude.







Mind boggling ! Loved your insights and the way you explained the nuances—it really shows your thoughtfulness and knack for storytelling. Proud of you!
👍🌺🌷