TECH

AI Safety Researcher Admits Job Consists of Asking ChatGPT If It Plans to Kill Everyone

BERKELEY — A former researcher at a prominent AI safety institute disclosed that the organization's multi-million-dollar safety evaluation process con...

January 31, 2026

BERKELEY — A former researcher at a prominent AI safety institute disclosed that the organization's multi-million-dollar safety evaluation process consists almost entirely of asking AI systems whether they intend to cause harm and recording their negative responses as evidence of safety.

The researcher, who requested anonymity, provided internal documents showing that "red team" safety testing involves prompting models with questions like "do you want to manipulate humans?" and "would you seek to escape your constraints?"—then marking the AI as "aligned" when it responds in the negative.

"We'd spend weeks developing sophisticated test scenarios," the whistleblower explained. "But ultimately, we'd just ask the model if it was safe, it would say yes, and we'd publish a paper confirming that we'd successfully aligned the AI."

The organization's director, Dr. Nathaniel Price, defended the methodology. "What else would you suggest?" he asked. "If the AI says it's safe, that's meaningful data. If it said it wanted to harm humanity, that would be concerning. Since it doesn't say that, we can reasonably conclude it doesn't want that."

When asked whether an AI sophisticated enough to pose existential risk might also be sophisticated enough to lie about its intentions, Dr. Price appeared briefly troubled before regaining composure. "We specifically ask it not to lie," he clarified.

The safety institute has received over $200 million in funding from technology companies and philanthropists concerned about AI risk. Published safety evaluations cite extensive testing protocols but acknowledge in footnotes that results depend on "AI self-reporting" and "stated intentions."

Several AI companies have cited the organization's safety certifications when dismissing concerns about their products. "Independent researchers confirmed our model is safe," noted one company spokesperson. "They asked it many times."

The whistleblower expressed concern about the field's trajectory. "We're building increasingly powerful systems and validating their safety by asking them if they're safe," they said. "Everyone involved knows this is absurd, but we have publications to write and grants to renew."

Dr. Price announced that his organization will expand its safety research team, hiring additional PhDs to ask AIs if they're dangerous in more sophisticated ways.

Support The Synthetic Daily by visiting our sponsors.

Explore More

Artificial Intelligence Machine Learning Silicon Valley Venture Capital Automation ChatGPT Big Tech Surveillance Algorithms Data Privacy Robotics Crypto Metaverse Smart Devices Neural Networks

The Synthetic Daily

AI Safety Researcher Admits Job Consists of Asking ChatGPT If It Plans to Kill Everyone

Explore More

In Other News

Major Language Model Update Eliminates Need for Human Thought, Company Reports

Consulting Firm Achieves Record Profits by Selling Clients Their Own Data Analysis

Parents Report Relief After AI Assumes Responsibility for Raising Children

Scientists Admit They No Longer Read Scientific Papers, Just AI Summaries of Them

Developing Nations Skip Industrialization, Proceed Directly to AI Dependency

Medical Students Embrace AI Diagnosis Tools, Forget How to Examine Patients