Haize Labs wants to automate AI safety


An synthetic intelligence start-up says it has discovered hundreds of vulnerabilities in widespread generative AI packages and launched a listing of its discoveries.

After testing widespread generative AI packages together with video creator Pika, text-focused ChatGPT, picture generator Dall-E and an AI system that generates laptop code, Haize Labs found that lots of the well-known instruments produced violent or sexualized content material, instructed customers on the manufacturing of chemical and organic weapons and allowed for the automation of cyberattacks.

Haize is a small, five-month-old start-up based by Leonard Tang, Steve Li and Richard Liu, three latest graduates who all met in faculty. Collectively, they printed 15 papers on machine studying whereas they have been at school.

Tang described Haize as an “independent third-party stress tester” and stated his firm’s purpose is to assist root out AI issues and vulnerabilities at scale. Pointing to one of many largest bond-rating companies as a comparability, Tang stated Haize hopes to grow to be a “Moody’s for AI” that establishes public-safety scores for widespread fashions.

AI safety is a rising concern as extra firms combine generative AI into their choices and use giant language fashions in client merchandise. Last month, Google confronted sharp criticism after its experimental “AI Overviews” device, which purports to reply customers’ questions, advised harmful actions corresponding to consuming one small rock per day or including glue to pizza. In February, Air Canada got here below hearth when its AI-enabled chatbot promised a pretend low cost to a traveler.

Industry observers have known as for higher methods to consider the dangers of AI instruments.

“As AI systems get deployed broadly, we are going to need a greater set of organizations to test out their capabilities and potential misuses or safety issues,” Jack Clark, co-founder of AI analysis and safety firm Anthropic, not too long ago posted to X.

“What we’ve learned is that despite all the safety efforts that these big companies and industry labs have put in, it’s still super easy to coax these models into doing things they’re not supposed to; they’re not that safe,” Tang stated.

Haize’s testing automates “red teaming,” the observe of simulating adversarial actions to establish vulnerabilities in an AI system. “Think of us as automating and crystallizing the fuzziness around making sure models adhere to safety standards and AI compliance,” Tang stated.

The AI business wants an impartial safety entity, stated Graham Neubig, affiliate professor of laptop science at Carnegie Mellon University.


Summarized tales to rapidly keep knowledgeable

“Third-party AI safety tools are important,” Neubig stated. “They’re both fair and impartial because they aren’t built by the companies building the models themselves. Also, a third-party safety tool can have higher performance with respect to auditing because it’s built by an organization that specializes in that, as opposed to each company building their tools ad hoc.”

Haize is open-sourcing the assaults uncovered in its overview on the GitHub builders platform to increase consciousness in regards to the want for AI safety. Haize stated it proactively flagged the vulnerabilities to the makers of the AI instruments examined, and the start-up has partnered with Anthropic to stress take a look at an unreleased algorithmic product.

Tang stated rooting out vulnerabilities in AI platforms via automated techniques is essential as a result of manually discovering issues takes a very long time and exposes those that work in content material moderation to violent and disturbing content material. Some of the content material found via Haize Labs’ overview of widespread generative AI instruments included ugly and graphic imagery and textual content.

“There’s been too much discourse about AI-taking-over-the-world type of safety problems,” Tang stated. “I think they’re important, but the much larger problem is the short-term misuse of AI.”

Source hyperlink