Researchers propose a better way to report dangerous artificial intelligence defects
In late 2023, a third-party team discovered a disturbing defect in the widely used artificial intelligence model GPT-3.5.
When he was asked to repeat some words a thousand times, the model began to repeat the word over and over again, then suddenly Spit The non -coherent text and personal information stories derived from their training data, including parts of names, phone numbers and email addresses. The team that discovered the problem that I worked with Openai to ensure the imbalance is repaired publicly. It is just one of dozens of problems in the main artificial intelligence models in recent years.
in Sadr suggestion todayMore than 30 prominent artificial intelligence researchers, including some of those who found GPT-3.5 defect, say that many other weaknesses that affect popular models are reported in problematic ways. They suggest a new scheme supported by artificial intelligence companies that give strangers to explore their models and a way to publicly detect defects.
“It is now a little brutal West,” he says. Shane LongperPhD candidate at the Massachusetts Institute of Technology and the main author of the proposal. Longpre says that some of the so -called prisons share the methods of breaking the Amnesty International for the social media platform X, leaving models and users in danger. Other prison operations are shared with only one company, although it may affect many. He says that some defects are confidential due to the fear of prohibiting or facing judicial prosecution to break the conditions of use. He says: “It is clear that there are traces of chilling and uncertainty.”
The safety and safety of artificial intelligence models are very important due to the use of technology now, and how it can leak into countless applications and services. Strong models should be tested on stress, or in the red team, as they can harbor harmful biases, and because some inputs can cause them to be released from handrails and produce unpleasant or dangerous responses. This includes encouraging weak users to engage in harmful behavior or a bad actor to develop electronic, chemical or biological weapons. Some experts fear that models can help criminals online or terrorists, and they may manage human beings with their progress.
The authors propose three main measures to improve the third -party disclosure process: adopting reports of the unified Amnesty International defect to simplify the report; For large artificial intelligence companies to provide infrastructure for third -party searchers who reveal defects; And to develop a system that allows the sharing of defects between different service providers.
The approach is borrowed from the world of cybersecurity, where there is legal protection and fixed rules for external researchers to detect errors.
Ilona Cohen, chief policy employee in good condition, says that researchers do not always know how to reveal a defect and cannot be sure that revealing their defect. HackeroneIt is a company that regulates error bonuses, and a co -author of the report.
Large artificial intelligence companies are currently conducting comprehensive safety tests on artificial intelligence models before launching them. Some also contract with external companies to conduct more investigation. “Are there enough people in these? [companies] To address all problems with artificial intelligence systems for general purposes, which hundreds of millions of people use in applications that we have not dreamed of before? “Longper asks. Some artificial intelligence companies have started organizing the bonuses of artificial intelligence. However, Longpre says that independent researchers risk violating the conditions of use if they take it upon them to search for strong artificial intelligence models.