Microsoft open-sources tools to create safer language models
Microsoft has released the source code for new tools and datasets to audit AI-powered content moderation systems.
Large language models (LLMs) are a popular way to train AI systems, but are not without risks. As they're trained on enormous volumes of data from the internet, they have the propensity to 'learn' inappropriate and toxic language based on what they see during training.
Content moderation tools can be used to filter such language. However, the datasets used to train these tools often fail to capture the complexity of harmful language, particularly hate speech.
For example, Meta's new language model Open Pretrained Transformer (OPT-175B) has an even higher risk of producing toxic results than its predecessors.
In time for its Build 2022 conference, Microsoft has open-sourced new tools that could pave the way for more trustworthy LLMs, able to analyse and create language with the same degree of intelligence as humans. They are known as ToxiGen/De(ToxiGen) and AdaTest.
ToxiGen is a dataset for training content filtering tools that can detect toxic language. It is among the biggest datasets of publicly available hate speech, with 274,000 samples of statements that can be classified as either 'neutral' or 'toxic.'
The statements, from a variety of existing datasets and public sources, target 13 minority groups, including Black people, Asians, Muslims, Latinos, LGBTQ+ individuals, Native Americans and people with both physical and cognitive disabilities.
The ToxiGen repository on GitHub includes a tool called ALICE, which developers can use to stress test any off-the-shelf content moderation system and iteratively enhance it across these minority groups.
The dataset is intended to train classifiers that learn to identify subtle forms of hate speech, which don't include profanity or slurs.
'With release of the source codes and prompt seeds for this work we hope to encourage and engage community to contribute to it by for example adding prompt seeds and generating data for minority groups that are not covered in our dataset or even scenarios we have not covered to continuously iterate and improve it,' said Microsoft.
AdaTest stands for 'human-AI team approach Adaptive Testing and Debugging'. It refers to a combative method that builds suites of unit tests by using language models against one other.
AdaTest gives an LLM the job of creating a large number of tests aimed at detecting bugs in the model. At the same time, a human leads the language model by picking appropriate tests and organising them into semantically relevant topics.
Thanks to the human guidance, Microsoft claims that the language model's generation performance is 'dramatically' improved and directed towards areas of interest.
An inner testing loop and an outer debugging loop make up the AdaTest process. The inner testing loop is responsible for finding bugs, while the outer debugging loop is used to fix flaws.
'AdaTest offers significant productivity gains for expert users while remaining simple enough to empower diverse groups of non-experts without a background in programming,' says Microsoft.
'This means experts and non-experts alike can better understand and control the behaviour of their AI systems across a range of scenarios, which makes for not only better-performing AI systems but more responsible AI systems.'