Ctdit23 1125 125 website image.jpg

Meta releases massive AI dataset for training, avoiding bias

Training data is important for AI datasets, but 'bad' data can make the tools unusable

Image:
Training data is important for AI datasets, but 'bad' data can make the tools unusable

Meta is to debut a gigantic language model for AI research, in the hope of fighting toxicity and bias in these systems.

The Open Pretrained Transformer (OPT-175B) has 175 billion parameters, on par with models like commercial models like GPT-3.

In the past, developers have used these types of systems to build functionality like content moderation and automated copywriting. However, because they are trained on massive volumes of existing text, they can generate outputs that are biased, inaccurate or just plain racist.

Training AI on a 'bad' dataset can lead to a system full of flaws and inaccuracies, like Amazon's (never-released) recruitment tool that scored women lower than men, or facial recognition programmes that misidentify based on race.

Meta believes that restrictions on access to large language models perpetuate known issues like bias and toxicity. OPT-175B is the first such model to be made available to the wider AI research community under a non-commercial license.

Academic researchers, people affiliated with government, civil society and academic organisations, and industry research laboratories will be able to use the dataset for free, as well as pretrained models and the code to train and use them. Meta is also releasing subsets of the data - up to 66 billion parameteters - for anyone to use.

In a paper accompanying the announcement, Meta's researchers note that they trained the model using 992 Nvidia 80GB A100 GPUs, reaching a performance of 147 TFLOPS per chip. Using the latest Nvidia hardware enabled them to cut the carbon output to 1/7th of the footprint of GPT-3.

Click here to access the code for Meta's smaller pre-trained models, or fill in this form to request access to the full version.

You may also like

Cloud big three sign open letter urging datacentre kit suppliers to step up
/news/4337271/cloud-big-sign-open-letter-urging-datacentre-kit-suppliers-step

Green

Cloud big three sign open letter urging datacentre kit suppliers to step up

Embodied carbon emissions are the focus

Long reads: Why do so many women experience imposter syndrome?
/feature/4331535/long-reads-women-experience-imposter-syndrome

Leadership

Long reads: Why do so many women experience imposter syndrome?

And is it always a bad thing?

EU accuses Meta of violating competition rules with 'pay-or-consent'
/news/4331147/eu-accuses-meta-violating-competition-rules-pay-consent

Law

Meta claims greenlight given by earlier verdict, EU disagrees