Deepfakes and the detection challenge

The issue of deepfakes, as depicted in the BBC TV drama The Capture, is only going to get worse, warn Kemp Little's Joanna Conway and Oriana Williams

Deepfakes are a hot topic - there's no denying it.

Back in October 2019, iProov (the biometric facial authentication firm) reported that more than 70 per cent of the UK's population did not know what a deepfake was. However, deepfakes are rapidly becoming part of the zeitgeist.

There is good reason for this. Deepfakes hit all of the buzz words. They epitomise disruptive technological development, representing an innovative and readily understandable way that AI can be used to create something we are all familiar with.

Deepfakes also capture the public imagination. The BBC TV series The Capture, for example, released at around the time of iProov's survey in late 2019, was premised on the potential to frame someone using undetectable deepfake CCTV footage.

Deepfakes have also been catapulted into the limelight by the global political landscape. In the post-Cambridge Analytica fake news era, with the US presidential campaign in full swing, the threat posed by deepfake electioneering has caught the attention of governments and regulators alike, resulting in a slew of new laws and initiatives to combat the threat. Add to this heady mix the vexed question of regulating big tech and social media and making it responsible for the content it hosts and it's no surprise that deepfakes are getting attention.

In terms of the threats deepfakes pose, the ability to detect synthesised media content is critical. The digital identity and document verification market could be worth $15 billion by 2024, according to Goode Intelligence research, so there is money to be made here.

Deepfakes are beginning to represent security risks to private companies and individuals. In August 2019 insurer Euler Hermes reported that AI software had been used to imitate a CEO's voice and successfully request a fraudulent fund transfer of €220,000 from the CEO of a group company. Companies such as iProov offer technological solutions to biometric authentication. Deeptrace is another, offering a comprehensive deepfake detection tool for monitoring, detection and mitigation.

Big tech also needs to be able to detect the threat. This will become particularly important as the regulatory landscape tightens up and pressure to "tag" synthesised content on digital platforms increases. Indeed, big tech, academia and industry bodies are acting and collaborating to tackle the issue.

Facebook, the Partnership on AI, Microsoft, and academics from Cornell Tech, MIT, University of Oxford, UC Berkeley, University of Maryland, College Park, and University at Albany-SUNY have all come together to build the Deepfake Detection Challenge (DFDC), launched in 2019.

The DFDC has the stated aim of catalysing more research and development in this area to spurr researchers around the world to build innovative new technologies that can help detect deepfakes and manipulated media. The DFDC includes a data set and leaderboard, as well as grants and awards, to spur the industry to create new ways of detecting and preventing media manipulated via AI from being used to mislead others.

The DFDC represents an interesting example of "open innovation", essentially crowd-sourcing innovation with the intention of creating open source solutions. Facebook states that it has invested $10 million in the project. Challenge participants submit their code into a black box environment for testing. Open proposals are eligible for challenge prizes and must abide by the open source licensing terms. Closed proposals are possible and will be proprietary but those participants will not be eligible to accept the prizes. $1 million in prizes is available and the competition ends 31 March 2020.

Part of the challenge in reliable deepfake detection is the availability of datasets. Big tech has the data capability and the financial weight and it appears willing to use that for detection challenges. Google released a dataset of synthetic speech in support of another international challenge, this one on automatic speaker verification (fake audio detectors). After the challenge the dataset was made freely available to the public. Google has since released a large dataset of visual deepfakes which have been incorporated into the new FaceForensics benchmark (which Google co-sponsors). This dataset too is now available, free to the research community, for use in developing synthetic video detection methods on the FaceForensics github page.

To be useable the datasets need to be obtained with proper consent. Google reports that it paid consenting actors to record hundreds of videos and then, using publicly available deepfake generation methods, created thousands of deepfakes from these videos. Similarly Facebook was at pains to point out that it commissioned the DFDC dataset in the same manner and that no Facebook user data was used in this dataset.

With the financial and other pressures and incentives driving deepfake detection technology development, the need for datasets and the contrast between the big tech's open source versus the specialised tech companies closed proprietary technology models, it remains to be seen where all of this will land. One thing is for sure, any bad actors intent on using deepfake technology will be trying to stay one step ahead in this particular race.

Joanna Conway is a partner, and Oriana Williams an associate, at technology and digital specialist law firm, Kemp Little LLP