Verifying videos in the era of the ‘deepfake’

Building a newsroom culture of instinctive questioning

Prior to 2017, the term ‘deep fake’ wasn’t among the most common references on the Internet. Only 100 searches of the expression were done globally on Google during that year. But, as the phenomenon exploded and was then picked up by mainstream media, people started concerning about it. The expression is a fusion of the words ‘deep learning’ and ‘fake’. “This AI [Artificial Intelligence] is what drives the computer program to create new, realistic imagery”, refers Hazel Baker, head of the UGC newsgathering at Reuters. Hazel was responsible for the talk Preparing the next wave: video fake news, as part of the International Journalism Festival in Perugia, Italy.  

This type of footage attracted the attention of the news industry in 2017 when the University of Washington created a video of Barack Obama talking about terrorism, fatherhood, and other topics. The audio clips are real, but the video places Obama in a different setting. The voice was taken from different clips and was edited to create a new synthetic speech. “It looked like an Obama interview but, in fact, his words were completely taken out of context”, said Baker.

Last year, a group of researchers from the Max Planck Institute and the Universities of Munich, Bath, and Stanford developed deep video portraits, a system where an actor has the facial expressions transferred to the video of any personality. This system can control facial expressions, head movements, eye gaze, and blinks just like a puppet. It can recreate highly convincing videos with people saying whatever the creator wants in any environment. According to Hazel, what is worrying about these technologies is the fact that they can intentionally or accidentally manipulate the discourse.

It is estimated that around 10,000 fake videos circulate on the Internet, mostly on entertainment sites; therefore, Baker asked herself if they represent a serious threat for the news industry, and concludes that newsrooms must learn how to spot these videos and be prepared to be in a position to defend authentic material on the Internet, especially because of the knowledge people have around them. “People are beginning to question whether real  authentic material could actually be a deep fake.”, she mentioned and then added that the best way to learn to how spot deep fakes is to analyze real cases.

An experiment in the newsroom

Hazel wanted to prepare her news team to verify videos, so she came with the idea to create a fake video and test how the journalists in the newsroom would react. Her experiment had two objectives, namely learning what it takes to make a deep fake and whether they could spot one. Knowing how TV interviews work, Baker considered the clearest threat emerges when there is only one camera in the room the interview takes place and no witnesses are around to testify about the authenticity of the recording.

Baker filmed one of her colleagues, Tess, in a remote studio with a robotically-controlled camera. The woman represented a fictional chief executive of an imaginary electronic firm giving an interview in French about the company’s expansion. First, Tess read the interview script in English in order to make different shapes with her mouth, lips,  and tongue. Afterward, a French colleague came into the same TV studio and repeated the interview on her native language, and then, with the help an external company specialized in artificial intelligence, they created what is called a ‘reanimated video’.

Baker explained that her group never look at a video in isolation. To verify it, they check how it arrived to them, the source, and they review what they know about the subject but in this case, she only shared the video with the rest of the newsroom and asked them their thoughts, and if there was something unusual on it. “I wasn’t trying to trick or embarrassed my colleagues in any way, I just wanted to know: could our verification workflow stand up to a piece of video like this?”, clarified Hazel.

Thanks to the reaction of her colleagues, Hazel identified some red flags that gave her insights about how to spot manipulated videos. The first one is the audio to video synchronization; her clip wasn’t synchronized. The second were problems around the mouth shape and sibilant sounds, a native speaker can identify how the mouth should look for certain sounds. Hazel’s suggestion is to ask for the help of native speakers whenever possible. Third, the team found a lack of naturality on the interviewee, she looked robotic.

The human instinct also came up when reviewing the video. For the people who knew about the experiment, they clearly noticed where the flaws were. But those who ignored what was happening considered there was something wrong with the clip.“Our instinct is a really powerful thing and we should trust it, but it shows we can respond more effectively if we confront the issue in advance.”, opined Hazel, and she underlined the importance of training in verification.

 A strategy to tackle fake videos

After the experiment, Hazel developed a sliding scale of deceit to identify the level of manipulation in videos. The most common type of video fake news is taking out of context the original clip and mislabeling it. Then it comes edited videos followed by staged videos, which not necessarily were created to propagate misinformation, but to raise awareness of another issue. At the very end of the scale comes the deep fakes, which require a high degree of manipulation and preparation.

Hazel also shared some of the tools Reuters uses to verify video material. These include reverse image search useful to debunking content. Then it follows geolocation with satellite imagery and Google Earth to establish where a clip was filmed. Metadata examination, corroborating imagery, directly questioning the source and subject expert consultation complete the list.

The final reflection Hazel did, was about the possibility of technology being able to solve the problem. She recognized that companies and newsrooms are doing efforts to tackle deep fakes with initiatives such as Face Forensics, a database project to learn the differences between original and doctored images. Yet, technology alone can’t be the solution. Hazel emphasized that as technology advances, it is just a matter of time that people have software available on their phone to create their own fake videos having potential consequences. “A combination of human judgments, subject specialism and frameworks to train and a verify and prepare are actually more important than at the moment, the current technology solutions”, Baker concluded.