Researchers have found a way to extract audio from still figure of speech and soundless video after a professor was inspired to do so by the sci - fi TV showFringe .

In the television receiver show , the FBI is able-bodied to extract immortalise sound from a melted Zen of shabu . Den of Geekcalled the idea a " ridiculous pseudo - skill technique " , which seems mediocre enough . However , professor of electrical and computer engineering and computer scientific discipline at Northeastern University Kevin Fu determine the reexamination and set about showing that extracting sound recording from images and silent video recording , at least , is potential .

“ Imagine someone is doing a TikTok video and they mute it and nickname music , ” Fu said in apress release . “ Have you ever been curious about what they ’re really enounce ? Was it ‘ Citrullus vulgaris watermelon vine ’ or ‘ Here ’s my watchword ’ ? Was somebody speaking behind them ? you could really find fault up what is being talk off camera . ”

So , how can this happen ? Cameras , while aimed at capturing visual data , are unwittingly beak up audio information too . Virtually all camera phone have range of a function stabilisation technology built in . Springs hold the camera lens set aside in liquidness , while an electromagnet force the camera lens around to reduce photographic camera handshaking .

While a cool feature film , it is this which enables the seizure of audio . As someone or something makes a noise near the photographic camera lens , the spring hover slightly and bend the light ever so slightly . It ’s not obtrusive " unless you ’re looking for it " accord to Fu . Alone , it would n’t provide you with useful audio . However , another feature of modern earphone cameras avail change state it into something worth listening to .

“ The mode television camera work today to reduce cost basically is they do n’t read all pixels of an image at the same time – they do it one words at a time , ” Fu excuse . “ [ That happens ] hundreds of grand of times in a single photo . What this essentially means is you ’re able to amplify by over a thousand time how much frequency information you may get , essentially the granularity of the audio . ”

Using this selective information , captured as a by-product of how photographs are taken , it ’s possible to extract clean dull audio from pretty much any photo that contains light . Applying a machine - study algorithm named Side Eye by the squad , they can get utilitarian audio .

“ If you want to know if I allege yes or no , you may train [ Side Eye ] on people say yes and no and then look at the patterns and with high confidence when I get an image later know if someone said yes or no . "

Testing their system on 10 different smartphones , Fu ’s team found that it could realise utter digits with 80.66 percent truth , identify which of 20 speakers said the word with 91.28 percent accuracy , and guess the gender of speakers with 99.67 percent accuracy .

This could , of course of study , be a cybersecurity nightmare , if masses with nefarious intentions are able to hear what is being said from still picture and video where no audio was ( designedly ) enchant . The squad attempted to address solution , including stronger springs , lock away lenses , and randomize how the rolling shutter captures pixels .

at long last though , the squad is more interested in how extracted audio recording could be used in sound case .

" perhaps there ’s an alibi and it ’s being admit to tourist court and somebody wants to prove somebody was or was n’t there , " Fu said . " You might be able-bodied to use this proficiency if you have an attested video with a known timestamp to confirm one way or the other . If you listen the person ’s voice , they ’re more than likely there . "

The field of study is post on pre - print serverarXiv , and was presented at the 2023IEEE Symposiumon Security and Privacy .