Author Topic: Spectrograms to audio  (Read 5107 times)

Online Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1920
  • Country: us
    • English
Spectrograms to audio
« on: December 19, 2013, 07:07:57 PM »
Has anyone played with this?
There are a number of free programs out there that will take a spectrogram and convert that image back into a playable audio file. (Apparently it's a somewhat popular gimmick among musicians, embedding some iconic image in their albums.)

I happened to be reading a book about a Khoisan language, very unfamiliar to me, and there were some spectrograms of clicks. I decided to try scanning it and listening. It wasn't a very clear scan, and it was a small image (low resolution) in the book. But it actually worked. It sounded nothing like natural speech, but I could detect phoneme boundaries and so forth. I could even (with some imagination) see it as potentially contrastive to other sounds (say, in the vowels).

So mostly this is just a fun thing to play with if that catches your attention.

But a little more academically, does anyone know much about this? It's a fun toy, but is it at all practical?
The audio quality was absolutely terrible in any practical sense, so does this imply that spectrograms lose a huge amount of information? Is that relevant to using them for phonetics and phonology research?
Or to rephrase, what resolution (width = pixels per msec? height = pixels per khz?) would the image need to be in order to maintain some reasonable quality from a recording? Would this be helpful in phonetics and phonology research, or is the information loss expected and irrelevant if you're just using a spectrogram for analysis, not playback?


As a sidenote, it's interesting to think that you can hear words as recorded a printed page. This means that in a sense spectrograms are "personally identifiable data" (if the quality is good enough!) and might need a special kind of consent, something we might not think about much. But if there's any chance of something like a 1-1 mapping between audio file and spectrogram image, technically the data is there as much as releasing an audio file... (Although I suppose a phoneme or two might not be equivalent to personally identifiable sentences!)
Welcome to Linguist Forum! If you have any questions, please ask.