Linguist Forum

Specializations => Computational Linguistics => Topic started by: zaba on March 12, 2014, 09:13:44 AM

Title: Why do speech-to-text projects not rely on phonology?
Post by: zaba on March 12, 2014, 09:13:44 AM
Most speech-to-text projects are based entirely on statistics, right? Isn't phonology more convenient?
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: MalFet on March 12, 2014, 09:27:49 AM
Phonology is the quintessential example of a gestalt effect. Designing algorithms to effectively recognize holistic forms turns out to be incredibly hard.

This fact is, ultimately, one of the better criticisms of computational theories of mind. Human minds and digital computers just seem to work...differently.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: zaba on March 12, 2014, 10:27:27 AM
Wow, thanks.
Quote
Designing algorithms to effectively recognize holistic forms turns out to be incredibly hard.
Sure, I guess that sounds right. After all, there's a lot of phonology goin' on!

Quote
Phonology is the quintessential example of a gestalt effect.

Can you kindly elaborate on this with a sentence or two? I'm an ignoramus.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: MalFet on March 12, 2014, 10:37:42 AM
Have you looked at the wikipedia page on gestalt? I'm happy to expand on that however I'm able, but starting from scratch I'll only do a worse job explaining things than any introductory blurb out there.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: zaba on March 12, 2014, 10:40:39 AM
Do you think anything is lost in the process for lack of phonologists? How would things be different if there were phonologists?

In what way can I see the repercussion of the lack of phonologists on text-to-speech e.g. on siri?

Sorry to bombard you with these likely idiotic questions.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: MalFet on March 12, 2014, 10:49:07 AM
If a computer had access to a proper phonology, it would understand natural speech as well humans do.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: zaba on March 12, 2014, 12:39:07 PM
so with a proper model of phonetics <> phonology  interface, speech to text could be improved, if only theoretically so. IS that true?
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: lx on March 12, 2014, 01:12:06 PM
There are speech-to-text algorithms that don't use a phonological model, but many, many algorithms and processes do. When you employ any sort of language model, you restrict the domain of applicability significantly, and there is a hell of a lot of money to be made in a system that can be applied across multiple languages. Many algorithms rely on models and statistics from specific languages, but the idea is that you can plug in a hefty-enough corpus and draw the same statistics and have fairly comparable rates. Once you start putting in phonological information, you really do become language-specific and that is less desirable when you want your algorithm(s) to be utilisable across a broad spectrum, but it's still a very popular approach. I would have to take issue with the fact that phonological modelling in speech-to-text is as rare as your first post made it out to be.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: jkpate on March 12, 2014, 06:25:22 PM
Basing a system on statistics is not exclusive to taking advantage of phonological facts. If "using phonology" means "using optimality theory constraints" or "using generative rules" then you are right that ASR systems, as far as I know, don't "use phonology." But if you mean the identification of relevant phonological features from data or learning facts about assimilation, then ASR systems do use phonology.
Title: Re: Why do speech-to-text projects not rely on phonology?
Post by: MalFet on March 12, 2014, 10:29:28 PM
Right, that's an important distinction. If the term phonology is used broadly to just mean something like "category", then certainly speech recognition uses something along these lines. I'd hesitate to consider phonology to include nothing more specific than category, though.

I'd be curious, though...are there any systems that actually model full phonological systems, including things like allophony and underspecification? That seems very cumbersome for little payoff, but I'd be delighted to learn that some algorithm out there was actually trying to mimic human (rather than computer) perceptual qualities.