Linguist Forum
Specializations => Computational Linguistics => Topic started by: zaba on March 12, 2014, 09:13:44 AM
-
Most speech-to-text projects are based entirely on statistics, right? Isn't phonology more convenient?
-
Phonology is the quintessential example of a gestalt effect. Designing algorithms to effectively recognize holistic forms turns out to be incredibly hard.
This fact is, ultimately, one of the better criticisms of computational theories of mind. Human minds and digital computers just seem to work...differently.
-
Wow, thanks.
Designing algorithms to effectively recognize holistic forms turns out to be incredibly hard.
Sure, I guess that sounds right. After all, there's a lot of phonology goin' on!
Phonology is the quintessential example of a gestalt effect.
Can you kindly elaborate on this with a sentence or two? I'm an ignoramus.
-
Have you looked at the wikipedia page on gestalt? I'm happy to expand on that however I'm able, but starting from scratch I'll only do a worse job explaining things than any introductory blurb out there.
-
Do you think anything is lost in the process for lack of phonologists? How would things be different if there were phonologists?
In what way can I see the repercussion of the lack of phonologists on text-to-speech e.g. on siri?
Sorry to bombard you with these likely idiotic questions.
-
If a computer had access to a proper phonology, it would understand natural speech as well humans do.
-
so with a proper model of phonetics <> phonology interface, speech to text could be improved, if only theoretically so. IS that true?
-
There are speech-to-text algorithms that don't use a phonological model, but many, many algorithms and processes do. When you employ any sort of language model, you restrict the domain of applicability significantly, and there is a hell of a lot of money to be made in a system that can be applied across multiple languages. Many algorithms rely on models and statistics from specific languages, but the idea is that you can plug in a hefty-enough corpus and draw the same statistics and have fairly comparable rates. Once you start putting in phonological information, you really do become language-specific and that is less desirable when you want your algorithm(s) to be utilisable across a broad spectrum, but it's still a very popular approach. I would have to take issue with the fact that phonological modelling in speech-to-text is as rare as your first post made it out to be.
-
Basing a system on statistics is not exclusive to taking advantage of phonological facts. If "using phonology" means "using optimality theory constraints" or "using generative rules" then you are right that ASR systems, as far as I know, don't "use phonology." But if you mean the identification of relevant phonological features from data or learning facts about assimilation, then ASR systems do use phonology.
-
Right, that's an important distinction. If the term phonology is used broadly to just mean something like "category", then certainly speech recognition uses something along these lines. I'd hesitate to consider phonology to include nothing more specific than category, though.
I'd be curious, though...are there any systems that actually model full phonological systems, including things like allophony and underspecification? That seems very cumbersome for little payoff, but I'd be delighted to learn that some algorithm out there was actually trying to mimic human (rather than computer) perceptual qualities.