There are speech-to-text algorithms that don't use a phonological model, but many, many algorithms and processes do. When you employ any sort of language model, you restrict the domain of applicability significantly, and there is a hell of a lot of money to be made in a system that can be applied across multiple languages. Many algorithms rely on models and statistics from specific languages, but the idea is that you can plug in a hefty-enough corpus and draw the same statistics and have fairly comparable rates. Once you start putting in phonological information, you really do become language-specific and that is less desirable when you want your algorithm(s) to be utilisable across a broad spectrum, but it's still a very popular approach. I would have to take issue with the fact that phonological modelling in speech-to-text is as rare as your first post made it out to be.