Character Clusters (Blends) in English - Complete List

I siphoned around the internet for a little bit and was able to put together a preliminary list of character clusters inclusive of both consonants and vowels specific to the English language or blends that are composed of character clusters that might have even been accounted for outside of a linguistics field definition.  In in terms of published works I only noticed one general work that discussed the phenomena that was particular not just to the English language, but to all languages.  Can anyone point me to a list of a very refined lexicological resource on this topic?

Are you referring to https://en.wikipedia.org/wiki/Orthographic_ligature ?

Like this: https://en.wikipedia.org/wiki/Consonant_cluster (and I'm looking for a definitive or complete list of these for the English language).  The list I've procured however is also inclusive of vvcv alphabet arrangements and arrangements similar to these.  When the average longest word consists of 5 letters (both consonants and vowels) a very refined list of these things especial to the English language, not having to print out all the possible combinations of this alphabet topography would serve wonders (I think results resolve to ~50,000 different combinations and arrangements).   

You're mixing some terminology: characters are written symbols (e.g. letters), but now it's clear you're asking about pronunciation. But you still seem to be relying on spelling as the way you are framing your question, rather than a more accurate approach like a phonetic transcription. You asked originally about "blends", which would suggests complex units behaving as a whole, but now it seems like you're asking about all possible combinations of letters.

What is your goal? How will you use this information? It sounds like it might be something that would be used in Natural Language Processing, for example. Others certainly have made lists like this before, although I don't know if they are freely available or if the format is what you're looking for. Depending on how extensive a "complete" list must be, this also could be something you could generate relatively easily from a large word list (if you only care about spelling, not pronunciation), and those are certainly available online.

(Also because this is a question about pronunciation, I'll move it to the Phonetics and Phonology board. It might actually belong in Computational Linguistics, but that's fine for now anyway.)

Given what you might be interested in, I sort of recommend the CMU pronouncing dictionary, which is a fairly good list of words (spelled) and "transcribed". Here is a sample entry:


The numbers mark stress level, each letter group after the spelled word is a phoneme, and if you speak American English or know someone who does you fan figure out how to map to IPA (e.g. AE is [æ]). It includes a lot of stuff that isn't "real English", being proper nouns that might show up in some text, and I don't always agree with their transcriptions. This is not "complete" since it omits words like splang, frell, but those are kind of marginal words.

Usually, we analyze consonant clusters into "possible syllable beginnings" and "possible syllable ends", so that possible clusters is the product of the two (or at least theoretically). For example, you get syllables beginning with [mb] and you don't get syllables ending with [mb], but you can get [mb] by combining syllable-final [m] and syllable-initial [ b ]. The CMU dictionary does not note where syllable breaks are (for good reason: nobody knows for sure).


