The reason I'm using term: "character cluster" is because consonant cluster is limited to consonants only (iterated in this thread somewhere).
But there is still a fundamental difference between
sounds and
spelling. If you're just looking for a general term then you could say for example "phoneme cluster", but that sounds like odd phrasing because
consonant clusters specifically refers to two (or more) consonants that
cluster as a unit, almost as if they are a single sound.* The combinations ('clusters') of random sounds wouldn't form "clusters" in the same sense because that's just what sounds happen to be adjacent. Linguists don't generally speak of "consonant-vowel clusters" for that reason, for example. Vowels, on the other hand, can form
diphthongs (complex vowels of two or more parts), so the term "vowel cluster" is clear but not typically used.
(*Note that in this sense we might say that /ʃt/ ["sh+t"] is a consonant cluster in English as in the word "wished" /wɪʃt/, but that for example /ʃf/ is not a consonant cluster because it only occurs at syllable boundaries as in "wishful", but not "*wishf" or "*shful". So "cluster" refers to something acting as a unit in a particular sense within syllable structures, not just adjacency.)
However, more generally there is a whole subfield of phonology called
phonotactics looking at how sounds combine and what combinations are valid. You seem to be asking basically "What is the phonotactic system of English?" And there's been a lot written on that, but I don't know of an easily accessible
list. Instead, most linguists would study this via rules/patterns as generalizations rather than enumerating examples in a list.
A computational linguistics approach to this did not seem apparent because I think my question presumes a purely linguistic approach that, yeah, could technically lead into programming.
No, I just meant that I was trying to imagine a relevant application for this, and enumerating out lists like that is something that computational approaches might find valuable (e.g. training an algorithm). Most theoretical approaches aren't list-based (see above).
There's also a note that it might still be a very complicated thing to gather an extensive list of the clustered phonetic forms or arrangements I would like to touch on here.
As I said, linguists would generally approach this via generalizations rather than lists (although some informal lists would be intermediate steps in working out the theory). For example, there are no restrictions as far as I know on which consonants can combine with which vowels. So CV and VC combinations can be any C or V elements. The are more restrictions on CC and VV.
The lexical index that I finally put together consisted of a list of 1.) frequently used core parts of speech which are whole words
Why? What relationship are you assuming between whole words and combinations of sounds? Why look beyond pairs or triplets (etc.) that are actually related to each other? Once you get beyond a syllable (or two) you will find few relevant phonological relationships. The vast majority will just be via simple adjacency between two segments, and a few beyond that.
2.) frequently used whole words no longer than 5 letters because 5 letters is the average sized word
As above, I can't see how that is relevant. More importantly, averages would tend to obscure less common patterns, and I thought you wanted to find all of (e.g. the full range of) variation.
3.) became a mixture of traditional consonant clusters, a combination of letters that runs parallel to vowel dipthongs, and a combination of the first 2 letter combinations.
Again it is very important to not confuse letters/characters with sounds/phonemes. There is no one-to-one relationship. There are about 45 phonemes in English (varying by dialect, mostly in vowels), but only 26 letters, and some letters also do not represent distinct sounds (e.g. C, Q).
Just using a list of IE. morphemes was not going to achieve rational results because the way the reconstructions are addressed they don't always correspond to English language renderings. This ultimately prompts a question. What uncomplicated method could there be to take a list of IE. morphemes and systematically render them to only their English language, writing, phonetic system collocations. Are there any quick references I can refer to?
What? You mean Proto-Indo-European? Why would that have anything to do with (all of) the sound patterns in English today?
I'm still not sure
why you're doing this, so if you can explain your project, as I asked above, I might be able to give a more directly helpful response.
Lastly, to demonstrate what I mean I'll simply post a preliminary of this list that I think could serve to bypass having to procure an extremely vast list of these phonetic combinations, tack off the erroneous ones or ones that are not partial to English morphology (which would result in just as much of a complexity on its own):
OK, but why? You're basically enumerating part of a dictionary list, plus sub-parts of some words. It resembles the sort of dictionary lists often used in computational linguistics (e.g. developing spell-checkers), or for other purposes like a 'dictionary attack' to hack someone's password by throwing random words (or combinations) at it rather than just random combinations of characters (that are less likely to be chosen than real words). There really are lists out there that you can find like that, which is why I suggested a computational approach before (it doesn't necessarily need to be any more complicated than just finding and reading the list, or whatever you'd like). Probably finding an open source spell-checker would be an easy place to start looking at a list of English words in their dictionary list. But what you'd do with that, I'm not sure. Again, linguists typically want to identify patterns, not just long lists of data.
--
As a slightly off-topic comment, but the sort of thing that others have thought about in the past, one of my professors once mentioned some work that had gone into surveying all English words and noticing that the level of ambiguity in representation is quite low (something like a 50% chance of accurately identifying any individual word) if representing only broad categories of sounds in that word, e.g. vowel vs. fricative vs. stop, or something like that. I think it was a total of 5 categories (rather than 26 letters, or 45 phonemes), and that was in itself enough information to narrow down the guess (e.g. by a computer doing speech recognition) to a 50/50 chance at identifying the word correctly, which seems surprising, but is an interesting result of phonotactic research combined with a computational application. (I think I'm paraphrasing those numbers relatively accurately but I'm not positive about the specifics, just that about "half" of English phonological information can be captured by a much rougher representation of about 5 categories.)