Author Topic: Character Clusters (Blends) in English - Complete List  (Read 451 times)

Offline mojobadshah

  • Jr. Linguist
  • **
  • Posts: 6
Character Clusters (Blends) in English - Complete List
« on: January 02, 2020, 02:54:02 PM »
I siphoned around the internet for a little bit and was able to put together a preliminary list of character clusters inclusive of both consonants and vowels specific to the English language or blends that are composed of character clusters that might have even been accounted for outside of a linguistics field definition.  In in terms of published works I only noticed one general work that discussed the phenomena that was particular not just to the English language, but to all languages.  Can anyone point me to a list of a very refined lexicological resource on this topic?

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1968
  • Country: us
    • English
Re: Character Clusters (Blends) in English - Complete List
« Reply #1 on: January 02, 2020, 03:47:57 PM »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline mojobadshah

  • Jr. Linguist
  • **
  • Posts: 6
Re: Character Clusters (Blends) in English - Complete List
« Reply #2 on: January 03, 2020, 03:16:49 PM »
Like this: https://en.wikipedia.org/wiki/Consonant_cluster (and I'm looking for a definitive or complete list of these for the English language).  The list I've procured however is also inclusive of vvcv alphabet arrangements and arrangements similar to these.  When the average longest word consists of 5 letters (both consonants and vowels) a very refined list of these things especial to the English language, not having to print out all the possible combinations of this alphabet topography would serve wonders (I think results resolve to ~50,000 different combinations and arrangements).   

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1968
  • Country: us
    • English
Re: Character Clusters (Blends) in English - Complete List
« Reply #3 on: January 03, 2020, 03:48:53 PM »
You're mixing some terminology: characters are written symbols (e.g. letters), but now it's clear you're asking about pronunciation. But you still seem to be relying on spelling as the way you are framing your question, rather than a more accurate approach like a phonetic transcription. You asked originally about "blends", which would suggests complex units behaving as a whole, but now it seems like you're asking about all possible combinations of letters.

What is your goal? How will you use this information? It sounds like it might be something that would be used in Natural Language Processing, for example. Others certainly have made lists like this before, although I don't know if they are freely available or if the format is what you're looking for. Depending on how extensive a "complete" list must be, this also could be something you could generate relatively easily from a large word list (if you only care about spelling, not pronunciation), and those are certainly available online.

(Also because this is a question about pronunciation, I'll move it to the Phonetics and Phonology board. It might actually belong in Computational Linguistics, but that's fine for now anyway.)
Welcome to Linguist Forum! If you have any questions, please ask.

Offline panini

  • Linguist
  • ***
  • Posts: 175
Re: Character Clusters (Blends) in English - Complete List
« Reply #4 on: January 04, 2020, 01:03:12 PM »
Given what you might be interested in, I sort of recommend the CMU pronouncing dictionary, which is a fairly good list of words (spelled) and "transcribed". Here is a sample entry:

ABNORMALLY  AE0 B N AO1 R M AH0 L IY0

The numbers mark stress level, each letter group after the spelled word is a phoneme, and if you speak American English or know someone who does you fan figure out how to map to IPA (e.g. AE is [æ]). It includes a lot of stuff that isn't "real English", being proper nouns that might show up in some text, and I don't always agree with their transcriptions. This is not "complete" since it omits words like splang, frell, but those are kind of marginal words.

Usually, we analyze consonant clusters into "possible syllable beginnings" and "possible syllable ends", so that possible clusters is the product of the two (or at least theoretically). For example, you get syllables beginning with [mb] and you don't get syllables ending with [mb], but you can get [mb] by combining syllable-final [m] and syllable-initial [ b ]. The CMU dictionary does not note where syllable breaks are (for good reason: nobody knows for sure).

Offline mojobadshah

  • Jr. Linguist
  • **
  • Posts: 6
Re: Character Clusters (Blends) in English - Complete List
« Reply #5 on: January 04, 2020, 02:56:52 PM »
You're mixing some terminology: characters are written symbols (e.g. letters), but now it's clear you're asking about pronunciation. But you still seem to be relying on spelling as the way you are framing your question, rather than a more accurate approach like a phonetic transcription. You asked originally about "blends", which would suggests complex units behaving as a whole, but now it seems like you're asking about all possible combinations of letters.

What is your goal? How will you use this information? It sounds like it might be something that would be used in Natural Language Processing, for example. Others certainly have made lists like this before, although I don't know if they are freely available or if the format is what you're looking for. Depending on how extensive a "complete" list must be, this also could be something you could generate relatively easily from a large word list (if you only care about spelling, not pronunciation), and those are certainly available online.

(Also because this is a question about pronunciation, I'll move it to the Phonetics and Phonology board. It might actually belong in Computational Linguistics, but that's fine for now anyway.)

The reason I'm using term: "character cluster" is because consonant cluster is limited to consonants only (iterated in this thread somewhere). A computational linguistics approach to this did not seem apparent because I think my question presumes a purely linguistic approach that, yeah, could technically lead into programming. There's also a note that it might still be a very complicated thing to gather an extensive list of the clustered phonetic forms or arrangements I would like to touch on here.  After a brief search on google books the insight into English language consonant clusters appears to be ?meager.  The lexical index that I finally put together consisted of a list of 1.) frequently used core parts of speech which are whole words 2.) frequently used whole words no longer than 5 letters because 5 letters is the average sized word 3.) became a mixture of traditional consonant clusters, a combination of letters that runs parallel to vowel dipthongs, and a combination of the first 2 letter combinations.  Just using a list of IE. morphemes was not going to achieve rational results because the way the reconstructions are addressed they don't always correspond to English language renderings.  This ultimately prompts a question.  What uncomplicated method could there be to take a list of IE. morphemes and systematically render them to only their English language, writing, phonetic system collocations.  Are there any quick references I can refer to?  Lastly, to demonstrate what I mean I'll simply post a preliminary of this list that I think could serve to bypass having to procure an extremely vast list of these phonetic combinations, tack off the erroneous ones or ones that are not partial to English morphology (which would result in just as much of a complexity on its own):

perpendicular
miscellaneous
individualize
adaptability
decentralize
commissioned
troubleshoot
rehabilitate
familiarize
accommodate
incorporate
responsible
deteriorate
manufacture
demonstrate
orchestrate
interpolate
systematize
consolidate
standardize
restructure
horizontal
generalize
transition
experiment
manipulate
obliterate
specialize
neutralize
capitalize
anticipate
mastermind
accomplish
depreciate
revitalize
experience
substitute
eviscerate
strengthen
photograph
coordinate
facilitate
leadership
synthesize
distribute
streamline
straighten
centralize
categorize
reposition
reorganize
accumulate
concerning
throughout
themselves
everything
beginning
advertise
improvise
outsource
transform
diversify
acclimate
emphasize
volunteer
eliminate
undertake
implement
terminate
stimulate
challenge
normalize
spearhead
modernize
establish
reinforce
discharge
lubricate
safeguard
transpose
reproduce
transport
represent
recognize
entertain
duplicate
penetrate
alleviate
dramatize
supervise
authorize
intervene
designate
arbitrate
institute
enlighten
mechanize
calibrate
liquidate
interface
construct
technical
subdivide
structure
reconcile
rearrange
integrate
formalize
associate
officiate
supposing
therefore
following
including
ourselves
everybody
something
shouldn't
wouldn't
couldn't
shalln't
vertical
straight
overcome
simulate
simplify
initiate
vitalize
increase
demolish
heighten
overhaul
dedicate
optimize
forecast
finalize
mitigate
expedite
maximize
maintain
leverage
dispatch
diminish
withdraw
minimize
decrease
continue
sanction
consider
conserve
restrict
immunize
conclude
complete
commence
generate
transfer
function
purchase
exercise
practice
advocate
dispense
instruct
delegate
appraise
penalize
navigate
mobilize
engineer
litigate
download
automate
regulate
activate
register
condense
localize
classify
preserve
separate
schedule
assemble
research
organize
document
allocate
inasmuch
provided
wherever
whenever
although
whatever
anything
somebody
everyone
yourself
outside
degrees
forward
utilize
reshape
clarify
broaden
justify
whittle
protect
improve
uncover
prevent
deliver
compete
surpass
further
succeed
fulfill
nullify
fortify
sharpen
shatter
exhibit
execute
amplify
achieve
enhance
service
nurture
witness
lighten
control
satisfy
license
contain
connect
confirm
conduct
restore
reserve
combine
furnish
release
rectify
tighten
receive
capture
finance
canvass
sustain
provide
promote
produce
bombard
bolster
solicit
enlarge
perform
bargain
educate
sponsor
recruit
preside
approve
appoint
oversee
advance
command
comfort
enforce
program
compute
operate
upgrade
augment
convert
actuate
realign
install
remodel
compile
collect
qualify
channel
situate
process
prepare
catalog
package
outline
arrange
because
whether
insofar
without
between
through
towards
despite
against
nowhere
nothing
certain
another
anybody
whoever
herself
himself
someone
doesn't
haven't
weren't
should
nobody
myself
anyone
itself
musn't
didn't
hadn't
hasn't
wasn't
aren't
upside
inside
finish
bottom
beside
middle
center
steady
uphold
change
modify
endure
adjust
revise
reduce
launch
embark
verify
effect
unveil
propel
impose
devote
polish
handle
target
commit
negate
battle
shield
assume
secure
retain
master
ensure
regain
select
sculpt
market
depict
screen
scrape
update
triple
reveal
insure
travel
repeat
repair
charge
relate
recall
extend
supply
breach
expand
borrow
enrich
solder
siphon
orient
manage
direct
assign
decide
govern
advise
enlist
adhere
import
upload
deploy
action
record
splice
gather
budget
revamp
filter
divide
divert
though
matter
unless
around
except
beyond
behind
across
within
before
during
front
below
angle
right
words
focus
adopt
defer
reach
raise
probe
check
start
carry
avert
learn
enact
yield
serve
nurse
seize
widen
weigh
scope
visit
value
knock
tutor
issue
treat
gross
grade
close
trade
clear
trace
frame
cause
cater
forge
teach
boost
study
print
enter
block
plant
blast
begin
drive
award
paint
shear
chair
edify
pilot
coach
equip
build
debug
verbs
match
label
chart
index
group
route
place
align
merge
while
that
order
above
along
under
since
after
about
among
until
would
could
might
shall
being
where
there
these
which
yours
whose
those
their
can't
don't
next
left
back
earn
hunt
trap
gain
tune
zone
nail
seal
weld
wage
load
copy
save
view
lift
sand
keep
trim
halt
fund
tour
time
find
care
tend
stun
spot
play
sell
shop
draw
open
lead
host
hire
head
name
code
list
rate
rank
sort
plan
form
file
till
case
even
once
soon
lest
than
both
when
your
then
also
near
down
plus
over
like
upon
into
from
with
must
does
will
have
been
were
self
same
some
mine
none
what
this
that
them
they
it's
top
end
mid
zap
win
fix
aim
net
wax
cut
use
buy
sew
run
aid
add
act
ray
map
log
set
now
far
how
why
and
the
off
out
but
for
can
may
did
had
has
was
are
all
lot
she
his
you
him
her
our
its
who
no
if
so
or
as
up
by
on
in
to
of
at
do
be
is
it
me
us
he
my
we


language
thousand
possible
interest
remember
complete
question
mountain
children
together
sentence
perhaps
brought
surface
nothing
produce
special
develop
contain
correct
machine
certain
science
pattern
against
several
morning
hundred
problem
numeral
product
measure
example
between
thought
country
picture
through
weight
wonder
common
record
island
decide
object
course
street
behind
strong
minute
beauty
figure
notice
govern
appear
person
center
toward
simple
travel
listen
ground
during
better
happen
direct
family
though
enough
friend
second
letter
always
school
answer
should
father
mother
animal
change
follow
differ
before
little
people
number
among
bring
shape
check
laugh
plane
force
wheel
clear
sleep
quick
green
final
teach
front
stood
drive
pound
field
power
voice
serve
money
vowel
table
reach
early
heard
whole
piece
south
order
class
short
black
state
leave
above
ready
young
usual
plain
color
watch
horse
north
began
carry
group
river
until
those
music
often
paper
begin
white
night
close
press
while
don't
story
might
start
since
cross
never
cover
plant
learn
still
study
found
stand
earth
build
world
point
again
house
light
spell
large
small
three
right
cause
think
great
under
every
round
after
where
place
first
water
sound
could
thing
these
write
would
about
their
which
other
there
east
fill
snow
heat
miss
game
gold
boat
test
busy
foot
moon
deep
blue
full
stay
inch
fact
tail
mind
free
warm
gave
week
done
able
rest
noun
star
plan
wait
note
dark
lead
unit
fine
town
fall
cold
pull
rule
road
love
slow
less
sing
five
fast
west
hold
step
TRUE
hour
best
size
king
farm
pass
knew
told
fire
rock
half
area
ship
wind
song
pose
body
soon
bird
talk
feel
list
ever
girl
main
wood
face
sure
hear
base
once
fish
idea
room
rain
took
care
feet
mile
book
mark
both
ease
walk
next
seem
open
stop
life
real
late
left
draw
hard
tree
city
door
last
keep
four
food
grow
page
head
self
near
need
kind
went
such
high
must
here
land
even
port
hand
read
home
play
also
well
want
tell
does
move
mean
same
turn
line
help
much
form
just
very
name
give
good
show
came
year
only
back
live
made
take
part
work
find
been
side
down
call
than
know
over
most
come
more
look
make
long
like
them
then
many
will
time
each
said
word
when
your
were
what
some
from
this
have
they
with
that
sit
bed
hot
yes
ran
ago
dry
age
yet
lot
box
cry
fly
map
lay
war
ten
six
top
dog
red
cut
eat
car
got
few
run
sea
far
saw
eye
let
sun
own
try
off
men
ask
why
act
big
add
put
end
air
set
too
old
boy
low
say
our
man
get
new
any
now
may
who
did
day
has
two
him
see
her
way
she
how
use
all
out
can
but
hot
had
one
his
are
for
was
you
and
the
oh
am
us
me
no
my
go
so
if
do
an
up
we
by
or
at
be
as
on
he
it
is
in
to
of
I
a


tion
less
eeba
eeca
eeda
eefa
eega
eeha
eeja
eeka
eela
eema
eena
eepa
eeqa
eera
eesa
eeta
eeva
eexa
eeza
eewa
eeya
eeae
eebe
eece
eede
eefe
eege
eehe
eeje
eeke
eele
eeme
eene
eepe
eeqe
eere
eese
eete
eeve
eexe
eeze
eewe
eeye
eeai
eebi
eeci
eedi
eefi
eegi
eehi
eeji
eeki
eeli
eemi
eeni
eepi
eeqi
eeri
eesi
eeti
eevi
eexi
eezi
eewi
eeyi
eeao
eebo
eeco
eedo
eefo
eego
eeho
eejo
eeko
eelo
eemo
eeno
eepo
eeqo
eero
eeso
eeto
eevo
eexo
eezo
eewo
eeyo
eeau
eebu
eecu
eedu
eefu
eegu
eehu
eeju
eeku
eelu
eemu
eenu
eepu
eequ
eeru
eesu
eetu
eevu
eexu
eezu
eewu
eeyu
eeay
eeby
eecy
eedy
eefy
eegy
eehy
eejy
eeky
eely
eemy
eeny
eepy
eeqy
eery
eesy
eety
eevy
eexy
eezy
eewy
eeyy
ooba
ooca
ooda
oofa
ooga
ooha
ooja
ooka
oola
ooma
oona
oopa
ooqa
oora
oosa
oota
oova
ooxa
ooza
oowa
ooya
ooae
oobe
ooce
oode
oofe
ooge
oohe
ooje
ooke
oole
oome
oone
oope
ooqe
oore
oose
oote
oove
ooxe
ooze
oowe
ooye
ooai
oobi
ooci
oodi
oofi
oogi
oohi
ooji
ooki
ooli
oomi
ooni
oopi
ooqi
oori
oosi
ooti
oovi
ooxi
oozi
oowi
ooyi
ooao
oobo
ooco
oodo
oofo
oogo
ooho
oojo
ooko
oolo
oomo
oono
oopo
ooqo
ooro
ooso
ooto
oovo
ooxo
oozo
oowo
ooyo
ooau
oobu
oocu
oodu
oofu
oogu
oohu
ooju
ooku
oolu
oomu
oonu
oopu
ooqu
ooru
oosu
ootu
oovu
ooxu
oozu
oowu
ooyu
ooay
ooby
oocy
oody
oofy
oogy
oohy
oojy
ooky
ooly
oomy
oony
oopy
ooqy
oory
oosy
ooty
oovy
ooxy
oozy
oowy
ooyy
oeba
oeca
oeda
oefa
oega
oeha
oeja
oeka
oela
oema
oena
oepa
oeqa
oera
oesa
oeta
oeva
oexa
oeza
oewa
oeya
oeae
oebe
oece
oede
oefe
oege
oehe
oeje
oeke
oele
oeme
oene
oepe
oeqe
oere
oese
oete
oeve
oexe
oeze
oewe
oeye
oeai
oebi
oeci
oedi
oefi
oegi
oehi
oeji
oeki
oeli
oemi
oeni
oepi
oeqi
oeri
oesi
oeti
oevi
oexi
oezi
oewi
oeyi
oeao
oebo
oeco
oedo
oefo
oego
oeho
oejo
oeko
oelo
oemo
oeno
oepo
oeqo
oero
oeso
oeto
oevo
oexo
oezo
oewo
oeyo
oeau
oebu
oecu
oedu
oefu
oegu
oehu
oeju
oeku
oelu
oemu
oenu
oepu
oequ
oeru
oesu
oetu
oevu
oexu
oezu
oewu
oeyu
oeay
oeby
oecy
oedy
oefy
oegy
oehy
oejy
oeky
oely
oemy
oeny
oepy
oeqy
oery
oesy
oety
oevy
oexy
oezy
oewy
oeyy
eaba
eaca
eada
eafa
eaga
eaha
eaja
eaka
eala
eama
eana
eapa
eaqa
eara
easa
eata
eava
eaxa
eaza
eawa
eaya
eaae
eabe
eace
eade
eafe
eage
eahe
eaje
eake
eale
eame
eane
eape
eaqe
eare
ease
eate
eave
eaxe
eaze
eawe
eaye
eaai
eabi
eaci
eadi
eafi
eagi
eahi
eaji
eaki
eali
eami
eani
eapi
eaqi
eari
easi
eati
eavi
eaxi
eazi
eawi
eayi
eaao
eabo
eaco
eado
eafo
eago
eaho
eajo
eako
ealo
eamo
eano
eapo
eaqo
earo
easo
eato
eavo
eaxo
eazo
eawo
eayo
eaau
eabu
eacu
eadu
eafu
eagu
eahu
eaju
eaku
ealu
eamu
eanu
eapu
eaqu
earu
easu
eatu
eavu
eaxu
eazu
eawu
eayu
eaay
eaby
eacy
eady
eafy
eagy
eahy
eajy
eaky
ealy
eamy
eany
eapy
eaqy
eary
easy
eaty
eavy
eaxy
eazy
eawy
eayy
ness
able
ment


ous
age
shr
sch
scr
sph
thw
thr
uae
ube
uce
ude
ufe
uge
uhe
uje
uke
ule
ume
une
upe
uqe
ure
use
ute
uve
uxe
uze
uwe
uye
oae
obe
oce
ode
ofe
oge
ohe
oje
oke
ole
ome
one
ope
oqe
ore
ose
ote
ove
oxe
oze
owe
oye
iae
ibe
ice
ide
ife
ige
ihe
ije
ike
ile
ime
ine
ipe
iqe
ire
ise
ite
ive
ixe
ize
iwe
iye
aae
abe
ace
ade
afe
age
ahe
aje
ake
ale
ame
ane
ape
aqe
are
ase
ate
ave
axe
aze
awe
aye
eae
ebe
ece
ede
efe
ege
ehe
eje
eke
ele
eme
ene
epe
eqe
ere
ese
ete
eve
exe
eze
ewe
eye
ary
ish
ory
ous
acy
asm
dom
ism
ity
ric
ion
ant
ard
art
ean
eer
ent
iff
ist
yte
ian
ana
ery
ful
cle
kin
let
ock
ern
ess
est
eth
fid
gen
gon
ful
ics
ify
ing
ior
isk
fit
oid
oma
oon
red
abs
acr
end
aer
gas
all
amb
api
apo
avi
azo
bar
bio
bis
two
cat
cen
cis
con
cog
col
com
cor
cry
sac
cyt
toe
dec
dek
dia
dis
dif
dys
ect
not
old
epi
erg
exo
for
gam
geo
gem
hal
sea
hem
hex
way
hol
hom
hyl
hyp
iso
log
lyo
mal
meg
mes
met
mid
mis
mon
myo
myc
neo
non
oct
ont
oro
oto
out
ovi
ovo
oxy
pan
par
ped
per
pel
pre
pro
por
pur
pyo
pyr
sex
sub
suc
suf
sug
sum
sup
sus
sur
syn
syl
sym
tel
the
tox
tra
tri
uni
vas
xen
zoo
zyg
zym


xe
gg
ll
mm
nn
rr
ty
dw
tw
bl
br
ch
cl
cr
dr
fl
fr
gl
gr
pl
pr
sc
sh
sk
sl
sm
sn
sp
st
sw
th
tr
wh
wr
qu
ba
ca
da
fa
ga
ha
ja
la
ma
na
pa
qa
ra
sa
ta
va
za
wa
ya
be
ce
de
fe
ge
he
je
ke
le
me
ne
pe
qe
re
se
te
ve
ze
we
ye
bi
ci
di
fi
hi
ki
li
mi
ni
pi
ri
si
ti
vi
wi
bo
co
do
fo
go
ho
jo
ko
lo
mo
no
po
ro
so
to
vo
wo
zo
wo
yo
bu
cu
du
fu
gu
hu
ju
ku
lu
mu
pu
ru
su
tu
by
ly
my
ny
py
ry
ac
al
ar
ic
id
of
an
er
or
en
el
et
ed
ab
in
ad
am
em
ex
un
aa
ae
ai
ao
au
ea
ee
ei
eo
eu
ia
ie
io
iu
oa
oe
oi
oo
ou
ua
ue
ui
uo
uu


a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z










Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1968
  • Country: us
    • English
Re: Character Clusters (Blends) in English - Complete List
« Reply #6 on: January 04, 2020, 08:40:53 PM »
Quote
The reason I'm using term: "character cluster" is because consonant cluster is limited to consonants only (iterated in this thread somewhere).
But there is still a fundamental difference between sounds and spelling. If you're just looking for a general term then you could say for example "phoneme cluster", but that sounds like odd phrasing because consonant clusters specifically refers to two (or more) consonants that cluster as a unit, almost as if they are a single sound.* The combinations ('clusters') of random sounds wouldn't form "clusters" in the same sense because that's just what sounds happen to be adjacent. Linguists don't generally speak of "consonant-vowel clusters" for that reason, for example. Vowels, on the other hand, can form diphthongs (complex vowels of two or more parts), so the term "vowel cluster" is clear but not typically used.
(*Note that in this sense we might say that /ʃt/ ["sh+t"] is a consonant cluster in English as in the word "wished" /wɪʃt/, but that for example /ʃf/ is not a consonant cluster because it only occurs at syllable boundaries as in "wishful", but not "*wishf" or "*shful". So "cluster" refers to something acting as a unit in a particular sense within syllable structures, not just adjacency.)

However, more generally there is a whole subfield of phonology called phonotactics looking at how sounds combine and what combinations are valid. You seem to be asking basically "What is the phonotactic system of English?" And there's been a lot written on that, but I don't know of an easily accessible list. Instead, most linguists would study this via rules/patterns as generalizations rather than enumerating examples in a list.

Quote
A computational linguistics approach to this did not seem apparent because I think my question presumes a purely linguistic approach that, yeah, could technically lead into programming.
No, I just meant that I was trying to imagine a relevant application for this, and enumerating out lists like that is something that computational approaches might find valuable (e.g. training an algorithm). Most theoretical approaches aren't list-based (see above).

Quote
There's also a note that it might still be a very complicated thing to gather an extensive list of the clustered phonetic forms or arrangements I would like to touch on here.
As I said, linguists would generally approach this via generalizations rather than lists (although some informal lists would be intermediate steps in working out the theory). For example, there are no restrictions as far as I know on which consonants can combine with which vowels. So CV and VC combinations can be any C or V elements. The are more restrictions on CC and VV.

Quote
The lexical index that I finally put together consisted of a list of 1.) frequently used core parts of speech which are whole words
Why? What relationship are you assuming between whole words and combinations of sounds? Why look beyond pairs or triplets (etc.) that are actually related to each other? Once you get beyond a syllable (or two) you will find few relevant phonological relationships. The vast majority will just be via simple adjacency between two segments, and a few beyond that.

Quote
2.) frequently used whole words no longer than 5 letters because 5 letters is the average sized word
As above, I can't see how that is relevant. More importantly, averages would tend to obscure less common patterns, and I thought you wanted to find all of (e.g. the full range of) variation.
Quote
3.) became a mixture of traditional consonant clusters, a combination of letters that runs parallel to vowel dipthongs, and a combination of the first 2 letter combinations.
Again it is very important to not confuse letters/characters with sounds/phonemes. There is no one-to-one relationship. There are about 45 phonemes in English (varying by dialect, mostly in vowels), but only 26 letters, and some letters also do not represent distinct sounds (e.g. C, Q).
Quote
Just using a list of IE. morphemes was not going to achieve rational results because the way the reconstructions are addressed they don't always correspond to English language renderings.  This ultimately prompts a question.  What uncomplicated method could there be to take a list of IE. morphemes and systematically render them to only their English language, writing, phonetic system collocations.  Are there any quick references I can refer to?
What? You mean Proto-Indo-European? Why would that have anything to do with (all of) the sound patterns in English today?

I'm still not sure why you're doing this, so if you can explain your project, as I asked above, I might be able to give a more directly helpful response.

Quote
Lastly, to demonstrate what I mean I'll simply post a preliminary of this list that I think could serve to bypass having to procure an extremely vast list of these phonetic combinations, tack off the erroneous ones or ones that are not partial to English morphology (which would result in just as much of a complexity on its own):
OK, but why? You're basically enumerating part of a dictionary list, plus sub-parts of some words. It resembles the sort of dictionary lists often used in computational linguistics (e.g. developing spell-checkers), or for other purposes like a 'dictionary attack' to hack someone's password by throwing random words (or combinations) at it rather than just random combinations of characters (that are less likely to be chosen than real words). There really are lists out there that you can find like that, which is why I suggested a computational approach before (it doesn't necessarily need to be any more complicated than just finding and reading the list, or whatever you'd like). Probably finding an open source spell-checker would be an easy place to start looking at a list of English words in their dictionary list. But what you'd do with that, I'm not sure. Again, linguists typically want to identify patterns, not just long lists of data.

--

As a slightly off-topic comment, but the sort of thing that others have thought about in the past, one of my professors once mentioned some work that had gone into surveying all English words and noticing that the level of ambiguity in representation is quite low (something like a 50% chance of accurately identifying any individual word) if representing only broad categories of sounds in that word, e.g. vowel vs. fricative vs. stop, or something like that. I think it was a total of 5 categories (rather than 26 letters, or 45 phonemes), and that was in itself enough information to narrow down the guess (e.g. by a computer doing speech recognition) to a 50/50 chance at identifying the word correctly, which seems surprising, but is an interesting result of phonotactic research combined with a computational application. (I think I'm paraphrasing those numbers relatively accurately but I'm not positive about the specifics, just that about "half" of English phonological information can be captured by a much rougher representation of about 5 categories.)
« Last Edit: January 04, 2020, 08:47:39 PM by Daniel »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline mojobadshah

  • Jr. Linguist
  • **
  • Posts: 6
Re: Character Clusters (Blends) in English - Complete List
« Reply #7 on: January 05, 2020, 02:49:42 PM »
Quote
The reason I'm using term: "character cluster" is because consonant cluster is limited to consonants only (iterated in this thread somewhere).
But there is still a fundamental difference between sounds and spelling. If you're just looking for a general term then you could say for example "phoneme cluster", but that sounds like odd phrasing because consonant clusters specifically refers to two (or more) consonants that cluster as a unit, almost as if they are a single sound.* The combinations ('clusters') of random sounds wouldn't form "clusters" in the same sense because that's just what sounds happen to be adjacent. Linguists don't generally speak of "consonant-vowel clusters" for that reason, for example. Vowels, on the other hand, can form diphthongs (complex vowels of two or more parts), so the term "vowel cluster" is clear but not typically used.
(*Note that in this sense we might say that /ʃt/ ["sh+t"] is a consonant cluster in English as in the word "wished" /wɪʃt/, but that for example /ʃf/ is not a consonant cluster because it only occurs at syllable boundaries as in "wishful", but not "*wishf" or "*shful". So "cluster" refers to something acting as a unit in a particular sense within syllable structures, not just adjacency.)

However, more generally there is a whole subfield of phonology called phonotactics looking at how sounds combine and what combinations are valid. You seem to be asking basically "What is the phonotactic system of English?" And there's been a lot written on that, but I don't know of an easily accessible list. Instead, most linguists would study this via rules/patterns as generalizations rather than enumerating examples in a list.

Quote
A computational linguistics approach to this did not seem apparent because I think my question presumes a purely linguistic approach that, yeah, could technically lead into programming.
No, I just meant that I was trying to imagine a relevant application for this, and enumerating out lists like that is something that computational approaches might find valuable (e.g. training an algorithm). Most theoretical approaches aren't list-based (see above).

Quote
There's also a note that it might still be a very complicated thing to gather an extensive list of the clustered phonetic forms or arrangements I would like to touch on here.
As I said, linguists would generally approach this via generalizations rather than lists (although some informal lists would be intermediate steps in working out the theory). For example, there are no restrictions as far as I know on which consonants can combine with which vowels. So CV and VC combinations can be any C or V elements. The are more restrictions on CC and VV.

Quote
The lexical index that I finally put together consisted of a list of 1.) frequently used core parts of speech which are whole words
Why? What relationship are you assuming between whole words and combinations of sounds? Why look beyond pairs or triplets (etc.) that are actually related to each other? Once you get beyond a syllable (or two) you will find few relevant phonological relationships. The vast majority will just be via simple adjacency between two segments, and a few beyond that.

Quote
2.) frequently used whole words no longer than 5 letters because 5 letters is the average sized word
As above, I can't see how that is relevant. More importantly, averages would tend to obscure less common patterns, and I thought you wanted to find all of (e.g. the full range of) variation.
Quote
3.) became a mixture of traditional consonant clusters, a combination of letters that runs parallel to vowel dipthongs, and a combination of the first 2 letter combinations.
Again it is very important to not confuse letters/characters with sounds/phonemes. There is no one-to-one relationship. There are about 45 phonemes in English (varying by dialect, mostly in vowels), but only 26 letters, and some letters also do not represent distinct sounds (e.g. C, Q).
Quote
Just using a list of IE. morphemes was not going to achieve rational results because the way the reconstructions are addressed they don't always correspond to English language renderings.  This ultimately prompts a question.  What uncomplicated method could there be to take a list of IE. morphemes and systematically render them to only their English language, writing, phonetic system collocations.  Are there any quick references I can refer to?
What? You mean Proto-Indo-European? Why would that have anything to do with (all of) the sound patterns in English today?

I'm still not sure why you're doing this, so if you can explain your project, as I asked above, I might be able to give a more directly helpful response.

Quote
Lastly, to demonstrate what I mean I'll simply post a preliminary of this list that I think could serve to bypass having to procure an extremely vast list of these phonetic combinations, tack off the erroneous ones or ones that are not partial to English morphology (which would result in just as much of a complexity on its own):
OK, but why? You're basically enumerating part of a dictionary list, plus sub-parts of some words. It resembles the sort of dictionary lists often used in computational linguistics (e.g. developing spell-checkers), or for other purposes like a 'dictionary attack' to hack someone's password by throwing random words (or combinations) at it rather than just random combinations of characters (that are less likely to be chosen than real words). There really are lists out there that you can find like that, which is why I suggested a computational approach before (it doesn't necessarily need to be any more complicated than just finding and reading the list, or whatever you'd like). Probably finding an open source spell-checker would be an easy place to start looking at a list of English words in their dictionary list. But what you'd do with that, I'm not sure. Again, linguists typically want to identify patterns, not just long lists of data.

--

As a slightly off-topic comment, but the sort of thing that others have thought about in the past, one of my professors once mentioned some work that had gone into surveying all English words and noticing that the level of ambiguity in representation is quite low (something like a 50% chance of accurately identifying any individual word) if representing only broad categories of sounds in that word, e.g. vowel vs. fricative vs. stop, or something like that. I think it was a total of 5 categories (rather than 26 letters, or 45 phonemes), and that was in itself enough information to narrow down the guess (e.g. by a computer doing speech recognition) to a 50/50 chance at identifying the word correctly, which seems surprising, but is an interesting result of phonotactic research combined with a computational application. (I think I'm paraphrasing those numbers relatively accurately but I'm not positive about the specifics, just that about "half" of English phonological information can be captured by a much rougher representation of about 5 categories.)

Well, I don't disagree with anything that you covered here.  You're reflections and break down of the linguistic jargon that I put to this thread helps to put some of that phenomena into perspective. I think the best way to rephrase my inquiry here is to simply put everything into terms of only "consonant clusters" and beg the question (because I know there are attempts online to put all these clusters and blends together), but I'm always in question how definitive any one of them are and would be very appreciative for a link or 2 to one of the more known competent references specific to "consonant clusters in English." 
« Last Edit: January 05, 2020, 02:51:21 PM by mojobadshah »

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1968
  • Country: us
    • English
Re: Character Clusters (Blends) in English - Complete List
« Reply #8 on: January 05, 2020, 05:37:12 PM »
I've asked several times above: why do you want this information? Are you trying to learn (or teach) English? Are you trying to understand the theoretical structure of English phonology?

The subfield of phonotactics has thousands of publications in it. Beyond just which sounds can combine, there is even more extensive research on what happens when sounds combine, just in the general field of segmental morphology. The process of assimilation is when two sounds (typically adjacent) affect each other so that the pronuncation is different. That's why, for example, "ti" is sometimes pronounced "sh" as in "nation", via a process called palatalization (specifically that high front vowel affects consonants so that they are pronounced in a particular way, with the hard palate in the mouth), and eventually can lead to extensive changes like that. (That's just one example of many.) So if you're interested in how to pronounce the combinations, you'll need more than just a list. Even just limiting it to phonotactics in general, just to get a sense of how much research has been put into this topic, take a look at https://en.wikipedia.org/wiki/Sonority_hierarchy -- and despite extensive research that's still a controversial (but sometimes useful) idea.

As for learning (or teaching) these combinations, it's generally more relevant to focus on specific challenging cases than going through all combinations, and to determine that it depends on the first language of the learners. (A linguist with experience in both languages can identify the relevant items to practice, and that can be very helpful.)

For a comprehensive list, something like this looks like a relevant place to start:
https://www.jstor.org/stable/454173
It summarizes several previous studies. (In this case, it's probably beneficial that this is an old article because current research would, as I wrote above, focus on generalizations rather than lists.)
Just skimming over that, it looks like a good and reliable list. I do notice one that might be "missing", which is "vl-", as in the Russian name "Vlad[imir]", which English speakers seem to have no trouble pronouncing (so in principle it can be considered part of English phonology, even though it seems to be an accidental gap in that it's coincidentally missing from dictionaries). That's another question you'd want to address, depending on your purpose, whether you care about what is common versus what is possible.
« Last Edit: January 05, 2020, 05:40:23 PM by Daniel »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline mojobadshah

  • Jr. Linguist
  • **
  • Posts: 6
Re: Character Clusters (Blends) in English - Complete List
« Reply #9 on: January 06, 2020, 01:17:48 PM »
I've asked several times above: why do you want this information? Are you trying to learn (or teach) English? Are you trying to understand the theoretical structure of English phonology?

The subfield of phonotactics has thousands of publications in it. Beyond just which sounds can combine, there is even more extensive research on what happens when sounds combine, just in the general field of segmental morphology. The process of assimilation is when two sounds (typically adjacent) affect each other so that the pronuncation is different. That's why, for example, "ti" is sometimes pronounced "sh" as in "nation", via a process called palatalization (specifically that high front vowel affects consonants so that they are pronounced in a particular way, with the hard palate in the mouth), and eventually can lead to extensive changes like that. (That's just one example of many.) So if you're interested in how to pronounce the combinations, you'll need more than just a list. Even just limiting it to phonotactics in general, just to get a sense of how much research has been put into this topic, take a look at https://en.wikipedia.org/wiki/Sonority_hierarchy -- and despite extensive research that's still a controversial (but sometimes useful) idea.

As for learning (or teaching) these combinations, it's generally more relevant to focus on specific challenging cases than going through all combinations, and to determine that it depends on the first language of the learners. (A linguist with experience in both languages can identify the relevant items to practice, and that can be very helpful.)

For a comprehensive list, something like this looks like a relevant place to start:
https://www.jstor.org/stable/454173
It summarizes several previous studies. (In this case, it's probably beneficial that this is an old article because current research would, as I wrote above, focus on generalizations rather than lists.)
Just skimming over that, it looks like a good and reliable list. I do notice one that might be "missing", which is "vl-", as in the Russian name "Vlad[imir]", which English speakers seem to have no trouble pronouncing (so in principle it can be considered part of English phonology, even though it seems to be an accidental gap in that it's coincidentally missing from dictionaries). That's another question you'd want to address, depending on your purpose, whether you care about what is common versus what is possible.

I was looking for a very concise work involving strictly consonant clusters/blends relevant to English texts.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1968
  • Country: us
    • English
Re: Character Clusters (Blends) in English - Complete List
« Reply #10 on: January 06, 2020, 09:09:07 PM »
Yes, that article includes a list like that.
Welcome to Linguist Forum! If you have any questions, please ask.