Author Topic: Showing Irish and Punjabi are connected  (Read 2055 times)

Offline Forbes

  • Jr. Linguist
  • **
  • Posts: 48
Showing Irish and Punjabi are connected
« on: July 11, 2021, 12:51:48 AM »
Some years ago I posted the following as a question/thought experiment in another forum. The question does rather suppose a knowledge of both Irish and Punjabi.

We know that Irish and Punjabi are genetically related, but suppose they were the only two IE languages around today with no records of any other IE language. How easy would it be for a linguist using the comparative method to show their relationship?

I choose these two languages because, quite apart from the fact that they are totally mutually unintelligible, both have developed features that are unusual in IE languages - in the case of Irish, initial consonant mutation and in the case of Punjabi, phonemic tones. Would those very differences in fact be sufficient so that no one would in fact even start to look for a connection?

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 2074
  • Country: us
    • English
Re: Showing Irish and Punjabi are connected
« Reply #1 on: July 11, 2021, 02:47:08 AM »
This is an interesting question. It is important to remember that the IE family is well-established because there is so much data available to us, and especially because we have historical records of classical languages. It was originally similarity between Latin and Ancient Greek compared to Sanskrit that suggested a relationship, not a similarity between Italian and Hindi (or Irish and Punjabi) or other modern languages.

The relationship between the classical IE languages is often quite clear, and the time depth there is around 3,000-4,000 years, versus 6,000+ for the modern languages.

Having just two languages also makes the comparison harder because we only have two data points. If the two languages happen to vary by chance (e.g. borrowing in one or both languages), we wouldn't have any way to observe the original similarity. If we compared all of the Celtic languages to all of the Indo-Iranian languages, for example, we'd have a lot more data to work with, and more chances to observe similarity. But having even more branches means we get to try to understand the full story, and how Irish and Punjabi each fit into it, rather than just comparing them individually with a lot of historical gaps.

But to take an extreme example for comparison, Basque has been extensively studied to look for possible relationships with just about every other language out there that might be even remotely plausible. (None of those possibilities have worked out.) So if Irish and Punjabi held the same status as linguistic isolates today, presumably someone would have tried to compare the two (especially because they would both be isolates, so the mystery for both would be interesting, and suggest a possible relationship, or at least both would interest people who wanted to study isolates). So I don't doubt that someone would actually try this study. That is, assuming historical linguistics progressed as well as it did based on Indo-European, and the same questions were being asked today. But I'd think that with Austronesian, for example, where the reconstruction is more transparent and in some ways easier than Indo-European, the scientific methods would have eventually ended up where they are today although it might have taken longer, because the original interest in IE was partly due to nationalism, race, etc. (which, tangentially, was part of what fueled the World Wars, Nazism, etc., and some of those ideas of nationalism in connection to, e.g. "who really belongs here", etc., still persist today, including in relation to where the IE homeland was, etc.). So let's set all of that aside and assume similar methods in reconstruction, and also that someone attempted this study.

In terms of the difficulty of showing a relationship, Peter (1991, an honors undergraduate thesis at the University of Illinois at Urbana-Champaign) showed that if we compare just modern English and Hindi, looking for apparent "cognates" in the languages, only about 35% of those would actually reflect a common origin. I'll assume it's similar to Irish and Punjabi (maybe somewhat more, given the extensive borrowing in English [from French] that would make direct reconstruction harder, so maybe 50% of apparent cognates would be actual cognates). What this means is that we need a reliable method to separate out unreliable data from real cognates. But that's the foundation of Historical Linguistics. If you look at any two languages, you can find coincidentally similar words. That's not enough to show a relationship. Instead, what you need to find are patterns. And specifically, you don't want to find identical things in two languages, but systematic differences. Consider Grimm's Law for Germanic: you would find a correspondence between Germanic /f/ and /p/ in other languages (e.g. Romance), and so forth. You aren't looking just for simple equivalence as in /p/=/p/, but evidence of regular sound change producing systematic, corresponding differences like /f/ and /p/.

As a basic starting point, we need data, and many linguists would start with the Swadesh list or something like it. We happen to have those available for Irish and Punjabi:

Now if you just skim through those lists, they look quite different, and it's not immediately apparent that the languages are related. But we can use the sound patterns to look for possible connections. I should emphasize that I admit my perspective here is somewhat biased by knowing about PIE reconstruction already, so the hypothetical linguist in our scenario wouldn't have that same information. But it's really not all that different from other groups that have been reconstructed.

Let's look at a few basic words, which happen to often be similar in related languages:
English: Irish / Punjabi
I: mé / maĩ
you: tú / tū̃
he: sé / iha
(Caveat: we'll rely on these forms even if they are likely somewhat obscured by orthography, especially for Irish. [Some IPA transcriptions are given there, but not all.] Regardless, that's not a practical problem for serious research and for linguists familiar with the languages, just a limitation of my answer here based on that source.)

These pronouns are far from conclusive, but the first two certainly look similar. Assuming the languages are related, then either /m/ and /t/ are inherited directly with no sound changes, or they changed in one or both languages but ended up converging again. As for "he", there's no obvious connection there, although we might wonder about some kind of relationship between /s/ and /h/, which is attested in some languages (e.g. Coastal Latin American Spanish at the end of a word, where los niños may be pronounced as lo(h) niño(h)). Regardless, we need more data.

Looking at another set of words that are commonly shared:
English: Irish / Punjabi
one: aon / ikka
two: dó / do
three: trí / tinn
four: ceathair  / cār
five: cúig / pañj

Some of those have obvious apparent correspondences, and others do not. "One" is quite variable across languages, so that's not surprising. You would think that "four" might show a relationship with /c...r/, but then "five" is either a different root (borrowing or other lexical change) or suggests instead /c/ = /p/. So this is starting to get complicated. Following up from about about /t/, here we see /t/=/tr/, so we would want to try to understand what's going on there. Maybe the sequence /tr/ was original, and either /r/ in general disappeared in Punjabi or in the cluster /tr/. But above we know that original /t/ probably corresponds directly to /t/ in both languages, so it's unlikely that, e.g. /t/ became /tr/ in Irish, unless of course it has something to do with the adjacent vowels or something else. Or maybe these /t/ phonemes actually represent different original phonemes in the proto-language.

To summarize at this point, even with very little data we've seen some hints of possible relationship, but so far nothing conclusive. We would want to gather some additional hypotheses about possible correspondences. We can do that just skimming through the list. Again, here it would be especially helpful to find possible correspondences between different sounds, and see if we can verify those as general patterns. Today, we might attempt this with a computer: look at all possible correspondences, and see if some seem to appear repeatedly. Or we might try it by hand, following traditional research. Regardless, this is just guessing to see if anything happens to fit.

The results would be mostly noise, to be thrown out. But if we do stumble upon a genuine correspondence, and then observe that correspondence in many more words, that is worth pursuing. If we can find enough such correspondences, we may be able to build up a systematic reconstruction of the two languages back to the proto-language and list the sound changes that occurred in the development of each. Just glancing through these lists, it is immediately apparent that there are a lot of borrowings (or other kinds of lexical change) obscuring any possible relationships. Syllable structures are different, etc. But if we keep looking at the data and trying out various hypotheses of correspondence, we may find something.

So a couple more pairs to consider:
English: Irish / Punjabi
and: agus / atē
to see: feic /vēkhṇā
mother: máthair / mātā

The first pair suggests /g/ = /t/. That happens to be incorrect (I know this for reasons our hypothetical researcher would not, based on the etymology of the words), but we could test it with more words. There's no reason not to test that. Do we see that correspondence often? If not, we can reject it as coincidence.
The second pair would suggest both /f/=/v/ and /c/=/kh/. I believe the first doesn't really work out (at least not as a general correspondence, because it was a somewhat unusual change in Irish), but the second probably does.
And the third pair suggests "th" [h] =/t/, and these words are in fact historically related, so looking at more might reveal this as a pattern. But this is actually misleading: the Punjabi word for mother was actually borrowed from Sanskrit. So reconstructing this would lead us in the wrong direction, although it might also appear in other words borrowed from Sanskrit.
If we repeat that process enough times, we might be able to separate out the noise from the genuine correspondences.

But at this point one thing does become clear: we need more than the 100-200 words of the Swadesh list, and these are generally thought to be among the most likely words to be cognates, so we would have even sparser results with longer lists, but we could still find some. A fundamental problem here is due to borrowings, so it's not entirely clear to me whether this pair really is reconstructable. It might be similar to many cases we see today of proposed relationships that cannot be sufficiently demonstrated.

In the time it's taken me to write up this answer, I've failed to sufficiently demonstrate a relationship, and that's even with the help of knowing something about the relationship between these languages (although for the most part I was guided by the data itself, i.e. not looking up the 'answer' to this question directly).

In terms of systematic correspondences, though, I do see one area in the data that looks promising. The wh-question words and the demonstratives seem to possibly show some correspondences. The Irish wh-words begin with "c" (=/c~k/) corresponding to /k/ in Punjabi. Of course those probably aren't independent data points (all going back to one root), but the repeated correspondences are what we're looking for in order to have some confidence in the correspondences we find. Similarly, I see Irish "s" (=/ʃ/) and Punjabi /h/ or /th/ in demonstratives or related words like 'this', 'here', etc., so I'd check that in more pairs.

At this point, I'd think the next step would be one of the following:
1. Learn each of these languages well (or spend a lot of time looking at dictionaries) to see what additional pairs can be identified. It would be very helpful to remove/identify any non-root parts of words (e.g. inflectional suffixes, as well as compounds, etc.) if possible, and a speaker of each language could do that more easily than I can. That might remove some of what's obscuring the relationships. Similarly, rather than relying just one the one-item Swadesh list pairs, we could ask speakers to name various synonyms or related meanings for each item to see if there are some other words that do show correspondences not demonstrated by these most basic translations from English.
2. Try a computational approach, that simply looks through every possible correspondence then lists out those that seem to repeat fairly often in the data.
3. If we had any information about borrowings and neighboring languages, we could try to remove some of that from the data. Your question suggested that English and Sanskrit don't exist, or at least there is no record of them, but in a real scenario of this type we'd probably know something about some of the neighboring languages, and even if not related they might be a source for borrowings. In the specific scenario that Indo-European developed exactly as it did but then mysteriously the only remaining languages today are Irish and Punjabi with no trace of the others, that would make this extremely difficult.

In short, this is very difficult, but not impossible. Persistence in research would suggest that eventually someone would stumble upon some reliable patterns in the data, and we might be able to sort out the rest from there. But having only these two data points makes it very difficult to understand anything like the full story we know about Indo-European today.

...both have developed features that are unusual in IE languages - in the case of Irish, initial consonant mutation and in the case of Punjabi, phonemic tones.
Those features are not, in themselves, dealbreakers here. Tones came from somewhere, so we could possibly reconstruct that (I'm assuming some kind of coda consonant), but vowels aren't that reliable anyway for reconstruction, so maybe we just set them aside. The initial consonant mutations make the reconstruction trickier, but just give us more options for correspondences, and instead we could focus on non-initial sounds. Together, it seems like mostly medial sounds might be most transparent. But my guess is that these features don't make this pairing much more difficult than, for example English and Hindi, or other similar pairs. (There are some pairings that would be easier, though, especially any languages that don't have as many borrowings as both of these. I wonder if Swedish and Russian, for example, would work out better.)
« Last Edit: July 11, 2021, 02:55:26 AM by Daniel »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline panini

  • Linguist
  • ***
  • Posts: 215
Re: Showing Irish and Punjabi are connected
« Reply #2 on: July 12, 2021, 11:36:16 AM »
Tone and mutation would not be absolute barriers to finding cognates, but they would be complicating factors that would reduce the probability of establishing the hypothesis. Now, I do not really know Punjabi but I've looked at a bunch of Indo-Aryan languages including a tonal one, and I do not believe that one would find evidence for a common source language from the morphology (this was crucial in establishing PIE in the first place). So it would rest on the lexicon and the possibility of finding sound laws. Also as a prelude, we have to say how this strange situation came about. Let's say that everything is as it was up to today, then suddenly aliens vaporize speakers of all Indo-European languages, and any record of any indo-European language. Except (some) speakers of modern Punjabi and modern Irish are preserved. The point being to preserve all existing influences on Punjabi and Irish (e.g. Sanskrit, Hindi, Persian, Arabic or English words borrowed into Punjabi, etc.). I suppose also we need to fill in the now-unoccupied land with random (?) speakers of the remaining languages of the world. The other thing is, we would not even ask the question without some reason. Let's say that the reason is to computationally disprove some relatedness claim, such as that Spanish Turkish and Russian Chinese are (lexically) related, drawing on some list of lexical similarities. The skeptical scientist picks a bunch of unrelated languages like Quechua, Lardil, Khoekhoe, Cree, Ket and Tibetan, and computes a "best case scenario" of relatedness. The project also compares within Volta-Congo, Afro-Asiatic and Austronesian.

Setting aside the specific comparison that you want to perform, we know that known loan words would in fact be eliminated from the mass-comparison database, unless this study was carried out by a very sharp person who realizes that you cannot have special rules of data-filtering just in case you have some suspicion (based on....) that a word is borrowed. Within the set of languages known to be related, there is a degree of similarity attributable to actual derivation from a proto-language source, versus the ubiquosity of loanwords from certain languages (many of which later disappeared due to the Mars attack). How does this comparison deal with similarities between Logoori and Sotho that are because they are both Bantu languages, and because they both have a bunch of English loan words?

Part of this procedural background involves creating word-pairs. They could be semantically-driven, comparing "kalb" (the dominant language for semantic research in this new world is Arabic, so that word is English "dog") in Punjabi [kuta:] versus Irish [madra] or [ɡəiɾˠ] (adopting the Wiktionary transcription). The [kuta:] and [ɡəiɾˠ] are not related, but we don't know that, so it is inevitable that this pair would go into a database of "similar words". Comparisons can be semantically sloppy as well: "fenugreek" and "female" are related (via a PIE root meaning "suck"). Seriously, nobody in their right mind would say "Fenugreek, female: obviously related via the meaning 'suck'". We accept semantic stretches when there is e.g. good historical support (which was vaporized by the Martians), or when they seem common enough.

Because we don't have a pre-compiled list of Irish-Punjabi cognates, we're dealing not just with words that we know are related, but also ones that are not related (in the relevant sense, i.e. "both come from PIE", not "Both come from English". Here is a seriously defective comparison set, based on Wiki Swadesh lists for Irish, Punjabi and Finnish. The Irish forms are orthographic and they deviate from pronunciation massively, in other words, because of Irish spelling we are not comparing Irish and Punjabi, we are comparing some older Celtic language which happens to have caused Irish spelling to look a wee more like PIE than actual pronunciation. Here is my collection of the first 60 words on the list

1   1sg   mē̃   mé    minä
2   2sg   tū̃   tú    sinä
3   3sg   iha   sé    se
4   we    asī̃   sinn    me
5   2pl   tusī̃   sibh    te
6   3pl   iha   siad    he
7   this   iha   seo    tämä
8   that   uha   sin   se
9   here   itthē   anseo    täällä
10   there   utthē   ansin   tuolla
11   who   kauṇ   cé   kuka
12   what   kī   cad   mikä
13   where   kitthē   cá   missä
14   when   kadõ   cathain   kun
15   how   kivē̃   chaoi   kuinka
16   not   nahī̃    ní    ei
17   all   sārā   uile    kaikki
18   many   bahut   a lán    paljon
19   some   kujjha   roinnt    jokin
20   few   virlē   beagán    vähän
21   other   dūjā   eile    toinen
22   one   ikka   aon    yksi
23   two   do   dó    kaksi
24   three   tinn   trí    kolme
25   four   cār   ceathair    neljä
26   five   pañj   cúig    viisi
27   big   vaḍḍā   mór    iso
28   long   lammā   fada    pitkä
29   wide   cauṛā   leathan    leveä
30   thick   moṭā   tiubh    paksu
31   heavy   bhārī   trom    raskas
32   small   choṭā   beag    pieni
33   short   choṭā   gearr    lyhyt
34   narrow   sauṛā   cúng    kapea
35   thin   patlā   tanaí    ohut
36   woman   ōrat   bean    nainen
37   man (adult male)   ādamī   fear    mies
38   man (human being)   inasān   duine    ihminen
39   child   baccā   leanbh   lapsi
40   wife   patnī   bean chéile    vaimo
41   husband   patī   fear céile    aviomies
42   mother   mātā   máthair    emä
43   father   abbā   athair    isä
44   animal   jānvar   ainmhí    eläin
45   fish   macchī   iasc    kala
46   bird   pañchī   éan    lintu
47   dog   kuttā   madra    koira
48   louse   jū̃   míol cnis    täi
49   snake   sappa   nathair    käärme
50   worm   kīṛā   péist    mato
51   tree   rukkha   crann    puu
52   forest   jaṅgal   coill    metsä
53   stick   soṭī   bata    keppi
54   fruit   phal   toradh    hedelmä
55   seed   bīj   síol    siemen
56   leaf   patā   duilleog    lehti
57   root   jaṛh   fréamh    juuri
58   bark (tree)   khalla   rúsc    kuori
59   flower   phulla   bláth    kukka
60   grass   ghāh   féar    ruoho

Given unguided comparison I feel in my gut (which is where most linguistic hypotheses are tested) that the three languages would come out to be about equally related in a "best case relatedness scenario". leanbh and  lapsi (child) are "obviously related" although they are not, and so on.

We do insist on systematic correspondence relations, so you can't derive [p,p] from the same thing as [p,h] or [p,v], unless there is some other contextual difference, or perhaps the proto-language had a richer consonant series (p,b,ph,bh). We know that Punjabi [p] with low tone will correspond to Irish [ b ] (I guess) and Punjabi [ b ] with H tone does too, because [ b h] → [p] in Punjabi. But maybe there was also [p', f, v] in the proto-language, if there was a proto-language.

I think this would be an excellent methodological experiment, best implemented via a computer program. Really, I have no idea how the experiment would turn out.

Offline Forbes

  • Jr. Linguist
  • **
  • Posts: 48
Re: Showing Irish and Punjabi are connected
« Reply #3 on: July 19, 2021, 03:38:51 PM »
Thanks to Daniel and Panini for their long and informative replies.

I was prompted to ask the question because, as mentioned above, many have sought to do things like showing that Basque and Georgian are related and the answer from linguists is always that no demonstrable relationship can be shown. I was therefore idly wondering if it would be possible, in the absence of all our knowledge of other Indo-European languages both ancient and modern, to show that Irish and Punjabi are related.

I have just remembered reading that "Punjab" is Persian "panj ab" meaning "five rivers" and that "ab" is etymologically the same as Welsh "afon" (hence all the River Avons). That prompted me to Google the Irish for river which is "abhain". A case of at least one instance of cognates being recognisable in languages separated by a long distance

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 2074
  • Country: us
    • English
Re: Showing Irish and Punjabi are connected
« Reply #4 on: July 19, 2021, 09:46:42 PM »
I have just remembered reading that "Punjab" is Persian "panj ab" meaning "five rivers" and that "ab" is etymologically the same as Welsh "afon" (hence all the River Avons). That prompted me to Google the Irish for river which is "abhain". A case of at least one instance of cognates being recognisable in languages separated by a long distance
Yes, exactly. The question is whether enough data points like that could be gathered to verify the correspondences suggested by that pair.

Note that it isn't such an interesting pair if both have /ab/, because we don't see any systematic differences, which is the real key to unlocking relationships, however counterintuitive that may seem. Welsh is actually more interesting with /f/, but that was excluded in your original premise. Again we'd need to get more data to verify whether these apparent correspondences are actually informative.

I'll leave you with just one more example to ponder: Grimm's law shows correspondences between IE voiceless stops (as found in Latin and other languages), versus Germanic voiceless fricatives. So we have pater and father, etc. One of those changes was /k/>/h/ (with an additional adjustment for place of articulation shifting from /x/ to /h/). If you happen to know a Romance language, you may be able to think of some examples. One that you might think of is English heart and Spanish corazón (there's an extra suffix added in Spanish but we'll set that aside). And that's a legitimate cognate pair, related also to English borrowings cordial (<Latin) and cardiac (<Greek). So far so good, right? What about other examples? Another pair that would seem obvious, and might be one of the first you'd think of if you're learning Spanish, would be house and casa. The correspondence is so clear, and exactly as predicted. Right? Not so fast... actually that's not a cognate pair at all, even though it seems like it should be. It's just a coincidence. The lesson is that even data points that fit your hypothesis might be noise in the data. And with just those two data points, we'd have no way of knowing that. So this is why we can't rely on single examples. We need repeated, systematic correspondences (preferably differences) to reliably demonstrate relationship.
Welcome to Linguist Forum! If you have any questions, please ask.