Author Topic: Croatian toponyms  (Read 2202 times)

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Croatian toponyms
« on: August 12, 2017, 08:26:47 AM »
I am trying to help the Croatian historians by interpreting the toponyms. Many toponyms appear to be easily explainable by PIE. Which is to be expected, IE languages have been spoken here ever since mid 3rd millennium BCE (Vucedol culture). There a few astounding examples. The ancient name for the river Kupa is Colapis, and that's obviously *kwol+*h2ep (water with meanders). The ancient name for Zagreb is Andautonia, and that can quite easily be h2en+dheh2+o(n)t+on(=om), so that it means "near that which flows". However, the mainstream Croatian toponymy quite often doesn't appear to have looked into PIE. Issa, the ancient name for the island Vis, is widely stated to have an unknown, perhaps Pre-Indo-European, etymology. However, it can easily be derived from *yos+*eh2, in the sense "where a lot of springs are". There were spas there in the Roman times. And it appears that all the ancient names for the places in Croatia where the Roman spas were share the same root. Daruvar was called Balissa (I believe Bal means bright, from *bhel) and Varazdin was called Iasa. There are multiple rivers and streams whose names appear to be derived from *h3rews. On Risnjak, the mountain, there is a stream with the same name. Many people say that the stream was named after the mountain, although it could easily be the other way around. Also, the ancient name for the river Rasa is Arsia, and, in Slavonia, there is a stream called Ervenica. Cibalae, the ancient name for Vinkovci, could easily be from *kjey+*bel (strong house), and it seems to me that nobody suggested it. The IE word for valley, *h1eyn, also appears in multiple toponyms. Incerum, the ancient name for Pozega, is often said to have an unknown etymology. However, it can easily be *h1eyn+*kjer, so that it means "the heart of the valley". The ancient name for Donji Miholjac is Mariniana. It could be from Marinus, a common roman name, but it's more likely a Latin folk-etymology of *mory+*h1eyn, "marshy valley", which is what Donji Miholjac actually is. The mountain Papuk is said to be named after the Papuk stream, but the stream is said to be of unknown etymology. I believe it is actually from *bhebhogj (repetitive participle of *bhogj, "that which flows and flows"). The mount Psunj is also said to be of unknown etymology, even though its ancient name, Pisunus, is very similar to the PIE word for resin, *pisnu, and Psunj has a lot of softwood. The river Sutla is also said to be of of unknown etymology, although it can very easily be *suh1nt, participle of *sewh1, so that it means "that which waters the ground". Pazin is also said to be of unknown etymology, even though it's sensical as *ph2senti (pasture). The same goes for Aenona, the ancient name for Nin, it's said to be of unknown etymology, although it can be from*h2ekj+*mon (where a lot of stones are). There are many toponyms which are more sensibly explainable using PIE than using Croatian. Mainstream etymology connects the river Vuka with the Croatian word "vuk", for "wolf". However, it's more likely from zero grade of *welk (a PIE onomatopoeia for "to flow", syllabic l often vocalizes to u in Croatian even in today's loanwords), isn't it? Baranja is usually derived from "baran", a spurious Croatian word for lamb, but isn't it more sensical to derive it from the PIE word for marshland, *beh3r? The ancient name for Baranja was Valeriana. It's usually derived from the name Valerius, but isn't it more likely that it comes from *wel+*h1er (wet valley)? The river Orljava is said to be derived from Croatian word for echo, "oriti", but isn't it more sensical to derive it from *h1or (to flow)? There are some villages whose names mainstream etymology derives from "daleko" (far away), like Dalj and Daljok. Isn't it more sensibly derived from *dhel, in the sense "milkmen"? The neme of the city Osijek is said to come from the Croatian word for tide, "oseka", but couldn't just as easily be *h1es+*seg (healthy, fertile field)? Some historical sources also spell "Osijek" as "Esseg". Tarda can be explained similarly as coming from *ters (dry land).
Though, there are some place names that would appear fanciful in PIE. The ancient name for Valpovo is Iovalum, which would mean "magical beer" (*yow+*h2elut). Or maybe "magical herb" (*yow+h2elom). Even if it comes from *wel (valley), the prefix Io- is still unexplained. There was quite a demonstrable word there, something like *ker, meaning "to flow", occurring in many streams and rivers (Krapina, Karasica, Krka, Korana, Krndija [the stream]…), without an obvious IE root. Or the suffix *-la in the river names like Orljava and Sutla. There are some toponyms which multiple languages could give a sensical explanation for, for instance, Pannonia (both Latin "pannis" and PIE *pen appear as sensical origins). I've tried to reconstruct some grammar of the ancient language of Slavonia based on the toponyms. Obviously, it was a centum language. I believe it had an ablaut, but not with the vowels e and o, but with a and u. For example, Mursa (the ancient name for Osijek) and Marsonia (the ancient name for Slavonski Brod) obviously share the same root (probably *mreys), and Papuk would then be a grammatical repetitive of *bhogj. Because of the epenthetic vowels (Ervenica), I'd suggest that the accent was on the first syllable (as in Ancient Greek "aster", "oros" or "ennea"). Do you think I am doing it right?
« Last Edit: August 12, 2017, 12:23:15 PM by FlatAssembler »

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #1 on: August 17, 2017, 05:56:51 AM »
I will continue writing, since there doesn't appear to be much opposition yet. So, I think I also know where does the name of the mountain Krndija come from. There are several mainstream theories. One is that it's related to the Greek word χορδή, string, in the sense "border between two territories". The other is that it comes from the Croatian word "krčiti", meaning "to cut wood". My theory is that it comes from PIE *(s)ker-nt, in the sense "steep". I also have a temptation to think that the nominative singular actually ended in -i in Illyrian. The suffix -i- is seen as well in, for instance, Serapia (unidentified stream in ancient Slavonia, its name, of course, comes from PIE *ser-h2ep, "flowing water"), Colapis, and possibly also in Andautonia.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #2 on: August 17, 2017, 07:13:29 PM »
Your etymologies seem plausible, given that I'm not an expert on these languages in particular.
The only potentially helpful commentary I could give would be to point out that some of them look quite transparent, so perhaps they are too convenient/easy (and plausible), and therefore just coincidental? The problem with toponyms is that there is no correlating factor to check. Have you lined up systematic sound changes that would correlate with those developments? But even if so, it's hard to verify (with any independent third factor) that those etymologies are more than coincidence (or that all of them are). Again I can't comment specifically on these points, and you may very well be right. But this is a very narrow/specialized field and it's hard to say more.
Welcome to Linguist Forum! If you have any questions, please ask.

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #3 on: August 18, 2017, 12:40:28 AM »
Thanks for responding!
Quote
The only potentially helpful commentary I could give would be to point out that some of them look quite transparent, so perhaps they are too convenient/easy (and plausible), and therefore just coincidental?
Which ones? And why would being transparent indicate that it's just a coincidence? Sound changes are derived from the obvious etymologies.
Quote
Have you lined up systematic sound changes that would correlate with those developments?
Well, a few of them. Of course, there is a loss of the laryngeals. Then there are a few more for consonants:
*kw>k (Colapis)
*kj>k (Cibalae, Incerum)
*bh>p (Papuk, written as "Papugh" in historical sources)
*gj>gh (in Papugh, whatever sound that represented)
*mr>b (in Bosut, if it comes from *mreys), but this is uncertain.
*kjm>ym (in Aenona), but this is also uncertain.
As for the vowels, I've written that the rules of ablaut probably changed from the primary vocalism being e/o to being a/u (as in the toponyms Mursa, Marsonia and Mariniana, all on marshy land). Also, it probably had an epenthetic vowel e, as in the hydronym Ervenica.
Shouldn't we first try to find obvious etymologies, and then try to derive the regular sound changes and explain the less obvious ones?
Quote
But even if so, it's hard to verify (with any independent third factor) that those etymologies are more than coincidence (or that all of them are).
I don't know. Look, the only places in which there were Illyrian thermae in Croatia were called: Issa, Balissa and Iasa. What's the probability of the same element occuring in all three places with Illyrian thermae if it was a coincidence? And it being so simply explainable as derived from PIE *yos (spring)?
« Last Edit: August 18, 2017, 08:01:33 AM by FlatAssembler »

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #4 on: August 19, 2017, 04:20:48 AM »
I'll share a few more of my ideas until I get another response. I believe I know where the toponym Scardona (the ancient name for Skradin) comes from. It's explainable as Proto-Indo-European *(s)kwor-dhos (big cliff). The same root can perhaps be seen in Cersia (the ancient name for the island of Cres).

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #5 on: August 19, 2017, 06:26:56 AM »
I know this is a bit of politicized issue, but does anyone here have an idea where the name "Croatia" comes from? I have a theory, which appears to me nobody before suggested. See, the Croatian word for "Croat" is "Hrvat". This could mean "one from the river *Hrva". Then the hydronym *Hrva can be analyzed as PIE *ser-h2ekw-eh2. What do you think?
To give some more context, the first mention of the name "Croat" is on the Tanais Tablets on the east cost of the Sea of Azov.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #6 on: August 19, 2017, 01:32:28 PM »
Quote
Quote
The only potentially helpful commentary I could give would be to point out that some of them look quite transparent, so perhaps they are too convenient/easy (and plausible), and therefore just coincidental?
Which ones? And why would being transparent indicate that it's just a coincidence? Sound changes are derived from the obvious etymologies.
It reminds me of the proposed cognate between Indo-European and Semitic/Afro-Asiatic for 'bull' (something along the lines of 'taurus'), which did indeed look similar. The best counterargument (aside from a lack of systematic correspondences) was simply that the time depth was too great for it to not have changed more substantially to be less transparent. (Of course your derivations are not quite that transparent, and they also have a shallower time depth, so my comment was not as strong an objection.)

Quote
Shouldn't we first try to find obvious etymologies, and then try to derive the regular sound changes and explain the less obvious ones?
Sure! But the important distinction is whether you're trying to determine a large-scale statistical question (like "are these two languages related?") or trying to identify small-scale data points accurately (like "what is the etymology of 'Croatia'?"). Statistical methods (even informal, traditional approaches) work because you have enough data that even if one data point is off the whole analysis still works out (maybe one proposed cognate-- or a few-- just developed by coincidence, but overall most of the proposed cognates are still accurate). But the comparative method and reconstruction in general involve a lot of guessing, which is fine if you're going for the statistical approach, but it's much, much harder to provide direct evidence for individual etymologies. Even for various well-known words with some direct evidence about their original usage (one example is "OK"), we don't know exactly what their etymology is. Of course many of the examples you're looking at probably do have simple (even obvious) etymologies, but it's hard to verify that you're correct, because it's a sample of one (i.e., not really statistics at all), so there's no way to compare your results to know if you're right. I suppose we could look at all of your proposed etymologies and get a sense that they're reasonable (as I've said) and then use a statistical comparison of your personal skill but that's a sort of ad hominem argument rather than real statistics about any individual data point. Again, I'm not saying you're wrong, but I also don't know how to verify that you're right.

And for most of these there are already other proposed etymologies, right? How would you propose selecting one rather than the other? Let's say for the sake of argument that you're right about half of your proposed etymologies. In that case, which ones? I don't know how to answer that question. Plausibility is not evidence per se. One possibility would be to try to find a pattern (for example, you notice a lot of rivers being relevant to your proposed etymologies, so if you can find something systematic in that maybe it would help), but it would still end up being about individual data points overall.

Etymology is not a major focus of linguistics publishing these days, but there's still some work in philology on the subject, and actually toponyms in particular are still frequently researched. So I imagine you could publish some of these results in a journal if that interests you. But it would simply be one perspective with no clear way to decide whether your proposal is right. (On the other hand, looking at known sound changes you might be able to find some flaws in the proposals of others and try to falsify those, which is all we can really do in science. Show flaws in the others, and no known flaws in yours, thus leaving yours as the only known remaining possibility.)
Welcome to Linguist Forum! If you have any questions, please ask.

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #7 on: August 20, 2017, 05:48:56 AM »
I'm glad to hear another educated opinion!
Quote
The best counterargument (aside from a lack of systematic correspondences) was simply that the time depth was too great for it to not have changed more substantially to be less transparent.
I am not sure I understand. Of course the proportion of words that look similar is higher among cognates than in those that aren't cognates (aside from borrowing). If there are a lot of cognates between languages, of course there are going to be cognates such as Croatian "svinja" and English "swine" or Croatian "sunce" and English "sun". It's not like all the phonemes are bound to change to dissimilar phonemes in a few thousand years.

This is not really related to the topic, but why are you guys so much against the attempts to reconstruct older proto-languages? If the first human languages were sign languages that gradually evolved into fully spoken languages, isn't it reasonable to assume that all spoken languages share a common ancestor? Though I personally wouldn't assume an especially close connection between Indo-European and Uralic, as most of the people who try to reconstruct and older proto-language do, but between Indo-European and Austronesian. Look at the pronouns. Most of the proto-languages have a nasal in the 1st person singular, while both Indo-European and Austronesian have a velar. In PIE, it's *egjoh2, in PAN, it's *aku. Then look at the PAN Swadesh list. Doesn't it seem to you that PIE *r corresponds to PAN *l, that PIE *s corresponds to PAN *q and that PIE *d corresponds to PAN *d?
*treys (three)-*telu (three)
*romk (hand)-*lima (hand/five)
*ser (to flow)-*qalur (to flow)
*skend (skin)-*qanic (skin)
*stembh (to walk)-*qaqay (foot)
*smew (smoke)-*qabu (ash)
*serw (to watch)-*qalayaw (day)
*bheh2s (to talk)-*baqbaq (mouth)
*dwoh1 (two)-*dusa (two)
*dyews (sky)-*daya (upwards/height/sky)
*danu (river)-*danaw (lake)
Yes, I should study PAN a lot more before making such extraordinary statements, but why wouldn't this method be legitimate? I may be missing something very important.
Quote
Statistical methods (even informal, traditional approaches) work because you have enough data that even if one data point is off the whole analysis still works out (maybe one proposed cognate-- or a few-- just developed by coincidence, but overall most of the proposed cognates are still accurate).
Why do you think those statistical methods work any better than common sense does? If you count the English dictionary equivalents of the Croatian words that start with a 't', do you think that a significant portion of them will start with a 'th'? Wouldn't the early loanwords and coincidences average them out?
Quote
Of course many of the examples you're looking at probably do have simple (even obvious) etymologies, but it's hard to verify that you're correct, because it's a sample of one (i.e., not really statistics at all), so there's no way to compare your results to know if you're right.
Well, finding a few hydronyms such as Colapis or Serapia, or a toponym near the river such as Andautonia, in a native American language would prove that my methodology is flawed. But I don't think there are such.
Quote
And for most of these there are already other proposed etymologies, right? How would you propose selecting one rather than the other?
Well, for example, far away from the sea, there is a marshy valley called Mariniana. In language A, this means "of the marine". In language B, this means "marshy valley". Which etymology is more likely? Every child knows the answer. It's B.
My most serious opponent is probably the Croatian etymologist Petar Skok, who ascribed many toponyms to "Mediterranian substratum", that is, the supposed Pelasgian language, spoken all the way from Itally to Turkey. His evidence was allegedly the same or similar elements appearing in toponyms in that area. However, he didn't ascribe any meaning to those supposedly repeating elements. So, I am pretty sure that his hypothesis is invalid. I waged a Wikipedia war against his hypotheses there:
https://en.wikipedia.org/wiki/Talk:Vis_(town)
As for the interpretations using Croatian, listen, Croatian language in most of the cases doesn't give any explanation of a toponym. If "Krndija" really comes from "krčiti", where did the 'č' disappear? And what does the ending "-ndija" mean? If "Daljok" really comes from "dal", what does the ending "-jok" mean? I am a native speaker of Croatian and I know something about linguistics, so I think I can safely tell you that those etymologies are invalid. And when Croatian does give some explanation, it's almost always complete nonsense. If there was a town called "Far", would you assume that its name means "far away", or would you assume it's actually not an English name? That's why I assume the toponym "Dalj" doesn't come from Croatian. And why would anyone call a river "a female wolf" (Vuka)? And the same goes for the other supposed explanations using Croatian which I tried to contradict.
Quote
Etymology is not a major focus of linguistics publishing these days.
Perhaps because it takes less knowledge of linguistics to discuss morphosyntax than to discuss etymology. I could probably write two full pages about what part of speech is "plus" in Croatian without doing any actual research. But that probably won't be interesting to anybody whom I know.
« Last Edit: August 20, 2017, 10:31:33 AM by FlatAssembler »

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #8 on: August 20, 2017, 05:36:53 PM »
Quote
I am not sure I understand. Of course the proportion of words that look similar is higher among cognates than in those that aren't cognates (aside from borrowing). If there are a lot of cognates between languages, of course there are going to be cognates such as Croatian "svinja" and English "swine" or Croatian "sunce" and English "sun". It's not like all the phonemes are bound to change to dissimilar phonemes in a few thousand years.
Correct. I don't disagree with you about that, and the problem is less extreme than the example I gave. My point was that the most likely flaw in your argument (if there is one) would be that you are finding relatively obvious examples (that look convincing!), and that perhaps the true etymology is more obscure and less transparent than that. It's not about any of your particular derivations or anything, just a general comment of something to watch out for, example, if you might find some of your etymologies more intuitive than existing proposals (which might actually be correct because they are less intuitive, if you follow what I mean).

It is great though to see you incorporating the known sound changes into your work, so as I've said, your proposals seem plausible to me, and unfortunately I can't comment further than that.

Quote
This is not really related to the topic, but why are you guys so much against the attempts to reconstruct older proto-languages? If the first human languages were sign languages that gradually evolved into fully spoken languages, isn't it reasonable to assume that all spoken languages share a common ancestor?
That's a topic for another thread! But I'd be happy to discuss it. In short, after about 10,000 years the changes have piled up too much to be sure of what's a borrowing or a chance resemblance. I personally am not opposed to trying to go back farther, but there's a point where it becomes almost impossible. Reconstructing proto-world is just not possible, since it's at least 5 or 10 times older than that 10,000 'limit'. Maybe we can push the limit to 20,000 years, but 50,000 or 100,000? The two methods we have are broad statistical comparison that breaks down around 10,000 years (the noise can no longer be distinguished from the real data), or reconstructing based on our reconstructions, which is a viable possibility but ends up stretching our hypotheses (and layering them) too much to be sure of much. I could go into more detail (start a new topic?) if you'd like.

Quote
Why do you think those statistical methods work any better than common sense does? If you count the English dictionary equivalents of the Croatian words that start with a 't', do you think that a significant portion of them will start with a 'th'? Wouldn't the early loanwords and coincidences average them out?
I'm not arguing for blind statistics at all! What I'm saying is that if you have a large sample, you'll probably be right on average. If you have a single data point, there's no reason to assume you'll be right that one time. Imagine a game of darts where you win by guessing what the thrower is trying to hit. With a sample of one, you can only assume they were trying to hit wherever the dart landed (or near there). With a sample of 10 or 100 or more, you can much more likely guess where they were aiming overall. So figuring out whether two languages are related (do they share a lot of cognates) is a much easier question than figuring out the etymology of an individual word-- if you're wrong sometimes, you can still be right statistically, but not for an individual data point. Thus without any direct evidence, an individual etymology is harder to figure out and there's no clear way to verify that you're right.

But that doesn't mean we should just dump the dictionary into a statistical program and see what happens! (And far too often non-linguists do something along those lines and claim they've solved a major linguistics problem like where the homeland of Indo-European was-- they're almost always wrong, even though, unfortunately, papers like that can get a lot of attention in the world outside of linguistics.)

Quote
Well, finding a few hydronyms such as Colapis or Serapia, or a toponym near the river such as Andautonia, in a native American language would prove that my methodology is flawed. But I don't think there are such.
Your methodology isn't flawed. It's just not clear how to verify that your individual results are correct. (They might be!!) And it's certain that in general many places are named after bodies of water.

Quote
My most serious opponent is probably the Croatian etymologist Petar Skok, who ascribed many toponyms to "Mediterranian substratum", that is, the supposed Pelasgian language, spoken all the way from Itally to Turkey. His evidence was allegedly the same or similar elements appearing in toponyms in that area. However, he didn't ascribe any meaning to those supposedly repeating elements. So, I am pretty sure that his hypothesis is invalid.
Unclear. You might be right. The question is how to decide between the two.

Quote
I waged a Wikipedia war against his hypotheses there:
Wikipedia is not the place for original research, and a 'war' there is utterly meaningless, even if you 'win'. If you have new ideas, publish them. Sometimes science is no better than just sharing ideas and hoping other people read them. After they are published (and especially if others start to accept them as useful/good/reliable/etc. ideas) then you can cite them on Wikipedia (and elsewhere). And in publishing them you'll get a peer review from someone who really knows the specific subject in detail and can determine whether your contribution is worth sharing. The result will be your hypothesis as a competing hypothesis with the others out there. And time will tell what happens. Maybe nothing. Maybe just two plausible hypotheses with no clear way (currently) to decide between the two (that's what I observe at the moment). But yours will be on par with the other then.

Quote
I am a native speaker of Croatian and I know something about linguistics, so I think I can safely tell you that those etymologies are invalid.
Neither of those is a qualification for knowing etymologies. Native speakers have no intuition whatsoever about the historical state of their languages (I wasn't born knowing how Shakespeare wrote, for example, or with any intuition that my American English was somehow inherited from England, or before that older Germanic tribes). And knowing something about linguistics means you can attempt to figure it out (as you are doing!), not that your answers are necessarily correct.

Quote
And the same goes for the other supposed explanations using Croatian which I tried to contradict.
The way to dispute them is to show that you have an alternative explanation that is equally coherent, and that there may be some flaws in that argumentation. If so, you probably have something publishable. Again this isn't my personal area so I can't tell you if that's the case.

Quote
Quote
Etymology is not a major focus of linguistics publishing these days.
Perhaps because it takes less knowledge of linguistics to discuss morphosyntax than to discuss etymology. I could probably write two full pages about what part of speech is "plus" in Croatian without doing any actual research. But that probably won't be interesting to anybody whom I know.
Not that I'm personally offended, but that's a major oversimplification of things.
I don't disagree with you that there are some things in linguistics (including some published papers) that don't take much background to understand. But there are vastly more things that take a lot of background (e.g., a PhD, or lots and lots of self study). If you really know the subject well, the next step is to publish. If you're not ready for that, that's fine! But until you can publish about it, you're not at that level yet. (Admittedly there are some theory-internal political factors for some journals in terms of what will be likely to be published, but rarely will it matter who you are because for good journals the process is double-blind so they don't know your name or whether you're affiliated with a university or whatever; watch out for predatory pay-to-publish journals that will publish just about anything they receive, but that's another story).

As for the real reasons why etymology isn't a major topic in linguistics today, here are some bullet points:
--Historical linguistics in general is not a major topic anymore. I personally don't like that (and some others also want to promote it), but most research now is about how languages 'work' (in the brain, in the mind, in society, in their structure, etc.) rather than where they came from.
--Emphasis is on how languages work rather than the details of individual lexical items.
--Etymology is still alive and well (relatively speaking) in the field of philology, which is like linguistics, but more interested in describing details like etymology rather than an explanatory science (linguistics) about how language works.
--Plus, etymology requires high level knowledge of individual languages, and (unfortunately in my opinion) the personal language knowledge of individual linguists seems to be in decline, in favor of things like statistical, experimental, computational, corpus, etc., methods. To oversimplify, not all linguists today have studied Latin and Greek (as was once common), and few study Indo-European roots in detail-- Historical linguists do, but there are fewer of them, as I said.
--Overall, there is also the fact that much of Historical Linguistics and etymology, in the sense that it interested the first historical linguists a couple centuries ago (and which eventually lead to modern Linguistics as a science), has actually been solved. There are dictionaries of etymologies for Proto-Indo-European roots, and while they are far from flawless, a large amount of research has been done, and has been done well. So linguists have moved on to other projects, while of course some still work on issues related to those original topics.

(Personally I study syntax, as well as morphology and semantics, mostly from the perspective of 'how language works', but also from historical and comparative perspectives. So I can certainly comment on this topic, but you'll need to find someone who specializes in Indo-European etymology, and perhaps Croatian in particular, to give you specialized feedback on this topic. It's nice to see you working on it though!)
« Last Edit: August 20, 2017, 08:20:04 PM by Daniel »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #9 on: August 21, 2017, 07:15:11 AM »
Sorry if I misinterpreted what you were trying to say. I have poor reading and listening comprehension in general, both in English and in Croatian (and people often lose their patience with me). As for my idea about PAN and PIE being genetically related, I posted it on a forum on which I know one expert for PAN, who also knows something about PIE, let's see what he will say (I know my arguments are more than likely not convincing):
https://www.theflatearthsociety.org/forum/index.php?topic=70713.msg1944261#msg1944261
Though I don't know any forum where I can discuss the Croatian toponyms. I tried to post some of my ideas on a Croatian forum about linguistics (which is actually mostly about the prescriptive grammar), but only one person there knew something about PIE. I just got bombarded with irrelevant and nonsensical critiques I wasn't prepared for.
https://‎www.forum.hr/showpost.php?p=64488535&postcount=143
Which forum would you recommend me?

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #10 on: August 21, 2017, 06:51:08 PM »
Quote
As for my idea about PAN and PIE being genetically related...
At some extreme time depth, probably. But not more closely than other groupings (for example there is relatively more evidence for Indo-European and Uralic being related, and there are also the proposals for Nostratic and Eurasiatic).

flatearthsociety forums? Discuss where you want of course, but I'd be surprised if that's the best place for scientific perspectives!

Quote
Though I don't know any forum where I can discuss the Croatian toponyms. ...
Which forum would you recommend me?
It depends on what your goals are. If you are seeking feedback in order to revise your ideas, then this approach may work. However, very few people could be considered experts in Croatian toponyms (perhaps only a handful) so you actually might do better contacting them personally (find some academic profiles for professors who do research on the topic and send them an email). You might find someone at various forums online, but since there are so few there's no way of guessing where they might be. And internet forums are not going to provide you great opportunities for such a specialized topic!

If your goal instead is to share your ideas more broadly, then you should seriously consider publishing them. You'll get feedback. It's not uncommon for papers to be rejected (even from full time academic researchers) but the advantage then is having the feedback from the peer review. Another probably easier way to start off would be to present your work at a linguistics (toponomy?) conference. You could even just stop by to see what's going on at a conference you're not presenting at (there may be some fee for attendance). You can see dozens of upcoming conferences at linguistlist.org. Note that for most conferences you must submit your work at least 3 months ahead of time, often 6 months (or more). Are you in Croatia? There may be some conferences locally for you, but if not you'll find many in not so distant countries-- Germany, Italy, etc. They happen all the time. If you aren't interested in a full-time academic career (so you don't want to go grad school in linguistics, or end up as a professor) you can still keep up with things by reading current research and by attending conferences as an 'independent scholar'. The review process should also be blind so that you won't be rejected simply because you aren't affiliated with a university. If you write a decent abstract, it will probably be accepted somewhere. (Don't submit to multiple conferences at once, but you can revise and resubmit elsewhere if it's rejected.)

The type of feedback you'll get will vary based on specialization: as I said very few people will specialize in Croatian toponyms. But you can get some feedback from Croatian experts. And you can get some feedback from historical linguists who can check the sound changes in your derivations, etc. Overall the reviewers will most likely just be looking to see that your approach seems reasonable overall, rather than fact-checking any individual details, so it's up to you to present your best work, and then see what happens at the conference, or during the article's peer review. Hope that helps!

That said, there are some other places online you can try to talk to people. Nothing specific comes to mind for this topic, though. You might find some people with Croatian interests on Croatian forums, as you're already trying, and perhaps some of them have an interest/hobby in etymology. The forum here is a bit slow these days, but I'm always hoping more people will start using it, so it's nice to have active members like you around-- sorry we don't have any Croatian toponomy experts though!
Welcome to Linguist Forum! If you have any questions, please ask.

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #11 on: August 22, 2017, 11:30:28 AM »
Quote
At some extreme time depth, probably. But not more closely than other groupings (for example there is relatively more evidence for Indo-European and Uralic being related, and there are also the proposals for Nostratic and Eurasiatic).
What kind of evidence counts there actually? I have found a simple sound law (that PIE *s corresponds to PAN *q), and there are six examples of that on the Swadesh list. What's the probability of that if they aren't closely related?
Check my math. We are mostly dealing with 2-consonantal roots. Let's be generous and say I allowed myself the semantic drift of 3 words. Both proto-languages have about 20 consonants. So, if the word in PIE has an *s, the probability that I find a word in PAN in which PIE *s corresponds to PAN *q is 1-(1-(1-(1/20))ˆ3)ˆ2=26%. Swadesh list has 100 words, so we can expect that 100/20=5 words where the first consonant in PAN is *q, and 5 words where the second consonant is *q, so that there are 10 words where PIE *s can potentially correspond to PAN *q. The probability that the rule coincidentally works in 6/10 words is 1-((1-0.26ˆ6)ˆ10)=0.31%. The probability of finding such a pattern in two truly unrelated languages is 1-(1-0.0031)ˆ20=6%. That's pretty low.
Quote
flatearthsociety forums? Discuss where you want of course, but I'd be surprised if that's the best place for scientific perspectives!
Well, I wanted to have some fun with my knowledge of linguistics and discuss a conspiracy theory that the laryngeal theory was incorrect and that the linguists who study the Anatolian languages are hiding that. The Flat Earth Society forum appeared to be a good place to do that.
Quote
Another probably easier way to start off would be to present your work at a linguistics (toponomy?) conference.
Look, I am a 17-year-old from a small town in Croatia, so I don't think such options are available to me. Thanks for your time, though.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #12 on: August 24, 2017, 02:50:00 AM »
Quote
What kind of evidence counts there actually?
Extensive, systematic evidence. Not chance similarities. I would recommend reading a textbook about historical linguistics. A good one would cover this. Hock & Joseph's "Language History, Language Change, and Language Relationship" (or Hock's longer and more detailed "Principles of Historical Linguistics") or Campbell's "Historical Linguistics" are good introductory texts.

Quote
I have found a simple sound law (that PIE *s corresponds to PAN *q), and there are six examples of that on the Swadesh list. What's the probability of that if they aren't closely related?
.... The probability of finding such a pattern in two truly unrelated languages is 1-(1-0.0031)ˆ20=6%. That's pretty low.
You have calculated the probability of that particular correspondence, which indeed is very unlikely. But you weren't just looking for one particular correspondence. You were looking for any correspondence-- q/s, q/r, q/q, etc., etc. So for 20 words, there are 20*20 = 400 possible correspondences. So you must consider what the odds are that ANY of the 400 correspondences would appear.

Let's look at an analogous problem:
What are the odds that you share the same birthday with your friend? 1/365, right? (Or 1/366 or 365.25 to be precise.)
This means it would be surprising if any friends had the same birthday, right? No. Surprising things happen all the time. That's how statistics work.
Imagine a classroom: what are the odds that someone else in the room shares your birthday? If there are 30 other people, then that's 30/365-- low odds, but still possible. But what about any two people sharing that same birthday?

See: https://en.wikipedia.org/wiki/Birthday_problem

By just 24 people in the room, the odds are that some pair shares a birthday! So finding any correspondence between languages is not unlikely at all.

So assuming .31% is right (it's more complicated than that*), that's roughly 1/322 (.31/100). 322 is close to 365, so we can use the birthday problem by analogy. Let's assume 20 sounds in each language: there's a 41.1% chance of a 'shared birthday' (see Wikipedia), or in our case a chance sound correspondence. Actually, it's a little higher than that (or substantially, because 365-322 is a big difference when multiplied out a lot as fractions), so let's say about 50%. That's very different from the 0.31% you calculated!

(*The main problem is starting with the Swadesh list because it's too small: you need systematic correspondences across a wide variety of words. Ideally, the correspondences would be exceptionless, setting aside borrowings, and of course any contextually conditioned words. The Swadesh list gives you a good starting point, but it's far from enough to fully determine linguistic relationship.)

--

Regardless, that math is irrelevant because it's a more complex problem. Finding a recurring correspondence in different words is actually less likely than the math given by either of us. But I'm certain that if you looked at more vocabulary you would start to find exceptions, so it's again irrelevant anyway. The issue is that you would need to find genuine cognates (not just the same number in the Swadesh list) to start to establish real patterns. It's an interesting coincidence, but coincidences happen all the time. Just think about how the odds are stacked against an American PhD student 'randomly' talking to a Croatian 17 year old online! (But the odds are very high that any two 'random' people would be talking online-- a very different question.)

Anyway, there is a great article here that can explain chance correspondences in much more detail (and much better than I did above):
http://www.zompist.com/chance.htm
Having no chance correspondences would actually be surprising. The burden of proof is more than that: it's finding multiple, systematic, widespread correspondences.

Quote
Well, I wanted to have some fun with my knowledge of linguistics and discuss a conspiracy theory that the laryngeal theory was incorrect and that the linguists who study the Anatolian languages are hiding that. The Flat Earth Society forum appeared to be a good place to do that.
No judgment here. It's up to you to decide what you want to do: play linguistics, or do linguistics. The internet is not usually too serious/academic, but there are exceptions.

Quote
Look, I am a 17-year-old from a small town in Croatia, so I don't think such options are available to me. Thanks for your time, though.
Good to get an early start! There are several different ways I can respond to this:

You're young. That's good. There's time to do a lot. Do you plan to go to a university to study linguistics? Many linguistics undergrads hadn't even heard of it until they started studying it. You're ahead, not behind. But if you don't want to go to a university to study linguistics (or take classes, if you're planning to study something else), you can still do some things on your own.

As a high school student, you might still be able to go to a conference. You're right that you may not be able to present, but you could stop by. Or at least you can in a year or two. I don't know of any laws (I'm not from Croatia though!) about age limits on attending conferences, as long as you are polite/respectful and you pay the registration fee (you can pay the lower 'student' price!). There are conferences multiple times per year in Croatia (I see them on Linguist List!), so you really could do that. Or you can wait a few years until you're at a university studying linguistics. Or until later when you're a grad student. Undergrads are not usually required to do research or attend conferences or publish. But they can, and it's a good experience, usually enjoyable too. And you get to find out if you like it-- is that a good career, or would you rather do something else?

Overall, whether you're 17 or 70, you do not need to be affiliated with a university to publish or present at a conference. That's why the process is "blind". That means they won't know your name, your age, your status, your university, or anything else. The term "Independent Scholar" is used in place of "Studies/Works at University of X" for anyone not affiliated with a university. Maybe a hobbyist, maybe a fieldworker who doesn't currently have a university job, maybe a professional working outside of academia, or maybe someone who couldn't find a job in academia this year. Or even a high school student. Really, I don't think there are rules against that. And they won't know it until you get accepted. It's possible the editors (who would know who you are) won't accepted the paper for review, but that's unlikely. And usually impossible for conferences where the abstracts are submitted anonymously online, so there's no one to "check" and stop you. Of course there is an expectation of responsibility that you will be presenting legitimate research (I'd ask for a professor at a local university to give you some comments on an abstract you plan to submit!), but if your intention is genuine, I don't see a problem with that. But again, you can wait a few years if you want, no hurry!

I will add, though, that academia is not a fast thing. So you have a lot of time. But you also should not expect fast results. Publishing can take a year or more. Presenting at a conference often requires planning 6 months in advance. And most importantly, just because you have some ideas (even good ideas) doesn't mean they'll get out there and be accepted right away. It might take a whole career (if they're "big" ideas).

Most importantly I would sincerely recommend that you visit a linguistics conference (not presenting yet, unless you want to try that also) just to get a sense of it. I didn't guess you were 17 (from your phrasing I could tell you were not in academia, or just starting), and it seems like you'd really like this stuff. So check it out if you can. And if you can't, hopefully you can in a few years. Regardless, there's nothing wrong with emailing a professor with similar interests and asking about the topic, if you plan to keep working on it, and maybe get some advice about your plans for going to a university!
« Last Edit: August 24, 2017, 02:55:17 AM by Daniel »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline FlatAssembler

  • Linguist
  • ***
  • Posts: 78
Re: Croatian toponyms
« Reply #13 on: August 24, 2017, 12:05:35 PM »
Quote
So you must consider what the odds are that ANY of the 400 correspondences would appear.
I have been thinking about this a little more. You are right, I didn't calculate exactly what I intended. 6% is the probability of such a rule appearing to exist for PIE *s. The probability of it existing for any phoneme is actually 1-((1-0.06)ˆ20)=71%.
Quote
The main problem is starting with the Swadesh list because it's too small: you need systematic correspondences across a wide variety of words.
Really? I always thought limiting yourself to the basic vocabulary (like the Swadesh list) makes your argument more compelling.
Quote
Anyway, there is a great article here that can explain chance correspondences in much more detail (and much better than I did above).
I've actually read that article before. It talks only about chance similarities that appear if you aren't attempting to find regular sound laws. It doesn't tell that the same problem appears even if you try to establish them, yet alone explain the correct mathematical formula for predicting them.
That's probably because very few people would notice something like that when looking at the Proto-Austronesian Swadesh list, and most of those who do will immediately dismiss it.
Quote
It's up to you to decide what you want to do: play linguistics, or do linguistics.
You know, I like reading non-mainstream science. It's always interesting to ask yourself: "Suppose you are wrong about some well-known or usually assumed thing. How could you know it?". It's not good to read alternative science before looking into the mainstream science, because, well, they will bombard you with controversial claims you won't be able to evaluate. If you don't know much about astronomy, it's counter-productive to look into the arguments made by the Flat-Earthers. But if you know a lot about linguistics, perhaps it would be interesting to hear and discuss the arguments for the Laryngeal Theory being false.
Quote
Do you plan to go to a university to study linguistics?
Lots of things interest me. I also know a lot about the programming languages, and people tell me I could make the most money by studying informatics.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1576
  • Country: us
    • English
Re: Croatian toponyms
« Reply #14 on: August 26, 2017, 02:19:56 AM »
Quote
I always thought limiting yourself to the basic vocabulary (like the Swadesh list) makes your argument more compelling.
That's also an important consideration. But it's a little complicated. Think of the Swadesh list as just a rough draft of what might be not-borrowed vocabulary. That's it. It's not in any sense complete, and it's not even accurate in some cases (some of those words don't actually have basic equivalents in some languages). It's just some good guesses about what "basic" vocabulary might be in different languages-- and it's a reasonable starting point, but even Swadesh had a few versions-- look on Wikipedia and you'll see a version with 100 words, another with 207, etc. Those words are unlikely to be borrowed, and also likely to exist in many languages, so they're good first words to consider. But if two languages are related, you should find much more substantial patterns (also) beyond the list!

Quote
It talks only about chance similarities that appear if you aren't attempting to find regular sound laws. It doesn't tell that the same problem appears even if you try to establish them, yet alone explain the correct mathematical formula for predicting them.
That's true. And the fact that you find what appears to be (in a small set) a regular correspondence is interesting. But it's not entirely surprising because you're looking at a small set of data, and allowing any correspondence. The sections there about semantic flexibility are also important-- you'll notice many Indo-European correspondences are not to words that mean exactly the same thing, but to words that can be attributed similar origins with sometimes complex etymologies. At a time depth of (proposed) PIE and PAN unity, those changes would be substantial, so I again refer to the 'bull' example for Semitic/Indo-European, which was just too good (transparent) to be true.

Quote
You know, I like reading non-mainstream science.
Nothing wrong with that-- assuming it's science rather than unfounded theories. But where to draw the line? An unanswerable question...

And yes, you can get some good ideas by looking at bad ideas, but don't take them too seriously.

Quote
But if you know a lot about linguistics, perhaps it would be interesting to hear and discuss the arguments for the Laryngeal Theory being false.
Correct! But that's not 'non-mainstream science'. That's fact-checking, hypothesizing, and (attempted) falsification, all important parts of normal mainstream science.

Quote
Lots of things interest me. I also know a lot about the programming languages, and people tell me I could make the most money by studying informatics.
What you study should interest you, and it should also prepare you for life after your studies (whatever that is, and however you want). So if you want to make lots of money, linguistics is almost certainly not the answer. And linguistics is almost purely academic, so a BA in linguistics is in itself not going to get you very far (unless you plan to also become a language teacher, or something like that), and even an MA (in theoretical linguistics) won't do much-- it's a PhD that you need, if you want to do research, as in these questions you're discussing here. But there are many other ways to approach this also: get a double major, so you can do some linguistics while also studying something else ('for the real world'), or take a few classes at least. At my university they recently created a joined major (single program) that is linguistics and computer science. Regardless, linguistics is usually a relatively small program (few classes) so it's often easy to combine it with something else (as a minor or second major). And if you end up not wanting to stay in academia (getting a PhD and becoming a professor), then there are some other paths as well: a master's in teaching ESL, a master's in speech & hearing science / speech pathology, and so on. Getting a degree in Applied Linguistics (broadly defined) is better for the real world, while theoretical linguistics will more directly address your questions here. As for informatics and computer science, if you learn statistics and/or programming, there are also some ways to combine that with your linguistics knowledge, whether that's working on something like Google Translate, or doing cutting edge statistical research on historical linguistics (but hopefully with a strong background in linguistics to avoid blindly running algorithms on 'big data' and having bad results, as in even some published research).

Regardless, you don't need to decide anything now. But if it interests you, try taking a class (and if you're really motivated, go to a conference!) when you have some time.
Welcome to Linguist Forum! If you have any questions, please ask.