Author Topic: About describing local history by means of linguistics  (Read 1785 times)

Offline Juan_007

  • New Linguist
  • *
  • Posts: 3
About describing local history by means of linguistics
« on: February 04, 2016, 07:07:51 AM »
Dear linguists,

This being my first ever post in this forum I want to give a short introduction about me (if you are not interested you can skip this paragraph completely until the line of ***’s, after which I formulate my main questions/discussion points):

I'm a Master Marine Sciences student at Utrecht University (The Netherlands) with a Bachelor in Biology. My Master is a very interdisciplinary study combining Earth Sciences, Chemistry, Biology and Physics, and I even had an obligatory course about international law.

I have always had a keen interest in history in a wide variety of manifestations. Even so much I want to invest my precious time, efforts, wits and creativity in combining my passions for History and Environmental Sciences by developing a very ambitious project.

My academic studies are for a great part about reconstructing and describing our physical world by taking and analysing samples and by descriptive and predictive (computer) models.

Now I’m looking for ways to describe our historical world in a similarly accurate way, and combine this with reconstructions in my own field of science. And I think linguistics could assist me in this to some extent. I have been strolling the internet looking to make a database describing the political and geographical human history. So how do people(s) spread and occupy Earth and how do they organize themselves.
I’ve determined the timeframe I’m interested in as roughly 1100 - 1800 AD, partly because of the quantity and quality of data available in numerous of aspects.
I can define a certain population, or ethnicity, as a group of people sharing (either, both, or all) a similar culture, religion, or language.

****************************************************
My main question is:
How can linguistics assist me in describing world history at a local level?

I think I’m not so much interested in studying the specific structures, morphologies, lexicons, or what you call it, of specific languages/dialects. But I am interested in interspecific relations and similarities between them. Since I’ve realized this I’ve been strolling the internet learning about language families, how they are constructed and what they sometimes tell us about how people spread or organized themselves.
(side note:) I have downloaded a list with ISO 639-3 codes, but comparing this with the database of peoples and ethnicities I have been constructing over the years, I found out it is very comprehensive, but not complete. Moreover I ideally want to use endonyms when describing peoples, and ISO 639 is full of exonyms, or even pejoratives. But I reckon for the scope of the database I want to construct I need some universal standard to be able to collect data from different sources.

(1) Are there databases available with numerical data on the relationships between languages? So with language distances and mutual intelligibility within language families and from one family to another? I’ve been reading literature about how families are constructed, but never found comprehensive data of numerical similarities or distances where they base their conclusions upon.

(2) Would it be possible to come up with a model to account for how these relationships change due to factors like language contact intensity, geographical isolation, lexical borrowing, etc.?

(3) Any factors you can point me towards which are highly relevant to how mutual similarity is influenced, or comprehensive literature about this subject would be highly appreciated!

I'm looking very much forward to any input, views and thoughts from members of this forum about what I just typed.

Best wishes!
« Last Edit: February 04, 2016, 07:10:38 AM by Juan_007 »

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1575
  • Country: us
    • English
Re: About describing local history by means of linguistics
« Reply #1 on: February 04, 2016, 01:04:12 PM »
This sounds like an interesting project, but also difficult.

Very broadly, I would say that computational historical linguistics is on the verge of being able to attempt something like your project, but you won't find lots of options in the current research and not a lot of free data (or even anything already compiled anywhere).

The other problem is that the best computer models tend to be poorly linguistically informed. For example:
http://science.sciencemag.org/content/337/6097/957.full
That paper has some good techniques but some bad data and the results are largely rejected by linguists. (I can link you to some arguments about why if you want.)

The trouble is that a lot of historical linguistics is educated guesses, and computers don't make good educated guesses based on the data that is already itself just educated guesses. A linguist needs to look at the details to determine plausibility, and the results are only as strong as the data itself. So we may know that there were different dialects at one time, but we won't know exactly who spoke them where, while we can make guesses about all of the above.

The problem is that the bad results are interesting and then taken by mainstream media and others outside of linguistics as more reliable than they are, along with some cool graphics:
http://www.businessinsider.com/map-how-indo-european-languages-evolved-2014-12

The best thing you can do is take the reliable data that you have and then systematically fit the less reliable data into that model, making predictions as appropriate.

But I don't know how well any of this will help you if your goal is then to look for correlations with other data sets. Perhaps it will work if your scale is large enough, e.g., Latin dialects vs. Germanic dialects.
Welcome to Linguist Forum! If you have any questions, please ask.

Offline Juan_007

  • New Linguist
  • *
  • Posts: 3
Re: About describing local history by means of linguistics
« Reply #2 on: February 05, 2016, 06:08:41 PM »
Difficult indeed. It's a highly complex problem for which the solution maybe needs more than just one approach. But that will not necessarily stop me.

First off, that movie is a bit ridicule even in my inexpert view. But I believe that’s why you showed it to me. I don’t think it really communicates anything effectively, and what it tries to communicate is at best an oversimplification of a misinterpretation…
It makes it look like there was no language before this origin of Indo-European languages in Europe, while Homo sapiens sapiens already occupied most of Europe some 20,000 years before the scope of the article. And then we can even discuss if it - another assumption made in the video - is true any other species outside (both) H. sapiens species would lack language. I know from memory there is no consensus anymore about H. neanderthalis lacking comlex language. There is abundant circumstantial evidence in favour of complex language in H. neanderthalis; they seemed to have all the tools in place to develop verbal, syntaxical and therewith ‘complex’ language. There was a time when ‘we’ coexisted with other Homo species and excessive contact with H. neanderthalis might be hard to deny. This also spawns the possibility of occasional interbreeding and for instance influencing each other’s languages. Making to me the discussion about language origin almost futile, especially when discussed within a H. sapiens sapiens context. I believe I’ve just discovered I’m an agnostic when it comes to the origin of language.

But back to topic!
I do think the article you showed me is useful for the discussion I want to hold. The method in the article is fortunately not very alien to me. I’ve worked with phylogeny more than once during my Biology study. I’m aware of most of the principles (should be all; I don’t want to shame my professors, I just mistrust how my brain works).
But in Biology big databases have been created (and are being expanded probably as we speak) with abundant (genetic) data of species (plants, animals, fungi, viruses, bacteria) from multiple sources and when you have access to these data you can run software to create your own trees based off available data, using different approaches relatively easily by changing the way the software works. The software will make the comparisons for you. During my Bachelor for instance, students had to confirm HIV actually came from contact with Chimpanzees by using these kinds of databases and software.
So, since I’m looking into linguistics I’m wondering if there is anything similar in this branch of science. Maybe not as sophisticated as what I just described; Ideally there would be a database full of morphological and lexical data and some software(s) to determine cognates and lexicostatistics between languages in any desirable approach.
But really anything that centralizes data on just linguistic distances, or intelligibility would in my opinion be useful, not only to my project, but to linguistic studies in general. I’ve looked into Ethnologue, but there you only occasionally see how mutually intelligible a language is with another or a couple. It would take me an immense amount of time to come up with a database that way and then it will still be Swiss cheese with almost solely holes in it. Is there even nowhere in linguistic literature where the statistics behind the formation of a certain family of languages is collectively published? How could Terence Kaufman otherwise come up with his ‘conservative’ grouping of South American languages he believes aren’t very likely to be broken apart?

On a side note: I get that classification gets difficult when you only have one or a handful of words available, and it will never be as reliable as taking DNA samples (though in days now passed phylogeny in Biology also relied more heavily on morphological differences and as long as you keep the unreliabilities this presents in mind you can accomplish a lot), but isn’t it very useful to centralize the data we do have? It will make it easier to apply different approaches to a similar root of data.

For my project I’m not so much interested in data underlying the statistics, but more so in these abstract outcomes itself, like linguistic distance, lexical similarity, or mutually intelligibility (like I said sometimes pops up in the descriptions in Ethnologue). Is there any way to obtain this kind of data relatively quickly? Or would it mean ticking off every source on the construction of every language family in the hopes of finding desired numerical, lexistatistical data?

The scale of my project is ultimately global. I want to describe what the world looked like at a local level between roughly 1100 - 1800 AD.

For Europe there is relatively abundant and comprehensive data on what it looked like politically, so I would be able to determine the smallest possible territorial building blocks for a map with relevant local data (e.g.: parish, manor, lordship, amt, etc.). There are some (sometimes profound) differences between different regions and cultures of Europe, but with a little care, prudence, caution, a lot of reading and patience I think I'm on the right path of creating a map with smallest possible territorial units of comparable ‘value’. My working definition is looking for the smallest locality local inhabitants would identify themselves with at that time. This could be church/religion related in Christian Europe, although you could also look at it from a different perspective and look for where an inhabitant would seek justice. Oftentimes these perspectives overlap in different ways in different regions and areas, so it is a bit guesswork here and there and by no means universally true. I think it should be possible somehow to do the same for the Arabic world and Asiatic civilizations of the period with their counterparts of administrative organization, but didn’t look into it more than just skin deep for now.
Last summer I've been working on actually mapping my home country The Netherlands in QGIS (and have taken big chunks of contemporary Belgium in the same breath). For now it consists of a lot of dots sitting on reference maps, because drawing all boundaries between them will take a lot of time if I want to do it accurately, so I won't do that until I'm satisfied with what dots to use, so to speak, and I can’t be completely sure unless I’ve a clear image of what the total world map will look like. Alternatively it would also be possible to let QGIS draw some generic boundaries for me, which would not be historically accurate and then adjust some boundaries.
For England, Wales and Denmark I actually found GIS shapefiles with the kinds of results I'm ideally after in the national archives of respective countries. Complete with boundaries accurately drawn in GIS from dozens and dozens of old local maps as references. I was very thrilled to be granted access to download them for personal research. I guess I will need to get back to them once I want anything with it beyond personal research to negotiate about licences…
And I'm aware of similar efforts going on in Scotland, and I know there is a similar sort of GIS map created for Sweden out somewhere, because I found a study on plague spreading in Sweden making use of such map.
All in all, it for once feels like I’m born in exactly at the right time.

I've also been working over the past few years on trying to determine what the Pre-Columbian Americas could have looked like politically and ethnically. I've often used this Wikipedia page as a starting point. There are many flaws and weaknesses in it, but overall it has been a helpful starting point. I’ve learned about how we use terms like ‘tribe’, ‘band’, ‘village’, ‘chiefdom’, ‘clan’, ‘kin group’ and many more in very different ways depending on who is talking and about what. Oftentimes definitions overlap and change from topic to topic (or even within the same topic). It can be a confusing mess.
For peoples with (at least sufficiently) sedentary lifestyle and organization, I want to look at what data I can get on the amount and locations of their villages. Then I will compare this for the cases where I can get very complete data with how these locations are related to contemporary populated places and see if I can determine some kind of relation. That way I might be able to use data of contemporary populated place to visualize how peoples were located through time.
For some peoples there is data on how they’ve organized themselves in terms of local administration. Think about the big empire states and some of the better described peoples in parts of North America.
I’ve also discovered that South America is the most challenging area of the Americas in terms of coming up with any useful way of determining what kind and how many peoples were inhabiting the continent before western contact.
Even though I think of historical linguistics as a probable tool to bring some kind of order in the less well described areas of the world, even then things become sometimes blurry when a people, or group of peoples (it’s too blurry) speak about the same language, share very similar cultural and religious traits, but don’t recognize each other as being ‘same’ or ‘similar’. Almost opposite to for example the Mexica recognizing other Nahuatl speaking peoples with different cultural and/or religious traits as being somehow related or even similar. I even believe they recognized (some) Mayan peoples as being somehow related because of sharing common cultural and religious traits with their common ancestors the Toltec.
But that’s where linguistics become handy I guess. I need it as a tool to give me some kind of order. I already note down which language a people spoke and if we know an overarching grouping/family. But I feel there is more I can get out of this with applicable data. That’s when my need for a discussion with some proper linguistics arose. And thinking about it now, it would probably be a very useful tool for African peoples as well. And to check how European linguistics relate to its political structures, if only to create a frame of reference when I rely heavier on linguistics for less intensively described regions of the world.

Offline Juan_007

  • New Linguist
  • *
  • Posts: 3
Re: About describing local history by means of linguistics
« Reply #3 on: February 05, 2016, 06:23:51 PM »
Discovered my hyperlink didn't work and I can't add one when editing my post.
Where I said "this Wikipedia page" I wanted to create a hyperlink with those words to this:
https://en.wikipedia.org/wiki/Classification_of_indigenous_peoples_of_the_Americas

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1575
  • Country: us
    • English
Re: About describing local history by means of linguistics
« Reply #4 on: February 08, 2016, 10:57:12 AM »
Here's a bit of a short reply for you:

1. The video is showing the origin of the Indo-European languages, which basically everyone agrees come from a shared ancester somewhere in central Eurasia between 5,000 to 9,000 years ago. It doesn't mean that the unlabeled areas did not have speakers of other languages or that all languages are related to those. The video is pretty good. The problems are (1) the homeland is probably somewhere in/around modern Ukraine rather than in Anatolia (Turkey), and (2) the dating is probably closer to 6,000 than 8,000 years. The problems are due to letting a computer decide everything based on raw statistics rather than using solid and informed linguistic analysis.

There is no big central database that I am aware of. I would suggest possibly emailing the authors of that study and others.

The data that is generally used in experiments like this is based just on word lists, which are easy to get (if you can find a dictionary or find word lists somewhere else) but also problematic because words (whether they are similar across languages, or even looking at small sound changes in those shared words) don't really capture the nuance of the history of languages. It might give a ballpark answer, but it's really not going to do much more than that. And as for more specific data, we just don't have it in many cases.

You might want to take a look at this video. Although it doesn't go in much depth it gives a relevant perspective on how linguists typically research these things, and then you can look into the computational methods elsewhere, which as I said are basically the same idea of making word lists and comparing them.
http://www.pbs.org/wgbh/nova/transcripts/2120glang.html
Unfortunately I'm not seeing any links to the video itself online, but it looks like that transcript has the relevant information.

As for some computational methods, a friend of mine wrote his dissertation about comparing dialects (looking at dialect distance) in Arabic using word lists (and more complex approaches) computationally. The approaches could be used equally well for looking at historical aspects (assuming word lists are good data in the first place, which is not always a good assumption, see above).
http://hdl.handle.net/2142/78346

I don't think anyone would disagree that a centralized database is a good idea. But I don't think anyone has made it yet either. You can fairly easily find word lists for various languages, though.

Quote
The scale of my project is ultimately global. I want to describe what the world looked like at a local level between roughly 1100 - 1800 AD.
Sounds very interesting.

Quote
For Europe there is relatively abundant and comprehensive data on what it looked like politically, so I would be able to determine the smallest possible territorial building blocks for a map with relevant local data (e.g.: parish, manor, lordship, amt, etc.). ...
I'm certain that this information exists, but I don't believe it is centralized. One source you could consider is to look at dialect atlases (often written in the relevant local language!) which actually do a good job at a very local level, although probably mostly within the last 100-200 years, and maybe only for a few European countries (like Germany).

Quote
I've also been working over the past few years on trying to determine what the Pre-Columbian Americas could have looked like politically and ethnically....
Much harder. Certainly the general location of nations/peoples can be approximated, but it will be very challenging when you go back toward 1000 years. And regardless, it just isn't entirely known what different areas looked like.

In the end, you will need to either do what that video did and make a good guess based on perhaps bad data/analysis, or not make a guess at all and not present an answer. You could alternatively compile a lot of different opinions about what might possibly be the right answer, without selecting one yourself, but that doesn't make for a very interesting map.


As an example, take Proto-Indo-European: basically everyone agrees there was a single ancestor to all of those relevant languages, but we don't know when it was spoken, where it was spoken, or who spoke it. But we can make some guesses (which may be wrong). And that's both the best studied and most confident case like that. It is going back another 5000+ years before the time you're interested in, of course, but it just gives you a sense of the problem: we aren't even sure what languages are related to each other in much of the world (although there are lots of theories, some of which are better than others).


As for compiling maps and a central data source, this one is great:
http://wals.info/
But it's not going to give you historical data.


In short, you won't find any good/reliable answers in just one location. The only way to do this well is to spend a lot of time with various sources for each region and working out the best answer possible for that region at this time. It won't be the same source material for each region, and you'll find different (and sometimes similar) problems for each. The data is out there to do something, but not conveniently nor necessarily reliable at the level you'd like.
« Last Edit: February 08, 2016, 10:58:50 AM by djr33 »
Welcome to Linguist Forum! If you have any questions, please ask.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1575
  • Country: us
    • English
Re: About describing local history by means of linguistics
« Reply #5 on: February 09, 2016, 04:34:43 AM »
Just came across this, which might interest you:
https://doi.org/10.1515/dig.2006.2006.14.104
Welcome to Linguist Forum! If you have any questions, please ask.