Recent Posts

Pages: 1 ... 7 8 [9] 10
English / Derivatives in -ist
« Last post by vox on August 20, 2018, 03:41:08 AM »
I have questions about these derivatives’ category in English.
1. Can all of them be both nouns and adjectives ?
2. Is it considered as a conversion A>N or N>A by morphologists ?
3. Are there adjectival doublets in -ist and -istic that are perfectly synonymous ?
Thank you.
Feedback, Help and Forum Policy / Re: Why is this forum Eurocentric?
« Last post by Daniel on August 18, 2018, 03:11:18 PM »
The longest threads speak to individuals, not general interest.

As for Eurocentrism in linguistics in general, it's because there are more Europeans and Americans working in linguistics than in other areas, and it's something that many linguists are hoping improves, both by focusing our research on other areas and by getting people from other areas and speakers of non-Western languages involved as linguists themselves. There's also a bit of a feedback loop where, for example, a historical linguistics class could certainly be taught based on the Austronesian family-- there's plenty of data and it's also conveniently very clean data (often languages separated on different islands and not in too much contact after separation), but we've all been taught in classes mostly about Indo-European history for reasons of tradition, available textbooks, the knowledge of our instructors, etc. Things are improving, slowly, especially as some of the areas of research about European languages begin to dry up (relatively few big new things are left to be discovered, although even for English many minor details that could have profound influence on theory are still being investigated and debated). So, for example, as prior Indo-Europeanists turn their attention to other areas, because they already have a reasonable understanding of IE and aren't answering many new questions, we might see more of a focus on other areas.

The familiar will probably always dominate discussions, questions and research, but we can work on also representing other things well, and also shifting what is familiar. Intro to Linguistics classes around the world should emphasize signed languages more, for example, because that is a severely understudied area, even more than geographically diverse oral languages. It's happening, but slowly.

Especially at the amateur level, it shouldn't be surprising that people are interested in things close to them. But there are also plenty of questions about things elsewhere including on this forum.
Feedback, Help and Forum Policy / Why is this forum Eurocentric?
« Last post by Voynichologist on August 18, 2018, 10:13:55 AM »
So, what do you think, why is this forum Eurocentric? I mean, the longest thread on this forum (according to the statistics) is "The Language of Old Europe" and the third longest thread is called "Croatian toponyms". Why do those things interest people more than the Native American or the Aboriginal languages?
Computational Linguistics / Re: Frequency and log-likelihood applied to a supcorpus
« Last post by Daniel on August 17, 2018, 11:39:58 AM »
Your hesitation is appropriate.

However, what is interesting about Zipf's Law is that it applies regardless of the particular data set you're looking at. (Obviously more approximately on smaller data sets.) So, it's possible there could be reasons for using it even on small data sets in your approach. But yes, look into existing research, and make sure there is a principled reason for approaching it that way.
Computational Linguistics / Re: Frequency and log-likelihood applied to a supcorpus
« Last post by Nemi on August 17, 2018, 11:37:25 AM »
Daniel, thank you a lot for your comment!  :) You're right, will search more for already existing measures. I wish I had more mathematical background, maybe it's also time for me to read more about statistics in general. I was too tempted to use log just because the results were amazing, but the more I think about it, I'm convinced that it is not statistically relevant here. It would be, if I used the whole corpus, maybe I will look into that.
Computational Linguistics / Re: Frequency and log-likelihood applied to a supcorpus
« Last post by Daniel on August 17, 2018, 11:04:15 AM »
I can't answer this from the technical perspective of a computational linguist, but I'll try to comment briefly.

When we use a statistical test, we are usually comparing two things. We want to show (roughly) that one is significantly more likely than another. There are much more complex statistical tests, but they all boil down to basically that core. This means that you can either compare two values to see whether one is bigger than the other (whether the difference between them is significant), or whether one value is significantly different from an expected value (e.g., 0 (lack of or observation of an unexpected event), 50% (as in a coin toss), etc.). Since the things you will be comparing will be similar, the comparison is usually unit-less. So in that sense, a log-transform may not be a problem at all. Of course it might mess up the statistical test mathematically, so beware of that. But in principle this may be fine, if you pick an appropriate statistical test. (Note that your intuitions about statistical distributions are irrelevant, so you shouldn't use logs just because you think they look better, but rely instead on a statistical test to tell you whether the results are 'interesting', that is, significant, in a technical sense.)

Regarding frequency and log likelihood for words in a corpus, you probably already know about Zipf's law:
But that partially addresses what you're observing: the relationship between the frequency of words is typically linear on a logarithmic scale. So there may be some sound motivation behind your decision.

The best answer is to consult current sources (e.g., journal articles) doing something like what you're trying to do, and then using the same (or slightly adjusted) methodology for your project. That's how you're likely to get published, at least. If you don't have a good reason for doing something else, that's where to start.

There are various approaches to dealing with frequencies in corpora (that's really the whole field of corpus linguistics!), and there are a number of ways to try to standardize values, find a balance between frequent words and frequent pairings to look at interesting patterns of attraction (e.g., etc. Don't reinvent the wheel if there's already a method out there, and that way your results will be comparable anyway.
Linguist's Lounge / Re: Introduction Thread
« Last post by Nemi on August 17, 2018, 10:50:25 AM »
Hi to all! My name is Nemi and I just started to dive into Corpus Linguistics. Currently I'm doing some research on Twitter. As I am pretty new to all this (my former field was philosophy), I have a lot of questions and was very happy, when I discovered this forum. I hope my questions are not too benign and apologize in advance, if they are :) Thank you for reading and nice to meet all of you!
Computational Linguistics / Frequency and log-likelihood applied to a supcorpus
« Last post by Nemi on August 17, 2018, 10:42:44 AM »
Dear all,

I am currently involved in a research project regarding a Twitter Corpus. I'd like to analyze the emotional tendency of tweets with specific keywords. Right now I'm compiling lists of frequent collocations and here is my question:

I compiled frequency lists of collocations to my keywords, which are fine. However, log likelihood scores bring me much more interesting results to be honest. It contains more hashtags and emotional adjectives, where as frequency is (obviously) listing "and" and "I" at the beginning.
As I am a noob regarding statistics, I'm not sure however, if I can use the log likelihood score to analyze a sub corpus without comparing it to something else. (I know that the MI-score has shortcomings, as it ranks less common words more highly. That's why I ruled it out.) I have the feeling that the results for log-likelihood would be a pitfall, since it is not measuring the whole corpus.

When analyzing collocations and I just want to know what people type mostly around a keyword, not in comparison to anything - not even the whole corpus itself! -, is it sufficient just to go by frequency or would a corpus linguist cringe? The scores measure probability, but since I already have my subcorpus targeting my keyword, is the score even applicable?

Best regards
Linguist's Lounge / Re: Is Burushaski, at its core, an Indo-European language?
« Last post by Daniel on August 17, 2018, 09:06:33 AM »
That's a fringe argument. Similar ideas have been proposed for decades, and they just aren't conclusive, because if there is such a relationship, it's too distant to demonstrate beyond a reasonable doubt. I'm not particularly opposed to the idea of it, but just about any theory for Burushaski's relationship is as good as any other at this point, so all you get from papers like that is some limited evidence in favor of one argument, but it's really just "circumstantial" (in the legal sense) because it doesn't show the relationship beyond a reasonable doubt-- it just would be compatible with the explanation, and if the explanation is correct, then probably a remaining trace of the original relationship.

In the end, questions like this are interesting, but there's a reason they haven't been answered. Of course Burushaski is related to something-- that should surprise no one. But we haven't yet been able to show which living language(s) that would be.

There are three reasons to remain skeptical:
1. If this was shown beyond a reasonable doubt, it would be big news. The fact that linguists have not reached consensus is telling.
2. Indo-European is an easy and lazy possibility. Given the extreme time depth (something like 10,000 years or more?), there are many, many other viable possibilities, and the Indo-European-centrism is just an artifact of the sociology of the field. It's no more likely than any other family to be related to Burushaski, but there has been a huge amount of research trying to link those up, so in a sense this is almost evidence against that particular possibility. It might be right, but why not also look just as hard at a possible connection to Tungusic or whatever other families haven't gotten that much attention. In the end, if you look hard enough for patterns, you'll find something that looks like a pattern, but that doesn't mean it's really evidence, especially when it's weak.
3. Clear evidence of relationships comes from widespread, systematic correspondences in languages. Pointing out individual features (e.g., pronominal paradigms) that happen to look similar in two languages leaves open the very real possibility for coincidence, or even borrowing. When we see one similarity, but an absence of other corresponding similarities, we should be skeptical. There is a reasonable argument for an ancient relationship between Indo-European and Uralic based on some prominal forms, for example, and while I wouldn't completely reject the possibility, I'll remain skeptical until we know more.

It's good to think about these issues in terms of two subtly but importantly different questions:
1. What is our best guess at the moment?
2. Should we accept that guess as a probable fact?

There's a "why not" argument for thinking Burushaski might be related to Indo-European, and that's not entirely unreasonable. Maybe! But there is no reason to assume that why not or best guess argument should make us assume it is correct or a resolved issue.

The problem is that many people looking at these issues want answers, rather than more questions or just interesting discussions. And all we have most of the time is complicated details, not conclusions.

To frame this from a slightly different perspective, consider the macro-family theories such as Eurasiatic or Nostratic. On the one hand, the current iterations of those theories are probably wrong and do not have enough evidence to back them up. However, I personally like them in the sense of giving me a vague intuitive idea of what the past might have been like. So there's a way in which I think of something like that (I don't even mind calling it "Eurasiatic"), in a very vague sense (that is, plus or minus several language families, unknown at this point), is probably a reasonable understanding. But I am not saying in any sense that either (1) "Eurasiatic" as a narrow hypothesis is correct, nor (2) we have enough evidence to reject alternative explanations.

In short, the quality of explanation corresponds to the availability of data. There's nothing fundamentally wrong with having a working understanding of a problem as your current best guess, but there is something wrong with taking that to the next step. It's the difference between "maybe" or "I wonder", and "Scientists have discovered that Burushaski is Indo-European!"

It's fine to be interested in these questions, but it comes with the risk of never finding definite answers.

So, what do I think? I think I don't know. Actually, I know I don't know. And at this point for the case of Burushaski, the evidence is too weak to even be leaning one way or another as a working hypothesis. It's related to something, surely, and at some time depth (maybe very extreme, even undetectably so), but Indo-European is not really a better explanation, given available data, than anything else, at least not by much of a margin. The most likely alternative explanation, given even only the data in that paper, would be ancient contact between the families. So, at this point, we don't know. That explanation might turn out to eventually be correct, but I wouldn't bet on it yet.
Linguist's Lounge / Is Burushaski, at its core, an Indo-European language?
« Last post by Voynichologist on August 17, 2018, 05:50:27 AM »
So, guys, what do you think about the thesis that Burushaski is, at its core, an Indo-European language?
Pages: 1 ... 7 8 [9] 10