### Author Topic: Man vs. Beast  (Read 72217 times)

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #60 on: July 31, 2015, 01:33:21 AM »
But isn't that just a question of what kind of information you are measuring? So a particular information theoretic analysis may very well be incorrect. But can the entire approach of measuring information content in a domain be wrong? We just need to find the right domain.

#### jkpate

• Forum Regulars
• Linguist
• Posts: 130
• Country:
• American English
##### Re: Man vs. Beast
« Reply #61 on: August 01, 2015, 01:48:18 AM »
That metaphor seems to hold up at the sentence level, but it ignores the fact that sentences only convey meaning in an assumed context.  However, once you start looking at the level of discourse processing, it breaks down rather quickly.

...

How does probability theory actually help you generate linguistic structure?  There are two sides to language--production and comprehension.  The best that probability theory can do for you is provide you with a cloud of more or less related words.  How do you assemble those words into structured phrases that can be understood in a given context?   Where does probability help you to decide the quantifier scope?  You can extract meaning from clouds of words in a document, but constructing the document requires knowledge about how to structure the information for a discourse context.  Probabilistic approaches help with certain types of linguistic processing, but they are a dead end when it comes to real text understanding.

Probability theory helps with generating structure by using graphical models. A graphical model describes relationships between (potentially infinite) random variables. Standard results in probability theory show how to perform inference for the values of some of those variables even if we never observe them. For example, for dependency parsing, we may propose a graphical model that includes a variable for each word, a variable for each potential directed arc between words, and a constraint that only variable configurations that correspond to a tree receive non-zero probability. To parse a particular sentence, or to update our grammar, we "clamp" the word variables to the values of the words of the sentence, and then use probabilistic inference techniques to compute the probability distribution over possible trees given those words or find the tree with the highest probability. This is how the Dependency Model with Valence works (and subsequent variants). The same basic strategy has since been pursued for CCG and Tree Substitution Grammar, and inspired a new kind of grammar called Adaptor Grammars.

There has also been work on graphical models that relate strings to logical forms via unobserved syntactic structure, using CCG or Hyper-edge Replacement Grammars (like a context-free grammar for graphs). I'm not aware of a model that includes a probability distribution over discourse structures, such as those provided by Discourse Representation Theory, but I don't see any reason in principle that would be impossible. It's still just variables (that presumably have an infinite domain) that have various relationships with each other.

Broadly speaking, linguistic theories build structures by selecting reusable components (such as local subtrees and lambda expressions) from a potentially-infinite bag of possible components. Graphical models work exactly the same way, except they also define a probability distribution over different ways to assemble the components. All of the information-theoretic quantities are defined with respect to probability distributions, so any time we have a probability distribution we also have all the potentially-useful information-theoretic quantities.
« Last Edit: August 01, 2015, 03:25:50 AM by jkpate »
All models are wrong, but some are useful - George E P Box

#### Copernicus

• Linguist
• Posts: 61
• Country:
##### Re: Man vs. Beast
« Reply #62 on: August 01, 2015, 09:45:07 AM »
But isn't that just a question of what kind of information you are measuring? So a particular information theoretic analysis may very well be incorrect. But can the entire approach of measuring information content in a domain be wrong? We just need to find the right domain.
The point I was trying to make is that context is not actually part of the signal, but information-theoretic approaches are all about signal processing.  They are very useful for many different types of text processing tasks, but they don't really work well for a model of how humans actually process language.  They analyze structure in signals, but they tend not to help us understand how the structure got into the signal or what it is there for in the first place.  The term "information" in "information theory" is really about the transformational processing of data from one form to another.  It is not really about understanding what natural language expressions mean.  For that, you need to have a theory that explains the relationship between thought and language.  Signal processing approaches do not.

#### Copernicus

• Linguist
• Posts: 61
• Country:
##### Re: Man vs. Beast
« Reply #63 on: August 01, 2015, 10:49:05 AM »
Probability theory helps with generating structure by using graphical models. A graphical model describes relationships between (potentially infinite) random variables. Standard results in probability theory show how to perform inference for the values of some of those variables even if we never observe them. For example, for dependency parsing, we may propose a graphical model that includes a variable for each word, a variable for each potential directed arc between words, and a constraint that only variable configurations that correspond to a tree receive non-zero probability. To parse a particular sentence, or to update our grammar, we "clamp" the word variables to the values of the words of the sentence, and then use probabilistic inference techniques to compute the probability distribution over possible trees given those words or find the tree with the highest probability. This is how the Dependency Model with Valence works (and subsequent variants). The same basic strategy has since been pursued for CCG and Tree Substitution Grammar, and inspired a new kind of grammar called Adaptor Grammars.
OK, but I'm familiar with all of that.  I've worked in Natural Language Processing for a few decades, so I've seen variations on all of those approaches.  Right now, people are very interested in building "hybrid" parsers, which I think is what you are suggesting here.  Language generation is quite a bit more challenging than language analysis, but people have come up with marvelously clever techniques for dialog interactions.  Dialog modeling (which non-computational linguists like to call "discourse modeling") is now a very active area of research, and language generation is a part of that.  Speaking as a computational linguist, I would say that all of these approaches show varying degrees of promise for human-computer linguistic interfaces, but, speaking as a theoretical linguist, I would say that they cannot scale up to a plausible model of linguistic behavior in humans.  And, to be clear, I am talking about a causal model, not a statistical or probabilistic one.

Quote
There has also been work on graphical models that relate strings to logical forms via unobserved syntactic structure, using CCG or Hyper-edge Replacement Grammars (like a context-free grammar for graphs). I'm not aware of a model that includes a probability distribution over discourse structures, such as those provided by Discourse Representation Theory, but I don't see any reason in principle that would be impossible. It's still just variables (that presumably have an infinite domain) that have various relationships with each other.
I am more or less familiar with those approaches.  Always been a fan of Mark Steeedman and categorial grammars.  The linguistic work is all very important as a contribution to our understanding of how linguistic signals are structured, and I do see a role for probabilistic approaches in human-computer discourse interactions.  However, if we are interested in more than just simulated discourse, I become more pessimistic that such approaches lead us in a useful direction.

Why do people choose to use the words they do?  The interesting thing about a linguistic expression is that the same expression can be used to convey completely different thoughts in different contexts, but different expressions can be used to convey the same thought in a specific discourse.  You can "canoe across a lake", "cross a lake in a canoe", or "go across a lake with a canoe".  The information content differs slightly in those three expressions in that they aren't interchangeable in all discourse contexts, but they are interchangeable in some.  In some contexts, "the boy died in a fire" is understood to mean that the fire caused his death.  In others, it could just mean that he died of some other cause while in a fire.  The expression itself has no inherent meaning, although it does contain information.  It only means something in a discourse context.

Quote
Broadly speaking, linguistic theories build structures by selecting reusable components (such as local subtrees and lambda expressions) from a potentially-infinite bag of possible components. Graphical models work exactly the same way, except they also define a probability distribution over different ways to assemble the components. All of the information-theoretic quantities are defined with respect to probability distributions, so any time we have a probability distribution we also have all the potentially-useful information-theoretic quantities.
Broadly speaking, I would agree with you.  However, potentially-useful information-theoretic quantities begs the question of whether they are useful as explanatory models of human linguistic behavior.  For that, you really need a causal model.  My view is that the causal model is essentially that of causing mental events to take place, i.e. understanding or comprehension.  It is about meaningful exchanges.  The communicative function of language ultimately drives its structural properties, although linguists have shown that one can describe those structural properties while largely ignoring their communicative function.  You can use statistical modeling to predict how likely a person is to use a relative clause, but that doesn't explain why that person uses a relative clause or how it affects the thinking of a listener.
« Last Edit: August 01, 2015, 10:58:25 AM by Copernicus »

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #64 on: August 01, 2015, 07:35:51 PM »
Quote
The point I was trying to make is that context is not actually part of the signal, but information-theoretic approaches are all about signal processing.  They are very useful for many different types of text processing tasks, but they don't really work well for a model of how humans actually process language.  They analyze structure in signals, but they tend not to help us understand how the structure got into the signal or what it is there for in the first place.  The term "information" in "information theory" is really about the transformational processing of data from one form to another.  It is not really about understanding what natural language expressions mean.  For that, you need to have a theory that explains the relationship between thought and language.  Signal processing approaches do not.
Why not include context in "the signal"? The "signal" component is based on a literal transmission over a certain channel (like radio) due to the history of Information Theory (e.g., Shannon's work). There's no reason we need to assume the auditory linguistic information is the only Information in the model. So then the question is, as I suggested earlier, picking the right signal, not whether somehow context is embedded within someone's speech.

#### Copernicus

• Linguist
• Posts: 61
• Country:
##### Re: Man vs. Beast
« Reply #65 on: August 01, 2015, 10:14:00 PM »
Quote
The point I was trying to make is that context is not actually part of the signal, but information-theoretic approaches are all about signal processing.  They are very useful for many different types of text processing tasks, but they don't really work well for a model of how humans actually process language.  They analyze structure in signals, but they tend not to help us understand how the structure got into the signal or what it is there for in the first place.  The term "information" in "information theory" is really about the transformational processing of data from one form to another.  It is not really about understanding what natural language expressions mean.  For that, you need to have a theory that explains the relationship between thought and language.  Signal processing approaches do not.
Why not include context in "the signal"? The "signal" component is based on a literal transmission over a certain channel (like radio) due to the history of Information Theory (e.g., Shannon's work). There's no reason we need to assume the auditory linguistic information is the only Information in the model. So then the question is, as I suggested earlier, picking the right signal, not whether somehow context is embedded within someone's speech.
My response to that is that a signal is something different that the information content that it "contains".  Thoughts are not really transmitted, unless we are talking about real "mental telepathy".  In that case, one might consider brain waves, or some such thing, to be the "signal".  But that isn't the case.  Thought takes place independently of the linguistic signal.

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #66 on: August 02, 2015, 04:58:45 AM »
It isn't thought that is transmitted, no. It is linguistic information (e.g., acoustics) and the information regarding context (location of utterance, shared knowledge regarding the history of the conversation, shared goals, etc.).

We might say that the message is the thought, but the signal of course is not. The thought is encoded using both a literal linguistic signal and the information in the context. Then together, we could see this as the whole linguistic signal, the speaker and hearer share information and thereby have related (but probably not identical) thoughts.

#### Copernicus

• Linguist
• Posts: 61
• Country:
##### Re: Man vs. Beast
« Reply #67 on: August 02, 2015, 05:50:14 PM »
It isn't thought that is transmitted, no. It is linguistic information (e.g., acoustics) and the information regarding context (location of utterance, shared knowledge regarding the history of the conversation, shared goals, etc.).

We might say that the message is the thought, but the signal of course is not. The thought is encoded using both a literal linguistic signal and the information in the context. Then together, we could see this as the whole linguistic signal, the speaker and hearer share information and thereby have related (but probably not identical) thoughts.
I used "thought" rather than "meaning", because I wanted to emphasize that thought is not linguistic in nature and does not require language to take place.  Generative semanticists typically assumed back in the 1970s, that meaning was linguistically structured (so-called "natural logic").  That assumption failed.  So I don't have a problem with calling it the "message".  Ultimately, though, linguistic structure must somehow be tightly integrated with thought, because its purpose is to communication thought.  Language is the "RNA" to mental "DNA".

To get back to the original topic here, it seems clear that other animals can think and plan very much like humans.  Broca's aphasia (or motor aphasia--loss of command of parts of the "grammar") does not seem to impair comprehension too seriously.  We can understand people who speak ungrammatically or agrammatically.  However, it is clear that language production strategies necessarily require some command of grammar.  For that reason, I consider linguistic grammars not to be neutral between perception and production strategies, but to be production-biased.  I would say that what generative linguists refer to as "the grammar" is essentially a mental process for producing language--an important component of what Chomsky called "performance".  He was wrong to assume that the purpose of the grammar was to calculate grammaticality intuitions.
« Last Edit: August 02, 2015, 05:57:01 PM by Copernicus »

#### Guijarro

• Forum Regulars
• Linguist
• Posts: 97
• Country:
• Spanish
##### Re: Man vs. Beast
« Reply #68 on: August 05, 2015, 03:47:13 AM »
I see that the debate went on and on and my contribution (or rather, Jan's contribution) may be a bit late to make sense. However, this is what he has just written to me today in response to JKpate's answer to his message (a page above this one):

As far as I understand jkpate's point on context selection, I think a number of doubts are raised by findings about relevance.

1. Relevance crucially has to do with changes to one's beliefs (surprises, new information, funniness etc.) rather than with merely conforming to probabilistic frames of what is typically expected.

2. If the model is still meant to be a code model, all speakers and hearers would have to use the same frames of the world around them to construct the relevant contexts (as with a codebook). But relevance is always relevance to an individual, and the "sophisticated understanding" (Sperber) required in adult communication takes differences and changes in individual perspectives into account. I suspect that this is one thing which makes humans different from animals.

3. "Sophisticated understanding" is not just based in processes in individual brains, but in social processes happening between people. So a simulation of what goes on in an individual would not be enough to solve the puzzle.

I don't know if this makes any sense? Personally, I tend to think that the task is ultimately not tractatable in an information-theoretic framework or even in a cognitive-science one, but that we need a relevance theory integrated with a social theory (I argued some of this in a Journal of Pragmatics article in 2010).

I think that I am getting lost in the information theory framework, so I would just need to understand (as simply as you may express it --think of me as a moron and try to put some sense into my thick skull as clearly as you can) the point you seem to be making.

For you, I take it, linguistic meaning is exactly the same as the speaker's meaning.

So, for instance,

If I say,

e.1. Here is a school for boys and girls of wealthy parents

There is no ambiguity in the syntactico-semantic meaning of the sentence, and, therefore, there's no need to indulge in non coded inferencing processes to get MY meaning, which, in this case is, say, that the school allows all kind of boys and only those girls whose parents are rich.

Or, suppose I say:

e.2. The beach is full

The "fullness" I am thinking about is not full of sand, nor full of pets, nor full of people, but rather, full of empty coke bottles which disgusts me. Are you saying that to get to this thought of mine, you have enough with your linguistic decoding?

Suppose, now, I tell you

e.3. Don't you dare!

Do you maintain that the coded meaning of the sentence is enough to make my wish clear, namely, that you don't dare smoke in the hospital, or that you abstain from cheating in an exam? Or thousand of other thoughts that may be UNDER-covered by the use of that linguistic expression at different moments? One does not need to indulge in inferencing processes? Is that what you mean?

But these operations (solving ambiguities, determining scope, and fixing references) are not the only problems to solve with a code model. Take a simple straightforward coded linguistic expression, like, say, I have been to X, and tell me please how you do account for the altogether different consequences you may extract in the following two examples:

e.4.a. I have been to the bar

e.4.b. I have been to the Republic of Congo

How come that in the first interpretation of e.4.a., one assumes that the past is considered quite near the present, and does not normally imply that you have done so one or a few times in your life, but quite normally; whereas the reverse is true when interpreting e.4.b?

Now, if you answer me with little formulae, I will be stuck as a non-winged duck in the desert, and will not be able to respond, unless Jan comes to my help again --which will perhaps stretch his patience a bit too much.

You see, I thought it was obvious that semantics (coded organisation of linguistic material) cannot cover the whole range of human mental representations one wants at one moment or other make manifest through a communicative process.

It seems, I was wrong. It is far from obvious to intelligent and dedicated scientists like you seem to be.

I am astonished!

[Afterthought: if this fact is far from obvious, although we indulge in communication processes trillions of times in our life and we can watch what happens DIRECTLY, can we imagine what a debate on other less familiar ideas (i.e., evolution, the existence of god, art, heliocentric systems ... and whatnot) will be? An enormous fuss!]

« Last Edit: August 05, 2015, 12:52:50 PM by Guijarro »

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #69 on: August 05, 2015, 05:49:55 AM »
Quote
You see, I thought it was obvious that semantics (coded organisation of linguistic material) cannot cover the whole range of human mental representations one wants at one moment or other make manifest through a communicative process.
Where did we say it did? I for one do make that distinction. But why can't you measure pragmatic/contextual information as information as well? That's where this seems to be getting controversial, not what semantics and pragmatics refer to.

#### jkpate

• Forum Regulars
• Linguist
• Posts: 130
• Country:
• American English
##### Re: Man vs. Beast
« Reply #70 on: August 05, 2015, 08:39:18 AM »
Where did we say it did? I for one do make that distinction. But why can't you measure pragmatic/contextual information as information as well? That's where this seems to be getting controversial, not what semantics and pragmatics refer to.

I think this is exactly right. Information in information theory is just about ruling out alternatives, and it's certainly possible to define probability distributions, and so codes, over infinite spaces of possible pragmatic interpretations. Guijarro's examples show that there are many possible alternative interpretations for a non-situated utterance, and that the situation provides more information so that situated utterances may be less ambiguous (and may rule out what seems to be the most likely interpretation of the non-situated utterance, such as the one that the beach is full of people). Graphical models provide a language for exploring how these information sources are integrated.

Copernicus may well be right that a graphical modeling approach can't scale to real world situations (but then I would wonder how any approach would succeed at a task that is information-theoretically impossible). It certainly is a difficult task, and the positive case in favor of an information-theoretic view is far from complete. My only gripe is that the negative case has not been made in a serious way.
All models are wrong, but some are useful - George E P Box

#### Guijarro

• Forum Regulars
• Linguist
• Posts: 97
• Country:
• Spanish
##### Re: Man vs. Beast
« Reply #71 on: August 05, 2015, 12:58:25 PM »

However, after all my efforts to understand your (apparently) simple arguments, I confess I don't have a hint of what you are claiming. My problem, of course.

Where's the back door, pray, so that I may silently step out of this blatant proof of my intellectual inability, without loosing too much face?

Cheers!

« Last Edit: August 06, 2015, 02:32:00 AM by Guijarro »

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #72 on: August 06, 2015, 09:39:46 AM »
Guijarro, very basically, you might think of Information Theory as a theory of decision making. Not about what you'll have for breakfast or where you'll go on your next vacation (though I suppose you could come up with some application like that too), but decisions regarding the information content in a signal.

And likewise, Pragmatics is the study of how humans determine the intended (information) content of an utterance. So it's also a study of decisions.

I'm not claiming anything specific (so if you found that lacking in my posts, you're right). But there's no reason to rule out information theory as a sort of quantified theory of pragmatics.

--

jkpate, regarding some of these more difficult problems, I think the challenge comes from attempting to apply methods that try to solve them rather than just optimize over the possible answers using heuristics. If we could somehow understand the heuristics that the human brain uses, we might come very close to understanding how "language" works.

#### Copernicus

• Linguist
• Posts: 61
• Country:
##### Re: Man vs. Beast
« Reply #73 on: August 06, 2015, 12:02:26 PM »
Guijarro, very basically, you might think of Information Theory as a theory of decision making. Not about what you'll have for breakfast or where you'll go on your next vacation (though I suppose you could come up with some application like that too), but decisions regarding the information content in a signal.

And likewise, Pragmatics is the study of how humans determine the intended (information) content of an utterance. So it's also a study of decisions.

I'm not claiming anything specific (so if you found that lacking in my posts, you're right). But there's no reason to rule out information theory as a sort of quantified theory of pragmatics.
Insofar as pragmatic information is encoded in a signal.  If some (or most) of the information is not extracted from the signal, then the relevance of an information-theoretic approach becomes less obvious.

#### Daniel

• Experienced Linguist
• Posts: 2043
• Country:
• English
##### Re: Man vs. Beast
« Reply #74 on: August 06, 2015, 04:50:48 PM »
Again, what is "the signal"?

The entire point is that context is not some vague unknown completely unassociated with the utterance. It's not part of the acoustic signal transmitted to a listener, no, but it's part of what they receive. If we're talking about syntax or semantics, the signal is obviously the language itself; if we're talking about pragmatics, then the signal is the Information in the world that is relevant to the utterance as well as the utterance itself.