Author Topic: The number of grammatical sentences  (Read 5929 times)

Offline enunciativo

  • Jr. Linguist
  • **
  • Posts: 6
The number of grammatical sentences
« on: February 10, 2014, 01:45:39 PM »
The claim that there is an infinite number of grammatical sentences is often used as a tenet underlying generative grammars and language-acquisition models.  This claim never made sense to me.  We don't have an infinite number of words to use to build our sentences.  Additionally, sentences can't be infinite in length:  a potentially endless sentence would always be incomplete because we could never finish composing or analyzing it (unless, of course, we could do this at infinite speed).  I think of the assertion that there is no longest sentence the same way that I see "For any finite integer n, there is a larger integer n + 1, which is also finite."  Asserting that we can apply recursion indefinitely to compose a sentence of infinite length to me is begging the question.  I'm very skeptical of arguments in this area that contain words like 'theoretically' or "in principle".  However, I don't see that any linguistic theory is undermined by conceding that a language contains "only" a googol of grammatical sentences instead of א 0א   (Is there an easy way to enter a Hebrew letter?  I'm having a terrible time.)   

Offline Corybobory

  • Global Moderator
  • Linguist
  • *****
  • Posts: 138
  • Country: gb
    • English
    • Coryographies: Handmade Creations by Cory
Re: The number of grammatical sentences
« Reply #1 on: February 10, 2014, 02:50:58 PM »
We do have an infinite number of words and sentences, because we all have formulae to create an endless number of words and sentences.

It's like saying numbers aren't infinite because no one has the time to count far enough, or we haven't thought of all the numbers and therefore they're finite.
BA Linguistics, MSt Palaeoanthropology and Palaeolithic Archaeology, current PhD student (Archaeology, 1st year)

Blog: http://www.palaeolinguist.blogspot.com
My handmade book jewellery: http://www.coryographies.etsy.com

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #2 on: February 10, 2014, 07:54:26 PM »
This is a generally accepted fact about language, but it is often presented in a simplified way. It's something I have thought about (a lot), and I do question sometimes.

In the end, obviously the number of potential sentences is unbounded, but does that mean that my English in my head is truly infinite at the moment? Does the computational system represent infinitely many combinatorial possibilities? Or does it simply allow to expansion as required so that infinity is only in potential, not in capability at the moment?

Consider the difference between having $1 and a job, and having $1,000,000 and no job. Which is more money?
If time is ignored (irrelevant in this case), in fact having $1+job is actually more money-- ignoring irrelevant practicalities like death, retirement or lay-offs, the job will continue forever to bring in a small amount of money per day that adds up toward infinity. However, we wouldn't say that you "have infinite money" in any real sense, because "have" is usually interpreted temporally, in a context.

Now consider a slightly different set of situations:
John has infinite money. He has a special credit card that simply will pay any amount and the world must accept it.
Bill has $1 and a job. He earns money daily.

Those really represent the two potential theories for how language is "infinite". In John's case, language is a combinatorial system in the brain that inherently generates infinity. In Bill's case, he is capable of potentially coming up with novel sentences but only by developing that over time.

The idea in generativism (and broadly in linguistics) is that we have a system like John's in our heads. Our Language Faculty is something that can (at the moment) generate infinitely many sentences with various combinatorial possibilities. For example, we might have a rule like the following:
S -> S and S
So we can easily generate infinitely many sentences of the form:
[[[I don't have any money and I don't like that] and I don't like that] and I don't like that] and I don't like that....
(Clearly, a very self-reflective person!)
John jumped, and then he jumped, and then he jumped, and then he jumped....
(Not interesting, but well formed enough. It's a story about a jumproper who never stops!)
And there are various children's songs of that form.


However, Generativism is strictly defined as not being about processing but about competence: it does not explain HOW we do this, just that we do it.

Personally I'm more inclined to believe the second option:
We have certain kinds of productivity, like adding an adjective to a noun. But true recursion may not exist. Instead, it exists simply out of practice and step by step extension of the existing forms.

A really interesting example of this may be center embedding:
The man the woman saw slept.
Most English speakers can understand that relatively easily (though the right intonation helps).
Recursively we can add a bit more:
The rat the cat the dog chased bit squeaked.
With a little work, we can understand this as English speakers.
But then in theory we can also add more:
The chef the waiter the customer the owner knew insulted blamed cried.
At this point, it starts to look like complete nonsense in every way. Trust me, that's a coherent idea. Highlight below for the "answer":
One day a friend of the man who owned a restaurant came to eat there, and the chef prepared the food incorrectly so the customer insulted the waiter, and then the waiter blamed the chef who became upset and started to cry.
Now, even if you happen to be able to understand that (with a little practice, it does get easier!), what about 4 or 5 or 100 levels of embedding? Is a (nearly) infinitely long sentence like that actually grammatical?

The point is... center embedding is very hard to process. Why?
The answer in Generativism is "that's irrelevant, just a performance limitation", with the computation system still fully capable.

An alternative answer is to think that center embedding is done one step at a time and that we can build up an ability to add more layers, one at a time. And that basically works: with some practice it's easy to understand the next level of embedding, up to a certain point at least.


Whether it's potentially accurate to classify "language" as an infinite system, I don't know. As Competence perhaps. But what is Competence then? It's supposed to be our "knowledge". Yet we don't necessarily actually have knowledge of how to make infinitely many sentences, just that as needed we can figure it out. So we're competent to expand our sentence inventory. Is that the same thing?
More than anything, just remember that Generativism/Competence is making no claims about the human mind. That is its fatal flaw. There are implications that whatever is in the head must be analogous to the system, but far too often the line is blurred to assume something about the actual state of the mind.



...deep question :)
Welcome to Linguist Forum! If you have any questions, please ask.

Offline MalFet

  • Global Moderator
  • Serious Linguist
  • *****
  • Posts: 282
  • Country: us
Re: The number of grammatical sentences
« Reply #3 on: February 10, 2014, 10:59:21 PM »
The claim that there is an infinite number of grammatical sentences is often used as a tenet underlying generative grammars and language-acquisition models.  This claim never made sense to me.  We don't have an infinite number of words to use to build our sentences.  Additionally, sentences can't be infinite in length:  a potentially endless sentence would always be incomplete because we could never finish composing or analyzing it (unless, of course, we could do this at infinite speed).  I think of the assertion that there is no longest sentence the same way that I see "For any finite integer n, there is a larger integer n + 1, which is also finite."  Asserting that we can apply recursion indefinitely to compose a sentence of infinite length to me is begging the question.  I'm very skeptical of arguments in this area that contain words like 'theoretically' or "in principle".  However, I don't see that any linguistic theory is undermined by conceding that a language contains "only" a googol of grammatical sentences instead of א 0א   (Is there an easy way to enter a Hebrew letter?  I'm having a terrible time.)

This is one of those debates in which the various sides involved are mostly just talking past each other. When generativists claim that there are infinite grammatical sentences in a given language, they're making a very specific claim about the behavior of sets. Within that framework, it is very straightforward to prove that a grammar can generate infinite grammatical sentences. On the other hand, when the cognitivists claim that there *aren't* infinite sentences, they're making a very different very specific claim about functional interpretability. In other words, both claims are true; they just mean different things.

To this end, anyone who wants to take a position on the statement "There are infinite sentences in language X" needs to be very clear on what is meant by the word "are". The theoretical work involved in making a sentence *be* is actually quite considerable.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #4 on: February 10, 2014, 11:52:12 PM »
That makes sense. But what about the predictions/implications?

For example, Generativism relies on grammaticality judgments and intuition. Is it reasonable to rely on hypothetically acceptable very long sentences? Once a pattern appears to be recursive, Generativism assumes it actually is.

To take my center embedding example above, Generativism seems to have no way to describe a language in which exactly 3 levels of center embedding are grammatical but no more. Instead, it would necessarily claim that in all languages (with some construction along those lines) it is necessarily recursive, regardless of what is in the mind of the speaker or in "performance".
This is a huge theoretical claim: no languages have enumerable limits on recursion.


The "Poverty of the Stimulus" argument relies on many assumptions about the implementation of this infinite set of sentences. If, for example, analogy were assumed to be more powerful than it currently is, the PoS argument would be a lot less convincing. Only a theoretical claim about infinitely many sentences is logically/mathematically provable. In fact, in real life the PoS argument relies on much weaker evidence and the assumption that the hypothetical argument is an effective substitute for the real-world one. I'm not entirely convinced.
Welcome to Linguist Forum! If you have any questions, please ask.

Offline jkpate

  • Forum Regulars
  • Linguist
  • *
  • Posts: 130
  • Country: us
    • American English
    • jkpate.net
Re: The number of grammatical sentences
« Reply #5 on: February 11, 2014, 12:51:37 AM »
I see the concrete aspect of this claim as coming down to the assertion/observation that language is structured and productive. Linguistic forms are not drawn from a finite list, but composed out of structured, reusable pieces. Remember the origin of this assertion: Chomsky wanted to oppose Skinner's behaviorist approach that relied on a "black box" association between form and meaning. Among other problems, this unstructured association failed to account for the appearance of unseen associations between form and meaning. Asserting that languages are infinite then had two points of appeal for Chomsky: it meant that Skinner's approach was logically inadequate, while Chomsky's own formalism entailed infinite languages.

However, I agree with your intuition that it is weird to say we see almost none of the sentences of our language, and even in principle can never see more than an infitesimal proportion of the sentences of our language. One approach that may in principle have the potential to possibly address this (and I probably should add even more qualifiers...) is to define a probabilistic notion of grammaticality. It is possible to define probability distributions over infinite sets of structures and strings that yield the nice Zipfian properties we observe and that correlate with a range of behavioral measures. The general intuition would be that an infinite range of structures or strings receive non-zero probability, but virtually all of them receive probability so near zero as to be negligible. More concretely, the set of grammatical sentences would be (some verison of) the typical set of the probability distribution over strings. By way of illustration, the standard bell curve gives non-zero probability to all real numbers, but virtually no probability mass exists outside of [-9,9].

I qualified this suggestion so heavily, however, because it doesn't work in the simple form. The probability of a structure decays exponentially as the number of pieces it is composed out of increases, so that short sequences, even ones that are not what we would call sentences, will have higher probability than long sentences. I see two ways to address this. One way is to "normalize" for the sentence length, essentially creating one model of grammaticality for sentences of length 1, another for sentences of length 2, and so on (presumably with some interpolation or smoothing to handle "gaps": maybe we haven't seen any sentences of length 67 but we have seen sentences of length 66 and 68). This first way is not very theoretically satisfying, since it really comes down to just a hard constraint that there are no sentences much longer than what we've seen, and in principle allows the grammaticality of 4-word sentences to vary freely from 5 word sentences (although maybe there's a principled way to tie these length-dependent grammaticality models together).

A second, somewhat more satisfying, possibility is to say that structure probabilities use exponentially bigger pieces as we deal with sentences that are longer. That is, instead of always building trees out of local subtrees (i.e. using context free grammar rules), we can build trees out of bigger subtrees. This approach would eliminate the exponential decay between structure probability and structure length, and would correspond to an assumption of a grammatical representation in terms of huge subtrees and pervasive reliance on memorization. However, this still has an unsatisfying consequence: reliable local dependencies, such as the "small piece" generalization that adjectives precede nouns, need to be essentially irrelevant for very long sentences, since those sentences must be composed out of the huge subtrees.
« Last Edit: February 11, 2014, 12:56:55 AM by jkpate »
All models are wrong, but some are useful - George E P Box

Offline jkpate

  • Forum Regulars
  • Linguist
  • *
  • Posts: 130
  • Country: us
    • American English
    • jkpate.net
Re: The number of grammatical sentences
« Reply #6 on: February 11, 2014, 03:03:46 PM »
A third possibility would be to say that sentences are not grammatical on their own but only given a context. For example, syntacticians are often concerned with anaphor resolution, which is dependent upon both the real-world context and preceding sentences. Thus, we may be tempted to say that a string is a grammatical sentence if it is a substring of a member of the typical set of that language (i.e. a sentence in a typical, potentially multi-sentence, speaker turn), and is dominated by "S." This approach would eliminate the problem with sentence length because the typicality of a sentence would no longer be dependent on the probability of the string itself, but upon the probability of the sequences of which it is a subsequence, which can be arbitrarily long.

Of course, at this point this is just speculation and would need to be tested ;)
« Last Edit: February 11, 2014, 07:20:35 PM by jkpate »
All models are wrong, but some are useful - George E P Box

Offline enunciativo

  • Jr. Linguist
  • **
  • Posts: 6
The number of grammatical sentences
« Reply #7 on: February 12, 2014, 04:09:01 PM »
We can certainly assume the legitimacy of infinite recursion as an axiom, and that assumption may simply disallow any discussion; but, to me, saying that a procedure can be applied a finite number of times while preserving grammaticality doesn't imply that it would work the same way an infinite number of times.  I was thinking of jkpate's suggestion of structural decay:  clearly center embedding turns awful after relatively few iterations, and any kind of subordination or coordination or anaphora may break down with enough recursion.  It may have been in my projective geometry class that my instructor smiled when a classmate showed him a proof and said, "You have to realize that at infinity, a straight line is perpendicular to itself."
If a language contains n words and the maximum length  of a grammatical sentence is p, the maximum number of word sequences m, grammatical or not, would be m = [(1 - n p) / (1 - n) ] - 1.  Clearly, the grammatical subset would be much smaller.  So, if n is finite (and I think that we should all agree that it is) and p is finite (apparently no one agrees with me on this one), then m would also be finite.  Do we at least all agree that an infinite number of grammatical sentences is impossible without a grammatical sentence of infinite length? 
Choosing a metaphor that's instructive and generally acceptable is surprisingly difficult.  Is building a skyscraper of infinite height possible?  In some ways, I'm untroubled by the unavailability of enough steel or glass or concrete; nor do I care about the limits of funding or of strength of materials.  What bothers me in a different way is that the skyscraper can be built only at a finite speed, so an infinite height will never be achieved.  For me, this compositional constraint is what the physical skyscraper shares with the endless sentence.  Perhaps this barrier is just one of performance.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #8 on: February 12, 2014, 06:59:37 PM »
Quote
Is building a skyscraper of infinite height possible?  In some ways, I'm untroubled by the unavailability of enough steel or glass or concrete; nor do I care about the limits of funding or of strength of materials.  What bothers me in a different way is that the skyscraper can be built only at a finite speed, so an infinite height will never be achieved.  For me, this compositional constraint is what the physical skyscraper shares with the endless sentence.  Perhaps this barrier is just one of performance.
Precisely. That's what a Generativist would say. I don't find it to be a very interesting answer, but you answered it. Now... does that matter?

Quote
Do we at least all agree that an infinite number of grammatical sentences is impossible without a grammatical sentence of infinite length?
It doesn't seem crazy to say "there is no limit on how long a sentence can be". It does seem crazy to say "a single sentence can be infinitely long".

No individual sentence is actually infinitely long. Instead, there is no limit.

In your formula above, P is unbounded. So sentences might be 100 words long, or 1000. None are infinitely long. But one can always be P+1 long. 1001, 1002, etc.

To then address your skyscraper metaphor again, consider this question:
Can there always be a taller skyscraper?

If yes, then there are infinitely many possible skyscrapers to be built.


In language, the general understanding is that there is no "longest sentence". It may be worth questioning that, but that's the current state of the field.
Welcome to Linguist Forum! If you have any questions, please ask.

Offline jkpate

  • Forum Regulars
  • Linguist
  • *
  • Posts: 130
  • Country: us
    • American English
    • jkpate.net
Re: The number of grammatical sentences
« Reply #9 on: February 12, 2014, 07:04:10 PM »
Do we at least all agree that an infinite number of grammatical sentences is impossible without a grammatical sentence of infinite length?
Actually, no, this is not the case. It is possible to have a set of infinite size, but for every member of that set to be finite. And this is the standard Generativist view: every sentence is of finite length, but there is no upper bound on sentence lengths.
All models are wrong, but some are useful - George E P Box

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #10 on: February 12, 2014, 09:58:27 PM »
It's basically the same as the idea of infinity in general: it's not entirely clear that it's a number (that's a philosophical question really), but it certainly represents a (lack of) endpoint for the number line. Every number is finite, but infinity is there at the end.
Welcome to Linguist Forum! If you have any questions, please ask.

Offline MalFet

  • Global Moderator
  • Serious Linguist
  • *****
  • Posts: 282
  • Country: us
Re: The number of grammatical sentences
« Reply #11 on: February 12, 2014, 11:45:44 PM »
Generativism seems to have no way to describe a language in which exactly 3 levels of center embedding are grammatical but no more. Instead, it would necessarily claim that in all languages (with some construction along those lines) it is necessarily recursive, regardless of what is in the mind of the speaker or in "performance".
This is a huge theoretical claim: no languages have enumerable limits on recursion.

Why can't generativism describe a language with exactly three levels of center embedding?

For sentence xAy
A => a; A=> aB
B=> b; B=> bC;
C=> c;

This produces xay, xaby, xabcy, but not xabcdy. My notation here is out of date, but if more recent models of syntax have blocked this kind of transformation I'd be very surprised to learn it.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #12 on: February 13, 2014, 12:48:14 AM »
Sure, you can list them out. But recursion is the normal approach for this.

In something like Minimalism, I think it's very possible that one couldn't restrict this kind of recursion. Based on the local selectional criteria and phrase types, it would be hard to rule it out. One could arbitrarily assign multiple categories and phrases to the same lexical items/structures, but that's a bit much.

So, yes, it's entirely possible in (most) formalisms. But it isn't really compatible with the style of formalism since, say, the 1980s or so. Certainly it isn't implemented like that!
Welcome to Linguist Forum! If you have any questions, please ask.

Offline MalFet

  • Global Moderator
  • Serious Linguist
  • *****
  • Posts: 282
  • Country: us
Re: The number of grammatical sentences
« Reply #13 on: February 13, 2014, 01:40:40 AM »
Sure, you can list them out. But recursion is the normal approach for this.

In something like Minimalism, I think it's very possible that one couldn't restrict this kind of recursion. Based on the local selectional criteria and phrase types, it would be hard to rule it out. One could arbitrarily assign multiple categories and phrases to the same lexical items/structures, but that's a bit much.

So, yes, it's entirely possible in (most) formalisms. But it isn't really compatible with the style of formalism since, say, the 1980s or so. Certainly it isn't implemented like that!

Do you actually know of any languages that have fixed limits on center-embedding, or is this a strictly hypothetical scenario? If we're talking hypotheticals here, saying "it isn't implemented like that!" is a bit vacuous, eh? If these languages don't exist, it's not implemented like anything.

Offline Daniel

  • Administrator
  • Experienced Linguist
  • *****
  • Posts: 1857
  • Country: us
    • English
Re: The number of grammatical sentences
« Reply #14 on: February 13, 2014, 02:59:03 AM »
Right. So the theories don't account for it. It has not been implemented. (It also has not been observed.)

Of course there's no evidence that there are no limits either. Maybe English has a strict limit of 42 levels of center embedding. ;)



Edit: Pirahã. If you believe the analyses of non-recursive syntax. It's not exactly lots of center embedding, but it does have, for example, adjectives, non-recursively. So it's just one level, not 1<n<infinity.
« Last Edit: February 13, 2014, 03:06:26 AM by djr33 »
Welcome to Linguist Forum! If you have any questions, please ask.