“It is often pointed out that, thanks to their grammars and huge lexicons, human languages are incomparable richer codes that the small repertoire of signals used in animal communication.
Another striking difference –but one that is hardly ever mentioned– is that human languages are quite defective when regarded simply as codes. In an optimal code, every signal must be paired with a unique message, so that the receiver of the signal can unambiguously recover the initial message. Typically, animal codes (and also artificial codes) contain no ambiguity. Linguistic sentences, on the other hand, are full of semantic ambiguities and referential indeterminacies, and do not encode at all many other aspects of the meaning they are used to convey. This does not mean that human languages are inadequate for their function. Instead, what it strongly suggests is that the function of language is not to encode the speaker’s meaning, or, in other terms, that the code model of linguistic communication is wrong. (pg.332).
It depends on what you mean by "optimal" here. One definition could be no errors, but another could be an acceptable error rate. Formally, we can understand the error rate by considering the conditional entropy of the message

given the signal

, denoted
)
. If the signal is completely unambiguous,
 = 0 )
and there is no risk of an error. If there are two equally likely messages for

, then
 = 1)
bit and there is risk of an error. If there are two possible messages but one is much more likely, then
 < 1)
bit. The
Noisy Channel Theorem establishes a bound on codes, using this conditional entropy, that have a pre-specified error rate greater than 0 as well as for codes that have an error rate arbitrarily close to zero. A non-zero error rate might well be tolerable if it is easy to recover from the errors (indeed, this is how
lossy compression algorithms, such as JPEG and MP3, manage to provide such exceptional compression).
Moreover, many potential ambiguities are ruled out by the real world context -- let's call it

. By a general property of entropy, the conditional entropy of the message given the signal and the context is less than or equal to the entropy of the message given the signal alone:
 \leq H( M | S ) )
, with equality iff the signal and the context are statistically independent. Presumably for natural language the signal and the context are not statistically indepedent: some utterances are more likely in some contexts than others (i.e.
 \neq P(S))
). So, for natural language, an information-theoretic approach entails that isolated utterances are more ambiguous than situated utterances.
This is all just to say that the concerns you raise do not challenge the "language as code" view, and to show how an information-theoretic approach provides a natural treatment.