NOT-UNIQUENESS OF TEXT Þ MEANING MAPPING: HOMONYMY

In the opposite direction—Texts to Meanings—a text or its fragment can exhibit two or more different meanings. That is, one element of the surface edge of the mapping (i.e. text) can correspond to two or more elements of the deep edge. We have already discussed this phenomenon in the section on automatic translation, where the example of Spanish word gato was given (see page 72). Many such examples can be found in any Spanish-English dictionary. A few more examples from Spanish are given below.

· The Spanish adjective real has two quite different meanings corresponding to the English real and royal.

· The Spanish verb querer has three different meanings corresponding to English to wish, to like, and to love.

· The Spanish noun antigüedad has three different meanings:

– ‘antiquity’, i.e. a thing belonging to an ancient epoch,

– ‘antique’, i.e. a memorial of classic antiquity,

– ‘seniority’, i.e. length of period of service in years.

The words with the same textual representation but different meanings are called homonymous words, or homonyms, with respect to each other, and the phenomenon itself is called homonymy. Larger fragments of texts—such as word combinations (phrases) or sentences—can also be homonymous. Then the termhomonymous expressions is used.

To explain the phenomenon of homonymy in more detail, we should resort again to the strict terms lexeme and wordform, rather than to the vague term word. Then we can distinguish the following important cases of homonymy:

· Lexico-morphologic homonymy: two wordforms belong to two different lexemes. This is the most general case of homonymy. For example, the string aviso is the wordform of both the verb AVISAR and the noun AVISO. The wordform clasificación belong to both the lexeme CLASIFICACIÓN1 ‘process of classification’ and the lexeme CLASIFICACIÓN2 ‘result of classification,’ though the wordform clasificaciones belongs only to CLASIFICACIÓN2, since CLASIFICACIÓN1 does not have the plural form. It should be noted that it is not relevant whether the name of the lexeme coincides with the specific homonymous wordform or not.

Another case of lexico-morphologic homonymy is represented by two different lexemes whose sets of wordforms intersect in more than one wordforms. For example, the lexemes RODAR and RUEDA cover two homonymous wordforms, rueda and ruedas; the lexemes IR and SER have a number of wordforms in common: fui, fuiste, ..., fueron.

· Purely lexical homonymy: two or more lexemes have the same sets of wordforms, like Spanish REAL1 ‘real’ and REAL2 ‘royal’ (the both have the same wordform set {real, reales}) or QUERER1 ‘to wish,’ QUERER2 ‘to like,’ and QUERER3 ‘to love.’

· Morpho-syntactic homonymy: the whole sets of wordforms are the same for two or more lexemes, but these lexemes differ in meaning and in one or more morpho-syntactic properties. For example, Spanish lexemes (el) frente ‘front’ and (la) frente ‘forehead’ differ, in addition to meaning, in gender, which influences syntactical properties of the lexemes.

· Purely morphologic homonymy: two or more wordforms are different members of the wordform set for the same lexeme. For example, fáciles is the wordform for both masculine plural and feminine plural of the Spanish adjective FÁCIL ‘easy.’ We should admit this type of homonymy, since wordforms of Spanish adjectives generally differ in gender (e.g., nuevos, nuevas ‘new’).

Resolution of all these kinds of homonymy is performed by the human listener or reader according to the context of the wordform or based on the extralinguistic situation in which this form is used. In general, the reader or listener does not even take notice of any ambiguity. The corresponding mental operations are immediate and very effective. However, resolution of such ambiguity by computer requires sophisticated methods.

In common opinion, the resolution of homonymy (and ambiguity in general) is one of the most difficult problems of computational linguistics and must be dealt with as an essential and integral part of the language-understanding process.

Without automatic homonymy resolution, all the attempts to automatically “understand” natural language will be highly error-prone and have rather limited utility.

MORE ON HOMONYMY

In the field of computational linguistics, homonymous lexemes usually form separate entries in dictionaries. Linguistic analyzers must resolve the homonymy automatically, by choosing the correct option among those described in the dictionary.

For formal distinguishing of homonyms, their description in conventional dictionaries is usually divided into several subentries. The names of lexical homonyms are supplied with the indices (numbers) attached to the words in their standard dictionary form, just as we do it in this book. Of course, in text generation, when the program compiles a text containing such words, the indices are eliminated.

The purely lexical homonymy is maybe the most difficult to resolve since at the morphologic stage of text processing it is impossible to determine what homonym is true in this context. Since morphologic considerations are useless, it is necessary to process the hypotheses about several homonyms in parallel.

Concerning similarity of meaning of different lexical homonyms, various situations can be observed in any language. In some cases, such homonyms have no elements of meaning in common at all, like the Spanish REAL1 ‘real’ and REAL2 ‘royal.’ In other cases, the intersection of meaning is obvious, like in QUERER2‘to like’ and QUERER3 ‘to love,’ or CLASIFICACIÓN1 ‘process of classification’ and CLASIFICACIÓN2 ‘result of classification.’ In the latter cases, the relation can be exposed through the decomposition of meanings of the homonyms lexemes. The cases in which meanings intersect are referred to in general linguistics as polysemy.

For theoretical purposes, we can refer the whole set of homonymous lexemes connected in their meaning as vocable. For example, we may introduce the vocable {QUERER1, QUERER2, QUERER3}. Or else we can take united lexeme, which is called polysemic one.

In computational linguistics, the intricate semantic structures of various lexemes are usually ignored. Thus, similarity in meaning is ignored too.

Nevertheless, for purely technical purposes, sets of any homonymous lexemes, no matter whether they are connected in meaning or not, can be considered. They might be referred as pseudo-vocables. For example, the pseudo-vocable REAL = {REAL1, REAL2} can be introduced.

A more versatile approach to handle polysemy in computational linguistics has been developed in recent years using object-oriented ideas. Polysemic lexemes are represented as one superclass that reflects the common part of their meaning, and a number of subclasses then reflect their semantic differences.

A serious complication for computational linguistics is that new senses of old words are constantly being created in natural language. The older words are used in new meanings, for new situations or in new contexts. It has been observed that natural language has the property of self-enrichment and thus is veryproductive.

The ways of the enrichment of language are rather numerous, and the main of them are the following:

· A former lexeme is used in a metaphorical way. For example, numerous nouns denoting a process are used in many languages to refer also to a result of this process (cf. Spanish declaración, publicación, interpretación, etc.). The semantic proximity is thus exploited. For another example, the Spanish wordestética ‘esthetics’ rather recently has acquired the meaning of heir-dressing saloon in Mexico. Since highly professional heir dressing really achieves esthetic goals, the semantic proximity is also evident here. The problem of resolution of metaphorical homonymy has been a topic of much research [51].

· A former lexeme is used in a metonymical way. Some proximity in place, form, configuration, function, or situation is used for metonymy. As the first example, the Spanish words lentes ‘lenses,’ espejuelos ‘glasses,’ and gafas ‘hooks’ are used in the meaning ‘spectacles.’ Thus, a part of a thing gives the name to the whole thing. As the second example, in many languages the name of an organization with a stable residence can be used to designate its seat. For another example, Ha llegado a la universidad means that the person arrived at the building or the campus of the university. As the third example, the Spanish word pluma‘feather’ is used also as ‘pen.’ As not back ago as in the middle of ninth century, feathers were used for writing, and then the newly invented tool for writing had kept by several languages as the name of its functional predecessor.

· A new lexeme is loaned from a foreign language. Meantime, the former, “native,” lexeme can remain in the language, with essentially the same meaning. For example, English had adopted the Russian word sputnik in 1957, but the term artificial satellite is used as before.

· Commonly used abbreviations became common words, loosing their marking by uppercase letters. For example, the Spanish words sida and ovni are used now more frequently, then their synonymous counterparts síndrome de inmunodeficiencia adquirida and objeto volante no identificado.

One can see that metaphors, metonymies, loans, and former abbreviations broaden both homonymy and synonymy of language.

Returning to the description of all possible senses of homonymous words, we should admit that this problem does not have an agreed solution in lexicography. This can be proved by comparison of any two large dictionaries. Below, given are two entries with the same Spanish lexeme estante ‘rack/bookcase/shelf,’ one taken from the Dictionary of Anaya group [22] and the other from the Dictionary of Royal Academy of Spain (DRAE) [23].

estante (in Anaya Dictionary)

1. m. Armario sin puertas y con baldas.

2. m. Balda, anaquel.

3. m. Cada uno de los pies que sostienen la armadura de algunas máquinas.

4. adj. Parado, inmóvil.

estante (in DRAE)

1. a. p. us. de estar. Que está presente o permanente en un lugar. Pedro, ESTANTE en la corte romana.

2. adj. Aplícase al ganado, en especial lanar, que pasta constantemente dentro del término jurisdiccional en que está amillarado.

3. Dícese del ganadero o dueño de este ganado.

4. Mueble con anaqueles o entrepaños, y generalmente sin puertas, que sirve para colocar libros, papeles u otras cosas.

5. Anaquel.

6. Cada uno de los cuatro pies derechos que sostienen la armadura del batán, en que juegan los mazos.

7. Cada uno de los dos pies derechos sobre que se apoya y gira el eje horizontal de un torno.

8. Murc. El que en compañía de otros lleva los pasos en las procesiones de Semana Santa.

9. Amér. Cada uno de los maderos incorruptibles que, hincados en el suelo, sirven de sostén al armazón de las casas en las ciudades tropicales.

10. Mar. Palo o madero que se ponía sobre las mesas de guarnición para atar en él los aparejos de la nave.

One does not need to know Spanish to realize that the examples of the divergence in these two descriptions are numerous.

Some homonyms in a given language are translated into another language by non-homonymous lexemes, like the Spanish antigüedad.

In other cases, a set of homonyms in a given language is translated into a similar set of homonyms in the other language, like the Spanish plato when translated into the English dish (two possible interpretations are ‘portion of food’ and ‘kind of crockery’).

Thus, bilingual considerations sometimes help to find homonyms and distinguishing their meanings, though the main considerations should be deserved to the inner facts of the given language.

It can be concluded that synonymy and homonymy are important and unavoidable properties of any natural language. They bring many heavy problems into computational linguistics, especially homonymy.

Classical lexicography can help to define these problems, but their resolution during the analysis is on computational linguistics.