ANALOGY IN NATURAL LANGUAGES

Analogy is the prevalence of a pattern (i.e., one rule or a small set of rules) in the formal description of some linguistic phenomena. In the simplest case, the pattern can be represented with the partially filled table like the one on the page 20:

revolución revolution
investigación ?

The history of any natural language contains numerous cases when a phonologic or morphologic pattern became prevailing and, by analogy, has adduced words with similar properties.

An example of analogy in Spanish phonology is the availability of the e before the consonant combinations sp, st‑, sn‑, or sf‑ at the beginning of words. In Latin, the combinations sp- and st at the initial position were quite habitual: specialis, spectaculum, spiritus, statua, statura, etc.

When Spanish language was developed from Vulgar Latin, all such words had been considered uneasy in their pronunciation and have been supplied with e-: especial, espectáculo, espíritu, estatua, estatura, etc. Thus, a law of “hispanicizing by analogy” was formed, according to which all words with such a phonetic peculiarity, while loaned from any foreign language, acquire the e as the initial letter.

We can compile the following table of analogy, where the right column gives Spanish words after their loaning from various languages:

statura (Lat.) estatura
sphaira (Gr.) esfera
slogan (Eng.) eslogan
smoking (Eng.) esmoquin
standardize (Eng.) estandarizar

As another example, one can observe a multiplicity of nouns ending in ‑ción in Spanish, though there exist another suffixes for the same meaning of action and/or its result: ‑miento,‑aje,‑azgo,‑anza, etc. Development of Spanish in the recent centuries has produced a great number of ‑ción-words derived by analogy, so that sometimes a special effort is necessary to avoid their clustering in one sentence for better style. Such a stylistic problem has been even called cacophony.

Nevertheless, an important feature of language restricts the law of analogy. If the analogy generates too many homonyms, easy understanding of speech is hampered. In such situations, analogy is not usually applied.

A more general tendency can be also observed. Lexicon and levels of natural language are conceptual systems of intricately interrelated subsystems. If a feature of some subsystem has the tendency to change and this hinders the correct functioning of another subsystem, then two possible ways for bypassing the trouble can be observed. First, the innovation of the initiating subsystem can be not accepted. Second, the influenced subsystem can also change its rules, introducing in turn its own innovations.

For example, if a metonymic change of meaning gives a new word, and the new word frequently occurs in the same contexts as the original one, then this can hinder the comprehension. Hence, either the novel or the original word should be eliminated from language.

In modern languages, one can see the immediate impact of analogy in the fact that the great amount of scientific, technical, and political terms is created according to quite a few morphologic rules. For example, the Spanish verbs automatizar, pasteurizar, globalizar, etc., are constructed coming from a noun (maybe proper name) expressing a conception (autómata, Pasteur, globo, etc.) and the suffix -izar/-alizar expressing the idea of subjection to a conception or functioning according to it.

Computational linguistics directly uses the laws of analogy in the processing of unknown words. Any online dictionary is limited in its size so that many words already known in the language are absent in it (say, because these words appear in the language after the dictionary was compiled). To “understand” such words in some way, the program can presuppose the most common and frequent properties.

Let us imagine, for instance, a Spanish-speaking reader who meets the word internetizarán in a text. Basing on the morphologic rules, he or she readily reconstructs the infinitive of the hypothetical verb internetizar. However, this verb is not familiar either, whereas the word Internet could be already included in his or her mental dictionary. According to the analogy implied by ‑izar, the reader thus can conclude that internetizar means ‘to make something to function on the principles of Internet.’

A natural language processor can reason just in the same way. Moreover, when such a program meets a word like linuxizar it can suppose that there exists a conception linux even if it is absent in the machine dictionary. Such supposition can suggest a very rough “comprehension” of the unknown word: ‘to make something to function on the principles of linux,’ even if the word linux is left incomprehensible.