CURRENT STATE OF APPLIED RESEARCH ON SPANISH

In our books, the stress on Spanish language is made intentionally and purposefully. For historical reasons, the majority of the literature on natural languages processing is not only written in English, but also takes English as the target language for these studies. In our opinion, this is counter-productive and thus it has become one of the causes of a lag in applied research on natural language processing in many countries, compared to the United States. The Spanish-speaking world is not an exception to this rule.

The number of Spanish-speaking people in the world has exceeded now 400 million, and Spanish is one of the official languages of the United Nations. As to the human-oriented way of teaching, Spanish is well described, and the Royal Academy of Madrid constantly supports orthographic [33] and grammatical research [30] and standardization. There are also several good academic-type dictionaries of Spanish, one the best of which being [28].

However, the lexicographic research reflected in these dictionaries is too human-oriented. Along with some historical information, these dictionaries provide semantic explanations, but without a formal description of the main linguistic properties of lexemes, even in morphologic and syntactic aspects.

Formal description and algorithmization of a language is the objective of research teams in computational linguistics. Several teams in this field oriented to Spanish work now in Barcelona and Madrid, Spain. However, even this is rather little for a country of the European Community, where unilingual and multilingual efforts are well supported by the governmental and international agencies. Some research on Spanish is conducted in the United States, for example, at New Mexico State University.

In Mexico—the world’s largest Spanish-speaking country—the activity in computational linguistics has been rather low in the past decades. Now, the team headed by Prof. L.A. Pineda Cortés at National Autonomous University of Mexico is working on a very difficult task of creation of a program that will be able to perform a dialogue in Spanish with a human. A very useful dictionary of modern Mexican Spanish, developed by the team headed by Prof. L.F. Lara Ramos [26] (see also [47]), is oriented to human users, giving semantic interpretations and suggestions on good usage of words.

Some additional information on Spanish-oriented groups can be found in the Appendix on the page 173.

As to the books by Helena Beristáin [11], Irene Gartz [15], and J.L. Fuentes [14] on Spanish grammar, they are just well structured[1] manuals of language oriented to native speakers, and thus cannot be used directly as a source of grammatical information for a computer program.

One of the most powerful corporations in the world, Microsoft, has announced the development of a natural language processing system for Spanish based on the idea of multistage processing. As usually with commercial developments, the details of the project are still rather scarce. We can only guess that a rather slow progress of the grammar checker of Word text processor for Windows is related somehow with these developments.

Thus, one who needs to compile all facts of Spanish relevant for its automatic processing faces with a small set of rather old monographs and manuals oriented to human learners, mainly written and first published in Spain and then sometimes reprinted elsewhere in Latin America.

Meantime, a development of natural language processing tools is quite necessary for any country to be competitive in the twenty-first century. We hope that our books will contribute to such developments in Mexico.

CONCLUSIONS

The twenty-first century will be the century of the total information revolution. The development of the tools for the automatic processing of the natural language spoken in a country or a whole group of countries is extremely important for the country to be competitive both in science and technology.

To develop such applications, specialists in computer science need to have adequate tools to investigate language with a view to its automatic processing. One of such tools is a deep knowledge of both computational linguistics and general linguistic science.

 

II. A HISTORICAL OUTLINE

A COURSE ON LINGUISTICS usually follows one of the general models, or theories, of natural language, as well as the corresponding methods of interpretation of the linguistic phenomena.

A comparison with physics is appropriate here once more. For a long time, the Newtonian theory had excluded all other methods of interpretation of phenomena in mechanics. Later, Einstein’s theory of relativity incorporated the Newtonian theory as an extreme case, and in its turn for a long time excluded other methods of interpretation of a rather vast class of phenomena. Such exclusivity can be explained by the great power of purely mathematical description of natural phenomena in physics, where theories describe well-known facts and predict with good accuracy the other facts that have not yet been observed.

In general linguistics, the phenomena under investigation are much more complicated and variable from one object (i.e., language) to another than in physics. Therefore, the criteria for accuracy of description and prediction of new facts are not so clear-cut in this field, allowing different approaches to coexist, affect each other, compete, or merge. Because of this, linguistics has a rich history with many different approaches that formed the basis for the current linguistic theories.

Let us give now a short retrospective of the development of general linguistics in the twentieth century. The reader should not be worried if he or she does not know many terms in this review not yet introduced in this book. There will be another place for strict definitions in this book.