Challenges and future prospects

Currently, speech translation technology is available as product that instantly translates free form multi-lingual conversations. These systems instantly translate continuous speech. Challenges in accomplishing this include overcoming Speaker dependent variations in style of speaking or pronunciation are issues that have to be dealt with in order to provide high quality translation for all users. Moreover, speech recognition systems must be able to remedy external factors such as acoustic noise or speech by other speakers in real-world use of speech translation systems.

For the reason that the user does not understand the target language when speech translation is used, a method "must be provided for the user to check whether the translation is correct, by such means as translating it again back into the user's language". In order to achieve the goal of erasing the language barrier world wide, multiple languages have to be supported. This requires speech corpora, bilingual corpora and text corpora for each of the estimated 6,000 languages said to exist on our planet today.

As the collection of corpora is extremely expensive, collecting data from the Web would be an alternative to conventional methods. “Secondary use of news or other media published in multiple languages would be an effective way to improve performance of speech translation.” However, “current copyright law does not take secondary uses such as these types of corpora into account” and thus “it will be necessary to revise it so that it is more flexible.”

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 5. The significance of speech translation research and history to date

Speech translation is a technology that translates spoken language into speech in another language. Speech-translation technology is significant because it enables speakers of different languages from around the world to communicate, erasing the language divide in global business and cross-cultural exchange. Achieving speech translation would have tremendous scientific, cultural, and economic value for humankind.

Speech translation first grabbed attention at the 1983 ITU Telecom World (Telecom ’83), when NEC Corporation performed a demonstration of speech translation as a concept exhibit. Recognizing that many years of basic research would be required to implement speech translation, the Advanced Telecommunications Research Institute International (ATR) was subsequently founded in 1986, and began a project to research speech translation. Researchers from a wide range of research institutes both in Japan and internationally joined this project. In 1993, an experiment in speech translation was conducted linking three sites around the world: the ATR, Carnegie Melon University (CMU), and Siemens. After the start of ATR’s project, speech translation projects were started around the world. Germany launched the Verbmobil project; the European Union the Nespole! and TC-Star projects; and the United States launched the TransTac and GALE projects. The GALE project was started in 2006 to translate automatically Arabic and Chinese into English. The goal of this project is to automate the extraction of vital multilingual information that up until then had been performed by humans; the project architecture consists of a batch text-output system. In contrast, the objectives of the ATR and NEC are speech translation enabling face-to-face and non-face-to-face cross language communication in real time. Online speech-to-speech translation is thus an integral component of this research, and immediacy of processing is a key factor.

Speech translation integrates three components: speech recognition, language translation, and speech synthesis. Each of these technologies presents its own difficulties. In particular, a requirement of this technology is recognizing and translating spoken language; this is much more difficult than translating text, because spoken language contains ungrammatical, colloquial expressions, and because it does not include punctuation like question marks, exclamation marks, or quotation marks. Mistakes in speech recognition also cause major translation errors.

Consequently, researchers have chosen a development technique of increasing accuracy to a usable level by initially restricting the system to relatively simple conversation, rather than supporting all forms of conversation from the start. Table 1 shows the history of speech-translation technology. Research and development has gradually progressed from relatively simple to more advanced translation, progressing from scheduling meetings, to hotel reservations, to travel conversation. Moving forward, however, there is a need to further expand the supported fields to include a wide range of everyday conversation and sophisticated business conversation.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 6. The history of automatic translation

Of the three components of speech translation, recent advances in text translation technology have made a major contribution to the realization of automatic speech translation technology. Research into text translation technology has a long history going back more than half a century. Warren Weaver of the Rockefeller Foundation advocated research into automatic translation technology for text in 1946, shortly after the birth of the first computer. At the time, the Rockefeller Foundation had a huge influence on the United States’ science and technology policy. Then in 1953, Georgetown University and IBM began joint research of automatic translation using the 701 (the first commercial computer, developed by IBM). In 1954, the world’s first automatic translation system was built on this computer, demonstrating the possibility of translation from Russian to English. Consisting of a dictionary of 250 terms and 6 rules, the translation capabilities of this system were extremely limited, but the demonstration had a huge impact on society. People at the time felt that the language barrier would soon be knocked down. Subsequently, as part of its response to the shock of the Sputnik launch, a whopping $20 million were invested into research on automatic translation by US government.

In 1965, however, the Automatic Language Processing Advisory Committee (ALPAC) presented a grave report to the US National Academy of Sciences. The report stated that because automatic translation would not be practical for the foreseeable future, research efforts should instead be directed at language theory and understanding to serve as the underpinnings of this technology. In the US, budgets for automatic translation were subsequently cut, and the focus turned to basic research, with the key concepts being meaning and understanding. One famous result from this time is language understanding using world knowledge by Winograd in 1970. The base knowledge base in this kind of research, however, was insufficient, and it cannot be said to have tied directly into improved performance of automatic translation in a general or practical sense.

Three great waves of technological innovation hit Japan in the 1980s: rule-based translation, example-based translation, and statistically-based translation. In Japan, a project to translate abstracts of the science and technology literature of the Science and Technology Agency (dubbed the Mu project) was successful. As a result, research and development into rule-based automatic translation, based on dictionaries and rules (analytic grammar rules, conversion rules, and generative grammar rules), began to gain popularity. A venture named Bravis launched sales of a commercial translation program. This spurred the commercialization of automatic-translation software by such big-name IT companies as Fujitsu, Toshiba, NEC, and Oki Electric Industry. All of the commercial software packages in the world today, and nearly all of the Web-based software, have this rule-based technology as their cores. Because better and more complete specialized dictionaries were an effective way to improve translation quality, slow but steady efforts have built up to increase dictionary sizes from a few tens of thousands of entries to millions of entries.

Meanwhile, in 1981 professor Makoto Nagao of Kyoto University took a hint from the translation process carried out by humans to propose an example-based translation method using sentences similar to the input sentence and their translations (together called “example-based translations”). This example-based translation, combined with further research at Kyoto University and ATR around 1990, created a second wave that spread from Japan to the rest of the world. This method has been incorporated into some commercial rule-based systems; it is also currently being used as the core method for a Japanese-to-Chinese translation project for scientific and technical publications being led by the National Institute of Information and Communications Technology (NICT).

Then in 1988, IBM proposed a method called statistical machine translation, combining pure statistical processing that excludes grammatical and other knowledge with a bilingual corpus. This method did not get attention for some time, however, for a number of reasons: the paper was difficult to understand, computer performance was lacking, the translation corpora were too small, the method of execution was only published in patent specifications, and it was not effective for languages other than related languages like English and French. Around 2000, however, a new method called phrase-based statistical machine translation was proposed, and buoyed by more complete bilingual corpora and more powerful computers, this created the third major wave.

Today, nine out of ten research papers in the field are on statistically-based translation. It is difficult to tell at this time whether this research domain will continue to grow.

Today, the three waves above are just now overlapping. We have gradually come to learn the strengths and weaknesses of the rule-based, example-based, and statistically-based approaches to automatic translation. The current opinion is that the best performance can be achieved by fusing these three approaches in some way, rather than by using any one of them in isolation. The three methods, however, have a common problem: they all translate at the sentence level. They cannot use contextual information. In other words, they do not make use of the relationships with the surrounding text, and thus cannot ensure cohesion. Statistical machine translation in particular performs automatic translation without analyzing the meaning of the input sentence, and so sometimes generates nonsensical translations. The method of using example-based and statistically-based methods is called “corpus-based translation”, and this paper primarily presents methods using statistical machine translation. A corpus is a database of text with supplementary linguistic information added, such as pronunciations, part-of-speech information, and dependency information. The next and subsequent chapters primarily describe corpus-based translation methods.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 7. Multilingual speech translation processing architecture

Figure 1 shows the overall architecture of the speech-translation system. Figure 1 illustrates an example where a spoken Japanese utterance is recognized and converted into Japanese text; this is then translated into English text, which is synthesized into English speech. The multilingual speech recognition module compares the input speech with a phonological model consisting of a large quantity of speech data from many speakers (the model consists of the individual phonemes making up the speech utterances), and then converts the input speech into a string of phonemes represented in the Japanese katakana syllabary. Next, this string of phonemes is converted into a string of words written in the Japanese writing system (mixed kana and kanji characters), so that the probability of the string of words is maximized. In this conversion, string of words appropriate as a Japanese utterance is generated based on the occurrence probability of a string of three words using an engine trained on large quantities of Japanese text. These words are then translated by a conversational-language translation module, replacing each Japanese word in the string with the appropriately corresponding English word. The order of the English words is then changed. In this procedure, the Japanese words in the string are replaced by English words using a translation model trained on pairs of Japanese-English translations. In order to rearrange the words into a proper English utterance, a string of words appropriate as an English utterance is generated based on the occurrence probability of a string of three words using an engine trained on large quantities of English text. This is then sent to the speech synthesis module. The speech synthesis module estimates the pronunciation and intonation matching the string of English words, selects matching waveforms from a database of long-time speech data, connects them, and performs high-quality speech synthesis. The method of speech recognition and synthesis using statistical modeling and machine learning based on massive speech corpora is called “corpus-based speech recognition and synthesis.” ATR developed its speech-translation system by collecting a corpus of general spoken travel conversation, in order to implement speech translation of travel conversation. To date, the project has created a Basic Travel Expression Corpus (BTEC) consisting of 1,000,000 matched pairs of Japanese and English sentences, and 500,000 each of matched Japanese-Chinese and Japanese-Korean pairs. This is the world’s largest translation corpus of multilingual travel conversation. The English sentences in the corpus are an average of seven words long and cover such everyday travel-conversation topics as greetings, problems, shopping, transportation, lodging, sightseeing, dining, communication, airports, and business. Below is an example of spoken English translations of a Japanese sentence.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 8. Comparative study with human speech translation capability

It is extremely difficult theoretically to evaluate the accuracy of speech translation. If the evaluation of the speech synthesis module is not included, evaluation is made by feeding a number of test sentences into the system, and evaluating the quality of the output. In this sense, the method for evaluating speech translation is essentially the same as that for evaluating automatic text translation. For speech translation, however, the utterances that are evaluated are not strings of text but speech.

Two methods are used to evaluate translation quality: one method where the translations are manually given subjective ratings on a five-point scale, and another that compares the similarity between the output of the system and previously prepared reference translations. A number of rating scales have been proposed for the latter, including BLEU, NIST, and word error rate (WER). Recently, these scales have come to be widely used. Since these results are simple numerical values, it is possible to use them to compare two different systems. What these scores cannot answer, however, is how the system with the higher score will perform in the real world.

A method has been proposed to resolve this issue, by estimating system performance in human terms, estimating the system’s corresponding Test of English for International Communication (TOEIC) score. First, native speakers of Japanese with known TOEIC scores (“TOEIC takers”) listen to test Japanese sentences, and are asked to translate them into spoken English. Next, the translations by the TOEIC takers are compared against the output of the speech-translation system by Japanese-English bilingual evaluators. The human win rate is then calculated as the proportion of tests sentences for which the humans’ translations are better. After the human win rate has been completely calculated for all TOEIC takers, regression analysis is used to calculate the TOEIC score of the speech-translation system. Figure 2 shows system performance converted into TOEIC scores. When using relatively short utterances like those in basic travel conversation (BTEC), the speech-translation system is nearly always accurate. The performance of the speech-translation system on conversational speech (MAD and FED) is, however, equivalent to the score of 600 (TOEIC) by the Japanese speakers. Furthermore, performance drops significantly when dealing with long, rare, or complex utterances. There is thus still room for improvement in performance.

Field experiments using speech translation device. A field experiment was conducted in Downtown Kyoto from 30 July to 24 August 2007, with the objective of evaluating the characteristics of communication mediated by a stand-alone speech translation device about the size of a personal organizer, as well as evaluate the usability of this device. The field experiment was set up as follows, in order to minimize the restrictions on the test subjects:

(1) The people with whom the subjects conversed were not selected ahead of time, in order to collect a diverse range of expressions while using the speech translation device in realistic travel situations, such as transportation, shopping, and dining.

(2) Although the subjects were told the purpose of the dialog ahead of time, no restrictions were placed on the exact destination or proper names of items to purchase.

(3) Subjects were allowed to change the topic freely depending on the flow of the conversation.

(4) Subjects were allowed to move to different locations as appropriate, in accordance with the task. (5) No time limit was placed on single dialogs.

In the case of transportation, the objective was considered to have been met if the subject was able to obtain information about the destination or to actually travel there. For shopping and dining, the objective was met if the subject completed the purchase of the article or the meal and received a receipt. In addition to quantitative evaluations of speech recognition rates, dialog response rates, and translation rates, the experiment also evaluated the level of understanding based on questionnaires. As shown in Figure 3, in the evaluation of the level of understanding of 50 native English speakers, about 80% said that the other person understood nearly everything that they said, and over 80% said they understood at least half of what the other person said. This result suggests that the performance of speech-translation devices could be sufficient for communication.

Lecture 9. Worldwide trends in research and development

International evaluation workshops give a strong boost to the development of speech-translation technologies. An international evaluation workshop is a kind of contest: the organizers provide a common dataset, and the research institutes participating in the workshop compete, creating systems that are quantitatively evaluated. The strengths and weaknesses of the various proposed algorithms are rated from the results of the evaluation, and the top algorithms are then widely used in subsequent research and development. This allows research institutes to perform research both competitively and cooperatively, promoting efficient research. Some representative examples of international evaluation workshops are presented here, describing automatic evaluation technologies that support competitive research styles via evaluation workshops.

(a) The International Workshop on Spoken Language Translation (IWSLT) is organized by C-STAR, an international consortium for research on speech translation including ATR in Japan, CMU in the United States, the Institute for Research in Science and Technology (IRST) in Italy, the Chinese Academy of Sciences (CAS), and the Electronics and Telecommunications Research Institute (ETRI) in Korea. The workshop has been held since 2004. Every year, the number of participating institutes increases, and it has become a core event for speech translation research. The subject of the workshop is speech translation of travel conversation from Japanese, Chinese, Spanish, Italian, and other languages into English. Two distinguishing features of the IWSLT are that it is for peaceful uses (travel conversation) and that the accuracy of the translation is fairly good, because it is a compact task.

(b) Global Autonomous Language Exploitation (GALE) [8] is a project of the US Defense Advanced Research Projects Agency (DARPA). It is closed and non-public. US $50 million are invested into the project per year. The purpose of the project is to translate Arabic and Chinese text and speech into English and extract intelligence from them. A large number of institutions are divided into three teams and compete over performance. The teams are run in units of the fiscal year in which the targets are assigned, and every year the performance is evaluated by outside institutions. In the United States, research on automatic translation is currently strongly dependent

on DARPA budgets, and the inclinations of the US Department of Defense are strongly reflected.

Methods for evaluating translation quality have become a major point of debate at these workshops. There are various perspectives on translation quality, such as fluency and adequacy, and it has been considered a highly knowledge-intensive task. A recently proposed evaluation method called BLEU is able to automatically calculate evaluation scores with a high degree of correlation to subjective evaluations by humans. This makes it possible to develop and evaluate systems repeatedly in short cycles, without costing time or money, which has made translation research and development much more efficient.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 10. Practical applications of speech translation technology

The improved processing power and larger memories of computers and more widespread networks are beginning to make it possible to implement portable speech translation devices. Advances are being made in the development of standalone implementations in compact hardware, and distributed implementations connecting mobile phones and other devices to high performance servers over a network. It is not feasible to implement the standalone method on a computer that is carried around, due to such issues as size, weight, and battery lifetime. There is also expected to be demand in situations where wireless and other infrastructure is not available. In light of these issues, efforts are being directed toward the commercialization of dedicated mobile devices with built-in speech-translation functionality. In 2006, NEC developed the world’s first commercial mobile device (with hardware specifications of a 400-MHz MPU and 64 MB of RAM) with onboard Japanese-to-English speech translation.

Meanwhile, in November 2007 ATR developed a speech translation system for the DoCoMo 905i series of mobile phones as a distributed implementation using mobile phones and network servers. The system, called “shabette honyaku” (see Figure 4), was released by ATR-Trek, and is the world’s first speech translation service using a mobile phone. Then in May 2008, a Japanese-to-Chinese speech-translation service was begun on the DoCoMo 906i series. Figure 5 shows the architecture of the speech recognition module used in the distributed speech translation. The mobile phone (front end) performs background noise suppression, acoustic analysis, and ETSIES 202 050-compliant encoding, and sends only the bit-stream data to the speech recognition server. The speech recognition server (back end) then expands the received bit-stream, performs speech recognition, and calculates word reliability. One of the benefits of using this system architecture is that it is not bound by the information-processing limitations of the mobile phone, making large-scale, highly precise phonological and linguistic models to be used. Since these models are on the server and not the mobile phone, they are easy to update, making it possible to keep them up to date at all times. The system is already in wide use: as of June 2008, there have been a cumulative total of over five million accesses.

Standardization for support of multiple languages in speech translation.

As speech translation technology overcomes linguistic barriers, it would be preferable for researchers and research institutions from many different countries to research it jointly. The C-STAR international consortium for joint research of speech translation, in which ATR and CMU play a central role, has been quite active in international joint research.

Meanwhile, the foreign travel destinations of Japanese people – whether for tourism, emigration, or study abroad – are becoming more diverse, and people from a large number of countries are coming to Japan in increasing numbers for tourism, study, and employment. These and other changes are heightening the need for means of interaction with people from non-English speaking countries.

In particular, Japan is strengthening its social and economic ties in the Asian region including Russia, and enhancing mutual understanding and economic relations at the grassroots level has become a key challenge. Relations with the rest of Asia are more vital to Japan than ever before. Consequently, rather than English, Japan needs to be able to get along in the languages of its neighbors, such as Chinese, Korean, Indonesia, Thai, Vietnamese, and Russian – languages that until now have not been widely taught or spoken in this country.

Against this backdrop, A-STAR was founded as a speech translation consortium for creating the basic infrastructure for spoken language communication overcoming the language barriers in the Asia-Pacific region. Rather than the research and development of technology proper, however, the consortium’s objective is to establish an international joint research organization to design formats of bilingual corpora that are essential to advance the research and development of this technology, to design and compile basic bilingual corpora between Asian languages, and to standardize interfaces and data formats to connect speech translation modules internationally, jointly with research institutions working in this field in the Asia-Pacific region. The consortium’s activities are contracted as research by the Asia Science and Technology Cooperation Promotion Strategy, which is a project of the special coordination funds for promoting science and technology. This project has further been proposed and adopted as APEC TEL (Telecommunications and Information)[10] project. It is also moving to create an expert group in the APT ASTAP (Asia-Pacific Telecommunity Standardization Program) in order to create a draft of the standardized interface and data formats for connecting speech translation modules. Figure 6 illustrates the standardized connections being considered in this project. This will standardize the interfaces and data formats of the modules making up the speech translation architecture, in order to enable their connection over the Internet. It is also necessary to create common speech-recognition and translation dictionaries, and compile standardized bilingual corpora. The basic communication interface will be Web-based HTTP 1.1 communication, and a markup language called STML (speech translation markup language) is currently being developed as the data format for connecting applications.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 11. Challenges for the development of speech translation

As described above, speech translation is a technology enabling communication between speakers of different languages. There are still many research challenges to overcome, however: in particular, there is great speaker dependency and diversity of expression; additionally, new words and concepts are constantly being created in accordance with changes in society. Speech translation technology is currently at the level of simple utterances of about seven words in length, such as travel conversation. Consequently, there are still many unsolved challenges before speech translation will be capable of handling long, complex speech such as a newspaper or lecture. Below are listed some of the immediate technical challenges.

1) Evaluating and Improving Usability in Practical Applications Human speakers have many inherent differences.People have many differences in speaking style,accent, and form of expression. Speech translationmust aim to suppress variations in performance dueto these differences, and provide the same high levelof performance for all users. Additionally, acousticnoise, reverberation, and speech by other speakershave a huge impact during real world use. Measuresto remedy these external factors are also extremelyvital. Meanwhile, from the standpoint of usability asa communication tool, it is essential to further reducethe time from speech recognition to translation tospeech synthesis. When speech translation is used,the user does not understand the translation language.For this reason, there are no techniques to checkwhether a translation is correct. A method must thus beprovided for the user to check whether the translationis correct, by such means as translating it again backinto the user’s language, or back-translating it. Whenconsidering it as a tool for gathering information whiletraveling, it is also essential to at the same time providea means to gather information via the Internet inmultiple languages, not only by asking people.These challenges require field testing and technologydevelopment to be performed in parallel, as well as agrowth loop of data collection, improving performance,improving usability, and providing trial service.

2) Support for multiple languages Although English is becoming the de facto worldwide lingua franca, what is needed is not a system that will translate into only English, but one that will translate directly into the 6,000 languages said to exist on our planet today. Multilingual speech translation requires a system of speech recognition, translation, and speech synthesis for each of these languages. In other words, massive speech corpora, bilingual corpora, and text corpora are required for each of these languages. The collection of speech corpora in particular is extremely expensive. This type of technology could also have great value in the sense of preserving languages in a process of decline and extinction.

3) Standardization for the connection of speech translation worldwide via the network Module connections are also being standardized in the Asia-Pacific region. Moving forward, it will be necessary to advance standardization for wide international connectivity, and the development of a joint-research structure.

4) Relaxing of copyright to enable example translations to be used via the web The development of speech translation technology requires a text corpus of the source language, a text corpus of the translation language, a bilingual corpus of translations between the two languages, and speech corpora. It is extremely expensive to create and collect these corpora using conventional methods. One method that is currently gaining attention is collecting data from the Web via the Internet, which continues its explosive growth. For example, the secondary use of news and other media published in multiple languages would be an effective way to improve the performance of speech translation. As of this time, however, copyright issues have not been resolved.

5) Using the latest proper nouns based on the user’s current location. There are huge numbers of proper names of people, places, and things. Incorporating all of these proper nouns into the speech-translation system at the same time would be nearly impossible, both in terms of performance and time. It would therefore be efficient to automatically acquire proper nouns corresponding to the user’s location using GPS or the like, and perform speech recognition, translation, and speech synthesis tailored to that location.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 12. Machine Translation – Terminology Management

Terminology must be managed in order to produce high-quality translations. When a specialized term is translated a certain way, that choice must be recorded and retrieved later so that later in the document or in a subsequent document the same translation equivalent is used for the same concept.

In general language texts, it is undesirable to use the same word over and over. In specialized texts, it is undesirable to use a different term for the same concept as it occurs in different parts of the translation. Somehow the termbase appropriate for a given translation job must be managed. It will not just magically appear when needed or remain update as terminology evolves. More and more, we will see specialized and general-purpose database management software being used to manage termbases. Each organization that produces a substantial amount of specialized text should consider building an organization-level termbase.

The requirement for terminology management applies to both human and machine translation. Neither a human nor a computer can magically select consistent equivalents for specialized terms. Using machine translation instead of human translation does not reduce the need for terminology management. If anything, machine translation increases the need for terminology management. A machine translation system can only put out the terms that are put into it. If the termbase does not yet exist, in which case the translation job (whether for human or machine) should be delayed until the termbase is ready.

The notion of sublanguage

Skilled human translators are able to adapt to various kinds of source text. Some translators can even start with poorly written source texts and produce translations that exceed the quality of the original. However, current machine translation systems strictly adhere to the principle of “garbage in — garbage out.” Therefore, if high quality translation is needed yet the source text is poorly written, forget about machine translation. There is more. Machine translation systems cannot currently produce high-quality translations of general-language texts even when well written. It is well-known within the field of machine translation that current systems can only produce high-quality translations when the source text is restricted to a narrow domain of knowledge and, furthermore, conforms to some sublanguage. A sublanguage is restricted not just in vocabulary and domain but also in syntax and metaphor. Only certain grammatical constructions are allowed and metaphors must be of the frozen variety (that is, used over and over in the same form) rather than dynamic (that is, creatively devised for a particular text).

Making the Decision

At first glance, post-editing may seem like a panacea [pæn'si]. Why not use machine translation for everything and then have a human post-edit the raw output up to normal publication quality if needed? The answer is an economic one. For a source text not restricted to a sublanguage, the cost of post-editing can be very high. If the post-editor must consult both the source and target texts, the effort, and therefore the time and cost for post-editing can easily approach the cost of paying a professional translator to translate the source text from scratch without the benefit of the raw machine translation output. It is sometimes argued that raw machine translation, not matter how bad, is useful because it includes consistent use of equivalents for specialized terms. That argument does not stand up when modern translator tools are considered. Such tools include automatic lookup of the source terms in a termbase and display of the corresponding target-language terms.

Ambiguity

What makes machine translation so difficult? Part of the problem is that language is highly ambiguous when looked at as individual words. For example, consider the word “cut” without knowing what sentence the word came from. It could have been any of the following sentences:

He told me to cut off a piece of cheese.

The child cut out a bad spot from the apple.

My son cut out early from school again.

The old man cut in line without knowing it.

The cut became infected because it was not bandaged.

Cut it out! You’re driving me crazy.

His cut of the profit was too small to pay the rent.

Why can’t you cut me some slack?

I wish you could be serious, and not cut up all the time.

His receiver made the cut much sooner than the quarterback expected.

Hardly anyone made the cut for the basketball team.

If you give me a cut like that, I’ll have your barber’s license revoked.

Lousy driver! Look before you cut me off like that!

The cut of a diamond is a major determiner of its value.

If a computer (or a human) is only allowed to the word “cut” and the rest of the sentence is covered up, it is impossible to know which meaning of “cut” is intended. This may not matter if everything stays in English, but when the sentence is translated into another language, it is unlikely that the various meanings of “cut” will all be translated the same way. We call this property of languages “asymmetry”.
We will illustrate an asymmetry between English and French with the word “bank.” The principal translation of the French word banque (a financial institution) is the English word “bank.” If banque and “bank” were symmetrical then “bank” would always be translated back into French as banque. However, this is not the case. “Bank” can also be translated into French as rive, when it refers to the edge of a river. Now you may object that this is unfair because the meaning of “bank” was allowed to shift. But a computer does not deal with meaning, it deals with sequences of letters, and both meanings, the financial institution one and the edge of a river one, consist of the same four letters, even though they are different words in French. Thus English and French are asymmetrical.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 13-14. Translation Models

To start a machine translation, computer designers invited a group of experienced translators to ask them a question, seemingly naive but directly referring to their profession: how do you translate? Could you tell us in detail everything about the translation process? What goes on in a translator's brain? What operation follows what? This simple question took everyone by surprise, for it is a terribly difficult thing to explain what the process of translation is.

Attempts to conceptualize the translation process have brought to life some theories, or models, of translation. The translation model is a conventional description of mental operations on speech and language units, conducted by a translator, and their explanation.

Approximately, four translation models can be singled out:

1. Situational (denotative) model of translation

2. Transformational model of translation

3. Psycholinguistic model of translation.

Each model explains the process of translation in a restrictive way, from its own angle, and, therefore, cannot be considered comprehensive and wholly depicting the mechanism of translation. But together they make the picture of translation process more vivid and provide a translator with a set of operations to carry out translation.

1. Situational model of translation

One and the same situation is denoted by the source and target language. But each language does it in its own way.

To denote means to indicate either the thing a word names or the situation a sentence names. Hence is the term of denotative meaning, or referential meaning, i.e. the meaning relating a language unit to the external world; and the term of denotation, or a particular and explicit meaning of a symbol.

To translate correctly, a translator has to comprehend the situation denoted by the source text - as P. Newmark stressed, one should translate ideas, not words and then find the proper means of the target language to express this situation (idea). If the translator does not understand the situation denoted by the source text, his or her translation will not be adequate, which sometimes happens when an inexperienced translator attempts to translate a technical text. The main requirement of translation is that the denotation of the source text be equal to the denotation of the target text. That is why a literary word-for-word translation sometimes results in a failure of communication. Наубайханадан нан сатып ал (Возьми хлеба в булочной) is equivalent to the English: Buy some bread in the bakery. Only because the receptor of the Russian sentence knows that the situation of buying in Russian can be denoted by a more general word взять whose primary equivalent (not for this context) is to take which does not contain the meaning of money-paying.

Thus, this model of translation emphasizes identification of the situation as the principal phase of the translation process.

This theory of translation is helpful in translating neologisms and realia: to give a proper equivalent to the phrase Red Guards, which is an English calque from Chinese, we should know what notion is implied by the phrase. On finding out that this phrase means ‘members of a Chinese Communist youth movement in the late 1960’s, committed to the militant support of Mao Zedong, we come to the Kazakh equivalent of this historic term – хунвэйбиндер.

As a matter of fact, this model of translation is used for attaining the equivalent on the situation level. It is the situation that determines the translation equivalent among the variables: instant coffee is equivalent to ерігіш кофе but not *лезде кофе.

The situation helps to determine whether a translation is acceptable or not. For example, we have to translate the sentence Somebody was baited by the rights. Without knowing the situation, we might translate the sentence as Кто-то подвергался травле со стороны правых as the dictionary’s translation equivalent for to bait is травить, подвергать травле. But in case we know that by the smb President Roosevelt is meant, our translation will be inappropriate and we had better use the equivalent Президент Рузвельт подвергался резким нападкам со стороны правых.

A weak point of this model is that it does not explain the translation mechanism itself. One situation can be designated by various linguistic means. Why choose this or that variable over various others? The model gives no answer to this question.

Another flaw in this theory is that it does not describe the systemic character of the linguistic units. Why do the elements of the idiom to lead somebody by the nose not correspond to the Russian обвести за нос? Why does this idiom correspond to the Russian держать верх над кем-то? This model does not describe the relations between the language units in a phrase or sentence and thus gives no explanation of the relations between the source and target language units. This model gives reference only to the extralinguistic situation designated by the sentence.

2. Transformational model of translation

When translating, a person transforms the source text into a new form. Transformation is converting one form into another one. There are two transformation concepts in the theory of translation.

In one of them, transformation is understood as an interlinguistic process, i.e., converting the source text into the structures of the target text, which is translation proper. Special rules can be described for transforming source language structures as basic units into target language structures corresponding to the basic units. For example, to translate the “adverbial verb” one must introduce an adverb, denoting the way the action is performed, into the target language structure: She stared at me. – Она пристально смотрела на меня.

In the second concept, transformation is not understood as broadly as replacing the source language structures by the target language structures. Transformation here is part of a translation process, which has three phases:

Ø Analysis: the source language structures are transformed into basic units of the source language. For example, the sentence I saw him enter the room. is transformed into I saw him. He entered the room.

Ø Translation proper: the basic units of the source language are translated into the basic units of the target language: Мен оны крдім. Ол блмеге кірді. Я видел его. Он вошел в комнату.

Ø Synthesis: the basic units of the target language are transformed into the terminal structures of the target language: Оны блмеге кіргені крдім. Я видел, что он вошел в комнату.

What are the advantages and disadvantages of this model? It is employed in contrastive analysis of two language forms that are considered to be translation equivalents, as it verbalizes what has been transformed in them and how. This model provides us with transformation techniques. It explains how we translate equivalent-lacking structures into another language. This model is important for teaching translation because it recommends that one transform a complex structure into a simple one.

However, a disadvantage of this model consists in inability to explain the choice of the transformation made, especially at the third synthesis phase. It does not explain the facts of translation equivalence on the situational level. It also ignores sociocultural and extralinguistic aspects of translation.

3. Psycholinguistic model of translation

Translation is a kind of speech event. And it develops according to the psychological rules of speech event. The scheme of the speech event consists of the following phases:

ü The speech event is motivated;

ü An inner code program for the would-be message is developed;

ü The inner code is verbalized into an utterance.

Translation is developed according to these phases: a translator comprehends the message (motif), transforms the idea of the message into his/her own inner speech program, then outlays this inner code into the target text.

The point of this theory is that it considers translation among speaking, listening, reading and writing as a speech event. But there is evidence to suggest that translators and interpreters listen and read, speak and write in a different way from other language users, basically because they operate under a different set of constraints. While a monolingual receiver is sender-oriented, paying attention to the speaker's/writer's message in order to respond to it, the translator is essentially receiver-oriented, paying attention to the sender's message in order to re-transmit it to the receiver of the target-text, supressing, at the same time, personal reactions to the message.

There are two essential stages specific to the process of translating and interpreting: analysis and synthesis– and a third stage, revision, available only to the translator working with the written text. During the analysis stage, the translator reads/listens to the source text, drawing on background knowledge, to comprehend features contained in the text. During synthesis, the target text is produced. Then the draft written translation is revised /edited.

However, the explanational force of this model is very restricted, inner speech being the globally disputable problem in both psychology and linguistics.

References:

1. Hutchins, J.H. and H. Somers: Machine Translation. Academic Press, 1992.

2. Hovy, E.H. Overview article in MITECS (MIT Encyclopedia of the Cognitive Sciences). 1998.

3. Hovy, E.H. Review in BYTE magazine, January 1993.

4. Knight, K. 1997. Automating Knowledge Acquisition for Machine Translation. AI Magazine 18(4), (81–95).

Lecture 15. Translator-client communication

In our digital age, electronic formats concern not just our texts, but also our communications with clients and other translators. Thanks to the Internet, professionals from all over the world can be in regular contact by email or various forms of instant messaging. Work can be sent and received electronically, across national and cultural borders. This has several consequences.