Supporting Communication among Different Languages
The UNL system allows people to communicate with peoples of different languages in their mother tongue. The UNL is a common language to exchange information through computers which can deal with natural languages. The UNL system basically consists of language servers, UNL editors and UNL viewers.
A conversion system from native languages into UNL is called "enconverter", and the one that deconverts from UNL into native languages is called "deconverter". Information "enconverted," from any language is exchanged in UNL format via networks. Information represented in UNL is "deconverted" into each native language on the terminal network.
Language server consists of deconverter and enconverter. The processes of "enconversion" and "deconversion" are provided by a Language Server which resides in the network of the Internet. The "enconverter" and "deconverter" are responsible for converting a particular language into UNL, and vice versa. The "enconverter" "enconverts" a language into UNL, while the "deconverter" "deconverts" UNL into a native language.
A "deconverter" is a software that automatically deconverts UNL into native languages. It is important to achieve a high quality and correct results. It is also important that the basic architecture of the "deconverter" is widely shared throughout the world, in order to treat all languages with the same quality and precision standards. Technology developed for a language can be applied to otherlanguages as long as the architecture is shared.
A "Deconverter", which generates natural language from UNL, plays a core role in the UNL system. It is very significant that "deconverter" is capable of expressing UNL information with very high accuracy. It follows that information, once composed in UNL, can be understood in any language as far as there be a "deconverter" of the language.
An "enconverter" is a software that automatically or interactively enconverts natural languages text into UNL. UNU/IAS developed a software for enconversion called "EnCo" which constitutes an enconverter together with a word dictionary, co-occurrence dictionary and conversion rules for a language. This "EnCo" is a language independent software, then it is applicable for any languages.
An "enconverter", as it generates UNL from natural languages, enables people to make UNL documents without any knowledge about UNL. It means that users of the UNL system do not need learn UNL. This makes UNL quite different from Esperanto, for instance.
UNL Editor and Viewer
UNL editor is used to make UNL documents. UNL editor is linked to language server equipped with a "enconverter" and a "deconverter" for a natural language. As the author writes a document, e-mail or any other text, in his/her language, UNL editor "enconverts" it into UNL documents. In this process, UNL expressions are produced automatically or interactively with the author.
There are four kinds of UNL editor according to the method of enconversion:
- full automatic enconversion for natural language texts;
- full automatic enconversion for controlled or tagged language texts;
- interactive enconversion for natural language texts;
- word by word input method.
The correctness of generated UNL is increased from 1) to 4), but the cost for making UNL documents is also increased from 1) to 4). Users can choose the enconversion method according to the purpose of the UNL documents that he/she wants to make.
UNL editor also shows the input in a UNL document in the author's native language. It shows how the UNL editor understands the original document. The author can check the correctness of the "enconversion". In this verification, the high accuracy of "deconversion" counts a lot. When it is found that the result is not correct enough, the author can either rewrite the original document or modify UNL interactively according to the guidance that is provided by the editor. Then the author can produce a UNL document as correct as is desired.
UNL viewer is used to see UNL document in user's native language. UNL viewer utilize a language server when it deconvert UNL documents into the user's native language.
In the Internet communication, the dominant text format, HTML, is capable of holding many links with other documents, enabling the readers to refer to various kinds of related documents. In summary, an electronic document contains various supplementary information in it, which contributes to increase usability. UNL information is to be equally treated in the network.
One of the HTML merits is that it allows production of the whole document in plain text. In general, information contained in an electronic document is divided into text and embedded instruction. In HTML, however, even embedded instruction is also described in plain text. This characteristic gives HTML a universal adaptability to any editing system in holding the advantage of hyper-text. Furthermore, in HTML, description format for embedding is open to the public. HTML conventions are still expanding and developing. Conventions to treat UNL information are expected to be regarded as one of extensions in HTML.
In order to achieve this universality, it is proposed that the description format for UNL expression is considered as an extension of HTML convention. UNL information can be embedded in HTML document with tags attached at its beginning and end, which specify the UNL information. Extensions of conventions should conform to the existing HTML so that it enables UNL expressions to be handled like other documents, without damaging the HTML hyper-text structure. In order to conform the HTML conventions, description in UNL will be in plain text.