Natural Language Generation

Natural Language Generation:

 Natural Language Generation is a subfield of Computational Linguistics and language-oriented Artificial Intelligence research devoted to studying and simulating the production of written or spoken discourse. The study of human language generation is a multidisciplinary enterprise, requiring expertise in areas of linguistics, psychology, engineering and computer science. One of the central goals is to investigate how computer programs can be made to produce high-quality natural language text from computer-internal representations of information.

Natural language generation often is characterized as a process that has to start from the communicative goals of the writer or speaker and needs to employ some sort of planning to progressively convert them into written or spoken words. In this view, the general aims of the language producer are refined into goals that are increasingly linguistic in nature, culminating in low-level goals to produce particular words. Usually, a modularization of the generation process is assumed which roughly distinguishes between a strategical (deciding what to say) and a tactical (deciding how to say it) part.

 This strategy-tactics distinction is partly mirrored by a distinction between text planning and sentence generation. Text planning is concerned with working out the large-scale structure of the text to be produced and may also comprise content selection. The result of this subprocess is commonly taken to be a tree-like discourse structure, which has at each leaf an instruction to produce a single sentence. These instructions are then passed in turn to a sentence generator, whose task can be further subdivided into sentence planning, i.e. organizing the content of each sentence, and the final step of surface realization, i.e. converting sentence-sized chunks of representation into grammatically correct sentences.

The different types of generation techniques can be classified into four main categories:

Canned text systems constitute the simplest approach for single-sentence and multi-sentence text generation. They are trivial to create, but very inflexible.

Template systems, the next level of sophistication, rely on the application of pre-defined templates or schemas and are able to support flexible alterations. The template approach is used mainly for multi-sentence generation, particularly in applications whose texts are fairly regular in structure.

Phrase-based systems employ what can be seen as generalized templates. In such systems, a phrasal pattern is first selected to match the top level of the input, and then each part of the pattern is recursively expanded into a more specific phrasal pattern that matches some subportion of the input. At the sentence level, the phrases resemble phrase structure grammar rules and at the discourse level they play the role of text plans.

Feature-based systems, which are as yet restricted to single-sentence generation, represent each possible minimal alternative of expression by a single feature. Accordingly, each sentence is specified by a unique set of features. In this framework, generation consists in the incremental collection of features appropriate for each portion of the input. Feature collection itself can either be based on unification or on the traversal of a feature selection network. The expressive power of the approach is very high since any distinction in language can be added to the system as a feature. Sophisticated feature-based generators, however, require very complex input and make it difficult to maintain feature interrelationships and control feature selection.

Many natural language generation systems follow a hybrid approach by combining components that utilize different techniques.