TightRope 1/95 - Articles


KNOWBOTS AND INTERACTIVE TELEVISION

The Knowbotic-Interface-Project as challenge to AI

abreviated version of a lecture, held within the context of a workshop on "Interactive TV" at the Institute for New Media, Frankfurt June 7, 1994 by

by Dr. Gerd Döben-Henisch, Institute for New Media, Frankfurt

The author is a graduate in cognitive science and a philosopher. He is the head of the Knowbotic-Interface-Project at the Institute for New Media. The task of the project is the automatic translation of natural linguistic texts into images of a pictorial world. This is made possible by using knowbots. Knowbots are intelligent programs which can be "educated". They live in virtual realities and are capable of accumulating knowledge about the world on their own. Relative to this knowledge, they are able to learn any natural language.

doeben@connectinc.com

Translated by Annika Blunck


1 Knowbots and Interactive TV

In the computer Science Magazines 1/94 and 2/94 the question was raised what the role of today's information scientists is or is supposed to be. Again the German information scientist was accused of being to little practice oriented and of lacking interdisciplinarity. A solution must be found in view of a highly relevant type of problem which is economically extremely explosive - the introduction of "interactive television" (ITV). The following text will describe a project which tries to solve this problem. However, this requires that all classical core subjects of AI have to be summoned in a thorough interdisciplinary way. Even though the project is practice oriented, it should be seen as a contribution to pure research. All kinds of cooperation are welcome.

1 What distinguishes ITV

If one were to characterise ITV in a couple of words, technical terms such as feedback, multi-user and interprocess communication would be most suitable. This requires a short explanation:

The technical transformation of these three minimal requirements, has been consciously left open, to allow the largest possible range of technical solutions in reality. But though the character of these minimal requirements is very general, far-reaching conclusions can be made. The most important consequence is that besides the necessary telephone-links (telephone-lines, ISDN, cable, fibre, etc.) computers are required. Only computers are capable of complying with M2 and M3. They, in turn, need suitable software. To fulfill M1, the software has to be able to ensure a friendly and flexible communication with the user; by fulfilling M2 the software has to be able to serve as many users as possible; and M3 will be achieved by putting the products of the various suppliers at the users' disposal as well as establishing direct interaction between them. In case of the service's offer, access is possible in the most varied forms: the user downloads data from a database (e.g. a videotape) or the user dials himself into another computer, or he will be connected to a currently running process (like shopping in a store, travel-information, electronic meeting-point or participation at a life programme).

How the future ITV has to be realised, is still open at the moment. Within the momentary debate mainly two strategies are discussed, with which traditional TV could be transformed into ITV: either by supplementing a suitable hardware (the famous set-top.box, a disguised computer - a variation abbreviated as TV-ITV),to the traditional TV by or by using the existing computer (a variation to be called PC-ITV). Though at the moment the final decision on which variation is the most suitable, does not seem to be possible. It is true for both variants that the realisation of M1 to M3 implies a network, into which the user can dial into. The software needed will not differ from the software used today in high-quality networks. In all applications the usability will have a keyfunction. A user-interface without a fully developed language ability will not have a chance in the long term.

2 Speech understanding as a Key

Speech recognition, as it is available in form of speech-recognition-software on the market today (1), is not sufficient for a thoroughly linguistically proficient system. The speech-recognition sofware assigns definite intervals of a digitalised language-signal to specific phonemes or letter-sequences. But the software does not offer any points of departure to determine the meaning of these letter-sequences. Though the definition of what the linguistic meaning is, is still difficult (2). On such a vague conceptual basis the construction of an adequate formal model of linguistic meaning is not easy.

The author begins with the assumption that an object X is only then a sign, if there exists a classification rule F, which depicts the classes of X-objects in classes of Y-objects. Y-objects, assigned to X by F then represent the F-meaning of X. From the philosophical point of view such a relation of meaning is realised within the context of the structures relevant for the consciousness of a subject. From the cognitive academic perspective, the relation is realised within the context of the cognitive process of the agent to be examined. If the linguistic meaning of a natural language has to be reconstructed formally, the relations of meaning have to be modelled in adequate similarity to the human speaker-listener.

The various experiments carried out so far to solve the problem of linguistic meaning for certain application-cases, have not been convincing. By fixing a small part of the world (e.g. reservations, schedules, etc.) within a data-structure, it has been attempted to construct an established relation of meaning for a certain chosen part of language (4). Excluding the enormous efforts to partially model the world-knowledge and the connected adjustment of the meaning-relation, the problem of permanent world's changes has not been solved at all through these experiments.

Because of this, the requirement is established that a complete speech ability is only reached when a user's interface is able to learn any language of whatever language area, in dependence on its user. This includes a process of permanent learning for the interface as well as its own correction.

3 The Term 'Knowbot'

To fulfil the requirement that any kind of knowledge in correspondence with any language can be learned by an intelligent user's interface, two strategies offer themselves: Robots are built, in all kinds of world-connected significance and internal assimilation of the world, like human beings. Virtual agents are constructed within appropriate environments which will be similar according to functionality based on the establishment and usability of verbal meaning.

Strategy 1 will be found in the Real World Computing Program of the Japanese MITI (5).

Strategy 2 forms the basis for the Knowbotic-Interface-Project. Here the hypothesis is valid that sufficient isomorphy of the data-structure and the functions are enough to achieve interesting results. (6)

Within the context of the Knowbotic-Interface-Project the virtual agents are called knowbots (7).This term is used in order to establish a distinction between knowbots and the robots of the RWC-programme, and to avoid the still diverse use of the term 'agent' within the context of AI. Some terms frequently used are 'autonomous agents', 'intelligent agents', 'information treating agents', agents as 'robots', agents as 'acting organisms' agents as 'processes' etc. (8)

Crucial for the knowbots within the project's context is their ability to learn any kind of knowledge of the world they live in. They must also be able to learn any language in relation to this knowledge. Since the exact functionality of human ability to learn and human usage of language is still impenetrable, all varieties of modelling experiences have an extremely hypothetical characteristic. Considering this the following draft of the knowbotic interface can only be seen as one possible proposition among various interesting alternatives.

4 The Knowbotic Interface Architecture

The knowbotic interface has to be understood as a program which is offered on a server and which solves the following tasks:




The internal structure of Knowbot Administration is shown in Fig. 1. Knowbots form part of a virtual world into which one can look through a window. The information on the objects which exist in this virtual reality, their different messages and actions are saved in specific charts: consequently there exists an object chart (OT), a message chart (MT) and an action chart (AT). Three kinds of functions have access to these charts: (1) a knowbot-function of the type X, simulates knowbots of type X. The concrete data of a knowbot of type X are saved in a structure or a class of type X. (2) An environment-function calculates the general features of the environment independent from the indivdual knowbots. (3) The environment-manager finally calculates the appearance of the environment with the help of information from a specific visual database. This can then be seen on the screen. As knowbots normally have visual sensors, they can realise the visual image of the environment in their own terms.

Because of the postulated ability of learning, we generally expect the knowbots to be in charge of an inner space which is fenced off from the environment. The inner space is linked to the environment by passive and active interfaces. A passive interface transforms the environment's condition e.g. in visual, acoustic or tactile signals. Predefined transformation-mechanisms produce features from which the knowbot is able to differentiate objects, relations and dynamics. Interacting with its memory system it is able to depict the permanent flow of such numbers of features in time varying structures and processes. These will form so-called memory contents.

The language system is a specific component of the memory. On one side it makes it possible to establish a dictionary. This dictionary links the sign-material to possible sign-value, the so-called meaning. On the other side a grammar can be established. However, the development of such a dictionary and grammar is only possible in so far as world-knowledge exists. It must be referred to a current world-model which represents accurately the actual communication-situation.

Active interfaces predefine in which way a knowbot can influence its environment. Normally the conception of a knowbot allows it to move in relation to its environment, e.g. it is able to change its position within the virtual space. Additionally one can assign manipulators to the knowbot which can deform part of the environment, or can change the consumer's possibilities in analogy to human drinking and eating.

In a more detailed description (Fig.2), the two main components of a knowbot are:

a) the so-called memory, to be understood as an acummulation of several partial systems, and

b) several processes of learning, which serve as a current world-model on one hand and on the other as the control component of consistence and planning.


The construction of the current world-model starts with sensory data, which are classified by a function of perception. This process can be automatically modified through existing mind-contents as well as through the current consequences and plans, which can become effective as expectations. The permanently newly produced classified sensorical data will then be used to renew the current world-model in specific parts. The contents of the current world-model serve as indicators to available memory contents, which will automatically be activated by the indicators. Within this context the verbal expressions will also be effective. They will be recognised as parts of the current world-model and then interpreted with the help of dictionary, syntax and pragmatics through memory contents. A general inferference-function draws automatically all conclusions that might be formed by the contents, activated in this way. The correspondence or non-correspondence of these automatic expectations with the factual events will be permanently verified. In case of non-correspondence it will lead to a global alarm. Affected by the current world-model as well as by the current expectations and by the most various needs, aims will also be determined. The most important aims will always lead to the construction of possible plans for future behaviour. If the aims are accepted, they will have an effect on the behaviour, will affect the perception and will be taken into acount for the permanent controlling of current results. (9)

The internal models, which a knowbot is able to construct (and especially the current world-model) can be used as a basis to produce a new virtual reality within the knowbotic interface. A reality other knowbots are able to live in. In concrete: W1 is a virtual world in which the knowbot K1 lives. It constructs, dependant on its passive interface, of its perception, of its memory etc. a current model M1. A specific management-function of the knowbotic interface exists and is called the environment manager. This manager will use the model M1 as a basis for a new virtual reality W2 in which a knowbot K2 can live. W1 in relation to W2 could be then called real, and W2 in relaton to W1 virtual. This characteristic of the knowbotic interface is named recursive virtualisation.

Because of the recursive virtualisation the knowbotic interface offers two very interesting applications:

  1. If a knowbot K1, talking the language 1, reads a text TS-1 in the language S1, it is able to construct an internal model M(TS-1) because of its ability to understand. Visual databases, which are already used for the visualisation of a virtual reality, in which again the knowbot lives, are available, so the model M(TS-1) can be made visible V(M(TS-1)). The visualisation V(MTS-1)) can be a very simple static image, a series of cartoons, an entire comic, an animation, an animated video, a storyboard, etc. depending on the chosen concept of visualisation.

  2. Instead of visualising M(TS(1)) one could additionally let a new virtual reality be constructed, into which a different knowbot K2 is introduced, talking the language S2. It describes W(MTS(1))) as T$_S2$ in the language S2. At this time we can only speculate about the efficiency of this new form of translation-technique.

Within the context of a knowbotic interface a knowbot, like a normal human user, is able to visit and change the database. If it would meet a human user while it is strolling through the database, the human being would not be able to realise immediately whether he is confronted with a knowbot or a human being.

Furthermore a knowbot, if the system-manager allows it, is able to contact other users via various linked lines, including telephone connenctions.

Because knowbots, like little children, start learning a language used in the environment, they are theoretically able to learn any language. They must not be programmed for this case, but rather be trained, at least for a certain amount. A trained knowbot can be copied endlessly.


If a user wants to dial himself into the virtual reality of a knowbot with the help of his computer in order to get to know its environment, this is quite simple, thanks to client-server-structure. Presupposition is the existence of a terminal program that is enlarged by some few functions. With the expansion, the charts of the server will be imitated within the client e.g. within the user. In the beginning the contents of the charts will be sent in the form of ASCII-strings via links (normal telephone lines will be suitable) and then only the changes will be tranferred in form of ASCII-strings. If a current version of a visual database is demanded by the user, the object-information will serve as indicators into the visual database in order to form a graphic-user-interface with it. Of course exactly the same client-structure can be used to move within the database.


Supplemented to the user's interface is a PC with keyboard, monitor and audio-exit in the first phase. In the second phase the PC will be extended by a microphone, (the server can also be contacted by the user through the telephone - including wireless telephone). In the third phase finally, the user will be in charge of a personal assistant the size of his waistcoat-pocket (or smaller) who, similar to today's palmtops, is able to contact any telephone partner, and who is intelligent enough to build its own world-model. This assistant is linked to the real situation via sensors and is normally able to speak to his user. He has an image of his owner's needs, interests and preferences and is able to define his mood and will behave accordingly.

Restricted to language, the user is able to contact a knowbot directly in order to talk to it. If graphics are included, the user is able to communicate with a specific knowbot and can, in addition, see parts of the virtual environment in which the knowbot is located at that moment. As a pseudo-knowbot, a user can enter the virtual world and act within it. This is important if the user wants a new knowbot to learn a specific language. The knowbot's behaviour is influenced by its environment. If a new knowbot "is looking for" a new word to describe a definite object of perception, then it will prefer the word its'forerunner' used in the same situation.

If a user dials into a logical database, he will be able to walk through the internal database of a knowbotic interface, he will be able to look at the contents, and he will also be able to invent new contents. Because several users can be present in the same database simultaneously, it is possible for the database to become a meeting-point. Each database's room can have its own rules of interaction, depending on the current machine of the user, the walk through a database can be textbased or based on images (graphics) only. (10)

Dialling into visual databases is of interest because it enables the user to influence the visual appearance of the objects in the knowbot's environment. One element of the visual database is a text that describes the appearance of an object. For the prototype there will be at least one visual database, holding the graphical information on an object among other things in the form of a postscript-text. (11)

5 The Need for Visions

By now the idea of a Knowbotic Interface should be clear. To conclude, some visions of possible applications should be presented; applications derived from these new technologies. Deliberately this is turned against the current trend to direct discussions on ITV by typical questions as: costs of the net, availability of lines, costs of broadcasting, net-monopolies, technical standards, etc.

I would see myself in solidarity with the engaged hypothesis of Dr.-Ing. Werner KNETSCH, managing director of Arthur D.Little International (12). According to him, the only point of importance, which will decide the ITV's success or failure, is whether interesting applications for an ITV exist or not - something like a killer-application, an application which is able to atract masses of users. Such applications will not be found by debating possible scales and possible participations of broadcasting costs, but by creating applications regarding factual and potential interests of possible users. And they will finally change the broadcasting-task or fees by a new practice.

The following list of examples does not claim to be complete. However, it must be emphasised that numerous applications will be possible with the help of a knowbotic interface, even when knowbots are still not available. Applicatons which have not yet been exhausted. Here some examples:

These are some examples which are already possible without knowbots. With knowbots additional new possibilities are revealed. Here a small selection: