Klaus H. Wolf <firstname.lastname@example.org>
Konrad Froitzheim <email@example.com>
Michael Weber <firstname.lastname@example.org>
Department of Distributed Systems
Computer Science Faculty
University of Ulm
89069 Ulm, Germany
From a system architectural point of view the World Wide Web is a client-server system. Documents are exchanged between WWW-clients and WWW-servers using the Hypertext Transfer Protocol (HTTP) [Berne93]. The currently used protocol version, HTTP 1.0, defines a stateless request-response mechanism. To retrieve a WWW-page a transport system connection is established for each document, referenced on this page. The client requests the document by its name and the server responds transferring the document's data. Documents containing non-continuous media are typically downloaded first and presented afterwards. Even interactive elements, such as forms, use the same request-response mechanism. There is no provision of continuous connections between client and server.
The World Wide Web is currently rapidly growing to establish the largest hypertext information system world-wide. Such extensive usage demands advanced features and more flexibility than the system provides today. The system has been intentionally designed as a distributed information system which links locally available documents into a global hypertext document. But many applications, especially in the commercial domain require enhanced features like security, interactivity, transport efficiency, improved layout capabilities, and server-control over the client display. Some of these features, e.g. interactive elements and security, have already been added to the World Wide Web. Others, e.g. HTTP 2.0, are still in the design phase or under discussion. Most enhancements, however, are based on extensions to the used standards or even replacements for them facing a problematic deployment phase.
WWW pages currently contain text, graphics and dialog elements. Presentation of other data and media is accomplished by external programs, so-called viewers. WWW clients do not directly support synchronous playback of stream oriented media such as audio and video. Audio and video are treated as files which are processed in three phases: retrieve, store in the file system, and present through a viewer. Usually, the presentation of such data begins after the document has been retrieved completely from a server. Thus continuous media cannot be displayed directly (inline) on a WWW-page.
Applications which require continuous updates of the client's view have to force the user to repeatedly retrieve a document [Crocker94]. Discussion on the inclusion of inline media other than graphics, e.g. audio [Uhler94] and video, is currently very lively. How this integration can be achieved is subject to active research. New protocols, protocol elements and extensions to HTML have been proposed [Soo94], [KaasPinTaub94]. However, changes to existing standards widely and intensively used should be done very carefully, and if possible, they should be avoided altogether.
A standard movie format is the ISO standard MPEG (Moving Pictures Expert Group) [MPEG]. Quicktime and Video for Windows are other movie formats in use, but they are not vendor and platform independent. Nevertheless they can be decoded on nearly all platforms depending on the availability of the proper decoder software on each client system. The decoders are external programs because these data types are not decoded directly by WWW clients. Thus movies are presented off-line after the movie files have been retrieved and stored locally.
However, there are application scenarios now which require moving images based on other formats and mechanisms. Remote control and remote visualization are applications making use of realtime generated animated computer graphics or live video. In contrast to MPEG, still image decoders are readily available in the WWW clients. They can be used to show sequences of images which are perceived as videos if the images replace each other at a reasonable rate.
Regardless of the actually chosen video stream format a client has to support continuous decoding explicitly. The WWW client has to be able to retrieve documents concurrently and present them continuously while successive data arrives. The implementation architecture of the WWW clients must be event and data driven. Clients must not block while retrieving a document.
However this animation method has some disadvantages. Combination of a series of documents to a single structure which fits into an HTTP response requires a description of the contents of the response. This means the addition of a new layer of control structures in between the document layer and the HTTP protocol layer. Existing WWW clients and servers have to be modified in order to deal with streams of documents instead of single documents.
The method allows to exploit the existing decoder software of WWW clients. It supports all image formats including GIF and JPEG. The major disadvantage of such a pseudo animation, if used for moving images, is the fact that always entire images have to be encoded and transmitted. The encoded animation contains considerable redundancy. All images are encoded completely independent from each other. They all contain header information and often encode the same unchanged image parts over and over again.
An image stream encoded in a multi-image format is accessed via hyper links as any other document. The WWW client opens the transport system connection, sends the HTTP request and waits for the response. The response contains HTTP header information and the image stream as HTTP body. The HTTP header indicates the document type to the client encoded in a string consisting of type and subtype, e.g. image/gif. This document type description does not have to be changed in order to support multiple images. A GIF encoded image sequence will have the same document type as single images. A single image is regarded as a special case of a sequence containing only one image.
The main advantage of image stream formats compared to sequences of separated images is the possibility to exploit format specific optimization methods like frame differencing. In addition image streams are backward compatible to WWW clients which do not support moving images. Clients, which do not support the multiple image feature of multi-image capable formats but tolerate them, will show the first image of the sequence. They will terminate the connection after having decoded the first image or just stop decoding.
Figure 1: The format of a GIF image (upper) and a GIF stream (lower).
However, treatment of multiple images can easily be added to existing GIF decoder software. A GIF sequence consists of a global header and a series of separately encoded images (Figure 1). The difference between a GIF stream and a GIF image is only the number of images following the global header. There is no additional information or control structure accompanying the existence of more than one image. The global header does not contain information about the number of subsequent images.
An investigation of available GIF image decoder software showed that the required changes for an upgrade from an image decoder to a stream decoder are very small. Thus we modified the GIF decoders of two publicly available WWW clients (Chimera, Mosaic) in such a way that they continue decoding as long as a GIF data stream does not terminate. The modified versions present moving images on a WWW page until the WWW server stops sending images.
Usually WWW servers try to transmit documents as fast as possible. That's why the frame rate at the client display depends on the quality of the transport system connection. The rate may be too high in a local environment and too low over slow links. The first case results in time-compressed presentation of the video. The later creates a backlog of frames, defeating the realtime capabilities. Our experiments showed that some changes to the server system are very useful in order provide controlled and smooth delivery of image sequences.
We identified three different approaches to show image sequences. In the first approach, called 'best effort' method, transmission and decoding is performed as fast as possible with the assumption that the connection is either slower or just fast enough for realtime display. If a transport system connection is not fast enough for realtime display the frame rate will be low. Anyway such a system will show each image of a sequence. It will be no skipped frames. The second method is time synchronisation. A video transmission system which provides time synchronisation tries to transmit only these images of a sequence which fit into the time scale of the video. It will skip images if transmission or display are too slow and it will delay playback at the receiver in the opposite case. A combination of the previous approaches is called 'best effort with upper rate limit', which tries to reach realtime display and limits the frame rate to an upper bound.
The latter two methods require a component which controls playback at the client. In addition this component needs a feedback mechanism between client and server in order to avoid overflowing of the client's storage space. Synchronization by the client, however, would require a major change to the client's software. We therefore propose time synchronization by the server. A software module in the server controls the transmission speed between video source and the WWW client.
Figure 2: HTTP servers can be extended via the common gateway interface (CGI). An URI points to an executable program (right) instead of a file.
Supported synchronisation mechanisms are:
Supported video sources are:
Supported output formats are:
For performance and availability reasons the World Wide Web relies on caching at different levels. Clients maintain local caches and institutions use caching WWW servers, called proxies, to reduce remote accesses. In the case of live video and animated graphics caching has to be avoided. Playback of a pre-recorded video stream from file will not be synchronized if the file is retrieved from a cache rather than from the synchronizing server extension. To avoid caching the server therefore marks stream oriented documents as already expired.
A video stream from a live source is encoded once in one of the supported image or stream formats. Each encoded image of this stream is put into a shared memory space to be accessible for the synchronizing CGI programs. Many instances of stream synchronizers may retrieve the encoded images simultaneously from the shared space of the image queue. They may even retrieve different images at the same time to keep up with the state of their client connections. Encoding the stream only once allows many clients to connect to a live source at the same time without overloading the server computationally.
Figure 3: Many instances of synchronizing CGI programs retrieve the image stream simultaneously. Each of them serves one remote WWW client.
Figure 4: The stream server converts between the image format of a live source and the stream/image format used in the World Wide Web. The converted images are put into a shared memory queue.
An image stream from a live source is encoded by a stream server which fills the shared memory queue for the synchronizing CGI programs (figure 4). The stream server's front-end connects directly to the live video source resp. digitizer. Its back-end serves a number of stream synchronizers via the shared memory queue describe above. The main purpose of the stream server is conversion from the image data format of the video source to the target stream format for transmission to the WWW clients. The front-end comprises image decoder modules which accept different image formats. The back-end is currently equipped with GIF and JPEG encoders.
Explicit command confirmations are not necessary if feedback is given visually as a graphics stream. A remote control WWW page contains animated graphics showing the state of the remote system and forms for user input. The page is not exchanged to show command confirmation. Instead, the WWW server is forced by its forms evaluating extension to return an empty HTTP response (using the 'No Response' response code 204) to the client. The form evaluating extension forwards the user input in an adequate format to the controlled system which in turn reflects its new bahaviour through a video or graphics stream. The WWW client will stay with the same page and shows the effect to the controlled system through the animated graphics or live video parts of the page.
Figure 5: User commands evaluated by server extensions control system parameters. Visualization of feedback fed into the WWW by another server extension.
Encoding and decoding of GIF is fast enough to allow about 5 QCIF sized frames per second in our set-up. We tested the performance in a local environment with a Sun workstation as server and Sun and Macintosh clients. Limiting factor in our demonstration scenario are the speed of the available frame grabber and the color conversions (dithering, colormap merging) for 8 bit pseudo color X-Window displays.
We do not exploit the frame differencing capabilities of GIF streams yet. We will add this feature to both the GIF encoder and WWW clients in the near future. We expect higher frame rates because encoder and decoder will have to process only the changed image parts. This will result in a speedup which is proportional to the relation between static and dynamic parts.
Figure 6: Remote WWW users can
operate the model railroad and
watch it in realtime. The HTTP
request shown in the picture is
issued in order to get the contents
of the inline image which is
referenced by the base HTML
document. The URI points to a
stream synchronizing server
extension. This server extension
gets images from a live camera. The
response to the HTTP request is an
infinite image stream displayed at
the client as video.
An HTML form is used to submit commands to the WWW server. The server forwards the commands to the model railroad controller via a serial interface. After hitting the 'Go!'-button the chosen train begins to move to the selected destination. An additional confirmation is not necessary.
Soon clients will operate the robot from remote and watch the synthesis. The status will be displayed as a live video showing the equipment and as a graphics animation of the changing absorption spectrum. A client can dynamically modify synthesis parameters or even stop the process in case of problems. Of course the product has still to be sent with the postal service, but the client knows about its quality instantly.
Upcoming WWW clients which support inline presentation of MPEG streams will allow integration of video into the WWW at much lower bandwidth than the current solutions. This is especially true for pre-encoded movies. But transmission of live video from a camera requires realtime encoding. Due to the computational cost of motion compensation software encoding on desktop computers is currently not able to deliver MPEG streams of very high compression rates. The bandwidth requirements of such MPEG streams are smaller, but still comparable to a sequence of JPEG coded images. However we will add an MPEG backend to the stream server mentioned above. This means not just integration of available MPEG encoder software. The MPEG software has to be adapted to the stream server system in order to support multiple clients at different transmission rates at the same time while it encodes only once.
[Crocker94] G. Crocker: web2mush: Serving Interactive Resources to the Web, 1994; http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/crocker/tech.html
[Uhler94] S. Uhler: Incorporating real-time audio on the Web, 1994; http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/uhler/uhler.html
[GIF87] CompuServe, Incorporated: CompuServe GIF 87a, http://icib.igd.fhg.de/icib/it/defacto/company/compuserve/gif87a/gen.html
[Soo94] J. C. Soo: Live Multimedia over HTTP, 1994; http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/soo/www94a.html
[KaasPinTaub94] M. F. Kaashoek, T. Pinckney, J. A. Tauber: Dynamic Documents: Extensibility and Adaptability in the WWW, 1994; http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/pinckney/dd.html
[JPEG] International Organization for Standardization: Information Technology - Digital Compression and Coding of Continous-tone Still Images; ISO/IEC DIS 10918-1; ISO 1991.
[MPEG] International Organization for Standardization: Information Technology - Coding of moving pictures and associated audio for digital storage up to about 1.5 Mbit/s; ISO/IEC DIS 11172; ISO 1992.
[PNG95] T. Boutell, M. Adler, L. D. Crocker,T. Lane: PNG (Portable Network Graphics) Specification, 1995; http://sunsite.unc.edu/boutell/png.html
[Netscape95] Netscape Communications Corporation: An Exploration of Dynamic Documents; 1995; http://home.netscape.com/assist/net_sites/dynamic_docs.html
[WWW] The World Wide Web Consortium; http://www.w3.org/hypertext/WWW/
[HTML] Specification of the HyperText Markup Language; http://www.w3.org/pub/WWW/MarkUp/MarkUp.html
[DTD] SGML Document Type Definition of the HyperText Markup Language; http://www.w3.org/pub/WWW/MarkUp/html3/html3.dtd