Collaborative Browsing in the World Wide Web
Proceedings of the 8th Joint European Networking Conference, Edinburgh, May 12.-15. 1997
M.S.E.E. Gabriel Sidler, ETH Zurich
Dr. Andrew Scott, Lancaster University
Dipl. Physiker Heiner Wolf, University of Ulm
Table of Contents
I. Why Collaboration on the WWW?
II. The CoBrow Approach III. Applications
V. Ongoing and Future Work
The World Wide Web (WWW) is today the most successful service of the Internet. The richness of information available combined with easy access to this information makes it a premier information gathering tool for researchers and consumers, however, the model of today's WWW does not include the users. The WWW is a purely information focused environment, consisting of documents and links between these documents. The virtual world formed by the linked information on the WWW is completely separated from the world of its users. The CoBrow project  proposes to extend the model of the WWW to include its users. This will enable many new applications like WWW based conferencing, help desks, online presentations, online tours, and group entertainment. We believe that the WWW is well suited to become the unifying platform for synchronous, interactive collaboration across the Internet.
CoBrow has developed a model of a WWW that includes its users. The approach chosen is to associate users of the WWW with a location in the WWW; this association is based on the document that the user is viewing. Furthermore, users can have attributes like Interest, Language, Time of Presence etc. Based on the location of users in the WWW and these attributes virtual neighbourhoods are formed. Users within the same neighbourhood are made visible and can collaborate with each other through conferencing and application sharing tools.
Three prototypes were implemented and allowed us to gain experience with different approaches and technologies. An important aspect is the seamless integration into the current WWW infrastructure without requiring changes to existing protocols.
The WWW was originally envisioned by its designers to become a collaborative tool for the Internet, however, the early implementations were little more than a tool to publish and retrieve documents to and from the Internet. Nevertheless, from the users' point of view it was a big step forward compared to the tools that existed at that time for the search and transfer of documents. The graphical user interface that could be operated by novice users, the capability to link documents and the platform independence of the content format were the main advantages.
Collaboration on the Internet is certainly not a new research topic. Initial efforts focused on asynchronous collaboration and produced such successful tools as Usenet News, Email, mailing lists, and the file transfer protocol. More recently the focus of research has shifted to synchronous collaborative tools such as Internet telephony, audio-, and video conferencing tools. It was proven that the Internet can support such real-time collaborative services if sufficient resources are available.
Most of today's synchronous collaborative tools are not integrated into the WWW. Integration is to be understood in two ways. First, the tools are typically standalone applications outside of the WWW client software. More importantly, there does not exist a mechanism to become aware of other users on the WWW and establish contact with them.
During the past few years the WWW has experienced enormous growth . The number of WWW servers providing information is now at 275,000  and still growing exponentially. Today the WWW provides an extremely rich virtual world of linked information, however, WWW users are not a part of this virtual world. In today's model of the WWW users are watching this exciting and colorful world of information from the sidelines. They request information by specifying a URL and then their WWW client fetches the requested document from the WWW and displays it. This separation of the information world from the world of the information consumers and producers clearly limits the usefulness of the WWW. WWW users are not aware of each other and therefore have no means to communicate. CoBrow attempts to unite the two worlds by developing a model of the WWW that includes its users. This will enable new collaborative services including WWW based conferencing, help desks, presentations, online-tours, group entertainment, etc. We believe the WWW is well suited to becoming the unifying platform for synchronous, interactive collaboration across the Internet for several reasons:
These four features make the WWW a unique environment, capable of becoming the universal collaborative platform of the future.
The WWW is a meshed network of hypermedia documents. WWW users have random access to virtually every document from their desktops. The WWW is ideally suited for browsing through the information space, but many people are searching for specific, possibly related, information. They often start their search at starting points that are known to be linked to relevant information.
Alternatively people can submit keywords to search engines and hope to get a significant answer directing them to the desired information. There is a vast amount of information available on the WWW, but information about any given subject is not concentrated on one server, it is distributed over the Internet. Chunks of information are available from many different sources. WWW users who search for information usually get access to a small subset of the information resources available. One of the most important reasons for this is that people do not have enough time to explore all the resources offered by search engines. Everyone searching for a certain piece of information will explore a different subset of resources. Over time everyone learns about useful information sources, but the knowledge gained covers only a subset of that available, however, the combined knowledge of all the people interested in the same subject covers virtually all the information available at all sites.
Searching for information would be much more efficient if people were able to share their knowledge about information resources. The problem is to bring together people who can benefit from each other. The WWW is crowded by several million people world wide, and somewhere in the order of one million people are estimated to be active concurrently.
If someone browses for information, there is a high
probability that someone else is interested in the same subject
at the same time, but people browsing the WWW are unaware of the
presence of any fellow browsers. Even if they are submitting the
same keywords to a search engine or are searching at the same
starting point, there is no possibility of getting in touch with
each other. It is the goal of CoBrow to bring these people together.
CoBrow is a distributed tool set providing support for collaborative
browsing in the WWW. It essentially generates lists of people
(clients) who share the same interests, can learn from each other,
or are interested in meeting each other. The decisions made to
generate these lists are based on information available about
These four metrics can be represented as orthogonal axes in the vicinity space (Figure 1). Despite their orthogonality, only combinations of metrics are adequate to limit the number of potential participants in a group.
Space Metric: The document structure in the Web can be represented by directed graphs with the HMTL documents as vertices and the links between documents as edges. The distance is calculated as the smallest number of hypertext references that link documents. Using the space metric enables groups to be formed of people browsing within a spatial environment defined by the links between documents. The application of the space metric combined with a time metric could, for example, serve as the basis for an online-help system in the Web.
Reliable tracking of user movements through the Web is crucial to detecting a spatial vicinity among users. CoBrow needs up-to-date information about the document being displayed by the user's WWW client, a task complicated by the number of web servers a user may contact and by the lack of any proper record of a user's state at a server. A number of ways of addressing the problems of maintaining persistent state information have been tried within the project and are discussed in the next chapter.
Semantic Metric: The semantic metric is based on document semantics. The semantics can be covered by introducing weighted hyperlinks or by using content-based information from the documents. Combining the semantic metric with spatial metrics considerably reduces search spaces.
For the weighted link metric, hyperlinks between HTML documents are tagged with weight attributes. These weight attributes expresse the intensity of the relationship between documents as a hint for content correlation. A high weight implies a strong correlation. A threshold value for the computation of the logical vicinity brings together only those users whose set of visited documents fulfil strong correlation criteria. Link weights can be added to documents by the author or by analysis tools based on, for example, keyword matching algorithms.
The document semantic-based metric is more challenging to bring about. Documents which are not necessarily linked, but deal with the same topic, should be included in a vicinity. To obtain the relevant content of a document, information retrieval algorithms are necessary. The algorithms required are very similar to those used by Internet search engines. A simple CoBrow could use the services of search engines or just rely on keywords contained in HTML META-tags.
Time Metric: People browsing at the same time or in overlapping intervals of time are in a temporal vicinity. The time metric is a supporting metric for the other metrics discussed and is not particularly useful on its own. Using time criteria alone would bring more or less all concurrent users of the WWW together in one vicinity.
Temporal and spatial information about users browsing the WWW is available and we are currently exploring a time metric which takes into account the length of time documents are displayed at the client as well as the time at which presentation took place. The longer a user reads a document, the longer it is associated with the user, and subject to spatial-user matching. We are considering display duration as an indication of importance. We are aware that a long duration may also be an indication of complexity, or simply of a break for lunch.
Figure 2: Value of
time metric is at a maximum during presentation of a page to the
User Interest Metric: Whereas the previous metrics have been WWW server and document-centred, the user interest metric is human-centred. Here, vicinity is defined to correlate users according to attributes assigned to them. Attribute matching could be based on shared interest, the ability to speak a common language, membership of the same cultural group, and many other definable characteristics. Attributes are collected at the client side and exchanged with the CoBrow system automatically, or on request from the user.
The creation of user interest profiles is beyond
the scope of the core CoBrow service. We are exploring mechanisms
to create user interest profiles automatically using information
available at the client side, for example, the news reader configuration
or statistics about WWW traffic. If the CoBrow service is provided
in connection with the service of a search engine then CoBrow
can directly profit from the information available from WWW users.
Users submit keywords to search engines in order to ask for information
about a certain topic, and by submitting keywords users implicitly
express their current interests. The normal search engine will
use these keywords to find related documents and CoBrow, as a
search engine for people, will work on exactly the same keywords
to find people with related interests. A user who enters a discussion
forum on the WWW also expresses his current interests by joining.
The subject of the forum entered clearly indicates his interest.
In this case an interest based metric is applied very easily.
The list of users generated by CoBrow is just the set of users
in the forum.
There are many situations where discussion based on web pages would be useful, for example:
In other words a synchronous multimedia conferencing
system built around the web would be useful in providing communication
for the mutual benefit of users with similar interests, particularly
when these users were not previously aware of each others existence
or of their shared interest.
A secondary, but no less important, reason for developing
a range of early prototypes was to enable members of the project
team to get feedback from real users. Many of the issues faced
during the design of our final system depend on how real users
respond to different styles of interfaces and we were particularly
interested in discovering what people's reactions would be to
being tracked, presented with the names of users with similar
The lightweight conferencing prototype makes use of animated-GIF functionality to provide video images on standard web pages. Early trials of this approach to adding video took place at the University of Ulm and resulted in the model railroad demonstration .
A CoBrow system using animated-GIFs, or web-video, in this way depends on each client host running a small daemon application that provides an interface to the world that looks like a WWW server (Figure 3). This web-video server simply distributes a page containing an infinitely long animated-GIF consisting of frames taken from the host machine's video camera.
Figure 3: Lightweight conferencing using personal web-video servers
A problem with this approach is that a separate web page must be displayed for each user in a meeting place and while good for small numbers of users, larger groups require too much desktop as the number of browser windows increases.
We are currently using a Java based mechanism to get information about the start and end of the presentation of documents. A small Java applet is inserted into every HTML document subject to the space metric on CoBrow enabled servers. This happens transparently for the WWW server. The applet notifies a tracking component on the server about its start up and shutdown, which correspond to the beginning and end of the period the document is displayed. The tracking system is just one source for spatial information about users, other sources include the WWW server log files and proxies.
Work on developing a similar system for audio is
currently under way.
If web pages are generated on-the-fly their content does not have to be pre-defined, in other words the same request sent by different users can result in two entirely different pages. This is useful because pages can then be made to contain the current state of the system. The importance of pages being able to carry around state information becomes apparent when one considers that web servers are effectively stateless, and retain no information about page requests. This idea was used at Lancaster a few years ago to construct a virtual map of the department, through which people could wander and meet members of the department, and is now used in a diverse range of applications including the common search engines.
This approach has allowed a CoBrow to be built with dynamic creation and updates of meeting places and conferences within meeting places without daemons running on client machines, in fact the system works with no server or client modifications.
An interesting side effect of using the CGI approach is that the system is proxy safe. One problem a CoBrow system faces is that proxy servers often respond to user requests by sending previously cached copies of pages. Unfortunately this means the server that originally contained the page never sees the request and therefore it is difficult to maintain a central record of the people asking for the page. A CGI based system can get round this by requesting that all requests for the base text of a page be referred back to the main server on which the page resides. Proxy servers will then only respond to requests for objects within the page, for example, images. This means we get the traffic reducing advantages of using proxy servers but still have a central record of requests for each page.
Although shared browsing capabilities are provided within the user's browser, multimedia conferencing functionality is provided by external applications which can be downloaded and registered by the user. Once registered as a helper application, the CoBrow system can invoke all the tools necessary for participation in a conference automatically.
The conferencing tools available are not restricted by the system as it was felt important that new tools should be available when released, and that users should be able to choose for themselves the best set of tools for the type of conference they wish to have. Currently the recommended tool set includes CUSeeMe, and the Internet Multicast Backbone (MBone) tools. Management of created conferences is handles by the CoBrow system and no user configuration of the tools is necessary.
Within this prototype we have also experimented with the use of multiple servers. If successful, there would be many CoBrow servers around the Internet, each managing meeting places and conferences for a relatively small number of users - corresponding to groups or departments. For the system to work effectively, servers must be able to share information on the meeting places, conferences and users they manage. This system handles this by providing a CGI program that generates a complete dump of a server's local state, remote servers can pull this information to complete their view of global state.
A drawback of this approach is that the user interface is limited by the available HTML tags, which currently only allow simple buttons, menus, and text entry boxes.
Figure 4: A browser showing
the Web-only interface
The use of Java means that complex user interfaces, far superior to those previously possible within web pages, can be built quite easily. The level of sophistication possible with this approach can be seen in a number of recently developed systems, for example, the Java implementation of the Corel Office suite . There is considerable experience with Java within the CoBrow team from other projects and all the partners are keen to see Java based interfaces adopted in the final implementation of CoBrow.
As Java is an interpreted language, run within the web browser, user interfaces can be far more responsive than possible using the simple submit and timed-response techniques available to CGI based systems. This opens up a whole new class of more interactive interfaces.
The first interface to be tried was a simple text based chat application where users could discuss material using similar functionality to that available through applications such as Unix talk and irc. The more immediate response and feedback from the system results in a very useable interface, however, due to the interpreted nature of Java and the resulting low performance there is still a place for more conventional collaborative tools, such as the MBone software mentioned earlier.
Figure 5: Sophisticated user interfaces are possible with Java
The development of prototypes showing different types of user interfaces and also different approaches to collecting and collating the information needed to maintain and operate a CoBrow system has proved extremely valuable. We have gained considerable knowledge of the relative performance and efficiency of the various underlying implementation techniques and built three systems able to support user-trials.
Our experiences with Java have removed any doubt in our minds that a collaborative system built around the web must be based on Java like technology, and as we move from the prototype stages of the project to the development of the final system we see more and more emphasis on Java.
Early feedback from users, following small scale trials of the three systems, has been positive. Users have found the ability to contact other users with similar interests without prior arrangement and without previous knowledge of their existence a useful addition to the familiar web environment.
It has also been interesting to note the almost immediate adoption, of even the early and more restricted versions, of the prototypes among the developers for their day to day work. We believe this and the level positive user feedback clearly shows a need for such collaborative systems.
An issue that must be addressed is how a multi-cast based service could be efficiently extended to users connecting via Internet providers over modem links. An even greater problem is the growing demand for mobility to be considered in the development of this type of service - IPv6  should help but only by allowing movable endpoints to connections, the problem of variable levels of service and connectivity will still be a significant issue.
The wide scale adoption of proxy servers now prevents many page requests getting through to the main server for the page, instead an intermediate proxy server will reply with a previously cached copy of the page. Getting an accurate picture of which users are looking at which pages is obviously an important requirement when trying to establish vicinities, and unfortunately obtaining such information is made far more difficulty with the advent of proxy servers. Any real collaborative browsing system must work correctly even when users have proxy servers configured. The 'web only' prototype avoids this problem by making use of a characteristic of the page retrieval mechanism, but this can not easily be transferred to either of the other prototypes - more work clearly has to be done in this area.
Another issue for 'real' systems is support for systems behind firewalls. Firewalls can place severe restrictions on connections to hosts outside an institution and could easily reduce the effectiveness of a collaborative conferencing system. There is a significant, and growing, number of particularly commercial organisations with installed firewalls and so solutions must be found to these problems without compromising the security which these firewalls have been installed to provide.
Finally, in the longer term, more sophisticated ways of identifying users' interests need to be investigated, and also ways in which these more complex definitions of interest can be matched in order to support and enable collaboration. There is obviously scope for investigating existing work on automatic abstract generation.
This work has been performed in the framework of the project CoBrow, which is partly funded by the European Community under the Telematics for Research program and the Swiss Department of Education and Science. The authors would also like to thank their colleagues on the CoBrow project at ETH Zurich, Lancaster University, UK, and at the University of Ulm, Germany for their contributions.
 AltaVista Search, Nov. 1996.
 Corel Corporation homepage, Nov. 1996.
 Forrester Research, "Forecast of users of on-line services, Computer Zeitung, Vol. 8, 23. Feb. 1995.
 C. Huitema, "IPv6 - The New Internet Protocol, Prentice Hall, ISBN 0-13-241936-X, 1996.
 Interactive Model Railroad, Universitšt Ulm, Nov. 1996.
 The CoBrow Project, Nov. 1996.
Dr. Andrew Scott is a lecturer at the computing department of Lancaster University. His research interests include computer supported collaborative work, multimedia, high speed networking and wireless networking.
Gabriel Sidler received a diploma in computer science from HTL Brugg-Windisch, Switzerland and a master's degree in electrical engineering from Northwestern University, Chicago. He is currently a research assistant at the computer engineering and networking laboratory of the Swiss Federal Institute of Technology, Zurich. His research interests include multimedia communication and computer supported collaborative work.
Heiner Wolf is a research assistant at the distributed systems department of the University of Ulm, where he is pursuing a Ph.D. degree in computer science. His research interests include computer supported collaborative work, multimedia, high speed networks and ATM.