Eric Chang and Steven Matuszek, May 1998
This project sought to design and implement an application that uses the Web as a distributed computer system. Specifically, our project deals with the field of three-dimensional graphics. Rendering three-dimensional images is a computation-intensive process that could be helped by having several processors work on it concurrently. This is an approach that has been implemented previously in hardware and in local-area shared-rendering systems, but we propose to use the ability of the Web to harness the computational power of remote computers browsing on it.
Ideally such a system could take advantage of any computer that visited its Web site, so special software on the client side cannot be a requirement. However, the majority of Web browsers have the ability to interpret and execute Java programs. Thus, we propose to create an architecture by which a single browser accessing a web site can use its host machine's computing ability to execute a given subsection of the calculations that must be performed to render an image, and to return its results to the server site. The program at the server site would be responsible for first assigning these client programs their share of the work and later reintegrating the results into a final image.
The process of writing these programs will be one of successive addition of functionality. First, simply to get a reliable client-server model up and running; second, to insure the interchange of data goes as planned; third, to get the "slave" programs calculating correctly; fourth, to get the "master" program running the show correctly. Issues that will have to be dealt with include file writing and crash recovery. Also, issues such as synchronization, and the reliability of network transfer are handled at length in the Java network packages; this project should also serve as a real-world evaluation of how well those features hold up. Of course, the most interesting issue would be performance: how well our system does in comparison to a Java renderer running on a single system.
Traditionally, three-dimensional graphical rendering has required access to high-end computers. This is because the time needed to perform the necessary calculations would be prohibitively high on readily available individual personal computers. Such high-speed graphics engines have in the past either been supercomputers, multi-processor connection machines, or more recently, dedicated high-end workstations, often arranged into parallel-computing "farm" of renderers.
Although such methods have worked well, with advances in computer architecture technology leading to the capability to render ever-more complex models, there are a number of drawbacks. Because rendering three-dimensional images is a tremendously computationally intensive operation, in order to complete the tasks in a reasonable time, expensive hardware is needed -- high end computing resources do not come cheaply. Even the relatively inexpensive method of linking less powerful computers using a local area network to render images in parallel does not involve an insignificant amount of investment capital - the entire set of computers must be purchased by a single institution, and the networking technology installed. With the increased demand for 3-d graphics, the proposed method of utilizing the computational potential available over the Web may be an attractive alternative to traditional hardware-intensive systems.
Taking the cue from the multiprocessor systems, Web-based rendering involves breaking the monolithic task into a number of smaller pieces that can be solved independently of one another. This allows for parallel computation, thereby reducing the overall rendering time by having calculations being performed simultaneous, or as in the proposed method, even asynchronously. In rendering farms or locally networked arrays, the idea is to assign the work to smaller and cheaper machines rather than an expensive single-purpose one. Web-based rendering uses this idea but uses a Web-based distributed application to preclude the necessity of having all the computing power "under one roof," by exploiting the idle time of remote clients.
The benefits of such a system are attractive. Theoretically, three-dimensional images can then be generated quickly without a significant investment in equipment or local computing power. Even the toll on the remote clients is not prohibitive - each only performs a small part of the total calculation. The platform-independent nature of the proposed Web application means that even the great multitudes of economical personal computers, of which there are far more than high-end machines, being used to access the Web can contribute to the rendering process.
Collaborative System Design
A collaborative system, such as the one proposed in this project, utilizes a number of different elements to complete its task. These include agents, which collaborate with each other, and with centralized operational and/or data servers, which serve as the process mediators. As the agents perform their share of the calculation, dynamic and persistent data repositories are used to store the results of partial solutions to the rendered image. The final, yet integral, components of a collaborative system are the transactions between agents and mediator. In the absence of effective communication between the different elements, the collaborative effort can degenerate into chaos.
Collaborative systems can be used for such varied tasks as shared interactive forums like chat rooms or whiteboards, or coordinated search engines like Web crawlers. However, in this project, they are used to implement a distributed parallel computing engine. It uses autonomous agents to perform the calculations on the remote computers, as opposed to agents under direct human control, because the human factor is non-existent -- the rendering software does not require remote user input, only client processor time. The most important design considerations in establishing a collaborative system are communication, identity assignment, shared state information and performance.
Communication in a collaborative system can occur in a variety of formats. There are point-to-point, broadcast, "narrowcast" message targeting, depending on the intended number and selection of receivers desired by the transmitting process. The communication complexity can be as simple as straightforward message-passing, the transfer of object data between agents and mediators, or as involved as direct interaction via distributed object interfaces.
Another important aspect in ensuring the proper operation of a collaborative system is the maintenance of agent identities. The identification of individual agents is necessary for the correct delivery of inter-process messages. Because each agent works on a small portion of the overall problem, identification is also used to assign sub-tasks as well as allocating data and resources. Finally, identification is also used for authentication; that is, to determine which agents are associated with which process. It also helps to ensure that the agents are indeed part of the extant system and not some rogue process without authorized access to the resources of the collaborative system.
In many collaborative systems, maintaining the consistency of the shared state information is important to the overall integrity of the task. In the rendering problem, the shared state information is the partial solution to the final result -- the fully rendered image. Because it is located centrally, and there is no real need to broadcast it to the clients participating in the computation, consistency is not a large issue in this project. However, the mediators must be able to deal with asynchronous broadcasts from the collaborators, and be able to perform update ordering on the results as they are returned in order to correctly assemble the final rendered image.
Finally, the last design consideration is the performance of the system. This measurement of the system efficiency is dependent on many of the other design considerations; the goal is to maximize performance by selecting choices that are well fitted for the given problem. For example, there is a trade-off between maintaining shared state consistency and overall performance - the better all the agents working on the task are kept up to date, the slower the system becomes due to the inter-agent communication overhead. For other design choices, the answers are not that obvious. For example, the collaborative design can be of a central mediator model, which is simple to implement, but may also experience bottlenecking as the mediator struggles to coordinate the actions of its agents. The design can also be of the peer-to-peer type, which distributes the bottlenecking issue and offers good update rates, but can also have inconsistency problems.
Collaborative System Structure
In terms of the structure of a collaborative system, there are two main parts - the structure of the collaborators and mediators, and the communications standard used. A collaborator needs a unique identifier, for the reasons described in the previous section, and a communications package by which it can connect with remote agents, send messages, and receive messages. A mediator has to also be able to communicate with its collaborators, as well as maintaining a register of its agents. Agent registration involves creating new agents, assigning them unique identifiers, and removing them after they have completed their tasks.
For inter-agent communication, a number of different modes of complexity is supported by Java. There is basic socket communications, message passing, RMI remote interface objects, and CORBA remote objects. For the development of this project, design of the system considered primarily message passing and RMI.
The message passing mode utilizes a message handler to manage incoming and outgoing messages. Upon receiving a message, the message-do component triggers an event-style response; message passing mode is fairly simple. RMI remote interface mode internalizes many of the functions needed to set up a collaborator / mediator relationship. For example, under the RMI model, the collaborator registers itself with client-stubs and server skeletons, and the message handling functions are internalized by the model.
To determine which model to implement, one must weigh their relative benefits and drawbacks. Message passing is less robust, but uses less overhead, and is adequate for simple communication needs. It is also more widely supported by existing available technology, but lacks the sophistication and flexibility that RMI offers.
The Web was chosen as the distributed computer platform for a number of reasons. The most obvious is that given the expensive nature of high-speed or large numbers of machines needed for fast rendering capabilities, using remote clients via the Web means that all of the computers and networking do not have to exist in-house. Instead, it uses an existing network of computers as the work force by exploiting their idle time. Also, because of the global nature and extent of the Web, the number of computers that can contribute to the rendering problem is limited only by as many as there are accessing the Web, a number well into the tens of millions.
Java was chosen as the language in which to implement the rendering software. It is a fairly standardized language with good features for solving this problem because it was designed for use on the Web. For example, Java has object classes that give it built-in network support; it makes commonly accessible the Web-standard communications protocols. Java, when compiled into byte codes, also offers platform independence, so any available PC or workstation can theoretically host a rendering agent, and because most browsers support Java, almost of these potential hosts will be able to do so.
In terms of actually implementing the collaborative software, a number of design choices had to be made. The first was to determine how to break up the image rendering problem into pieces that could be solved by remote agents. To do so, the image space was broken up into horizontal scanlines. This is a natural division for a rendering problem, because in most implementations, the color of one given pixel depends only upon the geometry that is being rendered, and not on the values of any other pixels.
The collaborator model
The collaborator takes the form of an applet, since the stated purpose is to have the client code be small programs that can run on any machine running a Web browser. The collaborator loads the Renderer and World classes, which are respectively the classes that do the work of rendering and the class that holds the data to be rendered. It then makes a connection to the mediator, which sends it the world data and the number of the scanline it needs rendered. The client's Renderer then makes the calculations, resulting in a Scanline, which it returns to the mediator.
The mediator model
The mediator, or server, waits for a connection from a collaborator, sends it the world data that it will need, and tells it which scanline to render. It keeps track internally of which scanlines it still needs rendered.
At any given point, the next scanline to be calculated is picked as follows: take the number of the last scanline given out, add a number relatively prime to the number of scanlines, take the modulus of the sum by the number of scanlines, and return it if it has not been given out. If it has been given out before, this indicates that all scanlines have been given and that we are now simply waiting on slow replies; to fill in for replies that we may never get, we now start assigning scanlines from the first uncompleted index. This serves to maintain consistency of the shared output, as it mathematically reduces the probability that two clients will be trying to work with the same data.
When it has received all entries, the server writes the results to a file and exits.
Other design considerations and choices
The collaboration model that we found to work best was a simple socket-based client-server model with Object Serialization. Object Serialization is a simple but powerful feature of Java 1.1, in which an entire class can be encoded into a file stream, and decoded into its original state. Thus, a World object can be passed whole, with its data initialized by the server and utilized by the clients. The alternative would be to pass each integer and floating-point number that was needed by hand, including creating a scheme for assigning each value correctly on the other end. This becomes infeasible very quickly for large data sets.
Note that since the client is implemented as an applet, it is untrusted and allowed only to open sockets on the same machine where it was loaded from. For this project, this is fine, as it is presumably the same server running the mediator program as serving the applets. [Note: later browsers would offer a little more control over this security constraint.]
In Netscape 3.01 and 4.04 under Irix 6.2, applets often refused to load. This is believed to be a problem with Netscape, not the Java code.
Message passing resolved to be rather pointless overhead as there was essentially only one thing the server would want to tell the client to do.
RMI, or Remote Method Invocation, is a technology that seeks to blur the line between what is happening on the server and what is happening on the client. It includes built-in security management and involves the implementation of only a few methods of the RMI interfaces. At this build, the authors had not gotten a working RMI implementation of the Web-based renderer and intend to comment on its working in the future.
In visits to the server by multiple clients, both sides performed admirably, the server sending the correct data and correctly cycling through the scanlines, the clients calculating the rendered scanlines correctly and quickly.
Here is the applet upon startup, running under Windows 2000. [These screenshots were added much later.]
Here is the server, running under SunOS. It has created a world, populated it with spheres, opened a socket, output its HTML file, and begun to distribute the work to clients and receive results.
Here is the applet, doing its client work. It has received the information about the world and the coordinates for the line it should render, and is displaying the results of its computations. (The scanline is shown 10 pixels high for visibility.)
Here is the final result. The scanlines, as discussed above, are rendered very much out of vertical order; you can see that they were reassembled correctly.
Besides the algorithmic base for three-dimensional rendering and the implementation choices made during development of this collaborative system, there are some of meta-design issues that are worth discussing. These deal with the unpredictable nature of efficiency, client availability, fault tolerance, and possible sociological incentives for using the proposed distributed methodology.
The core flaw with the application of a collaborative system across the Web to deal with computationally expensive problems such as graphical rendering is that it is effective, but not very efficient, because of the language and hardware used to solve them. Java results in slow running code for two major reasons. Java executable byte code must be platform-independent, so it cannot be optimized for any particular computer architecture. Also, it is running on a virtual Java engine on the host computer, which removes it again by a degree from actual executable machine code. In terms of hardware, the Web is full of sub-optimal hardware, and the communication overhead and time lag delay involved in transmitting data over the Web is considerably higher than any locally connected set of machines.
Given such drawbacks, can Web-based rendering really ever be faster than dedicated machine, or is the hardware cost the only benefit to using such a system? Because there are tens of millions of computers comprising the Web, distributed computing has a singular advantage over any single amassment of machines -- scalability. The piecemeal computation can be performed in parallel asynchronously over a practically unlimited number of computers, and through this fact, can theoretically outperform a single farm of machines, even with the overhead involved, depending on the complexity of the problem involved.
However, such computational advantage relies entirely on the ability to recruit volunteer clients; that is, to convince many people to visit the server site and donate their processor time to the rendering problem. The collaborative approach also requires people to stay at the server site for long periods of time; the clients must also stay connected long enough to contribute enough processor resources to complete at least one piece of the partial image. Thus, the size of the sub-problem, and the time needed to solve it, is also an important factor in designing the rendering engine that should be considered.
Related to this issue is the problem of how to deal with disconnections before computation is done; for example, if a browsing client leaves the site before their portion of the calculation is done. Clearly, that portion must be re- assigned to an agent on another visiting client and recomputed from the start. Also, concerns about fault-tolerance and recovery from repeated failures should also be addressed both in terms of the integrity of the solution as well as the mediating server load.
So, how can any Web-based distributed computational engine solve the client availability dilemma? There exist some Web sites can achieve tens of thousands of hits a day, so the desired large number of participants is not an unreasonable target. One way is simply ask a large number of Web users to point their browsers at the server when they are not using it for other purposes. For example, by appealing to the readers of a topically germane newsgroup to donate their idle time to the problem, enough altruistic people may join in the effort. An alternative more compatible to capitalistic endeavor and on-line commerce is to offer some service that users are willing to stay on the site for a long period of time for. In this way, it is not dissimilar to charging clients CPU time as a site toll, making them pay for the services provided by the server. Contrariwise, if one has a popular page, selling computational time to organizations desiring the capability of a Web-renderer can result in an economic benefit for the owner of the server in much the same way selling advertisement space is profitable.
Future Work and Conclusion
The most important future work on this project would be to implement RMI more thoroughly, in order to more fully evaluate it as a distributed-computing framework. As far as the socket implementation, it should be threaded to properly handle multiple clients.
Future work on this project may implement security measures such as true authentication, encryption, and such. These features are all supported by Java, and will address the issue of preserving data integrity. It would preclude malicious attacks on the rendering system by rogue processes submitting incorrect or unauthorized data as part of the rendered solution.
Although this implementation of Web technology as a distributed computer system dealt with three-dimensional graphical rendering, it is not unimaginable that a similar system could be used for other computationally intensive operations. It can be the basis for an economical means to perform complex simulations, or for cryptographic attacks. Because of the vast computational resources available across the Web, these tasks no longer require extensive financial investments for expensive in-house machines.
In conclusion, this project has demonstrated the feasibility of exploiting Web resources towards the end of rendering three-dimensional images. Although it reduces the need for expensive hardware, its effectiveness does depend on the availability of volunteer clients. The slow nature of Java code and the high network traffic nature of the Web means that this method will be slower than optimized software or even local-area networked computer banks that use more efficient networking protocols. Fortunately, there are also a number of economical incentives that can attract remote browsers, and with enough of a volunteer base, the total calculation time can be accordingly reduced. Using collaborative systems as a parallel distributed computing engine by exploiting the resources on the Web is a field that holds promise for any number of computationally expensive operations.
© 1998 by Eric Chang and Steven Matuszek.