Othecko: A Distributed Voting-Based Game System

The files:
README
GameException.java
GameServer.java
GameState.java
MessageObject.java
MiddleMan.java
OtheckoApplet.java
RemoteSystem.java
Vote.java

Steven Matuszek and Jack Freelander
University of North Carolina at Chapel Hill
December 14, 1999

version 1.1
Steven Matuszek
December 20, 2000


Introduction: Provision of an ongoing, reliable, responsive service to many users

We set out to create a distributed vote-based gaming system. Each move of one of the players in the game is chosen by vote, with votes being cast by anyone who cares to visit a Web page and play the game. The administration of the game is handled by a central server.

To implement this in a distributed manner, it is incumbent upon us to ensure that it is reliable, responsive, and scalable to many users. We would naturally also prefer that it be maintainable and extensible; for this reason we chose to implement the system in Java.


Provision of a Service

To the user, the service appears as a game-playing Java applet on a Web page. The user views the game state, makes a choice of moves, submits the move, and waits for the applet to reflect the resolution of moves and the next game state.


Click to view full size

The applet handles all interaction with the user, including informing him or her when it is waiting for the next update to the game state. Hence, the user is protected from knowing the details of the system.


Scalability

If one server had to accept all the connections from users wanting to vote, it would quickly become impractical as the number of players grew. For this reason, we have designed the system to produce and support a tree-like hierarchy of connections.

The primary server (PS in this diagram) makes the ultimate decision as far as advancing the game, and sends that information to the backup servers (BS, only one shown in this diagram) and to the MiddleMen (MM). At the bottom are the client applets (CL).

The internal nodes are all instances of a program named MiddleMan. MiddleMan maintains a list of subscribers and a list of parents. Whenever it receives a game state from above, it propagates that information to all of its subscribers. Whenever it receives a vote, it checks to see whether it has enough votes to propagate this information upwards. The current system uses a quorum-based method to determine this.

When it is time to send votes upwards, the MiddleMan does not simply send all the information upwards. Instead, it checks to see whether some of its votes represent the same move, using an Equals function whose internal definition is dependent on the semantics of the game. In this manner, it makes a count of the number of votes it has received for each of the distinct moves, and sends up a structure containing those votes and a count for each vote. As a matter of fact, a single vote is actually a degenerate vote-collection object, with one move and a count of one. This means that the MiddleMen directly above the applet layer do not need to handle information any differently.

The result of this is that the structure of the system, besides reducing load on each individual node, reduces network traffic -- the maximum size of every message is a constant, no matter how many votes it represents.

The need or desire to obtain all votes before proceeding with a move is largely dependent on the semantics of the game. Since we wish to support a system where any given client may elect not to vote -- or the user may get bored and wander off to another web site -- the determining factor for when to send moves up could be either time-based or quorum based. We have implemented the quorum scheme; the time-based system would simply involve another message type.

In conjunction with the primary server, the design provides for a genealogy server to be made available at startup. This server will inform every MiddleMan that contacts it who its parents should be. More importantly (since the MiddleMen could, if necessary, be started by hand by the administrators), it gives a URL to any Web browser that contacts it. The URL refers to a page containing the game applet and the applet parameters; the parameters include the list of parents that the applet should contact. It is important that the Web browser be redirected to this information from a location that never changes -- e.g., http://www.othecko.com.

This genealogy server does provide a single point of failure, but it would be a startup failure, not a service-time failure, and is therefore perfectly acceptable in a distributed system.


Fault Tolerance

Without backups, the design of the message-passing system would be such that if any one component lost power, or lost its connection to the network, the votes and game states could not be propagated. For that reason, every MiddleMan and every applet maintains a list of several possible parents to whom it could talk. If it times out waiting to connect to send a message, it will give up temporarily on that parent and connect to the next on its list. Since all messages are sent over reliable sockets, we need not concern ourselves with lost or duplicate messages; any failure, be it of the target system or of the intervening network, is treated the same way.

The first time any MiddleMan or applet connects to a parent, it sends a SUBSCRIBE message, which lets the parent (be it a MiddleMan or a game server) know that whenever it has a state to propagate downwards, it should send it to the subscribing child. Upon receipt of a SUBSCRIBE message, the parent immediately sends the current game state. This permits a user to join in the middle of a game. A variation of this message type is the EMERGENCY_SUBSCRIBE message, which tells the parent not to send the current game state. This is the type of message that is sent the first time a child makes an emergency connection to one of its backup parents. After that, the child is a regular subscriber to that parent, and will send it all its votes; the parent will update it every time it receives a new game state from above.

Note that in the case of transient failures, the original parent may come back online and resume sending game states to the child. The child may then send an UNSUBSCRIBE message to the backup parent, or simply ignore any duplicate game states it receives. This is configurable by the game administrator, and is a tradeoff in redundancy -- load versus reliability.

As for the game server, without backups, it too would represent a single point of failure for the system. Instead, an arbitrary number of game servers may be running at any given time. The backup servers are all subscribed to the primary server, so they receive the new game state every time it is updated. The primary server also updates the backup servers every time it receives a tally of votes, so that the backup servers have at any time as much of the vote information as it is possible to receive. The primary server also provides the backup servers with a list of its subscribers.

If at any time a MiddleMan immediately beneath the game server cannot reach it to propagate vote tallies upwards, it will instead contact the first backup parent. (Recall that the MiddleMan does not know what its parent is; the interface between layers provides that Middlemen and game servers receive messages from below identically. When the genealogy server assigns parents, it will assign the same ordering of game servers to this layer of Middlemen.) The first backup server, upon receiving an emergency subscription, will assume that the primary server has gone down, and use its vote information to resolve the next state and send it to all subscribers. This should result in the game carrying on automatically, with the only lost votes being those that the main server might be carrying in between receiving a new vote tally and updating the backup servers.

If the primary server comes back online, it will be updated with the game state and the current vote tally by the backup server. The backup server will hand authority back to the main server, by not sending any more game states downwards. The child is informed that it should resubscribe to the main server by a SUGGEST_PARENT message, the same kind the genealogy server sends.


Protocol

The protocol used in our system is based on the serialization functionality provided in Java:

// MessageObject.java :

public class MessageObject implements Serializable
{	
	private int type;
	private int messageID;
	private Object object;
	private RemoteSystem sender;	
	
...
}

The protocol consists of an atomic message containing the serializable object that is to be sent. This permits us not to worry about writing a transfer protocol ourselves; any class that we define can be wrapped up in a message, sent along a socket, and rebuilt at the other side. This is advantageous because the design can be used with potentially any game; all that is required is that the Java representation of the game state be serializable.

The use of a unique messageID allows the passing of messages to be idempotent, as messages received twice, or from a component we are currently ignoring, can be ignored. No connection is maintained between any two communicating systems. This aspect of the protocol is advantageous in that the server part of the MiddleMan can be implemented in a non-threaded way.


Message types


Failures

Below is a summary of the failure cases we have considered, and how we chose to handle them:


Conclusions

The proof-of-concept implementation of our system was done using Tic-Tac-Toe. The MiddleMen were set up with a quorum of one, so that they would pass votes along as soon as receiving them. The MiddleMen correctly collected and merged votes, and the redundancy introduced by having more than one possible parent proved useful. Testing indicated that the failure of any given MiddleMan or MiddleMen did not result in the failure of the game, as the nodes below them successfully contacted their alternate parents.

We ran the server requiring 1 vote, and observed that a single applet played a complete and quickly responding game. (No AI is involved, however; the server makes moves at random.) With a requirement of 2 votes, two applets played correctly, even when connected to different MiddleMen, or to one MiddleMan and to the server directly. (Recall that the protocol is the same, so messages down are the same whether sent to the applet or to a MiddleMan, and messages up the same whether sent to a MiddleMan or the server.) With larger vote requirements and more applets, the system still responded at the correct times.

Furthermore, our tree-like structure and subscription-based propagation of game state was successful at reducing the amount of network traffic and distributing the overall assembly of votes. Our design is easily extensible to account for the cases of fault-tolerance that were not explicitly handled by the current system. Taken as a whole, our system provides an ongoing reliable game system which can be easily modified to handle nearly any type of voting interaction.