# Distributed Systems Simulator

This blog explores the Java-based Distributed Simulator program I've created specifically for simulating distributed systems protocols, offering both built-in implementations of common algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment.

## Table of Contents

* [⇢ Distributed Systems Simulator](#distributed-systems-simulator)
* [⇢ ⇢ Motivation](#motivation)
* [⇢ ⇢ Fundamentals](#fundamentals)
* [⇢ ⇢ ⇢ Client/Server Model](#clientserver-model)
* [⇢ ⇢ ⇢ Processes and Their Roles](#processes-and-their-roles)
* [⇢ ⇢ ⇢ Messages](#messages)
* [⇢ ⇢ ⇢ Local and Global Clocks](#local-and-global-clocks)
* [⇢ ⇢ ⇢ Events](#events)
* [⇢ ⇢ ⇢ Protocols](#protocols)
* [⇢ Graphical User Interface (GUI)](#graphical-user-interface-gui)
* [⇢ ⇢ Simple Mode](#simple-mode)
* [⇢ ⇢ ⇢ The Menu Bar](#the-menu-bar)
* [⇢ ⇢ ⇢ The Toolbar](#the-toolbar)
* [⇢ ⇢ ⇢ The Visualization](#the-visualization)
* [⇢ ⇢ ⇢ Color Differentiation](#color-differentiation)
* [⇢ ⇢ ⇢ The Sidebar](#the-sidebar)
* [⇢ ⇢ ⇢ The Log Window](#the-log-window)
* [⇢ ⇢ Expert Mode](#expert-mode)
* [⇢ ⇢ ⇢ New Functions in the Sidebar](#new-functions-in-the-sidebar)
* [⇢ ⇢ ⇢ Lamport Time, Vector Time, and Anti-Aliasing Switches](#lamport-time-vector-time-and-anti-aliasing-switches)
* [⇢ ⇢ ⇢ The Log Filter](#the-log-filter)
* [⇢ ⇢ Events](#events)
* [⇢ ⇢ ⇢ Key Features of Events:](#key-features-of-events)
* [⇢ ⇢ ⇢ Event Types Available:](#event-types-available)
* [⇢ ⇢ Summary](#summary)

## Motivation

Distributed systems are notoriously complex, with intricate interactions between multiple nodes, network partitions, and failure scenarios that can be difficult to understand and debug in production environments. A distributed systems simulator provides an invaluable learning tool that allows developers and students to experiment with different architectures, observe how systems behave under various failure conditions, and gain hands-on experience with concepts like consensus algorithms, replication strategies, and fault tolerance—all within a controlled, repeatable environment. By abstracting away the operational overhead of managing real distributed infrastructure, simulators enable focused exploration of system design principles and help bridge the gap between theoretical knowledge and practical understanding of how distributed systems actually work in the real world.

In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system:

> "A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum

The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system.

This thesis aims to make it easier for users to view distributed systems from a different perspective. Here, the viewpoint of an end user is not adopted; instead, the functional methods of protocols and their processes in distributed systems should be made comprehensible, while simultaneously making all relevant events of a distributed system transparent.

To achieve this goal, a simulator was developed, particularly for teaching and learning purposes at the University of Applied Sciences Aachen. With the simulator, protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there is ample room for personal experiments, with no restriction to a fixed number of protocols. It is therefore important that users are enabled to design their own protocols.

## Fundamentals

For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters.

### Client/Server Model

```
┌─────────────────────────────────────────────┐
│                                             │
│     ┌────────┐         ┌────────┐           │
│     │ Client │◄-------►│ Server │           │
│     └────────┘         └────────┘           │
│                                             │
│         Sending of Messages                 │
│                                             │
└─────────────────────────────────────────────┘

Figure 1.1: Client/Server Model
```

The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate.

### Processes and Their Roles

A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID).

### Messages

In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID).

### Local and Global Clocks

In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong.

Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later.

```
┌─────────────────────┐     ┌─────────────────────┐
│    Process 1        │     │    Process 2        │
│                     │     │                     │
│ ┌─────────────────┐ │     │ ┌─────────────────┐ │
│ │Server Protocol A│ │     │ │Client Protocol A│ │
│ └─────────────────┘ │     │ └─────────────────┘ │
│                     │     │                     │
│ ┌─────────────────┐ │     └─────────────────────┘
│ │Client Protocol B│ │     
│ └─────────────────┘ │     ┌─────────────────────┐
│                     │     │    Process 3        │
└─────────────────────┘     │                     │
                            │ ┌─────────────────┐ │
                            │ │Server Protocol B│ │
                            │ └─────────────────┘ │
                            │                     │
                            └─────────────────────┘

Figure 1.2: Client/Server Protocols
```

In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in Chapter 3.11.1.

### Events

A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective.

### Protocols

A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol.

In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other.

Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol in Chapter 3.3).

# Graphical User Interface (GUI)

## Simple Mode

![Figure 2.1: The simulator after first launch]

The simulator requires JDK 21 and can be started with the command `java -jar target/ds-sim-VERSION.jar`

The simulator then presents itself as shown in Figure 2.1. To create a new simulation, select "New Simulation" from the "File" menu (see Fig. 2.2), after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used.

By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later.

### The Menu Bar

In the File menu (see Fig. 2.2), you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations.

![Figure 2.2: File Menu]

Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. This will also be discussed in more detail later. The Simulator menu offers the same options as the toolbar, which is described in the next section.

Some menu items are only accessible when a simulation has already been created or loaded in the current window.

![Figure 2.3: A new simulation]

### The Toolbar

The toolbar is located at the top left of the simulator (see Fig. 2.4). The toolbar contains the functions most frequently needed by users.

The toolbar offers four different functions:

![Figure 2.4: The menu line including toolbar]

* Reset simulation: can only be activated when the simulation has been paused or has finished
* Repeat simulation: cannot be activated if the simulation has not yet been started 
* Pause simulation: can only be activated when the simulation is currently running
* Start simulation: can only be activated when the simulation is not currently running and has not yet finished

### The Visualization

![Figure 2.5: Visualization of a simulation that has not yet been started]

The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. Figure 2.5 shows 3 processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time.

![Figure 2.6: Right-click on a process bar]

The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later).

Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options (see Fig. 2.6). A process can only be forced to crash or be revived via the popup menu during a running simulation.

In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time.

### Color Differentiation

Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the colors listed in Table 2.1. These are only the default colors, which can be changed via the settings.

```
Table 2.1: Color differentiation of processes and messages

| Process Color | Meaning                                           |
|---------------|---------------------------------------------------|
| Black         | The simulation is not currently running           |
| Orange        | The mouse is over the process bar                 |
| Red           | The process has crashed                           |

| Message Color | Meaning                                                                 |
|---------------|-------------------------------------------------------------------------|
| Green         | The message is still in transit and has not yet reached its destination |
| Blue          | The message has successfully reached its destination                    |
| Red           | The message was lost                                                    |
```

### The Sidebar

![Figure 2.7: The sidebar with empty event editor]

The sidebar is used to program process events. At the top of Figure 2.7, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events (none present here yet) along with their occurrence times and PIDs.

![Figure 2.8: The event editor with 3 programmed events]

To create a new event, the user can either right-click on a process bar (see Fig. 2.6) and select "Insert local event", or select an event below the event table (see Fig. 2.9), enter the event occurrence time in the text field below, and click "Apply". For example, in Figure 2.8, three events were added: crash after 123ms, revival after 321ms, and another crash after 3000ms of the process with ID 1.

![Figure 2.9: Event selection via sidebar]

Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective.

In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process (see Fig. 2.13 left). There, all variables of the process can be edited, providing another way to access a process editor.

### The Log Window

The log window (see Fig. 2.3, bottom) logs all occurring events in chronological order. Figure 2.10 shows the log window after creating the demo simulation with 3 participating processes. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples.

![Figure 2.10: The log window]

```
000000ms: New Simulation
000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)
000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)
000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)

□ Expert mode ☑ Logging
```

By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance. This is due to the very slow Java implementation of the JTextArea class, which performs updates very sluggishly.

## Expert Mode

The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings. Figure 2.11 shows the simulator in expert mode. When comparing expert mode with simple mode, several differences are noticeable:

![Figure 2.11: The Simulator in Expert Mode]

### New Functions in the Sidebar

The first difference is visible in the sidebar (see Fig. 2.12). In addition to local events, global events can now also be edited. As already mentioned, local events are those events that occur when a specific local time of the associated process has been reached. Global events, on the other hand, are those events that occur when a specific global time has been reached. A global event takes the global simulation time and a local event takes the local process time as the entry criterion. Global events thus only make a difference when the local process times differ from the global time.

Furthermore, the user can directly select the associated PID when programming a new event. In simple mode, the PID of the currently selected process (in the topmost ComboBox) was always used by default (here with PID 1).

![Figure 2.12: The Sidebar in Expert Mode]

### Lamport Time, Vector Time, and Anti-Aliasing Switches

Further differences are noticeable below the log window. Among other things, there are two new switches "Lamport time" and "Vector time". If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. To maintain clarity, the user can only have one of these two switches activated at the same time.

The anti-aliasing switch allows the user to activate or deactivate anti-aliasing. With anti-aliasing, all graphics in the visualization are displayed with rounded edges (see [Bra03]). For performance reasons, anti-aliasing is not active by default.

### The Log Filter

As a simulation becomes more complex, the entries in the log window become increasingly confusing. Here it becomes increasingly difficult to keep track of all events. To counteract this, expert mode includes a log filter that makes it possible to filter only the essential data from the logs.

The log filter is activated and deactivated using the associated "Filter" switch. A regular expression in Java syntax can be entered in the input line behind it. The use of regular expressions using Java is covered in [Fri06]. For example, with `"PID: (1|2)"` only log lines are displayed that contain either "PID: 1" or "PID: 2". All other lines that only contain "PID: 3", for example, are not displayed. With the log filter, only the log lines that match the specified regular expression are displayed. The log filter can also be activated retroactively, as already logged events are filtered again after each filter change.

The log filter can also be used during a running simulation. When the filter is deactivated, all messages are displayed again. Log messages that have never been displayed due to the filter are then displayed retroactively.

![Figure 2.13: The Process Editor in the Sidebar]

## Events

Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as the arrival of a message or the execution of an action due to an alarm (more on this later).

### Key Features of Events:

* Local Eventsi: Triggered when a specific local time of the associated process is reached
* Global Eventsi: (Expert Mode only): Triggered when a specific global simulation time is reached
* Event Programmingi: Users can add events by: Right-clicking on a process bar and selecting "Insert local event", using the event editor in the sidebar or by sssssssssssthe event time and type

### Event Types Available:

* Process crash
* Process revival
* 1-Phase Commit Protocol events
* 2-Phase Commit Protocol events
* Basic Multicast Protocol events

The event editor allows users to:

* Copy or delete selected events (right-click functionality)
* Select multiple events using Ctrl key
* Edit time and PID values after creation
* Move events to different times or assign to different processes

> **Important**: Remember to press Enter after changing event occurrence times, otherwise the changes won't take effect.

## Summary

The expert mode significantly extends the simulator's capabilities, providing:

* Enhanced visualization options with Lamport and vector timestamps
* Global event programming in addition to local events
* Advanced log filtering for complex simulations
* Anti-aliasing for improved graphics
* Direct PID selection for event programming

These features make the simulator more powerful for advanced distributed systems simulation while maintaining the option to work in simple mode for basic use cases.

E-Mail your comments to `paul@nospam.buetow.org`

[Back to the main site](../)