1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
|
# I/O Replay
## Overview
I/O Replay is an I/O benchmarking tool for Linux based operating systems which captures I/O operations on a (possibly production) server in order to replay the exact same I/O operations on a load test machine.
I/O Replay is operated in 5 steps:
1. Capture: Record all I/O operations over a given period of time to a capture log.
2. Initialize: Copy the log to a load test machine and initialize the load test environment.
3. Replay: Drop all OS caches and replay all I/O operations.
4. Analyze: Look at the OS and hardware stats (throughput, I/O ops, load average) from the run phase and draw conclusions. The aim is to identify possible I/O bottlenecks.
5. Repeat: Repeat 2-4 times but adjust OS and hardware settings in order to improve I/O performance.
Examples of OS and hardware settings and adjustments:
* Change of system parameters (file system mount options, file system caching, file system type, file system creation flags).
* Replay the I/O at different speed(s).
* Replay the I/O with modified pattern(s) (e.g. remove reads from the replay journal).
* Replay the I/O on different types of hardware.
The file system fragmentation (depending on the file system type and utilisation) might affect I/O performance as well. Therefore, replaying the I/O will not give the exact same result as on a production system. But it provides a pretty good way to determine I/O bottlenecks. As a rule of thumb file system fragmentation will not be an issue, unless the file system begins to fill up. Modern file systems (such as Ext4) will slowly start to suffer from fragmentation and slow down then.
## Benefits
In contrast to traditional I/O benchmarking tools, I/O Replay reproduces real production I/O, and does not rely on a pre-defined set of I/O operations.
Also, I/O Replay only requires a server machine for capturing and another server machine for replaying. A traditional load test environment would usually be a distributed system which can consist of many components and machines. Such a distributed system can become quite complex which makes it difficult to isolate possible I/O bottlenecks. For example in order to trigger I/O events a client application would usually have to call a remote server application. The remote server application itself would query a database and the database would trigger the actual I/O operations in Linux. Furthermore, it is not easy to switch forth and back between hardware and OS settings. For example without a backup and restore procedure a database would most likely be corrupt after reformatting the data partitions with a different file system type.
The benefits of I/O replay are:
* It is easy to determine whether a new hardware type is suitable for an already existing application.
* It is easy to change OS and hardware for performance tests and optimizations.
* Findings can be applied to production machines in order to optimize OS configuration and to save hardware costs.
* Benchmarks are based on production I/O patterns and not on artificial I/O patterns.
* Log files can be modified to see whether a change in the application behavior would improve I/O performance (without actually touching the application code)
* Log files could be generated synthetically in order to find out how a new application would perform (even if there isn't any code for the new application yet)
* It identifies possible flaws in the applications (e.g. Java programs which produce I/O operations on the server machines). Findings can be reported to the corresponding developers so that changes can be introduced to improve the applications I/O performance.
* It captures I/O in Linux Kernel space (very efficient, no system slowdowns even under heavy I/O load)
* It replays I/O via a tool developed in C with as little overhead as possible.
# Send in patches
Patches of any kind (bug fixes, new features...) are welcome! I/O Replay is new software and not everything might be perfect yet. Also, I/O Replay is used for a very specific use case at Mimecast. It may need tuning or extension for your use case. It will grow and mature over time.
This is also potentially a great tool just for analysing (not replaying) the I/O, therefore it would be a great opportunity to add more features related to that (e.g. more stats, filters, etc.).
Future work will also include file hole support and I/O support for memory mapped files.
# Getting started
I/O Replay consists of a set of SystemTap kernel modules (capturing I/O) and the tool ``ioreplay`` (replaying I/O). Usually you want to capture I/O from a production machine and want to replay it on a separate load testing machine.
## System requirements
I/O replay has been tested on
* CentOS 7.4 64Bit (latest version, all packages up to date, booted into the installed Kernel)
* SystemTap (from the default CentOS repository)
* GCC C-Compiler (from the default CentOS repository)
Before proceeding please ensure that the latest CentOS 7 kernel is installed and running on all machines involved. It should also be ensured that the capture machine and the load test machine have the same mount points mounted. This is to ensure that I/O is being replayed on the corresponding data drives on the load test machine.
## Compiling and installing ioreplay
I/O Replay has to be installed on all machines involved. To install I/O Replay perform the following steps:
```sh
sudo yum install gcc systemtap yum-utils kernel-devel-$(uname -r)
sudo debuginfo-install kernel-$(uname -r)
make && sudo make install
export PATH=$PATH:/opt/ioreplay/bin
```
This will install the ``ioreplay`` utility to ``/opt/ioreplay/bin/`` and the SystemTap kernel modules to ``/opt/ioreplay/systemtap/``. Run ``ioreplay -h`` to print out a brief help.
However, best practise is not to install any compilers on a production machine. You can either compile I/O Replay from scratch on all machines involved like shown above or only compile it on a build machine and distribute the ``/opt/ioreplay`` directory to the remaining machines. In the latter case you will also need to install the ``systemtap-runtime`` package as an additional dependency.
In case you decided to deinstall I/O Replay you can do so by running
```sh
sudo ioreplay -P # purges all test files created by ioreplay
sudo make uninstall
```
# Operating I/O Replay
## 1. Capture
The following steps are required to capture all I/O operation of the entire (Linux) system to the file ``io.capture``. For efficiency and security it is only capturing the meta data (amount of bytes written and read) and not the actual data itself. It is also capturing the system time in microseconds and the process IDs (PIDs) and thread IDs (TIDs) used as well as all relevant options and flags of the corresponding I/O syscalls. It will stop capturing automatically after 60 minutes:
* 1) Stop all applications on the machine. Otherwise the kernel module won't recognize any already opened file handles. Stopping the applications before starting with the capture is essential for tracing the flags in how the files were opened. All I/O operations on unknown file handles will be ignored otherwise.
* 2) Run:
```sh
sudo ioreplay -c ~/io.capture
```
* 3) Start all applications again.
* 4) To stop capturing I/O type Ctrl+C. Alternatively one hour for the Kernel module to auto exit.
To capture only I/O caused by Java process run:
```sh
sudo ioreplay -c ~/io.capture -m javaioreplay.ko
```
To capture the I/O of a specific process run the following respectively:
```sh
sudo ioreplay -c ~/io.capture -m targetedioreplay.ko -p PID
```
The resulting capture log looks like this and can be multiple GB in size:
```sh
t=1511381122062;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-9:1;:,f=0;:,m=438;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=0;:,
t=1511381122062;:,i=7764:8093;:,o=close;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-8:1;:,f=0;:,m=438;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=0;:,
t=1511381122062;:,i=7764:8093;:,o=close;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-9:0;:,f=0;:,m=438;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122062;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
t=1511381122062;:,i=7764:8093;:,o=read;:,d=162;:,b=0;:,
t=1511381122062;:,i=7764:8093;:,o=close;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-7:1;:,f=0;:,m=438;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
t=1511381122063;:,i=7764:8093;:,o=read;:,d=162;:,b=0;:,
t=1511381122063;:,i=7764:8093;:,o=close;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-8:0;:,f=0;:,m=438;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
t=1511381122063;:,i=7764:8093;:,o=read;:,d=162;:,b=0;:,
t=1511381122063;:,i=7764:8093;:,o=close;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=open;:,d=162;:,p=///usr/local/mimecast/someapp/somesubdir/vd11-6:1;:,f=0;:,m=438;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=read;:,d=162;:,b=12;:,
t=1511381122063;:,i=7764:8093;:,o=fstat;:,d=162;:,s=0;:,
t=1511381122063;:,i=7764:8093;:,o=lseek;:,d=162;:,O=0;:,W=1;:,b=12;:,
```
### Using a RAMdisk (optional)
It is beneficial to write ``io.capture`` to a RAMdisk so that we are not interfering so much with the system I/O:
```sh
sudo mkdir -p /mnt/ramdisk
sudo mount -t tmpfs -o size=32g tmpfs /mnt/ramdisk
```
Make sure that there is enough system memory available for such a RAMdisk and all the processes running on the machine. Eventually, RAM will be taken away from the Linux caches which potentially could decrease system I/O performance. Run the following command to capture to the RAMdisk respectively:
```sh
sudo ioreplay -c /mnt/ramdisk/io.capture
```
## 2. Initialize
### 2.1 Pre-process the capture log / generate a replay log
After producing ``io.capture`` it must be pre-processed. The resulting replay log introduces the following changes (for improving replay performance and make the parsing easier):
* Time stamps begin from 0
* Use of internal opcodes rather than strings (e.g. ``30`` instead of ``open``) for faster parsing.
* All operations on unknown file handles are _removed_.
* All incomplete or corrupt lines from the capture file are ignored. There may be corrupt lines in the capture file because SystemTap may skips a very few probe points if it decides that capturing I/O is causing too much overhead.
* Rewrite of all file paths. ``ioreplay`` adds ``/.ioreplay/NAME`` to all file paths for each file system mount point.
To generate the the replay log ``io.replay`` from the capture log ``io.capture`` run:
```sh
sudo ioreplay -c io.capture -r io.replay -n NAME -u USER
```
In which NAME is a freely chosen name and USER must be a valid system user. It is the system user under which the replay test will run. This command also creates all required top level directories such as ``/.ioreplay/NAME/``, ``/mnt/.ioreplay/NAME/`` in all mounted file systems. These are the directories where the replay test will read/write files from/to. These directories will belong to user USER.
``ioreplay`` will filter out many operations, especially all operations on pseudo file systems (e.g. sysfs, procfs), as it does not make a lot of sense to replay I/O on these file systems. Also, I/O operations on unknown file handles will be filtered out as well. This can happen when we start capturing the I/O *after* an application already opened a file. As a result we won't see how the application opened that file. The best practise is to stop all applications on the machine first, start capturing the I/O, and start all applications again. This may be improved in future releases of I/O Replay.
The resulting replay log will look like this: At the first line there is a meta header. It contains information about the test configuration. The meta header is followed by all the I/O operations. At the end of the file is the INIT section. It lists all files (also their sizes) and directories required to be present before replaying the I/O.
```sh
#|num_timelines=509591|num_mapped_pids=19189|num_mapped_fds=4292067|num_lines=55040114|replay_version=1|user=ioreplayuser|name=test0|init_offset=2578735248|
23|1|1|0|0|30|11|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-9:1|438|0|open@31|
23|1|1|0|0|0|11|0|fstat@32|
23|1|1|0|0|10|11|12|read@33|
23|1|1|0|0|0|11|0|fstat@34|
23|1|1|0|0|72|11|0|1|12|lseek@35|
23|1|1|0|0|10|11|0|read@36|
23|1|1|0|0|50|11|0|close@37|
23|2|1|0|0|30|12|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-8:1|438|0|open@38|
23|2|1|0|0|0|12|0|fstat@39|
23|2|1|0|0|10|12|12|read@40|
23|2|1|0|0|0|12|0|fstat@41|
23|2|1|0|0|72|12|0|1|12|lseek@42|
23|2|1|0|0|10|12|0|read@43|
23|2|1|0|0|50|12|0|close@44|
23|3|1|0|0|30|13|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-9:0|438|0|open@45|
23|3|1|0|0|0|13|0|fstat@46|
23|3|1|0|0|10|13|12|read@47|
23|3|1|0|0|0|13|0|fstat@48|
23|3|1|0|0|72|13|0|1|12|lseek@49|
23|3|1|0|0|10|13|0|read@50|
23|3|1|0|0|50|13|0|close@51|
23|4|1|0|0|30|14|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-7:1|438|0|open@52|
23|4|1|0|0|0|14|0|fstat@53|
23|4|1|0|0|10|14|12|read@54|
23|4|1|0|0|0|14|0|fstat@55|
23|4|1|0|0|72|14|0|1|12|lseek@56|
23|4|1|0|0|10|14|0|read@57|
23|4|1|0|0|50|14|0|close@58|
23|5|1|0|0|30|15|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-8:0|438|0|open@59|
23|5|1|0|0|0|15|0|fstat@60|
23|5|1|0|0|10|15|12|read@61|
23|5|1|0|0|0|15|0|fstat@62|
23|5|1|0|0|72|15|0|1|12|lseek@63|
23|5|1|0|0|10|15|0|read@64|
23|5|1|0|0|50|15|0|close@65|
23|6|1|0|0|30|16|/usr/local/mimecast/.ioreplay/test0/someapp/somesubdir/vd11-6:1|438|0|open@66|
23|6|1|0|0|0|16|0|fstat@67|
23|6|1|0|0|10|16|12|read@68|
23|6|1|0|0|0|16|0|fstat@69|
.
.
.
#INIT
0|1|688|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/8/b_dv01_11_vd11-11_a|@55290437
0|1|2592|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/3/b_dv01_11_vd11-11_b|@33907067
0|1|768|/mnt/14/.ioreplay/test0/bmnt/2/20171101/b/d/b_dv01_11_vd11-11_c|@64247527
0|1|1440|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/0/b_dv01_11_vd11-11_d|@2014896
0|1|960|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/9/b_dv01_11_vd11-11_e|@17724079
0|1|928|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/1/b_dv01_11_vd11-11_f|@4534389
0|1|1712|/mnt/14/.ioreplay/test0/bmnt/2/20171101/b/5/b_dv01_11_vd11-11_g|@2738458
0|1|784|/mnt/14/.ioreplay/test0/bmnt/2/20171101/b/b/b_dv01_11_vd11-11_h|@21136612
0|1|624|/mnt/14/.ioreplay/test0/bmnt/2/20171101/b/6/b_dv01_11_vd11-11_i|@24683427
0|1|672|/mnt/14/.ioreplay/test0/bmnt/2/20171101/b/9/b_dv01_11_vd11-11_j|@12584061
0|1|336|/mnt/15/.ioreplay/test0/bmnt/2/20171101/b/5/b_dv01_11_vd11-11_k|@7737434
0|1|12|/mnt/06/.ioreplay/test0/bmnt/tmp/b|@42498106
.
.
.
```
### 2.2 Initialize the replay test
It is very likely that the replay test wants to access already existing files. Therefore it has to be ensured that all of these exist already before starting the test. To create all files and directories required by the test run the following command:
```sh
sudo ioreplay -i io.replay
```
For that ``ioreplay`` makes use of the INIT section in ``io.replay``.
## 3. Replay
It has to be ensured that user USER can open many files and processes. Add the following to ``/etc/security/limits.d/ioreplay.conf``:
```sh
cat <<END | sudo tee /etc/security/limits.d/ioreplay.conf
* soft nofile 369216
* hard nofile 369216
* soft nproc 30768
* hard nproc 30768
END
```
To replay the log run:
```sh
sudo ioreplay -r io.replay
```
It is beneficial to read ``io.replay`` from RAMdisk so that we are not interfering so much with the system I/O.
*Init and replay in one go*
It is posisble to initialise the test and run the test with one single command, just replace option `-r` with `-R`:
```sh
sudo ioreplay -R io.replay
```
*Speed factor*
By default `ioreplay` tries to replay all I/O operations as fast as it can. To replay the I/O at a different speed it is possible to configure the speed factor by using the `-s` command line option.
The following pseudo code demonstrates how the speed factor affects the replay speed. Here `current_time` represents the current time while replaying the I/O, `time_in_log` represents the time as logged in `io.replay` and `time_ahead` indicates whether the replay is too quick or not.
```code
if (speed_factor != 0) {
time_ahead = time_in_log / speed_factor - current_time
if (time_ahead > 0) {
sleep(time_ahead)
}
}
```
A speed factor of `0` is interpreted as "replay as fast as possible". A speed factor of `1` can be used to replay everything in original speed (same speed as on the original host where the I/O was captured). A speed factor of `2` would double the speed and a speed factor of `0.5` would half the speed.
In order to replay the I/O in original speed the factor of `1` can be used as follows:
```sh
sudo ioreplay -R io.replay -s 1
```
## 4. Analyse
Look at various operating system statistics during the test. Useful commands are for example ``iostat -x 1``, ``dstat --disk`` and ``sudo iotop -o``. Best would be to collect all I/O statistics of all drives to a time series database with graphing capabilities such as Collectd/Graphite/Whisper.
## 5. Repeat
It is important to understand the I/O statistics observed. It is possible to repeat the same test any time again. Each time with different settings applied.
## Cleanup
To purge all temporally data files of all tests run
```sh
sudo ioreplay -P
```
Note: It's not required to cleanup any test data manually when you intend to re-run a test or run a new test. During initialization (``-i`` or ``-R`` switch) ``ioreplay`` will automatically move all old data to ``.ioreplay/.trash/`` sub-folders. The data will be ignored there. However, once you intend to completely delete all test files and directories (e.g. you run out of disk space or want to deinstall ``ioreplay`` you should purge them with ``-P`` as shown above.
## Supported file systems
Currently I/O Replay supports replaying I/O on ``ext2``, ``ext3``, ``ext4`` and ``xfs``. However, it should be straightforward add additional file systems.
## Supported syscalls
Currently, these file I/O related syscalls are supported (as of CentOS 7):
```code
open
openat
lseek
fcntl
creat
write
writev
unlink
unlinkat
rename
renameat
renameat2
read
readv
readahead - Initial support only
readdir
readlink
readlinkat
fdatasync
fsync
sync_file_range - Initial support only
sync
syncfs
close
getdents
mkdir
rmdir
mkdirat
stat
statfs - Initial support only
statfs64 - Initial support only
fstatfs - Initial support only
fstatfs64 - Initial support only
lstat
fstat
fstatat
chmod
fchmodat
fchmod
chown
chown16
lchown
lchown16
fchown
fchown16
fchownat
mmap2 - Initial support only
mremap - Initial support only
munmap - Initial support only
msync - Initial support only
exit_group - To detect process termination (closing all open file handles)
```
## Source code documentation
The documentation of the source code can be generated via the Doxygen Framework. To install doxygen run ``sudo yum install doxygen`` and to generate the documentation run ``make doxygen`` in the top level source directory. Once done, the resulting documentation can be found in the ``docs/html`` subfolder of the project. It is worthwhile to start from ``ioreplay/src/main.c`` and read your way through. Functions are generally documented in the header files. Exceptions are static functions which don't have any separate declarations.
|