summaryrefslogtreecommitdiff
path: root/gemfeed/2023-06-01-kiss-server-monitoring-with-gogios.gmi.tpl
blob: f306499e57bc553fb2101b29a73b4a7a6963b4a8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# KISS server monitoring with Gogios

> Published at 2023-06-01T21:10:17+03:00

Gogios is a minimalistic and easy-to-use monitoring tool I programmed in Google Go designed specifically for small-scale self-hosted servers and virtual machines. The primary purpose of Gogios is to monitor my personal server infrastructure for `foo.zone`, my MTAs, my authoritative DNS servers, my NextCloud, Wallabag and Anki sync server installations, etc.

With compatibility with the Nagios Check API, Gogios offers a simple yet effective solution to monitor a limited number of resources. In theory, Gogios scales to a couple of thousand checks, though. You can clone it from Codeberg here:

=> https://codeberg.org/snonux/gogios

=> ./kiss-server-monitoring-with-gogios/gogios-small.png Gogios logo

<< template::inline::toc

```
    _____________________________    ____________________________
   /                             \  /                            \
  |    _______________________    ||    ______________________    |
  |   /                       \   ||   /                      \   |
  |   | # Alerts with status c|   ||   | # Unhandled alerts:  |   |
  |   | hanged:               |   ||   |                      |   |
  |   |                       |   ||   | CRITICAL: Check Pizza|   |
  |   | OK->CRITICAL: Check Pi|   ||   | : Late delivery      |   |
  |   | zza: Late delivery    |   ||   |                      |   |
  |   |                       |   ||   | WARNING: Check Thirst|   |
  |   |                       |   ||   | : OutofKombuchaExcept|   |
  |   \_______________________/   ||   \______________________/   |
  |  /|\ GOGIOS MONITOR 1    _    ||  /|\ GOGIOS MONITOR 2   _    |
   \_____________________________/  \____________________________/
     !_________________________!      !________________________!

------------------------------------------------
ASCII art was modified by Paul Buetow
The original can be found at
https://asciiart.website/index.php?art=objects/computers
```

## Motivation

With experience in monitoring solutions like Nagios, Icinga, Prometheus and OpsGenie, these tools often came with many features that I didn't necessarily need for personal use. Contact groups, host groups, check clustering, and the requirement of operating a DBMS and a WebUI added complexity and bloat to my monitoring setup.

My primary goal was to have a single email address for notifications and a simple mechanism to periodically execute standard Nagios check scripts and notify me of any state changes. I wanted the most minimalistic monitoring solution possible but wasn't satisfied with the available options.

This led me to create Gogios, a lightweight monitoring tool tailored to my specific needs. I chose the Go programming language for this project as it comes, in my opinion, with the best balance of ease to use and performance.

## Features

* Compatible with Nagios Check scripts: Gogios leverages the widely-used Nagios Check API, allowing to use existing Nagios plugins.
* Lightweight and Minimalistic: Gogios is designed to be simple and fairly easy to set up.
* Configurable Check Timeout and Concurrency: Gogios allows you to set a timeout for checks and configure the number of concurrent checks, offering flexibility in monitoring your resources.
* Configurable check dependency: A check can depend on another check, which enables scenarios like not executing an HTTP check when the server isn't pingable.
* Retries: Check retry and retry intervals are configurable per check.
* Email Notifications: Gogios can send email notifications regarding the status of monitored services, ensuring you stay informed about potential issues.
* CRON-based Execution: Gogios can be quickly scheduled to run periodically via CRON, allowing you to automate monitoring without needing a complex setup.

## Example alert

This is an example alert report received via E-Mail. Whereas, `[C:2 W:0 U:0 OK:51]` means that we've got two alerts in status critical, 0 warnings, 0 unknowns and 51 OKs.

```
Subject: GOGIOS Report [C:2 W:0 U:0 OK:51]

This is the recent Gogios report!

# Alerts with status changed:

OK->CRITICAL: Check ICMP4 vulcan.buetow.org: Check command timed out
OK->CRITICAL: Check ICMP6 vulcan.buetow.org: Check command timed out

# Unhandled alerts:

CRITICAL: Check ICMP4 vulcan.buetow.org: Check command timed out
CRITICAL: Check ICMP6 vulcan.buetow.org: Check command timed out

Have a nice day!
```

## Installation

### Compiling and installing Gogios

This document is primarily written for OpenBSD, but applying the corresponding steps to any Unix-like (e.g. Linux-based) operating system should be easy. On systems other than OpenBSD, you may always have to replace `does` with the `sudo` command and replace the `/usr/local/bin` path with `/usr/bin`.

To compile and install Gogios on OpenBSD, follow these steps:

```shell
git clone https://codeberg.org/snonux/gogios.git
cd gogios
go build -o gogios cmd/gogios/main.go
doas cp gogios /usr/local/bin/gogios
doas chmod 755 /usr/local/bin/gogios
```

You can use cross-compilation if you want to compile Gogios for OpenBSD on a Linux system without installing the Go compiler on OpenBSD. Follow these steps:

```shell
export GOOS=openbsd
export GOARCH=amd64
go build -o gogios cmd/gogios/main.go
```

On your OpenBSD system, copy the binary to `/usr/local/bin/gogios` and set the correct permissions as described in the previous section. All steps described here you could automate with your configuration management system of choice. I use Rexify, the friendly configuration management system, to automate the installation, but that is out of the scope of this document.

=> https://www.rexify.org

### Setting up user, group and directories

It is best to create a dedicated system user and group for Gogios to ensure proper isolation and security. Here are the steps to create the `_gogios` user and group under OpenBSD:

```shell
doas adduser -group _gogios -batch _gogios
doas usermod -d /var/run/gogios _gogios
doas mkdir -p /var/run/gogios
doas chown _gogios:_gogios /var/run/gogios
doas chmod 750 /var/run/gogios
```

Please note that creating a user and group might differ depending on your operating system. For other operating systems, consult their documentation for creating system users and groups.

### Installing monitoring plugins

Gogios relies on external Nagios or Icinga monitoring plugin scripts. On OpenBSD, you can install the `monitoring-plugins` package with Gogios. The monitoring-plugins package is a collection of monitoring plugins, similar to Nagios plugins, that can be used to monitor various services and resources:

```shell
doas pkg_add monitoring-plugins
doas pkg_add nrpe # If you want to execute checks remotely via NRPE.
```

Once the installation is complete, you can find the monitoring plugins in the `/usr/local/libexec/nagios` directory, which then can be configured to be used in `gogios.json`.

## Configuration

### MTA

Gogios requires a local Mail Transfer Agent (MTA) such as Postfix or OpenBSD SMTPD running on the same server where the CRON job (see about the CRON job further below) is executed. The local MTA handles email delivery, allowing Gogios to send email notifications to monitor status changes. Before using Gogios, ensure that you have a properly configured MTA installed and running on your server to facilitate the sending of emails. Once the MTA is set up and functioning correctly, Gogios can leverage it to send email notifications.

You can use the mail command to send an email via the command line on OpenBSD. Here's an example of how to send a test email to ensure that your email server is working correctly:

```
echo 'This is a test email from OpenBSD.' | mail -s 'Test Email' your-email@example.com
```

Check the recipient's inbox to confirm the delivery of the test email. If the email is delivered successfully, it indicates that your email server is configured correctly and functioning. Please check your MTA logs in case of issues.

### Configuring Gogios

To configure Gogios, create a JSON configuration file (e.g., `/etc/gogios.json`). Here's an example configuration:

```json
{
  "EmailTo": "paul@dev.buetow.org",
  "EmailFrom": "gogios@buetow.org",
  "CheckTimeoutS": 10,
  "CheckConcurrency": 2,
  "StateDir": "/var/run/gogios",
  "Checks": {
    "Check ICMP4 www.foo.zone": {
      "Plugin": "/usr/local/libexec/nagios/check_ping",
      "Args": [ "-H", "www.foo.zone", "-4", "-w", "50,10%", "-c", "100,15%" ],
      "Retries": 3,
      "RetryInterval": 10
    },
    "Check ICMP6 www.foo.zone": {
      "Plugin": "/usr/local/libexec/nagios/check_ping",
      "Args": [ "-H", "www.foo.zone", "-6", "-w", "50,10%", "-c", "100,15%" ],
      "Retries": 3,
      "RetryInterval": 10
    },
    "www.foo.zone HTTP IPv4": {
      "Plugin": "/usr/local/libexec/nagios/check_http",
      "Args": ["www.foo.zone", "-4"],
      "DependsOn": ["Check ICMP4 www.foo.zone"]
    },
    "www.foo.zone HTTP IPv6": {
      "Plugin": "/usr/local/libexec/nagios/check_http",
      "Args": ["www.foo.zone", "-6"],
      "DependsOn": ["Check ICMP6 www.foo.zone"]
    }
    "Check NRPE Disk Usage foo.zone": {
      "Plugin": "/usr/local/libexec/nagios/check_nrpe",
      "Args": ["-H", "foo.zone", "-c", "check_disk", "-p", "5666", "-4"]
    }
  }
}
```

* `EmailTo`: Specifies the recipient of the email notifications.
* `EmailFrom`: Indicates the sender's email address for email notifications.
* `CheckTimeoutS`: Sets the timeout for checks in seconds.
* `CheckConcurrency`: Determines the number of concurrent checks that can run simultaneously.
* `StateDir`: Specifies the directory where Gogios stores its persistent state in a `state.json` file. 
* `Checks`: Defines a list of checks to be performed, each with a unique name, plugin path, and arguments.

Adjust the configuration file according to your needs, specifying the checks you want Gogios to perform.

If you want to execute checks only when another check succeeded (status OK), use `DependsOn`. In the example above, the HTTP checks won't run when the hosts aren't pingable. They will show up as `UNKNOWN` in the report.

`Retries` and `RetryInterval` are optional check configuration parameters. In case of failure, Gogios will retry `Retries` times each `RetryInterval` seconds.

For remote checks, use the `check_nrpe` plugin. You also need to have the NRPE server set up correctly on the target host (out of scope for this document).

The `state.json` file mentioned above keeps track of the monitoring state and check results between Gogios runs, enabling Gogios only to send email notifications when there are changes in the check status.

## Running Gogios

Now it is time to give it a first run. On OpenBSD, do:

```shell
doas -u _gogios /usr/local/bin/gogios -cfg /etc/gogios.json
```

To run Gogios via CRON on OpenBSD as the `gogios` user and check all services once per minute, follow these steps:

Type `doas crontab -e -u _gogios` and press Enter to open the crontab file for the `_gogios` user for editing and add the following lines to the crontab file:

```
*/5 8-22 * * * /usr/local/bin/gogios -cfg /etc/gogios.json
0 7 * * * /usr/local/bin/gogios -renotify -cfg /etc/gogios.json
```

Gogios is now configured to run every five minutes from 8 am to 10 pm via CRON as the `_gogios` user. It will execute the checks and send monitoring status whenever a check status changes via email according to your configuration. Also, Gogios will run once at 7 am every morning and re-notify all unhandled alerts as a reminder.

### High-availability

To create a high-availability Gogios setup, you can install Gogios on two servers that will monitor each other using the NRPE (Nagios Remote Plugin Executor) plugin. By running Gogios in alternate CRON intervals on both servers, you can ensure that even if one server goes down, the other will continue monitoring your infrastructure and sending notifications.

* Install Gogios on both servers following the compilation and installation instructions provided earlier.
* Install the NRPE server (out of scope for this document) and plugin on both servers. This plugin allows you to execute Nagios check scripts on remote hosts.
* Configure Gogios on both servers to monitor each other using the NRPE plugin. Add a check to the Gogios configuration file (`/etc/gogios.json`) on both servers that uses the NRPE plugin to execute a check script on the other server. For example, if you have Server A and Server B, the configuration on Server A should include a check for Server B, and vice versa.
* Set up alternate CRON intervals on both servers. Configure the CRON job on Server A to run Gogios at minutes 0, 10, 20, ..., and on Server B to run at minutes 5, 15, 25, ... This will ensure that if one server goes down, the other server will continue monitoring and sending notifications. 
* Gogios doesn't support clustering. So it means when both servers are up, unhandled alerts will be notified via E-Mail twice; from each server once. That's the trade-off for simplicity.

There are plans to make it possible to execute certain checks only on certain nodes (e.g. on elected leader or master nodes). This is still in progress (check out my Gorum Git project).

## Conclusion:

Gogios is a lightweight and straightforward monitoring tool that is perfect for small-scale environments. With its compatibility with the Nagios Check API, email notifications, and CRON-based scheduling, Gogios offers an easy-to-use solution for those looking to monitor a limited number of resources. I personally use it to execute around 500 checks on my personal server infrastructure. I am very happy with this solution.

E-Mail your comments to `paul@nospam.buetow.org` :-)

Other KISS-related posts are:

<< template::inline::rindex kiss simple-and-stupid

=> ../ Back to the main site