summaryrefslogtreecommitdiff
path: root/internal
AgeCommit message (Collapse)Author
2026-02-16fix: show OK checks regardless of stalenessv1.4.1Paul Buetow
The staleness filter was incorrectly hiding stale OK checks from the HTML report and email notifications. OK checks should always be shown since stale OK alerts are not concerning.
2026-02-08feat: add peer failover alertingv1.4.0Paul Buetow
Introduce peer URL monitoring with active/passive alert suppression, skip checks when passive, and bump version to v1.4.0. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08feat: write JSON status report next to HTMLPaul Buetow
Add a JSON report alongside the HTML status page with matching sections and summary counts, plus a last-updated timestamp for remote consumption. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-27feat: add minimum notification interval for email batchingPaul Buetow
Add MinNotifyIntervalS config option to batch email notifications over a time interval. When configured, Gogios only sends an email when: 1. The interval has elapsed since the last notification, AND 2. There's been a state change since the last notification. HTML status page and text reports continue updating on every run. The --force flag bypasses the interval for immediate notifications. Notification state (timestamp + check states snapshot) is persisted to {StateDir}/notify_state.json for comparison across runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23feat: implement version management and improved error handlingPaul Buetow
Co-authored-by: aider (ollama/qwen3-coder:latest) <aider@aider.chat>
2026-01-22add SU: (suppressed) count to status summary linePaul Buetow
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22fix OK checks incorrectly shown in suppressed sectionPaul Buetow
OK checks should appear in "OK checks" section, not "Suppressed alerts". Suppression now only applies to non-OK checks (Critical, Warning, Unknown). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21add alert suppressionPaul Buetow
2026-01-21add OnlyIfNotExists alert suppression featurePaul Buetow
Adds ability to suppress alerts during maintenance windows by checking for the existence of a file. When the file exists and is recent (within configured max age), matching alerts are excluded from email reports. Features: - Global PrometheusOnlyIfNotExists config for Prometheus alerts - Per-check OnlyIfNotExists config for individual checks - Configurable max age (default 86400s) for suppression file - New "Suppressed alerts" section in email and HTML reports - Suppressed checks excluded from counts and unhandled sections Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18add status page URL to email notificationsPaul Buetow
Include a configurable link to the HTML status page in email notifications. Defaults to https://gogios.buetow.org but can be customized via the StatusPageURL config option. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18exclude OK status from stale alerts reportPaul Buetow
Stale OK alerts are not concerning and shouldn't clutter the stale alerts section. Only report stale alerts with warning/critical/unknown status. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18clear resolved prometheus alerts from statePaul Buetow
When Prometheus alerts stop firing, they were previously left in state and became stale. Now they are automatically removed when no longer in the firing alerts list from Prometheus. Also fix Magefile Openbsd target to run build before deploy sequentially instead of using mg.Deps which runs them in parallel. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17Add OK checks section to HTML reportPaul Buetow
2026-01-15fixPaul Buetow
2026-01-10fix prometheus handlingPaul Buetow
2026-01-10only as a warningPaul Buetow
2026-01-08Add special handling for Prometheus Watchdog alertPaul Buetow
- Treat firing Watchdog as OK status to confirm Alertmanager is working - Treat absent/non-firing Watchdog as CRITICAL to alert on Alertmanager issues - Add comprehensive tests for both scenarios
2026-01-08Add Prometheus alert scraping with configurable timeout and host failoverPaul Buetow
2026-01-06Add HTML status page generation (v1.3.0)v1.3.0Paul Buetow
- Add configurable HTML status page generation after each check-run - Default output: /var/www/htdocs/buetow.org/self/gogios/index.html - New config options: HTMLStatusFile (path) and HTMLDisable (bool) - Auto-creates output directory if it doesn't exist - Uses atomic writes (tmp file + rename) to prevent corruption - Auto-refresh every 5 minutes via meta tag - Minimal, clean styling based on f3s_fallback template - W3C HTML5 compliant output - Email notifications on I/O errors via existing notifyError mechanism - Mirrors email report structure (status changes, unhandled alerts, stale alerts) - Comprehensive unit tests including W3C compliance validation - All user-generated content properly HTML-escaped for security 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-11-22more on federationPaul Buetow
2025-10-27feat: Add randomSpread and RunInterval to checksPaul Buetow
This commit introduces two new optional parameters to the check configuration: - `randomSpread`: This parameter allows specifying a random sleep time up to N seconds before a check is executed. This is useful to avoid all checks running at the same time. - `RunInterval`: This parameter defines the minimum interval in seconds between two executions of a check. This is useful if gogios is run more frequently than a specific check should be. The `README.md` has been updated to document these new features. fix: Fix deadlock when skipping checks This commit also fixes a deadlock that occurred when a check was skipped due to the `RunInterval` setting. The `inputWg.Done()` was not being called, causing the main goroutine to wait forever. build: Replace Taskfile with Magefile The `Taskfile.yml` has been replaced with a `Magefile.go` to manage the build process. This provides more flexibility and is more idiomatic for Go projects.
2025-06-12Add comprehensive unit tests for federated featurePaul Buetow
- Tests successful federation from remote endpoints - Tests error handling (server errors, invalid JSON, timeouts) - Tests multiple endpoint federation - Tests duplicate check name conflict resolution - Uses httptest.NewServer for realistic HTTP simulation - Validates federated check marking and status reporting 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-01more on federatedPaul Buetow
2025-06-01more on federationPaul Buetow
2025-05-29add commentPaul Buetow
2025-05-29also show last checked ago in stale reportPaul Buetow
2025-05-29can report stale alertsPaul Buetow
2025-05-28initial merge statePaul Buetow
2025-05-28add epoch to check statusPaul Buetow
2025-05-28persisting report to a file as wellPaul Buetow
2025-05-28fix unknown handlingPaul Buetow
2024-05-02add -force flagPaul Buetow
2023-10-04wrap errorPaul Buetow
2023-07-28lowercase errorPaul Buetow
2023-06-29use log.FatalPaul Buetow
2023-06-18add vet and lint checking - fix some lint errorsPaul Buetow
2023-06-18create state dir if it doesnt exist yetPaul Buetow
2023-06-07restylePaul Buetow
2023-05-17add retry and retry interval check config optionsPaul Buetow
2023-05-01fix dependencyPaul Buetow
2023-04-25add DependsOnPaul Buetow
2023-04-23remove comments for the obviousv0.0.0Paul Buetow
2023-04-22print out when there were no status changes or there are no unhandled alerts"Paul Buetow
2023-04-22dont show status change for unhandled reportPaul Buetow
2023-04-20add renotifyPaul Buetow
2023-04-20fix JSON tagsPaul Buetow
2023-04-19rename execute to runChecksPaul Buetow
2023-04-19add global timeoutPaul Buetow
2023-04-19clarifyPaul Buetow
2023-04-19add run.Paul Buetow