summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md117
1 files changed, 108 insertions, 9 deletions
diff --git a/README.md b/README.md
index f142cbc..3ac637f 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,17 @@
# foostats
-Small Perl script reporting anonymous site stats for my foo.zone web and gemini capsule running on OpenBSD using `httpd` webserver and `relayd`, `inetd` + `vger` for Gemini.
+A privacy-respecting web analytics tool for OpenBSD that processes HTTP/HTTPS and Gemini protocol logs to generate anonymous site statistics. Designed for the foo.zone ecosystem and similar sites, it provides comprehensive traffic analysis while preserving visitor privacy through SHA3-512 IP hashing.
+
+## Features
+
+- **Privacy-First**: IP addresses are hashed using SHA3-512 before storage; no personal information retained
+- **Multi-Protocol Support**: Processes both traditional web server logs (httpd) and Gemini protocol logs (vger/relayd)
+- **Distributed Architecture**: Supports replication between multiple nodes for comprehensive statistics
+- **Security Filtering**: Blocks and logs suspicious requests based on configurable patterns
+- **Comprehensive Reporting**: Generates daily, monthly, and 30-day summary reports in Gemtext format
+- **Feed Analytics**: Tracks Atom/RSS and Gemfeed subscribers
+- **IPv4/IPv6 Support**: Full support for both protocols
+
## Installation
On OpenBSD, install dependencies:
@@ -11,22 +22,110 @@ doas pkg_add p5-Digest-SHA3 p5-PerlIO-gzip p5-JSON p5-String-Util p5-LWP-Protoco
## Usage
-To parse the logs, run:
+### Basic Operations
+Parse web and Gemini logs:
```sh
-doas perl foostats.pl --parse-logs
+doas perl foostats.pl --parse-logs
```
-Note, expected are the logs in OpenBSD's "forwarded" format (see `httpd.conf(5)`).
-
-To fetch logs from partner server, run:
-
+Replicate statistics from partner nodes:
```sh
doas perl foostats.pl --replicate
```
-To pretty print the (merged) logs, run:
+Generate reports from statistics:
+```sh
+doas perl foostats.pl --report
+```
+Perform all operations in sequence:
```sh
-doas perl foostats.pl --pretty-print
+doas perl foostats.pl --all
+```
+
+### Command-Line Options
+
```
+--parse-logs Parse web and gemini logs
+--replicate Replicate stats from partner node
+--report Generate a report from the stats
+--all Perform all of the above actions
+--stats-dir <path> Directory to store stats files
+ Default: /var/www/htdocs/buetow.org/self/foostats
+--odds-file <path> File with odd URI patterns to filter
+ Default: <stats-dir>/fooodds.txt
+--filter-log <path> Log file for filtered requests
+ Default: /var/log/fooodds
+--partner-node <hostname> Hostname of the partner node for replication
+ Default: fishfinger.buetow.org or blowfish.buetow.org
+--help Show help message
+```
+
+## Configuration
+
+### Log Format
+
+Expected log format is OpenBSD's "forwarded" format (see `httpd.conf(5)`). The tool processes:
+- httpd access logs from `/var/www/logs/access.log`
+- Gemini logs from `/var/log/gemini` (vger) and `/var/log/relayd` (relayd)
+
+### Filter Configuration
+
+Create a `fooodds.txt` file in your stats directory with URI patterns to filter out suspicious requests. Example patterns:
+```
+.php
+.asp
+/wp-admin
+/wordpress
+/phpmyadmin
+```
+
+## Architecture
+
+The tool consists of several modules:
+
+- **FileHelper**: Handles JSON and gzip file I/O operations
+- **DateHelper**: Manages date-related operations
+- **Logreader**: Parses httpd and Gemini (vger/relayd) logs
+- **Filter**: Filters out suspicious requests based on patterns
+- **Aggregator**: Aggregates statistics from log entries
+- **FileOutputter**: Outputs statistics to compressed JSON files
+- **Replicator**: Replicates stats between partner nodes
+- **Merger**: Merges statistics from multiple sources
+- **Reporter**: Generates human-readable Gemtext reports
+
+## Output
+
+### Statistics Files
+
+Compressed JSON statistics stored in the stats directory:
+- Daily stats: `YYYY-MM-DD-hostname.json.gz`
+- Aggregated data includes: unique visitors, request counts, filtered requests, top URLs, feed subscribers
+
+### Reports
+
+Gemtext reports generated in `stats/gemtext/`:
+- Daily reports: `YYYY-MM-DD.gmi`
+- Monthly reports: `YYYY-MM.gmi`
+- 30-day summary: `30-day-summary.gmi`
+- Yearly reports: `YYYY.gmi`
+
+Reports include:
+- Total requests and unique visitors
+- Protocol breakdown (HTTP vs Gemini, IPv4 vs IPv6)
+- Top hosts and URLs by unique visitors
+- Feed subscriber counts
+- Filtered/suspicious request statistics
+
+## Privacy Considerations
+
+- IP addresses are immediately hashed using SHA3-512
+- No cookies or tracking scripts
+- Only aggregated statistics are stored
+- Individual user behavior is not tracked
+- Excessive requests (>1/second) are filtered
+
+## License
+
+BSD 3-Clause License (see LICENSE file)