summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2021-12-15 16:06:48 +0000
committerPaul Buetow <paul@buetow.org>2021-12-16 09:22:35 +0000
commit895ed15df5144e367a5143d1c36d8abe2fec8f08 (patch)
tree027e080ea75a8d0f3bb0030194558c13ce1f3ccb /doc
parentb1f3760dc2f452c3dba7883a538fd14d62a581e9 (diff)
documenting how to implement a custom log format
Diffstat (limited to 'doc')
-rw-r--r--doc/index.md2
-rw-r--r--doc/logformats.md100
-rw-r--r--doc/querylanguage.md22
-rw-r--r--doc/testing.md12
4 files changed, 96 insertions, 40 deletions
diff --git a/doc/index.md b/doc/index.md
index 2fc790e..565253b 100644
--- a/doc/index.md
+++ b/doc/index.md
@@ -9,6 +9,6 @@ DTail Documentation
## Advanced topics
-* The [DTail Query Language](./querylanguage.md) is the starting point to dig deeper into DTail's own SQL-like mapreduce language for extracting stats from log files.
+* The [DTail Query Language](./querylanguage.md) is the starting point to dig deeper into DTail's own SQL-like mapreduce language for extraction/aggregation stats from log files.
* [Log Formats](./logformats.md) explains how to create your own custom log format for use with mapreduce queries.
* Check out the [Testing Guide](./testing.md) for unit and integration testing.
diff --git a/doc/logformats.md b/doc/logformats.md
index 06fff76..dd49c7c 100644
--- a/doc/logformats.md
+++ b/doc/logformats.md
@@ -1,19 +1,17 @@
Log Formats
===========
-You may have looked at the [DTail Query Language](./querylanguage.md) and wondered how to make DTail understand your own log formats. Otherwise, DTail won't be able to extract information from your logs (e.g. extract fields and variables from your log lines to be used in the query language).
+You may have looked at the [DTail Query Language](./querylanguage.md) and wondered how to make DTail understand your own log format(s). If DTail doesn't know your log format, it won't be able to extract much useful information from your logs. This information then can be used as fields (e.g. variables) by the Query Language.
-You could either make your application follow the DTail default log format, or you would need to implement a custom log format in Go.
+You could either make your application follow the DTail default log format, or you would need to implement a custom log format. Have a look at `./integrationtests/mapr_testdata.log` for an example a log file in the DTail default format.
-## Current log formats
+## Available log formats
The following log formats are currently available out of the box:
-* `default` - The default DTail log format.
-* `generic` - A generic log format with a very simple set of fields.
-* `generickv` - A simple log format expecting all log lines in form of `field1=value1|field2=value2|...`.
-
-For details, have a look at the implementations at `./internal/mapr/logformat/`.
+* `default` - The default DTail log format
+* `generic` - A generic log format with a very simple set of fields
+* `generickv` - A simple log format expecting all log lines in form of `field1=value1|field2=value2|...`
### Selecting a log format
@@ -23,43 +21,75 @@ By default, DTail will use the `default` log format. You can override the log fo
% dmap --files /var/log/example.log --query 'from EXAMPLE select ....queryhere.... logformat generickv'
```
-Alternatively, you can override the default log format via `MapreduceLogFormat` in the Server section of `dtail.json`.
+Alternatively, you can override the default log format with `MapreduceLogFormat` in the Server section of `dtail.json`.
-## Log format fields
+## Under the hood: generickv
-TODO: Difference between field and variables.
+As an example, let's have a look at the `generickv` log format's implementation. It's located at `internal/mapr/logformat/generickv.go`:
-## Log format variables
+```shell
+// MakeFieldsGENERIGKV is the generic key-value logfile parser.
+func (p *Parser) MakeFieldsGENERIGKV(maprLine string) (map[string]string, error) {
+ splitted := strings.Split(maprLine, protocol.FieldDelimiter)
+ fields := make(map[string]string, len(splitted))
+
+ fields["*"] = "*"
+ fields["$line"] = maprLine
+ fields["$empty"] = ""
+ fields["$hostname"] = p.hostname
+ fields["$server"] = p.hostname
+ fields["$timezone"] = p.timeZoneName
+ fields["$timeoffset"] = p.timeZoneOffset
+
+ for _, kv := range splitted[0:] {
+ keyAndValue := strings.SplitN(kv, "=", 2)
+ if len(keyAndValue) != 2 {
+ // dlog.Common.Debug("Unable to parse key-value token, ignoring it", kv)
+ continue
+ }
+ fields[strings.ToLower(keyAndValue[0])] = keyAndValue[1]
+ }
+
+ return fields, nil
+}
+```
+
+... whereas:
-This is the list of pre-defined variables. Please note that these vary depending on the log format used.
+* `maprLine` is the whole raw log line to be parsed by the log format.
+* `protocol.FieldDelimiter` is the field delimiter used by the log format, here: `|`.
+* All field names starting with `$` are variables. They store some custom values.
+* All other fields are bareword-fields and are extracted from the log lines directly, e.g. `field1=value1|field2=value2|...`
+
+## Log format variables
### Common variables:
-The common variables may exist in all log formats.
+The common variables may exist in all log formats:
* `$empty` - The empty string `""`
* `$hostname` - The server FQDN
-* `$line` - The current log line
+* `$line` - The whole log line
* `$server` - Alias for `$hostname`
* `$timeoffset` - Offset of $timezone
* `$timezone` - The current time zone
-* `*` - Special placeholder
+* `*` - Special placeholder. E.g. sometimes used by the query language to group by everything.
### Default log format variables:
-These variables may only exist when your logs are in the DTail default log format:
+These variables may only exist in the DTail default log format (see `internal/mapr/logformat/default.go` more details):
*Date and time:*
-* `$hour` - The current hour in format HH
-* `$minute` - The current minute in format MM
-* `$second` - The current second in format SS.
-* `$time` - The current time in format YYYYMMDD-HHMMSS
+* `$hour` - The hour in format HH
+* `$minute` - The minute in format MM
+* `$second` - The second in format SS.
+* `$time` - The time in format YYYYMMDD-HHMMSS
*Log level/severity:*
* `$loglevel` - Alias for `$severity`
-* `$severity` - The log severity
+* `$severity` - The log severity, one of `FATAL`, `ERROR`, `WARN`, `INFO`, `VERBOSE`, `DEBUG`, `DEVEL`, `TRACE`
*System and Go runtime:*
@@ -67,6 +97,30 @@ These variables may only exist when your logs are in the DTail default log forma
* `$cgocalls` - Num of DTail server CGo calls
* `$cpus` - Num of DTail server CPUs used
* `$goroutines` - Num of DTail server Goroutines used
-* `$loadavg` - 1 min. average load average
+* `$loadavg` - 1 min. load average
* `$pid` - DTail server process ID
* `$uptime` - DTail server uptime
+
+## Implementing your own log format
+
+All what needs to be done is to place your own implementation into the `logformat` source directory. As a template, you can copy an existing format ...
+
+```shell
+% cp internal/mapr/logformat/generic.go internal/mapr/logformat/yourcustomformat.go
+```
+
+... and replace `GENERIGKV` with your format's name in capital letters (the method name string is used by DTail to reflect the log format parser method, so it is important to name it correctly):
+
+```shell
+// MakeFieldsCUSTOMLOGFORMAT is your own custom log format.
+func (p *Parser) MakeFieldsCUSTOMLOGFORMAT(maprLine string) (map[string]string, error) {
+ // .. Your own format implementation goes here
+ // .. you can parse maprLine and store values into the fields map.
+..
+.
+.
+ return fields, nil
+}
+```
+
+Once done, recompile DTail. DTail now understands `... logformat customlogformat` (see "Seleting a log format" above).
diff --git a/doc/querylanguage.md b/doc/querylanguage.md
index 725b635..41e95de 100644
--- a/doc/querylanguage.md
+++ b/doc/querylanguage.md
@@ -1,34 +1,34 @@
DTail Query Language
====================
-The query language allows you to run mapreduce queries on log files. This page intends to be a reference to the language.
+The query language allows you to run mapreduce queries on log files. This page is the reference to the language.
## Prerequisites
For this to work, DTail needs to understand your log format. DTail already understands its own log format. You can have a look at all examples of the [examples](./examples.md) page using `-query` (these would be all examples of the `dmap` command, and some examples using the `dtail` command).
-DTail also ships with a generic log format, which only allows you to run very basic queries. Check out the [log format](./logformats.md) documentation for this. To implement your own log format, please also check out the log format documentation.
+DTail also ships with a generic log format, which only allows you to run very basic queries. Check out the [log format](./logformats.md) documentation for this. That page also documents how to implement your own log format parser.
## The language
-These are the fundamental types of the query language:
+This are the fundamental types of the query language:
```shell
NUMBER := A whole number (e.g. 42)
FLOAT := A float number, e.g. 3.14
STRING := A quoted string, e.g. "foo"
-FIELD := BAREWORD|VARIABLE
+FIELD := BAREWORD|$VARIABLE
BAREWORD := A bare string without quotes, e.g. foo. This usually contains a value
extracted from a log line.
-VARIABLE := Like a bareword, but with a $ prefix, e.g. $foo. This usually contains
+$VARIABLE := Like a bareword, but with a $ prefix, e.g. $foo. This usually contains
a special value set by DTail itself (not necessary from the log line).
```
This is the overall structure of a query:
```shell
-QUERY := from TABLE
- select SELECT1[,SELECT2...]
+QUERY := select SELECT1[,SELECT2...]
+ [from TABLE]
[where CONDITION1[,CONDITION2...]]
[group by FIELD1[,FIELD2...]]
[order|rorder by ORDERFIELD]
@@ -39,7 +39,7 @@ QUERY := from TABLE
[logformat LOGFORMAT]
```
-Whereas....
+... whereas:
```shell
TABLE := The mapreduce table name, e.g. STATS in MAPREDUCE:STATS
@@ -50,7 +50,7 @@ OPERATOR := FLOATOPERATOR|STRINGOPERATOR
FLOATOPERATOR := One of: == != < <= > >=
STRINGOPERATOR := eq|ne|contains|ncontains|lacks|hasprefix|nhasprefix|hassuffix|nhassuffix
ORDERFIELD := FIELD|AGGREGATION(FIELD)
-SET := VARIABLE = FLOAT|STRING|FIELD|FUNCTION(FIELD)
+SET := $VARIABLE = FLOAT|STRING|FIELD|FUNCTION(FIELD)
LOGFORMAT := default|generic|generickv|...
AGGREGATION := count|sum|min|max|avg|last|len
FUNCTION := md5sum|maskdigits
@@ -58,6 +58,6 @@ FUNCTION := md5sum|maskdigits
*Notes:*
-* `lacks` is an alias for `ncontains` (not contains)
-* `rorder` stands for reverse order and is the inverse of `order`
+* `rorder` stands for reverse order.
+* `lacks` is an alias for `ncontains` (not contains).
* Available fields (variables and barewords) vary from the log format used. Check out the [log format](./logformats.md) documentation for more information.
diff --git a/doc/testing.md b/doc/testing.md
index 0e802a7..123a5c3 100644
--- a/doc/testing.md
+++ b/doc/testing.md
@@ -7,7 +7,9 @@ Currently, there are 3 different ways of how DTail can be tested.
2. Integration tests (automatic)
3. Semi-manual tests with DTail server instances running in Docker.
-Also, not actually testing, DTail is being linted and vetted before each release. For this run `make lint` and `make vet` at the top level source directory.
+## Quality control
+
+Also, not actually testing, DTail is being linted and vetted before each release. For this run `make lint` and `make vet` at the top level source directory. Furthermore, to improve the quality of the software even more, the code is being scanned by SonarQube and Black Duck periodically. DTail is also audited and pen-tested by Mimecast staff. And, of course, new features are peer reviewed as well...
## Unit tests
@@ -21,7 +23,7 @@ It will run unit tests for each source directory one after another and abort imm
## Integration tests
-Other than the unit tests, which only test the internal code, the integration tests will run a set of DTail commands externally and thus simulating common end user use cases.
+Other than the unit tests, which only test the internal code, the integration tests will run a set of DTail commands externally and thus simulating common end user scenarios.
This means, that you will need to compile all DTail binaries prior to running these tests:
@@ -38,7 +40,7 @@ The integration tests can be enabled setting the following environment variable:
% export DTAIL_INTEGRATION_TEST_RUN_MODE=yes
```
-To run the integration test together with all the unit tests simply run `make test` in the top level source tree. In case you only want to run the integration tests without the normal unit tests, then just do:
+To run the integration test together with all the unit tests, simply run `make test` in the top level source tree. In case you only want to run the integration tests without the normal unit tests, then just do:
```shell
% go clean -testcache
@@ -51,7 +53,7 @@ To run the integration test together with all the unit tests simply run `make te
### Requirements
-This assumes, that you have Docker up and running on your system. The following has been tested only on Fedora Linux 35. For other versions of Fedora (or Linux) you might need to change the Docker base image used (see Dockerfile) as otherwise you might run into issues with the `GLIBC` major version used.
+This assumes, that you have Docker up and running on your system. The following has been tested only on Fedora Linux 35. For other versions of Fedora (or Linux) you might need to change the Docker base image used (see Dockerfile) as otherwise you might run into issues with a different `GLIBC` major version used.
This also assumes, that you have compiled all the DTail binaries already (with `make` in the top level source directory).
@@ -81,7 +83,7 @@ make: Leaving directory '/home/paul/git/dtail/docker'
### Starting a DTail server farm
-To spin up 10 instances of `dserver` run:
+To spin up 10 instances of the `dserver` Docker image, run:
```shell
% make -C docker spinup