diff options
Diffstat (limited to 'gemfeed')
| -rw-r--r-- | gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.md | 1 | ||||
| -rw-r--r-- | gemfeed/2022-03-06-the-release-of-dtail-4.0.0.md | 1 | ||||
| -rw-r--r-- | gemfeed/2022-10-30-installing-dtail-on-openbsd.md | 1 | ||||
| -rw-r--r-- | gemfeed/2023-09-25-dtail-usage-examples.md | 244 | ||||
| -rw-r--r-- | gemfeed/DRAFT-dtail-usage-examples.md | 230 | ||||
| -rw-r--r-- | gemfeed/DRAFT-site-reliability-engineering.md | 67 | ||||
| -rw-r--r-- | gemfeed/W | 228 | ||||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dcat.gif | bin | 0 -> 602213 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dgrep.gif | bin | 0 -> 1309227 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dmap.gif | bin | 0 -> 1154423 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dtail-map.gif | bin | 0 -> 298895 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dtail-map2.gif | bin | 0 -> 271416 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/dtail.gif | bin | 0 -> 2290260 bytes | |||
| -rw-r--r-- | gemfeed/dtail-usage-examples/testing.gif | bin | 0 -> 2637253 bytes | |||
| -rw-r--r-- | gemfeed/index.md | 1 |
15 files changed, 759 insertions, 14 deletions
diff --git a/gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.md b/gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.md index 9fe46a5e..8a88d1f5 100644 --- a/gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.md +++ b/gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.md @@ -108,6 +108,7 @@ Other related posts are: [2021-04-22 DTail - The distributed log tail program (You are currently reading this)](./2021-04-22-dtail-the-distributed-log-tail-program.md) [2022-03-06 The release of DTail 4.0.0](./2022-03-06-the-release-of-dtail-4.0.0.md) [2022-10-30 Installing DTail on OpenBSD](./2022-10-30-installing-dtail-on-openbsd.md) +[2023-09-25 DTail usage examples](./2023-09-25-dtail-usage-examples.md) E-Mail your comments to `foo@paul.cyou` :-) diff --git a/gemfeed/2022-03-06-the-release-of-dtail-4.0.0.md b/gemfeed/2022-03-06-the-release-of-dtail-4.0.0.md index b40dee1f..7400ba78 100644 --- a/gemfeed/2022-03-06-the-release-of-dtail-4.0.0.md +++ b/gemfeed/2022-03-06-the-release-of-dtail-4.0.0.md @@ -291,6 +291,7 @@ Other related posts are: [2021-04-22 DTail - The distributed log tail program](./2021-04-22-dtail-the-distributed-log-tail-program.md) [2022-03-06 The release of DTail 4.0.0 (You are currently reading this)](./2022-03-06-the-release-of-dtail-4.0.0.md) [2022-10-30 Installing DTail on OpenBSD](./2022-10-30-installing-dtail-on-openbsd.md) +[2023-09-25 DTail usage examples](./2023-09-25-dtail-usage-examples.md) Thanks! diff --git a/gemfeed/2022-10-30-installing-dtail-on-openbsd.md b/gemfeed/2022-10-30-installing-dtail-on-openbsd.md index 61191976..873173ce 100644 --- a/gemfeed/2022-10-30-installing-dtail-on-openbsd.md +++ b/gemfeed/2022-10-30-installing-dtail-on-openbsd.md @@ -344,6 +344,7 @@ Other related posts are: [2021-04-22 DTail - The distributed log tail program](./2021-04-22-dtail-the-distributed-log-tail-program.md) [2022-03-06 The release of DTail 4.0.0](./2022-03-06-the-release-of-dtail-4.0.0.md) [2022-10-30 Installing DTail on OpenBSD (You are currently reading this)](./2022-10-30-installing-dtail-on-openbsd.md) +[2023-09-25 DTail usage examples](./2023-09-25-dtail-usage-examples.md) E-Mail your comments to `foo@paul.cyou` :-) diff --git a/gemfeed/2023-09-25-dtail-usage-examples.md b/gemfeed/2023-09-25-dtail-usage-examples.md new file mode 100644 index 00000000..d077ae3e --- /dev/null +++ b/gemfeed/2023-09-25-dtail-usage-examples.md @@ -0,0 +1,244 @@ +# DTail usage examples + +> Published at 2023-09-25T14:57:42+03:00 + +Hey there. As I am pretty busy this month personally (I am now on Paternity Leave) and as I still want to post once monthly, the blog post of this month will only be some DTail usage examples. They're from the DTail documentation, but not all readers of my blog may be aware of those! + +DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once which I programmed in Go. + +[https://dtail.dev](https://dtail.dev) + +``` + ,_---~~~~~----._ + _,,_,*^____ _____``*g*\"*, + ____ _____ _ _ / __/ /' ^. / \ ^@q f + | _ \_ _|_ _(_) | @f | ((@| |@)) l 0 _/ + | | | || |/ _` | | | \`/ \~____ / __ \_____/ \ + | |_| || | (_| | | | | _l__l_ I + |____/ |_|\__,_|_|_| } [______] I + ] | | | | + ] ~ ~ | + | Let's tail those logs! | + | | +``` + +DTail consists out of a server and several client binaries. In this post, I am showcasing their use! + +* Use `dtail` to follow logs +* Use `dtail` to aggregate logs while they are followed +* Use `dcat` to display logs and other text files already written +* Use `dgrep` to grep (search) logs and other text files already written +* Use `dmap` to aggregate logs and other text files already written +* `dserver` is the DTail server, where all the clients can connect to + +## Following logs + +The following example demonstrates how to follow logs of several servers at once. The server list is provided as a flat text file. The example filters all records containing the string `INFO`. Any other Go compatible regular expression can also be used instead of `INFO`. + +```shell +% dtail --servers serverlist.txt --grep INFO --files "/var/log/dserver/*.log" +``` + +Hint: you can also provide a comma separated server list, e.g.: `servers server1.example.org,server2.example.org:PORT,...` + +[](./dtail-usage-examples/dtail.gif) + +> Hint: You can also use the shorthand version (omitting the `--files`) + +```shell +% dtail --servers serverlist.txt --grep INFO "/var/log/dserver/*.log" +``` + +## Aggregating logs + +To run ad-hoc map-reduce aggregations on newly written log lines you must add a query. The following example follows all remote log lines and prints out every few seconds the result to standard output. + +> Hint: To run a map-reduce query across log lines written in the past, please use the `dmap` command instead. + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Beware: For map-reduce queries to work, you have to ensure that DTail supports your log format. Check out the documentaiton of the DTail query language and the DTail log formats on the DTail homepage for more information. + +[](./dtail-usage-examples/dtail-map.gif) + +> Hint: You can also use the shorthand version: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Here is another example: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +[](./dtail-usage-examples/dtail-map2.gif) + +You can also continuously append the results to a CSV file by adding `outfile append filename.csv` to the query: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select ... outfile append result.csv' +``` + +## How to use `dcat` + +The following example demonstrates how to cat files (display the full content of the files) on several servers at once. + +As you can see in this example, a DTail client also creates a local log file of all received data in `~/log`. You can also use the `noColor` and `-plain` flags (this all also work with other DTail commands than `dcat`). + +```shell +% dcat --servers serverlist.txt --files /etc/hostname +``` + +[](./dtail-usage-examples/dcat.gif) + +> Hint: You can also use the shorthand version: + +```shell +% dcat --servers serverlist.txt /etc/hostname +``` + +## How to use `dgrep` + +The following example demonstrates how to grep files (display only the lines which match a given regular expression) of multiple servers at once. In this example, we look after some entries in `/etc/passwd`. This time, we don't provide the server list via an file but rather via a comma separated list directly on the command line. We also explore the `-before`, `-after` and `-max` flags (see animation). + +```shell +% dgrep --servers server1.example.org:2223 \ + --files /etc/passwd \ + --regex nologin +``` + +Generally, `dgrep` is also a very useful way to search historic application logs for certain content. + +[](./dtail-usage-examples/dgrep.gif) + +> Hint: `-regex` is an alias for `-grep`. + +## How to use `dmap` + +To run a map-reduce aggregation over logs written in the past, the `dmap` command can be used. The following example aggregates all map-reduce fields `dmap` will print interim results every few seconds. You can also write the result to an CSV file by adding `outfile result.csv` to the query. + +```shell +% dmap --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +Remember: For that to work, you have to make sure that DTail supports your log format. You can either use the ones already defined in `internal/mapr/logformat` or add an extension to support a custom log format. The example here works out of the box though, as DTail understands its own log format already. + +[](./dtail-usage-examples/dmap.gif) + +## How to use the DTail serverless mode + +Until now, all examples so far required to have remote server(s) to connect to. That makes sense, as after all DTail is a *distributed* tool. However, there are circumstances where you don't really need to connect to a server remotely. For example, you already have a login shell open to the server an all what you want is to run some queries directly on local log files. + +The serverless mode does not require any `dserver` up and running and therefore there is no networking/SSH involved. + +All commands shown so far also work in a serverless mode. All what needs to be done is to omit a server list. The DTail client then starts in serverless mode. + +### Serverless map-reduce query + +The following `dmap` example is the same as the previously shown one, but the difference is that it operates on a local log file directly: + +```shell +% dmap --files /var/log/dserver/dserver.log + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +As a shorthand version the following command can be used: + +```shell +% dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' \ + /var/log/dsever/dserver.log +``` + +You can also use a file input pipe as follows: + +```shell +% cat /var/log/dserver/dserver.log | \ + dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +### Aggregating CSV files + +In essence, this works exactly like aggregating logs. All files operated on must be valid CSV files and the first line of the CSV must be the header. E.g.: + +```shell +% cat example.csv +name,lastname,age,profession +Michael,Jordan,40,Basketball player +Michael,Jackson,100,Singer +Albert,Einstein,200,Physician +% dmap --query 'select lastname,name where age > 40 logformat csv outfile result.csv' example.csv +% cat result.csv +lastname,name +Jackson,Michael +Einstein,Albert +``` + +DMap can also be used to query and aggregate CSV files from remote servers. + +### Other serverless commands + +The serverless mode works transparently with all other DTail commands. Here are some examples: + +```shell +% dtail /var/log/dserver/dserver.log +``` + +```shell +% dtail --logLevel trace /var/log/dserver/dserver.log +``` + +```shell +% dcat /etc/passwd +``` + +```shell +% dcat --plain /etc/passwd > /etc/test +# Should show no differences. +diff /etc/test /etc/passwd +``` + +```shell +% dgrep --regex ERROR --files /var/log/dserver/dsever.log +``` + +```shell +% dgrep --before 10 --after 10 --max 10 --grep ERROR /var/log/dserver/dsever.log +``` + +Use `--help` for more available options. Or go to the DTail page for more information! Hope you find DTail useful! + +Other related posts are: + +[2021-04-22 DTail - The distributed log tail program](./2021-04-22-dtail-the-distributed-log-tail-program.md) +[2022-03-06 The release of DTail 4.0.0](./2022-03-06-the-release-of-dtail-4.0.0.md) +[2022-10-30 Installing DTail on OpenBSD](./2022-10-30-installing-dtail-on-openbsd.md) +[2023-09-25 DTail usage examples (You are currently reading this)](./2023-09-25-dtail-usage-examples.md) + +I hope you find the tools presented in this post useful! + +Paul + +E-Mail your comments to `foo@paul.cyou` :-) + +[Back to the main site](../) diff --git a/gemfeed/DRAFT-dtail-usage-examples.md b/gemfeed/DRAFT-dtail-usage-examples.md new file mode 100644 index 00000000..bd0d9f10 --- /dev/null +++ b/gemfeed/DRAFT-dtail-usage-examples.md @@ -0,0 +1,230 @@ +# DTail usage examples + +DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once which I programmed in Go. + +[https://dtail.dev](https://dtail.dev) + +``` + ,_---~~~~~----._ + _,,_,*^____ _____``*g*\"*, + ____ _____ _ _ / __/ /' ^. / \ ^@q f + | _ \_ _|_ _(_) | @f | ((@| |@)) l 0 _/ + | | | || |/ _` | | | \`/ \~____ / __ \_____/ \ + | |_| || | (_| | | | | _l__l_ I + |____/ |_|\__,_|_|_| } [______] I + ] | | | | + ] ~ ~ | + | Let's tail those logs! | + | | +``` + +DTail consists out of a server and several client binaries. In this post, I am showcasing their use! + +## Following logs + +The following example demonstrates how to follow logs of several servers at once. The server list is provided as a flat text file. The example filters all records containing the string `INFO`. Any other Go compatible regular expression can also be used instead of `INFO`. + +```shell +% dtail --servers serverlist.txt --grep INFO --files "/var/log/dserver/*.log" +``` + +Hint: you can also provide a comma separated server list, e.g.: `servers server1.example.org,server2.example.org:PORT,...` + +[](./dtail-usage-examples/dtail.gif) + +> Hint: You can also use the shorthand version (omitting the `--files`) + +```shell +% dtail --servers serverlist.txt --grep INFO "/var/log/dserver/*.log" +``` + +## Aggregating logs + +To run ad-hoc map-reduce aggregations on newly written log lines you must add a query. The following example follows all remote log lines and prints out every few seconds the result to standard output. + +> Hint: To run a map-reduce query across log lines written in the past, please use the `dmap` command instead. + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Beware: For map-reduce queries to work, you have to ensure that DTail supports your log format. Check out the documentaiton of the DTail query language and the DTail log formats on the DTail homepage for more information. + +[](./dtail-usage-examples/dtail-map.gif) + +> Hint: You can also use the shorthand version: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Here is another example: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +[](./dtail-usage-examples/dtail-map2.gif) + +You can also continuously append the results to a CSV file by adding `outfile append filename.csv` to the query: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select ... outfile append result.csv' +``` + +## How to use `dcat` + +The following example demonstrates how to cat files (display the full content of the files) on several servers at once. + +As you can see in this example, a DTail client also creates a local log file of all received data in `~/log`. You can also use the `noColor` and `-plain` flags (this all also work with other DTail commands than `dcat`). + +```shell +% dcat --servers serverlist.txt --files /etc/hostname +``` + +[](./dtail-usage-examples/dcat.gif) + +> Hint: You can also use the shorthand version: + +```shell +% dcat --servers serverlist.txt /etc/hostname +``` + +## How to use `dgrep` + +The following example demonstrates how to grep files (display only the lines which match a given regular expression) of multiple servers at once. In this example, we look after some entries in `/etc/passwd`. This time, we don't provide the server list via an file but rather via a comma separated list directly on the command line. We also explore the `-before`, `-after` and `-max` flags (see animation). + +```shell +% dgrep --servers server1.example.org:2223 \ + --files /etc/passwd \ + --regex nologin +``` + +Generally, `dgrep` is also a very useful way to search historic application logs for certain content. + +[](./dtail-usage-examples/dgrep.gif) + +Hint: `-regex` is an alias for `-grep`. + +## How to use `dmap` + +To run a map-reduce aggregation over logs written in the past, the `dmap` command can be used. The following example aggregates all map-reduce fields `dmap` will print interim results every few seconds. You can also write the result to an CSV file by adding `outfile result.csv` to the query. + +```shell +% dmap --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +Remember: For that to work, you have to make sure that DTail supports your log format. You can either use the ones already defined in `internal/mapr/logformat` or add an extension to support a custom log format. The example here works out of the box though, as DTail understands its own log format already. + +[](./dtail-usage-examples/dmap.gif) + +## How to use the DTail serverless mode + +Until now, all examples so far required to have remote server(s) to connect to. That makes sense, as after all DTail is a *distributed* tool. However, there are circumstances where you don't really need to connect to a server remotely. For example, you already have a login shell open to the server an all what you want is to run some queries directly on local log files. + +The serverless mode does not require any `dserver` up and running and therefore there is no networking/SSH involved. + +All commands shown so far also work in a serverless mode. All what needs to be done is to omit a server list. The DTail client then starts in serverless mode. + +### Serverless map-reduce query + +The following `dmap` example is the same as the previously shown one, but the difference is that it operates on a local log file directly: + +```shell +% dmap --files /var/log/dserver/dserver.log + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +As a shorthand version the following command can be used: + +```shell +% dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, +lifetimeConnections group by $hostname order by max($cgocalls)' /var/log/dsever/dserver.log +``` + +You can also use a file input pipe as follows: + +```shell +% cat /var/log/dserver/dserver.log | \ + dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +### Aggregating CSV files + +In essence, this works exactly like aggregating logs. All files operated on must be valid CSV files and the first line of the CSV must be the header. E.g.: + +```shell +% cat example.csv +name,lastname,age,profession +Michael,Jordan,40,Basketball player +Michael,Jackson,100,Singer +Albert,Einstein,200,Physician +% dmap --query 'select lastname,name where age > 40 logformat csv outfile result.csv' example.csv +% cat result.csv +lastname,name +Jackson,Michael +Einstein,Albert +``` + +DMap can also be used to query and aggregate CSV files from remote servers. + +### Other serverless commands + +The serverless mode works transparently with all other DTail commands. Here are some examples: + +```shell +% dtail /var/log/dserver/dserver.log +``` + +```shell +% dtail --logLevel trace /var/log/dserver/dserver.log +``` + +```shell +% dcat /etc/passwd +``` + +```shell +% dcat --plain /etc/passwd > /etc/test +# Should show no differences. +diff /etc/test /etc/passwd +``` + +```shell +% dgrep --regex ERROR --files /var/log/dserver/dsever.log +``` + +```shell +% dgrep --before 10 --after 10 --max 10 --grep ERROR /var/log/dserver/dsever.log + +Use `--help` for more available options. Or go to the DTail page for more information! Hope you find DTail useful! + +Other related posts are: + +[2021-04-22 DTail - The distributed log tail program](./2021-04-22-dtail-the-distributed-log-tail-program.md) +[2022-03-06 The release of DTail 4.0.0](./2022-03-06-the-release-of-dtail-4.0.0.md) +[2022-10-30 Installing DTail on OpenBSD](./2022-10-30-installing-dtail-on-openbsd.md) + +Thanks! + +Paul + +E-Mail your comments to `foo@paul.cyou` :-) + +[Back to the main site](../) diff --git a/gemfeed/DRAFT-site-reliability-engineering.md b/gemfeed/DRAFT-site-reliability-engineering.md index c73735f4..60a8897e 100644 --- a/gemfeed/DRAFT-site-reliability-engineering.md +++ b/gemfeed/DRAFT-site-reliability-engineering.md @@ -1,34 +1,73 @@ +## System Design and Incident Analysis: Building Resilience in the SRE Landscape + +A significant portion of the work revolves around system design and incident analysis. + +The first axiom is the acceptance of a bitter truth: things will always break. No matter the precision of which a system is crafted, the inevitability of failures looms large. However, what distinguishes a well-designed system from a mediocre one is its ability to minimise and contain cascading failures. These failures, if left unchecked, can spiral into global outages with come with consequences. + +There's a growing emphasis on building resilient systems to avoid such cascading failures to circumvent this. Such resilience requires foresight in system design, wherein potential weakpoints are identified and addressed before deployed to production. Prevention is better than cure. The primary objective is ensuring that services remain uninterrupted and dependable. + +Yet, despite these preventative measures, when incidents do arise, their analysis becomes a goldmine of learning. Every incident exposes gaps within the system. Instead of attributing these incidents to nebulous concepts like "human error," the onus is on dissecting them to uncover underlying systemic issues. Whether it's a tooling gap where operational tools prove insufficient or an operational expertise gap where engineers lack critical skills, incident analysis shines a light on these deficiencies. + +In doing so, incident analysis is about rectifying the immediate issue and learning and evolving the system design. Every incident offers an opportunity, a feedback loop, to refine the system further. Through rigorous postmortems focusing on customer impact, organisations can distil valuable lessons. These lessons, when incorporated, make the system more robust and less susceptible to similar failures in the future. + +Moreover, as systems grow more complex, the importance of observability tools cannot be overstated. These tools, designed to query against high cardinality data, provide granular insights into system operations. They enable engineers to diagnose problems rapidly, especially in the chaotic aftermath of an incident, giving clarity amidst the turmoil. + +In conclusion, the symbiotic relationship between system design and incident analysis underscores the evolving ethos of SRE. While impeccable system design lays the foundation for reliable operations, incident analysis ensures that this foundation remains robust and dynamic, adapting to challenges. Together, they form the pillars of a resilient, customer-centric service environment that stands the test of time. + +Add paragraph about product wants features, but observability is often an afterthought. So often, during an incident, people start agreeing, and then it was already too late. + +[6 minutes to wt.](add) + ## The Heroic Facade and Team Dynamics: Rethinking Success in SRE The realm of Site Reliability Engineering is punctuated by the constant ebb and flow of system challenges. While individual excellence is commendable, the overarching belief in the SRE culture should be that true success lies in cohesive teamwork and not in individual heroics. -The allure of the "hero" is undeniable. There's a certain appeal in being the one who swoops in, fixes critical incidents, and saves the day. However, this hero culture, while often romanticised, has its pitfalls. Heroes are necessary, no doubt, but a hero culture can often obscure the collaborative essence of SRE. Recognising that heroes do their best work as part of a team is a profound acknowledgement that true heroes don't need a hero culture to excel. +he SRE Hero is an anti-pattern that can occur when a few individuals consistently step in to save the day during incidents or emergencies, earning themselves the status of heroes. While this might seem positive at first, it can lead to several negative outcomes and should be addressed to ensure the reliability and sustainability of the SRE team's operations. These individuals might possess specialized knowledge, quick problem-solving skills, or simply a willingness to work long hours. As a result, they become the go-to people whenever something goes wrong. -The danger of a hero-driven approach is that it can lead to an over-reliance on specific individuals. The assumption that certain team members will always be there to address and mitigate issues can be a dangerous precedent. It fosters a reactive culture rather than a proactive one. Instead of developing inherently more resilient and reliable systems, the organisation starts relying on these heroes as a Band-Aid® solution, masking deeper systemic problems. +This culture can emerge for various reasons: -A further dimension to this issue is the impact on team morale. Continually being in the spotlight, heroes might be inadvertently sidelining other team members, leading to feelings of underappreciation or undervaluation. Such a dynamic can hinder sharing knowledge, collaboration, and preparation – the pillars that successful SRE teams are built on. +- Immediate Problem Solving: Heroes are praised for their ability to solve issues quickly. However, this may lead to bypassing proper post-incident analysis and learning, as the focus is on getting systems up and running as fast as possible. -However, this isn't to say that individual excellence should be curbed. Instead, it's about shifting the narrative. Building a team culture based on collaboration ensures that knowledge sharing becomes second nature. Such an environment propels teams towards a dynamic where preparation and proactive measures are valued over-reactive heroics. When success stories are shared as a collective win, it boosts team morale and fosters a sense of shared responsibility. +- Burnout and Fatigue: Heroes are often overworked and stressed, leading to burnout and high turnover rates. -In the broader spectrum of SRE, it's also crucial to recognise the silent work – the preventive measures, the well-thought-out systems, the meticulous planning – that ensures incidents don't occur. This proactive approach often goes unnoticed because, in a well-functioning system, the absence of issues is the norm. But this 'silence' is a testament to a team working harmoniously, with every member contributing towards system reliability. +- Skill Asymmetry: If only a few team members possess specific knowledge or skills, others may not have the chance to learn, grow, and take on more responsibilities. -To conclude, while the heroics in SRE can often be the stuff of legends, it's vital to see beyond this facade. The countless hours of teamwork, collaboration, and shared responsibility lie in the shadows of these heroic acts. The future of SRE lies not in individual heroics but in teams that operate like well-oiled machines, with every cog, big or small, playing its part to perfection. +- Dependency: Teams become dependent on heroes, leading to a lack of collaboration and shared ownership of systems. -## System Design and Incident Analysis: Building Resilience in the SRE Landscape +How can you fix it? -In the intricate domain of Site Reliability Engineering, a significant portion of the professional narrative revolves around system design and incident analysis. +- Incident Reviews and Post-Mortems: Conduct thorough post-incident reviews to understand the root causes of issues. Focus on learning and prevention rather than just quick fixes. -The first axiom in the world of system reliability is the acceptance of a bitter truth: things will always break. No matter the precision or the prowess with which a system is crafted, the inevitability of failures looms large. However, what distinguishes a well-designed system from a mediocre one is its ability to minimise and contain cascading failures. These failures, if left unchecked, can spiral into global outages with dire consequences. +- Distribute Knowledge: Encourage knowledge sharing by documenting incidents, solutions, and best practices. Consider implementing a knowledge-sharing platform or wiki. -There's a growing emphasis on building resilient systems to avoid such cascading failures to circumvent this. Such resilience is a testament to the foresight in system design, wherein potential chokepoints and vulnerabilities are identified and fortified. Prevention, as the age-old adage goes, is indeed better than cure. This is particularly pertinent to SRE, whose primary objective is ensuring that services remain uninterrupted and dependable. +- Rotating Responsibilities: Rotate on-call and incident response responsibilities among team members. This prevents burnout and ensures that everyone gains experience. -Yet, despite these preventative measures, when incidents do arise, their analysis becomes a goldmine of learning. Every incident, irrespective of its severity, exposes gaps within the system. Instead of attributing these incidents to nebulous concepts like "human error," the onus is on dissecting them to uncover underlying systemic issues. Whether it's a tooling gap where operational tools prove insufficient or an operational expertise gap where engineers lack critical skills, incident analysis shines a light on these deficiencies. +- Automation and Tooling: Develop automation and tools that enable the entire team to handle incidents more effectively, reducing the reliance on individual heroics. -In doing so, incident analysis is about rectifying the immediate issue and learning and evolving the system design. Every incident offers an opportunity, a feedback loop, to refine the system further. Through rigorous postmortems focusing on customer impact, organisations can distil valuable lessons. These lessons, when incorporated, make the system more robust and less susceptible to similar failures in the future. +- Training and Skill Development: Provide training and resources to help all team members enhance their skills. This levels the playing field and reduces skill asymmetry. -Moreover, as systems grow more complex, the importance of observability tools cannot be overstated. These tools, designed to query against high cardinality data, provide granular insights into system operations. They enable engineers to diagnose problems rapidly, especially in the chaotic aftermath of an incident, giving clarity amidst the turmoil. +- Recognize Collaborative Efforts: Shift the focus from individual heroics to collaborative efforts. Recognize and reward team members who contribute to preventive measures, incident response improvements, and system stability. -In conclusion, the symbiotic relationship between system design and incident analysis underscores the evolving ethos of SRE. While impeccable system design lays the foundation for reliable operations, incident analysis ensures that this foundation remains robust and dynamic, adapting to challenges. Together, they form the pillars of a resilient, customer-centric service environment that stands the test of time. +- Leadership Support: Management should actively support efforts to address the hero culture. This might involve setting expectations for collaboration, learning, and shared responsibility. + +- Celebrate Learning: Emphasize that learning from failures is a positive outcome. This encourages a culture of continuous improvement rather than blame. + +By addressing the hero culture and fostering a collaborative, learning-oriented environment, SRE teams can enhance their overall effectiveness, prevent burnout, and ensure the long-term stability of the systems they manage. + + + + +The allure of the "hero" is undeniable. There's a certain appeal in being the one who swoops in, fixes critical incidents, and saves the day. However, this hero culture, while often romanticised, has its pitfalls. Heroes are necessary, no doubt, but a hero culture can often obscure the collaborative essence of SRE. Recognising that heroes do their best work as part of a team is a profound acknowledgement that true heroes don't need a hero culture to excel. + +The danger of a hero-driven approach is that it can lead to an over-reliance on specific individuals. The assumption that certain team members will always be there to address and mitigate issues can be a dangerous precedent. It fosters a reactive culture rather than a proactive one. Instead of developing inherently more resilient and reliable systems, the organisation starts relying on these heroes as a Band-Aid® solution, masking deeper systemic problems. + +A further dimension to this issue is the impact on team morale. Continually being in the spotlight, heroes might be inadvertently sidelining other team members, leading to feelings of underappreciation or undervaluation. Such a dynamic can hinder sharing knowledge, collaboration, and preparation – the pillars that successful SRE teams are built on. + +However, this isn't to say that individual excellence should be curbed. Instead, it's about shifting the narrative. Building a team culture based on collaboration ensures that knowledge sharing becomes second nature. Such an environment propels teams towards a dynamic where preparation and proactive measures are valued over-reactive heroics. When success stories are shared as a collective win, it boosts team morale and fosters a sense of shared responsibility. + +In the broader spectrum of SRE, it's also crucial to recognise the silent work – the preventive measures, the well-thought-out systems, the meticulous planning – that ensures incidents don't occur. This proactive approach often goes unnoticed because, in a well-functioning system, the absence of issues is the norm. But this 'silence' is a testament to a team working harmoniously, with every member contributing towards system reliability. + +To conclude, while the heroics in SRE can often be the stuff of legends, it's vital to see beyond this facade. The countless hours of teamwork, collaboration, and shared responsibility lie in the shadows of these heroic acts. The future of SRE lies not in individual heroics but in teams that operate like well-oiled machines, with every cog, big or small, playing its part to perfection. ## Monitoring, Observability, and the SRE Arsenal: Navigating the Nuances of System Reliability diff --git a/gemfeed/W b/gemfeed/W new file mode 100644 index 00000000..7b753922 --- /dev/null +++ b/gemfeed/W @@ -0,0 +1,228 @@ +# DTail usage examples + +DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once which I programmed in Go. + +=> https://dtail.dev + +``` + ,_---~~~~~----._ + _,,_,*^____ _____``*g*\"*, + ____ _____ _ _ / __/ /' ^. / \ ^@q f + | _ \_ _|_ _(_) | @f | ((@| |@)) l 0 _/ + | | | || |/ _` | | | \`/ \~____ / __ \_____/ \ + | |_| || | (_| | | | | _l__l_ I + |____/ |_|\__,_|_|_| } [______] I + ] | | | | + ] ~ ~ | + | Let's tail those logs! | + | | +``` + +DTail consists out of a server and several client binaries. In this post, I am showcasing their use! + +## Following logs + +The following example demonstrates how to follow logs of several servers at once. The server list is provided as a flat text file. The example filters all records containing the string `INFO`. Any other Go compatible regular expression can also be used instead of `INFO`. + +```shell +% dtail --servers serverlist.txt --grep INFO --files "/var/log/dserver/*.log" +``` + +Hint: you can also provide a comma separated server list, e.g.: `servers server1.example.org,server2.example.org:PORT,...` + +=> ./dtail-usage-examples/dtail.gif Tail example + +> Hint: You can also use the shorthand version (omitting the `--files`) + +```shell +% dtail --servers serverlist.txt --grep INFO "/var/log/dserver/*.log" +``` + +## Aggregating logs + +To run ad-hoc map-reduce aggregations on newly written log lines you must add a query. The following example follows all remote log lines and prints out every few seconds the result to standard output. + +> Hint: To run a map-reduce query across log lines written in the past, please use the `dmap` command instead. + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Beware: For map-reduce queries to work, you have to ensure that DTail supports your log format. Check out the documentaiton of the DTail query language and the DTail log formats on the DTail homepage for more information. + +=> ./dtail-usage-examples/dtail-map.gif Tail map-reduce example + +> Hint: You can also use the shorthand version: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + 'from STATS select sum($goroutines),sum($cgocalls), + last($time),max(lifetimeConnections)' +``` + +Here is another example: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +=> ./dtail-usage-examples/dtail-map2.gif Tail map-reduce example 2 + +You can also continuously append the results to a CSV file by adding `outfile append filename.csv` to the query: + +```shell +% dtail --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select ... outfile append result.csv' +``` + +## How to use `dcat` + +The following example demonstrates how to cat files (display the full content of the files) on several servers at once. + +As you can see in this example, a DTail client also creates a local log file of all received data in `~/log`. You can also use the `noColor` and `-plain` flags (this all also work with other DTail commands than `dcat`). + +```shell +% dcat --servers serverlist.txt --files /etc/hostname +``` + +=> ./dtail-usage-examples/dcat.gif Cat example + +> Hint: You can also use the shorthand version: + +```shell +% dcat --servers serverlist.txt /etc/hostname +``` + +## How to use `dgrep` + +The following example demonstrates how to grep files (display only the lines which match a given regular expression) of multiple servers at once. In this example, we look after some entries in `/etc/passwd`. This time, we don't provide the server list via an file but rather via a comma separated list directly on the command line. We also explore the `-before`, `-after` and `-max` flags (see animation). + +```shell +% dgrep --servers server1.example.org:2223 \ + --files /etc/passwd \ + --regex nologin +``` + +Generally, `dgrep` is also a very useful way to search historic application logs for certain content. + +=> ./dtail-usage-examples/dgrep.gif Grep example + +Hint: `-regex` is an alias for `-grep`. + +## How to use `dmap` + +To run a map-reduce aggregation over logs written in the past, the `dmap` command can be used. The following example aggregates all map-reduce fields `dmap` will print interim results every few seconds. You can also write the result to an CSV file by adding `outfile result.csv` to the query. + +```shell +% dmap --servers serverlist.txt \ + --files '/var/log/dserver/*.log' \ + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +Remember: For that to work, you have to make sure that DTail supports your log format. You can either use the ones already defined in `internal/mapr/logformat` or add an extension to support a custom log format. The example here works out of the box though, as DTail understands its own log format already. + +=> ./dtail-usage-examples/dmap.gif DMap example + +## How to use the DTail serverless mode + +Until now, all examples so far required to have remote server(s) to connect to. That makes sense, as after all DTail is a *distributed* tool. However, there are circumstances where you don't really need to connect to a server remotely. For example, you already have a login shell open to the server an all what you want is to run some queries directly on local log files. + +The serverless mode does not require any `dserver` up and running and therefore there is no networking/SSH involved. + +All commands shown so far also work in a serverless mode. All what needs to be done is to omit a server list. The DTail client then starts in serverless mode. + +### Serverless map-reduce query + +The following `dmap` example is the same as the previously shown one, but the difference is that it operates on a local log file directly: + +```shell +% dmap --files /var/log/dserver/dserver.log + --query 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +As a shorthand version the following command can be used: + +```shell +% dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, +lifetimeConnections group by $hostname order by max($cgocalls)' /var/log/dsever/dserver.log +``` + +You can also use a file input pipe as follows: + +```shell +% cat /var/log/dserver/dserver.log | \ + dmap 'from STATS select $hostname,max($goroutines),max($cgocalls),$loadavg, + lifetimeConnections group by $hostname order by max($cgocalls)' +``` + +### Aggregating CSV files + +In essence, this works exactly like aggregating logs. All files operated on must be valid CSV files and the first line of the CSV must be the header. E.g.: + +```shell +% cat example.csv +name,lastname,age,profession +Michael,Jordan,40,Basketball player +Michael,Jackson,100,Singer +Albert,Einstein,200,Physician +% dmap --query 'select lastname,name where age > 40 logformat csv outfile result.csv' example.csv +% cat result.csv +lastname,name +Jackson,Michael +Einstein,Albert +``` + +DMap can also be used to query and aggregate CSV files from remote servers. + +### Other serverless commands + +The serverless mode works transparently with all other DTail commands. Here are some examples: + +```shell +% dtail /var/log/dserver/dserver.log +``` + +```shell +% dtail --logLevel trace /var/log/dserver/dserver.log +``` + +```shell +% dcat /etc/passwd +``` + +```shell +% dcat --plain /etc/passwd > /etc/test +# Should show no differences. +diff /etc/test /etc/passwd +``` + +```shell +% dgrep --regex ERROR --files /var/log/dserver/dsever.log +``` + +```shell +% dgrep --before 10 --after 10 --max 10 --grep ERROR /var/log/dserver/dsever.log + +Use `--help` for more available options. Or go to the DTail page for more information! Hope you find DTail useful! + +Other related posts are: + +<< template::inline::index dtail + +Thanks! + +Paul + +E-Mail your comments to `foo@paul.cyou` :-) + +=> ../ Back to the main site diff --git a/gemfeed/dtail-usage-examples/dcat.gif b/gemfeed/dtail-usage-examples/dcat.gif Binary files differnew file mode 100644 index 00000000..a5b9369d --- /dev/null +++ b/gemfeed/dtail-usage-examples/dcat.gif diff --git a/gemfeed/dtail-usage-examples/dgrep.gif b/gemfeed/dtail-usage-examples/dgrep.gif Binary files differnew file mode 100644 index 00000000..e5314604 --- /dev/null +++ b/gemfeed/dtail-usage-examples/dgrep.gif diff --git a/gemfeed/dtail-usage-examples/dmap.gif b/gemfeed/dtail-usage-examples/dmap.gif Binary files differnew file mode 100644 index 00000000..d2701038 --- /dev/null +++ b/gemfeed/dtail-usage-examples/dmap.gif diff --git a/gemfeed/dtail-usage-examples/dtail-map.gif b/gemfeed/dtail-usage-examples/dtail-map.gif Binary files differnew file mode 100644 index 00000000..0bcfb156 --- /dev/null +++ b/gemfeed/dtail-usage-examples/dtail-map.gif diff --git a/gemfeed/dtail-usage-examples/dtail-map2.gif b/gemfeed/dtail-usage-examples/dtail-map2.gif Binary files differnew file mode 100644 index 00000000..1220b732 --- /dev/null +++ b/gemfeed/dtail-usage-examples/dtail-map2.gif diff --git a/gemfeed/dtail-usage-examples/dtail.gif b/gemfeed/dtail-usage-examples/dtail.gif Binary files differnew file mode 100644 index 00000000..24a3eb35 --- /dev/null +++ b/gemfeed/dtail-usage-examples/dtail.gif diff --git a/gemfeed/dtail-usage-examples/testing.gif b/gemfeed/dtail-usage-examples/testing.gif Binary files differnew file mode 100644 index 00000000..696921d2 --- /dev/null +++ b/gemfeed/dtail-usage-examples/testing.gif diff --git a/gemfeed/index.md b/gemfeed/index.md index 86b2d103..f8de7575 100644 --- a/gemfeed/index.md +++ b/gemfeed/index.md @@ -2,6 +2,7 @@ ## To be in the .zone! +[2023-09-25 - DTail usage examples](./2023-09-25-dtail-usage-examples.md) [2023-08-20 - Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect](./2023-08-20-site-reliability-engineering-part-3.md) [2023-08-19 - Site Reliability Engineering - Part 2: Operational Balance in SRE](./2023-08-19-site-reliability-engineering-part-2.md) [2023-08-18 - Site Reliability Engineering - Part 1: SRE and Organizational Culture](./2023-08-18-site-reliability-engineering-part-1.md) |
