summaryrefslogtreecommitdiff
path: root/gemfeed/DRAFT-site-reliability-engineering.html
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2023-08-18 23:12:28 +0300
committerPaul Buetow <paul@buetow.org>2023-08-18 23:12:28 +0300
commit5b1c86ab678c5281cd6f152ec8096446560a29ec (patch)
tree5e5b1bf21587d47930cf97af94573883cc515302 /gemfeed/DRAFT-site-reliability-engineering.html
parentb971ef4988ae9c87cff55765c64616420676fb1c (diff)
Update content for html
Diffstat (limited to 'gemfeed/DRAFT-site-reliability-engineering.html')
-rw-r--r--gemfeed/DRAFT-site-reliability-engineering.html16
1 files changed, 0 insertions, 16 deletions
diff --git a/gemfeed/DRAFT-site-reliability-engineering.html b/gemfeed/DRAFT-site-reliability-engineering.html
index 873a5b76..c8fba6ad 100644
--- a/gemfeed/DRAFT-site-reliability-engineering.html
+++ b/gemfeed/DRAFT-site-reliability-engineering.html
@@ -8,22 +8,6 @@
<link rel="stylesheet" href="style-override.css" />
</head>
<body>
-<h2 style='display: inline'>Operational Balance in SRE: Finding the Equilibrium in Reliability and Velocity</h2><br />
-<br />
-<span>Site Reliability Engineering has established itself as more than just a set of best practices or methodologies. Instead, it stands as a beacon of operational excellence, which guides engineering teams through the turbulent waters of modern software development and system management.</span><br />
-<br />
-<span>In the universe of software production, two fundamental forces are often at odds: the drive for rapid feature release (velocity) and the need for system reliability. Traditionally, the faster teams moved, the more risk was introduced into systems. SRE offers a profound approach to reconciling these conflicting drives through concepts like error budgets and SLIs/SLOs. These mechanisms provide a tangible metric, allowing teams to quantify how much they can push changes while ensuring they don&#39;t compromise system health. Thus, the error budget becomes a balancing act, where teams weigh the trade-offs between innovation and reliability.</span><br />
-<br />
-<span>A quintessential component of this balance is the dichotomy between operations and coding. According to SRE principles, an engineer should ideally spend an equal amount of time on operations work and coding—50% on each. This isn&#39;t just a random metric; it&#39;s a reflection of the value SRE places on both maintaining operational excellence and progressing forward with innovations. This balance ensures that while SREs are solving today&#39;s problems, they are also preparing for tomorrow&#39;s challenges. </span><br />
-<br />
-<span>However, not all operational tasks are equal. SRE differentiates between &#39;ops work&#39; and &#39;toil&#39;. While ops work is integral to system maintenance and can provide value, toil represents repetitive, mundane tasks which offer little value in the long run. Recognising and minimising toil is crucial. A culture that allows engineers to drown in toil stifles innovation and growth. Hence, an organisation&#39;s approach to toil indicates its operational health and commitment to balance.</span><br />
-<br />
-<span>A cornerstone of achieving operational balance lies in the tools and processes SREs use. Effective monitoring, observability tools, and ensuring that tools can handle high cardinality data are foundational. These aren&#39;t just technical requisites but reflective of an organisational culture prioritising proactive problem-solving. By having systems that effectively flag potential issues before they escalate, SREs can maintain the delicate balance between system stability and forward momentum.</span><br />
-<br />
-<span>Moreover, operational balance isn&#39;t just a technological or process challenge; it&#39;s a human one. The health of on-call engineers is as crucial as the health of the services they manage. On-call postmortems, continuous feedback loops, and recognising gaps (be it tooling, operational expertise, or resources) ensure that the human elements of operations are noticed. </span><br />
-<br />
-<span>In conclusion, operational balance in SRE is not a static goalpost but an ongoing journey. It requires organisations to constantly evaluate their practices, tools, and, most importantly, their culture. By achieving this balance, organisations can ensure that they are poised for innovation while maintaining the robustness and reliability of their systems, resulting in sustainable long-term success.</span><br />
-<br />
<h2 style='display: inline'>On-Call Culture and the Human Aspect: Prioritising Well-being in the Realm of Reliability</h2><br />
<br />
<span>Site Reliability Engineering is synonymous with ensuring system reliability, but the human factor is an often-underestimated component of this discipline. It is evident that fostering a healthy on-call culture is as critical as any technical solution. In the world of constant alerts, pages, and incident management, the well-being of the engineers becomes paramount.</span><br />