summaryrefslogtreecommitdiff
path: root/gemfeed
diff options
context:
space:
mode:
Diffstat (limited to 'gemfeed')
-rw-r--r--gemfeed/2025-06-22-task-samurai.gmi28
-rw-r--r--gemfeed/2025-06-22-task-samurai.gmi.tpl26
-rw-r--r--gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-6.gmi443
-rw-r--r--gemfeed/atom.xml30
4 files changed, 239 insertions, 288 deletions
diff --git a/gemfeed/2025-06-22-task-samurai.gmi b/gemfeed/2025-06-22-task-samurai.gmi
index 8e353242..503f0106 100644
--- a/gemfeed/2025-06-22-task-samurai.gmi
+++ b/gemfeed/2025-06-22-task-samurai.gmi
@@ -13,7 +13,7 @@
* ⇢ ⇢ Where and how to get it
* ⇢ ⇢ Lessons learned from building Task Samurai with agentic coding
* ⇢ ⇢ ⇢ Developer workflow
-* ⇢ ⇢ ⇢ How it went down
+* ⇢ ⇢ ⇢ How it went
* ⇢ ⇢ ⇢ What went wrong
* ⇢ ⇢ ⇢ Patterns that helped
* ⇢ ⇢ ⇢ What I learned using agentic coding
@@ -30,6 +30,7 @@ Task Samurai is a fast terminal interface for Taskwarrior written in Go using th
### Why does this exist?
I wanted to tinker with agentic coding. This project was implemented entirely using OpenAI Codex. (After this blog post was published, I also used the Claude Code CLI.)
+
* I wanted a faster UI for Taskwarrior than other options, like Vit, which is Python-based.
* I wanted something built with Bubble Tea, but I never had time to dive deep into it.
* I wanted to build a toy project (like Task Samurai) first, before tackling the big ones, to get started with agentic coding.
@@ -56,17 +57,19 @@ And follow the `README.md`!
### Developer workflow
-I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am trying out currently) credits (it still happens!), but Codex was still available to me. So, I seized the opportunity to push agentic coding a bit more using another platform.
+I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am currently trying out) credits (it still happens!), but Codex was still available to me. So, I took the opportunity to push agentic coding a bit further with another platform.
I didn't really love the web UI you have to use for Codex, as I usually live in the terminal. But this is all I have for Codex for now, and I thought I'd give it a try regardless. The web UI is simple and pretty straightforward. There's also a Codex CLI one could use directly in the terminal, but I didn't get it working. I will try again soon.
+> Update: Codex CLI now works for me, after OpenAI released a new version!
+
For every task given to Codex, it spins up its own container. From there, you can drill down and watch what it is doing. At the end, the result (in the form of a code diff) will be presented. From there, you can make suggestions about what else to change in the codebase. What I found inconvenient is that for every additional change, there's an overhead because Codex has to spin up a container and bootstrap the entire development environment again, which adds extra delay. That could be eliminated by setting up predefined custom containers, but that feature still seems somewhat limited.
-Once satisfied, you can ask Codex to create a GitHub PR; from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
+Once satisfied, you can ask Codex to create a GitHub PR (too bad only GitHub is supported and no other Git hosters); from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
-### How it went down
+### How it went
-Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits. Here are the broad strokes:
+Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits:
* June 19: Scaffolded the Go boilerplate, set up tests, integrated the Bubble Tea UI framework, and got the first table views showing up.
* June 20: (The big one—120 commits!) Added hotkeys, colourized tasks, annotation support, undo/redo, and, for fun, fireworks on quit (which never worked and got removed at a later point). This is where most of the bugs, merges, and fast-paced changes happen.
@@ -79,7 +82,7 @@ It's worth noting that I worked on it in the evenings when I had some free time,
### What went wrong
-Going agentic isn't all smooth sailing. Here are the hiccups I ran into, plus a few hard-earned lessons:
+Going agentic isn't all smooth. Here are the hiccups I ran into, plus a few lessons:
* Merge Floods: Every minor feature or fix existed on its branch, so merging was a constant process. It kept progress flowing but also drowned the committed history in noise and the occasional conflict. I found this to be an issue with OpenAI's Codex in particular. Not so much with other agentic coding tools like Claude Code CLI (not covered in this blog post.)
* Fixes on fixes: Features like "fireworks on exit" had chains of "fix exit," "fix cell selection," etc. Sometimes, new additions introduced bugs that needed rapid patching.
@@ -92,29 +95,28 @@ Despite the chaos, a few strategies kept things moving:
* Tiny PRs: Small, atomic merges meant feedback came fast (and so did fixes).
* Tests Matter: A solid base of unit tests for task manipulations kept things from breaking entirely when experimenting.
* Live Documentation: Documentation, such as the README, is updated regularly to reflect all the hotkey and feature changes.
+
Maybe a better approach would have been to design the whole application from scratch before letting Codix do any of the coding. I will try that with my next toy project.
### What I learned using agentic coding
-Stepping into agentic coding with Codex as my "pair programmer" was a genuine shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at breakneck speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).
-
-Discussing requirements with Codex forced me to clarify features and spot logical pitfalls earlier. All those fast iterations meant I was constantly coaxing more helpful, less ambiguous code out of the model—making me rethink how to break features into clear, testable steps.
+Stepping into agentic coding with Codex as my "pair programmer" was a big shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at high speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).
### how much time did I save?
-Did it buy me speed? Let's do some back-of-the-envelope math:
+Did it buy me speed?
* Say each commit takes Codex 5 minutes to generate, and you need to review/guide 179 commits = about _6 hours of active development_.
* If you coded it all yourself, including all the bug fixes, features, design, and documentation, you might spend _10–20 hours_.
-* That's a couple of days potential savings.
+* That's a couple of days of potential savings—and I am by no means an expert in agentic coding, since this was my first completed agentic coding project.
## Conclusion
-Building Task Samurai with agentic coding was a wild ride—rapid feature growth, plenty of churns, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the raw power (and some hazards) of agentic development.
+Building Task Samurai with agentic coding was a wild ride—rapid feature growth, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the power of agentic development.
Am I an agentic coding expert now? I don't think so. There are still many things to learn, and the landscape is constantly evolving.
-While working on Task Samurai, there were times I genuinely missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
+While working on Task Samurai, there were times I missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
Personally, I am not sure whether I like where the industry is going with agentic coding. I love "traditional" coding, and with agentic coding you operate at a higher level and don't interact directly with code as often, which I would miss. I think that in the future, designing, reviewing, and being able to read and understand code will be more important than writing code by hand.
diff --git a/gemfeed/2025-06-22-task-samurai.gmi.tpl b/gemfeed/2025-06-22-task-samurai.gmi.tpl
index 59ccd54f..6b94be4d 100644
--- a/gemfeed/2025-06-22-task-samurai.gmi.tpl
+++ b/gemfeed/2025-06-22-task-samurai.gmi.tpl
@@ -16,6 +16,7 @@ Task Samurai is a fast terminal interface for Taskwarrior written in Go using th
### Why does this exist?
I wanted to tinker with agentic coding. This project was implemented entirely using OpenAI Codex. (After this blog post was published, I also used the Claude Code CLI.)
+
* I wanted a faster UI for Taskwarrior than other options, like Vit, which is Python-based.
* I wanted something built with Bubble Tea, but I never had time to dive deep into it.
* I wanted to build a toy project (like Task Samurai) first, before tackling the big ones, to get started with agentic coding.
@@ -42,17 +43,19 @@ And follow the `README.md`!
### Developer workflow
-I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am trying out currently) credits (it still happens!), but Codex was still available to me. So, I seized the opportunity to push agentic coding a bit more using another platform.
+I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am currently trying out) credits (it still happens!), but Codex was still available to me. So, I took the opportunity to push agentic coding a bit further with another platform.
I didn't really love the web UI you have to use for Codex, as I usually live in the terminal. But this is all I have for Codex for now, and I thought I'd give it a try regardless. The web UI is simple and pretty straightforward. There's also a Codex CLI one could use directly in the terminal, but I didn't get it working. I will try again soon.
+> Update: Codex CLI now works for me, after OpenAI released a new version!
+
For every task given to Codex, it spins up its own container. From there, you can drill down and watch what it is doing. At the end, the result (in the form of a code diff) will be presented. From there, you can make suggestions about what else to change in the codebase. What I found inconvenient is that for every additional change, there's an overhead because Codex has to spin up a container and bootstrap the entire development environment again, which adds extra delay. That could be eliminated by setting up predefined custom containers, but that feature still seems somewhat limited.
-Once satisfied, you can ask Codex to create a GitHub PR; from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
+Once satisfied, you can ask Codex to create a GitHub PR (too bad only GitHub is supported and no other Git hosters); from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
-### How it went down
+### How it went
-Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits. Here are the broad strokes:
+Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits:
* June 19: Scaffolded the Go boilerplate, set up tests, integrated the Bubble Tea UI framework, and got the first table views showing up.
* June 20: (The big one—120 commits!) Added hotkeys, colourized tasks, annotation support, undo/redo, and, for fun, fireworks on quit (which never worked and got removed at a later point). This is where most of the bugs, merges, and fast-paced changes happen.
@@ -65,7 +68,7 @@ It's worth noting that I worked on it in the evenings when I had some free time,
### What went wrong
-Going agentic isn't all smooth sailing. Here are the hiccups I ran into, plus a few hard-earned lessons:
+Going agentic isn't all smooth. Here are the hiccups I ran into, plus a few lessons:
* Merge Floods: Every minor feature or fix existed on its branch, so merging was a constant process. It kept progress flowing but also drowned the committed history in noise and the occasional conflict. I found this to be an issue with OpenAI's Codex in particular. Not so much with other agentic coding tools like Claude Code CLI (not covered in this blog post.)
* Fixes on fixes: Features like "fireworks on exit" had chains of "fix exit," "fix cell selection," etc. Sometimes, new additions introduced bugs that needed rapid patching.
@@ -78,29 +81,28 @@ Despite the chaos, a few strategies kept things moving:
* Tiny PRs: Small, atomic merges meant feedback came fast (and so did fixes).
* Tests Matter: A solid base of unit tests for task manipulations kept things from breaking entirely when experimenting.
* Live Documentation: Documentation, such as the README, is updated regularly to reflect all the hotkey and feature changes.
+
Maybe a better approach would have been to design the whole application from scratch before letting Codix do any of the coding. I will try that with my next toy project.
### What I learned using agentic coding
-Stepping into agentic coding with Codex as my "pair programmer" was a genuine shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at breakneck speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).
-
-Discussing requirements with Codex forced me to clarify features and spot logical pitfalls earlier. All those fast iterations meant I was constantly coaxing more helpful, less ambiguous code out of the model—making me rethink how to break features into clear, testable steps.
+Stepping into agentic coding with Codex as my "pair programmer" was a big shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at high speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).
### how much time did I save?
-Did it buy me speed? Let's do some back-of-the-envelope math:
+Did it buy me speed?
* Say each commit takes Codex 5 minutes to generate, and you need to review/guide 179 commits = about _6 hours of active development_.
* If you coded it all yourself, including all the bug fixes, features, design, and documentation, you might spend _10–20 hours_.
-* That's a couple of days potential savings.
+* That's a couple of days of potential savings—and I am by no means an expert in agentic coding, since this was my first completed agentic coding project.
## Conclusion
-Building Task Samurai with agentic coding was a wild ride—rapid feature growth, plenty of churns, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the raw power (and some hazards) of agentic development.
+Building Task Samurai with agentic coding was a wild ride—rapid feature growth, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the power of agentic development.
Am I an agentic coding expert now? I don't think so. There are still many things to learn, and the landscape is constantly evolving.
-While working on Task Samurai, there were times I genuinely missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
+While working on Task Samurai, there were times I missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
Personally, I am not sure whether I like where the industry is going with agentic coding. I love "traditional" coding, and with agentic coding you operate at a higher level and don't interact directly with code as often, which I would miss. I think that in the future, designing, reviewing, and being able to read and understand code will be more important than writing code by hand.
diff --git a/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-6.gmi b/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-6.gmi
index f0ed0bf5..df1d7b40 100644
--- a/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-6.gmi
+++ b/gemfeed/DRAFT-f3s-kubernetes-with-freebsd-part-6.gmi
@@ -22,23 +22,17 @@ This is the sixth blog post about the f3s series for self-hosting demands in a h
* ⇢ ⇢ ⇢ Generating encryption keys
* ⇢ ⇢ ⇢ Configuring `zdata` ZFS pool encryption
* ⇢ ⇢ ⇢ Migrating Bhyve VMs to encrypted `bhyve` ZFS volume
-* ⇢ ⇢ CARP (Common Address Redundancy Protocol)
-* ⇢ ⇢ ⇢ How CARP Works
-* ⇢ ⇢ ⇢ Configuring CARP
-* ⇢ ⇢ ⇢ CARP State Change Notifications
* ⇢ ⇢ ZFS Replication with zrepl
* ⇢ ⇢ ⇢ Understanding Replication Requirements
-* ⇢ ⇢ ⇢ Why zrepl instead of HAST?
+* ⇢ ⇢ ⇢ Why `zrepl` instead of HAST?
* ⇢ ⇢ ⇢ Installing zrepl
* ⇢ ⇢ ⇢ Checking ZFS pools
-* ⇢ ⇢ ⇢ Configuring zrepl with WireGuard tunnel
-* ⇢ ⇢ ⇢ Configuring zrepl on f0 (source)
-* ⇢ ⇢ ⇢ Configuring zrepl on f1 (sink)
-* ⇢ ⇢ ⇢ Enabling and starting zrepl services
+* ⇢ ⇢ ⇢ Configuring `zrepl` with WireGuard tunnel
+* ⇢ ⇢ ⇢ Configuring `zrepl` on f0 (source)
+* ⇢ ⇢ ⇢ Configuring `zrepl` on `f1` (sink)
+* ⇢ ⇢ ⇢ Enabling and starting `zrepl` services
* ⇢ ⇢ ⇢ Verifying replication
* ⇢ ⇢ ⇢ Monitoring replication
-* ⇢ ⇢ ⇢ A note about the Bhyve VM replication
-* ⇢ ⇢ ⇢ Quick status check commands
* ⇢ ⇢ ⇢ Verifying replication after reboot
* ⇢ ⇢ ⇢ Understanding Failover Limitations and Design Decisions
* ⇢ ⇢ ⇢# Why Manual Failover?
@@ -50,6 +44,10 @@ This is the sixth blog post about the f3s series for self-hosting demands in a h
* ⇢ ⇢ ⇢ Configuring automatic key loading on boot
* ⇢ ⇢ ⇢ Troubleshooting: Replication broken due to modified destination
* ⇢ ⇢ ⇢ Forcing a full resync
+* ⇢ ⇢ CARP (Common Address Redundancy Protocol)
+* ⇢ ⇢ ⇢ How CARP Works
+* ⇢ ⇢ ⇢ Configuring CARP
+* ⇢ ⇢ ⇢ CARP State Change Notifications
* ⇢ ⇢ Future Storage Explorations
* ⇢ ⇢ ⇢ MinIO for S3-Compatible Object Storage
* ⇢ ⇢ ⇢ MooseFS for Distributed High Availability
@@ -183,11 +181,7 @@ Using USB flash drives as hardware key storage provides an elegant solution. The
### UFS on USB keys
-
-We'll format the USB drives with UFS (Unix File System) rather than ZFS for several reasons:
-
-* Simplicity: UFS has less overhead for small, removable media
-* Reliability: No ZFS pool import/export issues with removable devices
+We'll format the USB drives with UFS (Unix File System) rather than ZFS for simplicity. There is no need to use ZFS.
Let's see the USB keys:
@@ -348,107 +342,9 @@ zroot/bhyve/rocky encryptionroot zroot/bhyve -
zroot/bhyve/rocky keystatus available -
```
-## CARP (Common Address Redundancy Protocol)
-
-High availability is crucial for storage systems. If the NFS server goes down, all pods lose access to their persistent data. CARP provides a solution by creating a virtual IP address that automatically moves between servers during failures.
-
-### How CARP Works
-
-CARP allows multiple hosts to share a virtual IP address (VIP). The hosts communicate using multicast to elect a MASTER, while others remain as BACKUP. When the MASTER fails, a BACKUP automatically promotes itself, and the VIP moves to the new MASTER. This happens within seconds, minimizing downtime.
-
-Key benefits for our storage system:
-* Automatic failover: No manual intervention required for basic failures
-* Transparent to clients: Pods continue using the same IP address
-* Works with stunnel: The VIP ensures encrypted connections follow the active server
-* Simple configuration: Just a single line in rc.conf
-
-### Configuring CARP
-
-First, add the CARP configuration to `/etc/rc.conf` on both f0 and f1:
-
-```sh
-# The virtual IP 192.168.1.138 will float between f0 and f1
-ifconfig_re0_alias0="inet vhid 1 pass testpass alias 192.168.1.138/32"
-```
-
-Parameters explained:
-* `vhid 1`: Virtual Host ID - must match on all CARP members
-* `pass testpass`: Password for CARP authentication (use a stronger password in production)
-* `alias 192.168.1.138/32`: The virtual IP address with a /32 netmask
-
-Next, update `/etc/hosts` on all nodes (n0, n1, n2, r0, r1, r2) to resolve the VIP hostname:
-
-```
-192.168.1.138 f3s-storage-ha f3s-storage-ha.lan f3s-storage-ha.lan.buetow.org
-192.168.2.138 f3s-storage-ha f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
-```
-
-This allows clients to connect to `f3s-storage-ha` regardless of which physical server is currently the MASTER.
-
-### CARP State Change Notifications
-
-To properly manage services during failover, we need to detect CARP state changes. FreeBSD's devd system can notify us when CARP transitions between MASTER and BACKUP states.
-
-Add this to `/etc/devd.conf` on both f0 and f1:
-
-paul@f0:~ % cat <<END | doas tee -a /etc/devd.conf
-notify 0 {
- match "system" "CARP";
- match "subsystem" "[0-9]+@[0-9a-z.]+";
- match "type" "(MASTER|BACKUP)";
- action "/usr/local/bin/carpcontrol.sh $subsystem $type";
-};
-END
-
-Next, create the CARP control script that will restart stunnel when CARP state changes:
-
-```sh
-paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
-#!/bin/sh
-# CARP state change handler for storage failover
-
-subsystem=$1
-state=$2
-
-logger "CARP state change: $subsystem is now $state"
-
-case "$state" in
- MASTER)
- # Restart stunnel to bind to the VIP
- service stunnel restart
- logger "Restarted stunnel for MASTER state"
- ;;
- BACKUP)
- # Stop stunnel since we can't bind to VIP as BACKUP
- service stunnel stop
- logger "Stopped stunnel for BACKUP state"
- ;;
-esac
-EOF
-
-paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
-
-# Copy the same script to f1
-paul@f0:~ % scp /usr/local/bin/carpcontrol.sh f1:/tmp/
-paul@f1:~ % doas mv /tmp/carpcontrol.sh /usr/local/bin/
-paul@f1:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
-```
-
-Enable CARP in /boot/loader.conf:
-
-```sh
-paul@f0:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
-carp_load="YES"
-paul@f1:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
-carp_load="YES"
-```
-
-Then reboot both hosts or run `doas kldload carp` to load the module immediately.
-
-
## ZFS Replication with zrepl
-Data replication is the cornerstone of high availability. While CARP handles IP failover, we need continuous data replication to ensure the backup server has current data when it becomes active. Without replication, failover would result in data loss or require shared storage (like iSCSI), which introduces a single point of failure.
+Data replication is the cornerstone of high availability. While CARP handles IP failover (see later in this post), we need continuous data replication to ensure the backup server has current data when it becomes active. Without replication, failover would result in data loss or require shared storage (like iSCSI), which introduces a single point of failure.
### Understanding Replication Requirements
@@ -459,32 +355,23 @@ Our storage system has different replication needs:
The replication frequency determines your Recovery Point Objective (RPO) - the maximum acceptable data loss. With 1-minute replication, you lose at most 1 minute of changes during an unplanned failover.
-### Why zrepl instead of HAST?
+### Why `zrepl` instead of HAST?
-While HAST (Highly Available Storage) is FreeBSD's native solution for high-availability storage, I've chosen zrepl for several important reasons:
+While HAST (Highly Available Storage) is FreeBSD's native solution for high-availability storage, I've chosen `zrepl` for several important reasons:
-1. HAST can cause ZFS corruption: HAST operates at the block level and doesn't understand ZFS's transactional semantics. During failover, in-flight transactions can lead to corrupted zpools. I've experienced this firsthand - the automatic failover would trigger while ZFS was still writing, resulting in an unmountable pool.
+* HAST can cause ZFS corruption: HAST operates at the block level and doesn't understand ZFS's transactional semantics. During failover, in-flight transactions can lead to corrupted zpools. I've experienced this firsthand - the automatic failover would trigger while ZFS was still writing, resulting in an unmountable pool.
+* ZFS-aware replication: `zrepl` understands ZFS datasets and snapshots. It replicates at the dataset level, ensuring each snapshot is a consistent point-in-time copy. This is fundamentally safer than block-level replication.
+* Snapshot history: With zrepl, you get multiple recovery points (every minute for NFS data in our setup). If corruption occurs, you can roll back to any previous snapshot. HAST only gives you the current state.
+* Easier recovery: When something goes wrong with zrepl, you still have intact snapshots on both sides. With HAST, a corrupted primary often means a corrupted secondary too.
-2. ZFS-aware replication: zrepl understands ZFS datasets and snapshots. It replicates at the dataset level, ensuring each snapshot is a consistent point-in-time copy. This is fundamentally safer than block-level replication.
-
-3. Snapshot history: With zrepl, you get multiple recovery points (every minute for NFS data in our setup). If corruption occurs, you can roll back to any previous snapshot. HAST only gives you the current state.
-
-4. Easier recovery: When something goes wrong with zrepl, you still have intact snapshots on both sides. With HAST, a corrupted primary often means a corrupted secondary too.
-
-5. Network flexibility: zrepl works over any TCP connection (in our case, WireGuard), while HAST requires dedicated network configuration.
-
-The 5-minute replication window is perfectly acceptable for my personal use cases. This isn't a high-frequency trading system or a real-time database - it's storage for personal projects, development work, and home lab experiments. Losing at most 5 minutes of work in a disaster scenario is a reasonable trade-off for the reliability and simplicity of snapshot-based replication.
+The 1-minute replication window is perfectly acceptable for my personal use cases. This isn't a high-frequency trading system or a real-time database—it's storage for personal projects, development work, and home lab experiments. Losing at most 1 minute of work in a disaster scenario is a reasonable trade-off for the reliability and simplicity of snapshot-based replication. Also, in the case of "1 minute of data loss," I would very likely still have the data available on the client side.
### Installing zrepl
-First, install zrepl on both hosts:
+First, install `zrepl` on both hosts involved (we will replicate data from `f0` to `f1`):
-```
-# On f0
+```sh
paul@f0:~ % doas pkg install -y zrepl
-
-# On f1
-paul@f1:~ % doas pkg install -y zrepl
```
### Checking ZFS pools
@@ -513,7 +400,7 @@ NAME USED AVAIL REFER MOUNTPOINT
zdata/enc 200K 899G 200K /data/enc
```
-### Configuring zrepl with WireGuard tunnel
+### Configuring `zrepl` with WireGuard tunnel
Since we have a WireGuard tunnel between f0 and f1, we'll use TCP transport over the secure tunnel instead of SSH. First, check the WireGuard IP addresses:
@@ -526,7 +413,7 @@ paul@f1:~ % ifconfig wg0 | grep inet
inet 192.168.2.131 netmask 0xffffff00
```
-### Configuring zrepl on f0 (source)
+### Configuring `zrepl` on f0 (source)
First, create a dedicated dataset for NFS data that will be replicated:
@@ -535,7 +422,7 @@ First, create a dedicated dataset for NFS data that will be replicated:
paul@f0:~ % doas zfs create zdata/enc/nfsdata
```
-Create the zrepl configuration on f0:
+Create the `zrepl` configuration on f0:
```sh
paul@f0:~ % doas tee /usr/local/etc/zrepl/zrepl.yml <<'EOF'
@@ -554,7 +441,7 @@ jobs:
filesystems:
"zdata/enc/nfsdata": true
send:
- encrypted: false
+ encrypted: true
snapshotting:
type: periodic
prefix: zrepl_
@@ -575,7 +462,7 @@ jobs:
filesystems:
"zroot/bhyve/fedora": true
send:
- encrypted: false
+ encrypted: true
snapshotting:
type: periodic
prefix: zrepl_
@@ -590,16 +477,21 @@ jobs:
EOF
```
-Key configuration notes:
-* We're using two separate replication jobs with different intervals:
- - `f0_to_f1_nfsdata`: Replicates NFS data every minute for faster failover recovery
- - `f0_to_f1_fedora`: Replicates Fedora VM every 10 minutes (less critical for NFS operations)
+ We're using two separate replication jobs with different intervals:
+
+* `f0_to_f1_nfsdata`: Replicates NFS data every minute for faster failover recovery
+* `f0_to_f1_fedora`: Replicates Fedora VM every 10 minutes (less critical for NFS operations)
+
+The Fedora is only used for development purposes, so it doesn't require as frequent replication as the NFS data. It's off-topic to this blog series, but it showcases, hows zrepl's flexibility in handling different datasets with varying replication needs.
+
+Furthermore:
+
* We're specifically replicating `zdata/enc/nfsdata` instead of the entire `zdata/enc` dataset. This dedicated dataset will contain all the data we later want to expose via NFS, keeping a clear separation between replicated NFS data and other local encrypted data.
* The `send: encrypted: false` option disables ZFS native encryption for the replication stream. Since we're using a WireGuard tunnel between f0 and f1, the data is already encrypted in transit. Disabling ZFS stream encryption reduces CPU overhead and improves replication performance.
-### Configuring zrepl on f1 (sink)
+### Configuring `zrepl` on `f1` (sink)
-Create the zrepl configuration on f1:
+On `f1` we configure `zrepl` to receive the data as follows:
```sh
# First create a dedicated sink dataset
@@ -613,7 +505,7 @@ global:
format: human
jobs:
- - name: "sink"
+ - name: sink
type: sink
serve:
type: tcp
@@ -627,41 +519,41 @@ jobs:
EOF
```
-### Enabling and starting zrepl services
+### Enabling and starting `zrepl` services
-Enable and start zrepl on both hosts:
+Enable and start `zrepl` on both hosts:
```sh
# On f0
paul@f0:~ % doas sysrc zrepl_enable=YES
zrepl_enable: -> YES
-paul@f0:~ % doas service zrepl start
+paul@f0:~ % doas service `zrepl` start
Starting zrepl.
# On f1
paul@f1:~ % doas sysrc zrepl_enable=YES
zrepl_enable: -> YES
-paul@f1:~ % doas service zrepl start
+paul@f1:~ % doas service `zrepl` start
Starting zrepl.
```
### Verifying replication
-Check the replication status:
+To check the replication status, we run:
```sh
-# On f0, check zrepl status (use raw mode for non-tty)
-paul@f0:~ % doas zrepl status --mode raw | grep -A2 "Replication"
+# On f0, check `zrepl` status (use raw mode for non-tty)
+paul@f0:~ % doas `zrepl` status --mode raw | grep -A2 "Replication"
"Replication":{"StartAt":"2025-07-01T22:31:48.712143123+03:00"...
# Check if services are running
-paul@f0:~ % doas service zrepl status
+paul@f0:~ % doas service `zrepl` status
zrepl is running as pid 2649.
-paul@f1:~ % doas service zrepl status
+paul@f1:~ % doas service `zrepl` status
zrepl is running as pid 2574.
-# Check for zrepl snapshots on source
+# Check for `zrepl` snapshots on source
paul@f0:~ % doas zfs list -t snapshot -r zdata/enc | grep zrepl
zdata/enc@zrepl_20250701_193148_000 0B - 176K -
@@ -683,91 +575,37 @@ You can monitor the replication progress with:
```sh
# Real-time status
-paul@f0:~ % doas zrepl status --mode interactive
+paul@f0:~ % doas `zrepl` status --mode interactive
# Check specific job details
-paul@f0:~ % doas zrepl status --job f0_to_f1
+paul@f0:~ % doas `zrepl` status --job f0_to_f1
```
-With this setup, both `zdata/enc/nfsdata` and `zroot/bhyve/fedora` on f0 will be automatically replicated to f1 every 5 minutes, with encrypted snapshots preserved on both sides. The pruning policy ensures that we keep the last 10 snapshots while managing disk space efficiently.
+With this setup, both `zdata/enc/nfsdata` and `zroot/bhyve/fedora` on f0 will be automatically replicated to f1 every 1 (or 10 in case of the Fedora VM) minutes, with encrypted snapshots preserved on both sides. The pruning policy ensures that we keep the last 10 snapshots while managing disk space efficiently.
The replicated data appears on f1 under `zdata/sink/` with the source host and dataset hierarchy preserved:
* `zdata/enc/nfsdata` → `zdata/sink/f0/zdata/enc/nfsdata`
* `zroot/bhyve/fedora` → `zdata/sink/f0/zroot/bhyve/fedora`
-This is by design - zrepl preserves the complete path from the source to ensure there are no conflicts when replicating from multiple sources. The replication uses the WireGuard tunnel for secure, encrypted transport between nodes.
-
-### A note about the Bhyve VM replication
-
-While replicating a Bhyve VM (Fedora in this case) is slightly off-topic for the f3s series, I've included it here as it demonstrates zrepl's flexibility. This is a development VM I use occasionally to log in remotely for certain development tasks. Having it replicated ensures I have a backup copy available on f1 if needed.
-
-### Quick status check commands
-
-Here are the essential commands to monitor replication status:
-
-```sh
-# On the source node (f0) - check if replication is active
-paul@f0:~ % doas zrepl status --job f0_to_f1 | grep -E '(State|Last)'
-State: done
-LastError:
-
-# List all zrepl snapshots on source
-paul@f0:~ % doas zfs list -t snapshot | grep zrepl
-zdata/enc/nfsdata@zrepl_20250701_202530_000 0B - 200K -
-zroot/bhyve/fedora@zrepl_20250701_202530_000 0B - 2.97G -
-
-# On the sink node (f1) - verify received datasets
-paul@f1:~ % doas zfs list -r zdata/sink
-NAME USED AVAIL REFER MOUNTPOINT
-zdata/sink 3.0G 896G 200K /data/sink
-zdata/sink/f0 3.0G 896G 200K none
-zdata/sink/f0/zdata 472K 896G 200K none
-zdata/sink/f0/zdata/enc 272K 896G 200K none
-zdata/sink/f0/zdata/enc/nfsdata 176K 896G 176K none
-zdata/sink/f0/zroot 2.9G 896G 200K none
-zdata/sink/f0/zroot/bhyve 2.9G 896G 200K none
-zdata/sink/f0/zroot/bhyve/fedora 2.9G 896G 2.97G none
-
-# Check received snapshots on sink
-paul@f1:~ % doas zfs list -t snapshot -r zdata/sink | grep zrepl | wc -l
- 3
-
-# Monitor replication progress in real-time (on source)
-paul@f0:~ % doas zrepl status --mode interactive
-
-# Check last replication time (on source)
-paul@f0:~ % doas zrepl status --job f0_to_f1 | grep -A1 "Replication"
-Replication:
- Status: Idle (last run: 2025-07-01T22:41:48)
-
-# View zrepl logs for troubleshooting
-paul@f0:~ % doas tail -20 /var/log/zrepl.log | grep -E '(error|warn|replication)'
-```
-
-These commands provide a quick way to verify that:
-
-* Replication jobs are running without errors
-* Snapshots are being created on the source
-* Data is being received on the sink
-* The replication schedule is being followed
+This is by design - `zrepl` preserves the complete path from the source to ensure there are no conflicts when replicating from multiple sources. The replication uses the WireGuard tunnel for secure, encrypted transport between nodes.
### Verifying replication after reboot
-The zrepl service is configured to start automatically at boot. After rebooting both hosts:
+The `zrepl` service is configured to start automatically at boot. After rebooting both hosts:
```sh
paul@f0:~ % uptime
11:17PM up 1 min, 0 users, load averages: 0.16, 0.06, 0.02
-paul@f0:~ % doas service zrepl status
+paul@f0:~ % doas service `zrepl` status
zrepl is running as pid 2366.
-paul@f1:~ % doas service zrepl status
+paul@f1:~ % doas service `zrepl` status
zrepl is running as pid 2309.
# Check that new snapshots are being created and replicated
-paul@f0:~ % doas zfs list -t snapshot | grep zrepl | tail -2
+paul@f0:~ % doas zfs list -t snapshot | grep `zrepl` | tail -2
zdata/enc/nfsdata@zrepl_20250701_202530_000 0B - 200K -
zroot/bhyve/fedora@zrepl_20250701_202530_000 0B - 2.97G -
@@ -780,6 +618,8 @@ The timestamps confirm that replication resumed automatically after the reboot,
### Understanding Failover Limitations and Design Decisions
+
+
#### Why Manual Failover?
This storage system intentionally uses manual failover rather than automatic failover. This might seem counterintuitive for a "high availability" system, but it's a deliberate design choice based on real-world experience:
@@ -816,7 +656,7 @@ For true high-availability NFS, you might consider:
Note: While HAST+CARP is often suggested for HA storage, it can cause filesystem corruption in practice, especially with ZFS. The block-level replication of HAST doesn't understand ZFS's transactional model, leading to inconsistent states during failover.
-The current zrepl setup, despite requiring manual intervention, is actually safer because:
+The current `zrepl` setup, despite requiring manual intervention, is actually safer because:
* ZFS snapshots are always consistent
* Replication is ZFS-aware (not just block-level)
@@ -912,12 +752,12 @@ paul@f0:~ % doas zfs destroy zdata/enc/nfsdata@failback
paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
paul@f1:~ % doas zfs destroy zdata/sink/f0/zdata/enc/nfsdata@failback
-# Stop zrepl services first - CRITICAL!
-paul@f0:~ % doas service zrepl stop
-paul@f1:~ % doas service zrepl stop
+# Stop `zrepl` services first - CRITICAL!
+paul@f0:~ % doas service `zrepl` stop
+paul@f1:~ % doas service `zrepl` stop
-# Clean up any zrepl snapshots on f0
-paul@f0:~ % doas zfs list -t snapshot -r zdata/enc/nfsdata | grep zrepl | \
+# Clean up any `zrepl` snapshots on f0
+paul@f0:~ % doas zfs list -t snapshot -r zdata/enc/nfsdata | grep `zrepl` | \
awk '{print $1}' | xargs -I {} doas zfs destroy {}
# Clean up and destroy the entire replicated structure on f1
@@ -953,19 +793,19 @@ paul@f1:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
zdata/sink/f0/zdata/enc/nfsdata
paul@f1:~ % doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
-# Now restart zrepl services
-paul@f0:~ % doas service zrepl start
-paul@f1:~ % doas service zrepl start
+# Now restart `zrepl` services
+paul@f0:~ % doas service `zrepl` start
+paul@f1:~ % doas service `zrepl` start
# Verify replication is working
-paul@f0:~ % doas zrepl status --job f0_to_f1
+paul@f0:~ % doas `zrepl` status --job f0_to_f1
```
Important notes about failback:
* The `-F` flag forces a rollback on f0, destroying any local changes
* Replication often won't resume automatically after a forced receive
-* You must clean up old zrepl snapshots on both sides
+* You must clean up old `zrepl` snapshots on both sides
* Creating a manual snapshot helps re-establish the replication relationship
* Always verify replication status after the failback procedure
* The first replication after failback will be a full send of the current state
@@ -976,7 +816,7 @@ Here's a real test of the failback procedure:
```sh
# Simulate failure: Stop replication on f0
-paul@f0:~ % doas service zrepl stop
+paul@f0:~ % doas service `zrepl` stop
# On f1: Take over by making the dataset writable
paul@f1:~ % doas zfs set readonly=off zdata/sink/f0/zdata/enc/nfsdata
@@ -1015,7 +855,7 @@ Success! The failover data from f1 is now on f0. To resume normal replication, y
1. Clean up old snapshots on both sides
2. Create a new manual baseline snapshot
-3. Restart zrepl services
+3. Restart `zrepl` services
Key learnings from the test:
@@ -1086,9 +926,9 @@ Important notes:
If you see the error "cannot receive incremental stream: destination has been modified since most recent snapshot", it means the read-only flag was accidentally removed on f1. To fix without a full resync:
```sh
-# Stop zrepl on both servers
-paul@f0:~ % doas service zrepl stop
-paul@f1:~ % doas service zrepl stop
+# Stop `zrepl` on both servers
+paul@f0:~ % doas service `zrepl` stop
+paul@f1:~ % doas service `zrepl` stop
# Find the last common snapshot
paul@f0:~ % doas zfs list -t snapshot -o name,creation zdata/enc/nfsdata
@@ -1101,8 +941,8 @@ paul@f1:~ % doas zfs rollback -r zdata/sink/f0/zdata/enc/nfsdata@zrepl_20250705_
paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
# Restart zrepl
-paul@f0:~ % doas service zrepl start
-paul@f1:~ % doas service zrepl start
+paul@f0:~ % doas service `zrepl` start
+paul@f1:~ % doas service `zrepl` start
```
### Forcing a full resync
@@ -1111,8 +951,8 @@ If replication gets out of sync and incremental updates fail:
```sh
# Stop services
-paul@f0:~ % doas service zrepl stop
-paul@f1:~ % doas service zrepl stop
+paul@f0:~ % doas service `zrepl` stop
+paul@f1:~ % doas service `zrepl` stop
# On f1: Release holds and destroy the dataset
paul@f1:~ % doas zfs holds -r zdata/sink/f0/zdata/enc/nfsdata | \
@@ -1137,17 +977,122 @@ paul@f1:~ % doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
# Clean up and restart
paul@f0:~ % doas zfs destroy zdata/enc/nfsdata@resync
paul@f1:~ % doas zfs destroy zdata/sink/f0/zdata/enc/nfsdata@resync
-paul@f0:~ % doas service zrepl start
-paul@f1:~ % doas service zrepl start
+paul@f0:~ % doas service `zrepl` start
+paul@f1:~ % doas service `zrepl` start
```
ZFS auto scrubbing....~?
Backup of the keys on the key locations (all keys on all 3 USB keys)
+## CARP (Common Address Redundancy Protocol)
+
+High availability is crucial for storage systems. If the storage server goes down, all pods lose access to their persistent data. CARP provides a solution by creating a virtual IP address that automatically moves between servers during failures.
+
+### How CARP Works
+
+CARP allows two hosts to share a virtual IP address (VIP). The hosts communicate using multicast to elect a MASTER, while the other remain as BACKUP. When the MASTER fails, a BACKUP automatically promotes itself, and the VIP moves to the new MASTER. This happens within seconds.
+
+Key benefits for our storage system:
+
+* Automatic failover: No manual intervention is required for basic failures, although there are a few limitations. The backup will only have read-only access to the available data, as we will learn later. However, we could manually promote it to read-write if needed.
+* Transparent to clients: Pods continue using the same IP address
+* Works with stunnel: Behind the VIP there will be a `stunnel` process running, which ensures encrypted connections follow the active server
+* Simple configuration
+
+### Configuring CARP
+
+First, add the CARP configuration to `/etc/rc.conf` on both f0 and f1:
+
+```sh
+# The virtual IP 192.168.1.138 will float between f0 and f1
+ifconfig_re0_alias0="inet vhid 1 pass testpass alias 192.168.1.138/32"
+```
+
+Whereas:
+
+* `vhid 1`: Virtual Host ID - must match on all CARP members
+* `pass testpass`: Password for CARP authentication (if you follow this, use a different password!)
+* `alias 192.168.1.138/32`: The virtual IP address with a /32 netmask
+
+Next, update `/etc/hosts` on all nodes (n0, n1, n2, r0, r1, r2) to resolve the VIP hostname:
+
+```
+192.168.1.138 f3s-storage-ha f3s-storage-ha.lan f3s-storage-ha.lan.buetow.org
+```
+
+This allows clients to connect to `f3s-storage-ha` regardless of which physical server is currently the MASTER.
+
+### CARP State Change Notifications
+
+To properly manage services during failover, we need to detect CARP state changes. FreeBSD's devd system can notify us when CARP transitions between MASTER and BACKUP states.
+
+Add this to `/etc/devd.conf` on both f0 and f1:
+
+paul@f0:~ % cat <<END | doas tee -a /etc/devd.conf
+notify 0 {
+ match "system" "CARP";
+ match "subsystem" "[0-9]+@[0-9a-z.]+";
+ match "type" "(MASTER|BACKUP)";
+ action "/usr/local/bin/carpcontrol.sh $subsystem $type";
+};
+END
+
+Next, create the CARP control script that will restart stunnel when CARP state changes:
+
+```sh
+paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
+#!/bin/sh
+# CARP state change control script
+
+case "$1" in
+ MASTER)
+ logger "CARP state changed to MASTER, starting services"
+ service rpcbind start >/dev/null 2>&1
+ service mountd start >/dev/null 2>&1
+ service nfsd start >/dev/null 2>&1
+ service nfsuserd start >/dev/null 2>&1
+ service stunnel restart >/dev/null 2>&1
+ logger "CARP MASTER: NFS and stunnel services started"
+ ;;
+ BACKUP)
+ logger "CARP state changed to BACKUP, stopping services"
+ service stunnel stop >/dev/null 2>&1
+ service nfsd stop >/dev/null 2>&1
+ service mountd stop >/dev/null 2>&1
+ service nfsuserd stop >/dev/null 2>&1
+ logger "CARP BACKUP: NFS and stunnel services stopped"
+ ;;
+ *)
+ logger "CARP state changed to $1 (unhandled)"
+ ;;
+esac
+EOF
+
+paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
+
+# Copy the same script to f1
+paul@f0:~ % scp /usr/local/bin/carpcontrol.sh f1:/tmp/
+paul@f1:~ % doas mv /tmp/carpcontrol.sh /usr/local/bin/
+paul@f1:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
+```
+
+Note that we perform several tasks in the `carpcontrol.sh` script, which starts and/or stops all the services required for an NFS server running over an encrypted tunnel (via `stunnel`). We will set up all those services later in this blog post!
+
+To enable CARP in /boot/loader.conf, run:
+
+```sh
+paul@f0:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
+carp_load="YES"
+paul@f1:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
+carp_load="YES"
+```
+
+Then reboot both hosts or run `doas kldload carp` to load the module immediately.
+
## Future Storage Explorations
-While zrepl provides excellent snapshot-based replication for disaster recovery, there are other storage technologies worth exploring for the f3s project:
+While `zrepl` provides excellent snapshot-based replication for disaster recovery, there are other storage technologies worth exploring for the f3s project:
### MinIO for S3-Compatible Object Storage
@@ -1913,7 +1858,7 @@ With NFS servers running on both f0 and f1 and stunnel bound to the CARP VIP:
* Data consistency: ZFS replication ensures f1 has recent data (within 5-minute window)
* Read-only replica: The replicated dataset on f1 is always mounted read-only to prevent breaking replication
* Manual intervention required for full RW failover: When f1 becomes MASTER, you must:
- 1. Stop zrepl to prevent conflicts: `doas service zrepl stop`
+ 1. Stop `zrepl` to prevent conflicts: `doas service `zrepl` stop`
2. Make the replicated dataset writable: `doas zfs set readonly=off zdata/sink/f0/zdata/enc/nfsdata`
3. Ensure encryption keys are loaded (should be automatic with zfskeys_enable)
4. NFS will automatically start serving read/write requests through the VIP
@@ -2116,7 +2061,7 @@ To check if replication is working correctly:
```sh
# Check replication status
-paul@f0:~ % doas zrepl status
+paul@f0:~ % doas `zrepl` status
# Check recent snapshots on source
paul@f0:~ % doas zfs list -t snapshot -o name,creation zdata/enc/nfsdata | tail -5
@@ -2128,8 +2073,8 @@ paul@f1:~ % doas zfs list -t snapshot -o name,creation zdata/sink/f0/zdata/enc/n
paul@f1:~ % ls -la /data/nfs/k3svolumes/
```
-Important: If you see "connection refused" errors in zrepl logs, ensure:
-* Both servers have zrepl running (`doas service zrepl status`)
+Important: If you see "connection refused" errors in `zrepl` logs, ensure:
+* Both servers have `zrepl` running (`doas service `zrepl` status`)
* No firewall or hosts.allow rules are blocking port 8888
* WireGuard is up if using WireGuard IPs for replication
@@ -2156,9 +2101,9 @@ paul@f0:~ % doas showmount -e localhost
# Test write access
[root@r0 ~]# echo "Test after reboot $(date)" > /data/nfs/k3svolumes/test-reboot.txt
-# Verify zrepl is running and replicating
-paul@f0:~ % doas service zrepl status
-paul@f1:~ % doas service zrepl status
+# Verify `zrepl` is running and replicating
+paul@f0:~ % doas service `zrepl` status
+paul@f1:~ % doas service `zrepl` status
```
### Integration with Kubernetes
@@ -2615,7 +2560,7 @@ For reference, with AES-256-GCM on a typical mini PC:
### Replication Bandwidth
-ZFS replication with zrepl is efficient, only sending changed blocks:
+ZFS replication with `zrepl` is efficient, only sending changed blocks:
* Initial sync: Full dataset size (can be large)
* Incremental: Typically <1% of dataset size per snapshot
@@ -2737,7 +2682,7 @@ The storage layer is the foundation for any serious Kubernetes deployment. By bu
* FreeBSD CARP documentation: https://docs.freebsd.org/en/books/handbook/advanced-networking/#carp
* ZFS encryption guide: https://docs.freebsd.org/en/books/handbook/zfs/#zfs-encryption
* Stunnel documentation: https://www.stunnel.org/docs.html
-* zrepl documentation: https://zrepl.github.io/
+* `zrepl` documentation: https://zrepl.github.io/
Other *BSD-related posts:
diff --git a/gemfeed/atom.xml b/gemfeed/atom.xml
index 58e2e2a2..e7e59869 100644
--- a/gemfeed/atom.xml
+++ b/gemfeed/atom.xml
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
- <updated>2025-07-02T00:37:08+03:00</updated>
+ <updated>2025-07-12T22:45:27+03:00</updated>
<title>foo.zone feed</title>
<subtitle>To be in the .zone!</subtitle>
<link href="gemini://foo.zone/gemfeed/atom.xml" rel="self" />
@@ -781,7 +781,7 @@
<li>⇢ <a href='#where-and-how-to-get-it'>Where and how to get it</a></li>
<li>⇢ <a href='#lessons-learned-from-building-task-samurai-with-agentic-coding'>Lessons learned from building Task Samurai with agentic coding</a></li>
<li>⇢ ⇢ <a href='#developer-workflow'>Developer workflow</a></li>
-<li>⇢ ⇢ <a href='#how-it-went-down'>How it went down</a></li>
+<li>⇢ ⇢ <a href='#how-it-went'>How it went</a></li>
<li>⇢ ⇢ <a href='#what-went-wrong'>What went wrong</a></li>
<li>⇢ ⇢ <a href='#patterns-that-helped'>Patterns that helped</a></li>
<li>⇢ ⇢ <a href='#what-i-learned-using-agentic-coding'>What I learned using agentic coding</a></li>
@@ -798,6 +798,7 @@
<h3 style='display: inline' id='why-does-this-exist'>Why does this exist?</h3><br />
<br />
<span>I wanted to tinker with agentic coding. This project was implemented entirely using OpenAI Codex. (After this blog post was published, I also used the Claude Code CLI.)</span><br />
+<br />
<ul>
<li>I wanted a faster UI for Taskwarrior than other options, like Vit, which is Python-based.</li>
<li>I wanted something built with Bubble Tea, but I never had time to dive deep into it.</li>
@@ -825,17 +826,19 @@
<br />
<h3 style='display: inline' id='developer-workflow'>Developer workflow</h3><br />
<br />
-<span>I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am trying out currently) credits (it still happens!), but Codex was still available to me. So, I seized the opportunity to push agentic coding a bit more using another platform.</span><br />
+<span>I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am currently trying out) credits (it still happens!), but Codex was still available to me. So, I took the opportunity to push agentic coding a bit further with another platform.</span><br />
<br />
<span>I didn&#39;t really love the web UI you have to use for Codex, as I usually live in the terminal. But this is all I have for Codex for now, and I thought I&#39;d give it a try regardless. The web UI is simple and pretty straightforward. There&#39;s also a Codex CLI one could use directly in the terminal, but I didn&#39;t get it working. I will try again soon.</span><br />
<br />
+<span class='quote'>Update: Codex CLI now works for me, after OpenAI released a new version!</span><br />
+<br />
<span>For every task given to Codex, it spins up its own container. From there, you can drill down and watch what it is doing. At the end, the result (in the form of a code diff) will be presented. From there, you can make suggestions about what else to change in the codebase. What I found inconvenient is that for every additional change, there&#39;s an overhead because Codex has to spin up a container and bootstrap the entire development environment again, which adds extra delay. That could be eliminated by setting up predefined custom containers, but that feature still seems somewhat limited.</span><br />
<br />
-<span>Once satisfied, you can ask Codex to create a GitHub PR; from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.</span><br />
+<span>Once satisfied, you can ask Codex to create a GitHub PR (too bad only GitHub is supported and no other Git hosters); from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts. </span><br />
<br />
-<h3 style='display: inline' id='how-it-went-down'>How it went down</h3><br />
+<h3 style='display: inline' id='how-it-went'>How it went</h3><br />
<br />
-<span>Task Samurai&#39;s codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits. Here are the broad strokes:</span><br />
+<span>Task Samurai&#39;s codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits:</span><br />
<br />
<ul>
<li>June 19: Scaffolded the Go boilerplate, set up tests, integrated the Bubble Tea UI framework, and got the first table views showing up.</li>
@@ -849,7 +852,7 @@
<br />
<h3 style='display: inline' id='what-went-wrong'>What went wrong</h3><br />
<br />
-<span>Going agentic isn&#39;t all smooth sailing. Here are the hiccups I ran into, plus a few hard-earned lessons:</span><br />
+<span>Going agentic isn&#39;t all smooth. Here are the hiccups I ran into, plus a few lessons:</span><br />
<br />
<ul>
<li>Merge Floods: Every minor feature or fix existed on its branch, so merging was a constant process. It kept progress flowing but also drowned the committed history in noise and the occasional conflict. I found this to be an issue with OpenAI&#39;s Codex in particular. Not so much with other agentic coding tools like Claude Code CLI (not covered in this blog post.)</li>
@@ -865,29 +868,28 @@
<li>Tests Matter: A solid base of unit tests for task manipulations kept things from breaking entirely when experimenting.</li>
<li>Live Documentation: Documentation, such as the README, is updated regularly to reflect all the hotkey and feature changes.</li>
</ul><br />
+<span>Maybe a better approach would have been to design the whole application from scratch before letting Codix do any of the coding. I will try that with my next toy project.</span><br />
<br />
<h3 style='display: inline' id='what-i-learned-using-agentic-coding'>What I learned using agentic coding</h3><br />
<br />
-<span>Stepping into agentic coding with Codex as my "pair programmer" was a genuine shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at breakneck speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).</span><br />
-<br />
-<span>Discussing requirements with Codex forced me to clarify features and spot logical pitfalls earlier. All those fast iterations meant I was constantly coaxing more helpful, less ambiguous code out of the model—making me rethink how to break features into clear, testable steps.</span><br />
+<span>Stepping into agentic coding with Codex as my "pair programmer" was a big shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at high speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).</span><br />
<br />
<h3 style='display: inline' id='how-much-time-did-i-save'>how much time did I save?</h3><br />
<br />
-<span>Did it buy me speed? Let&#39;s do some back-of-the-envelope math:</span><br />
+<span>Did it buy me speed? </span><br />
<br />
<ul>
<li>Say each commit takes Codex 5 minutes to generate, and you need to review/guide 179 commits = about _6 hours of active development_.</li>
<li>If you coded it all yourself, including all the bug fixes, features, design, and documentation, you might spend _10–20 hours_.</li>
-<li>That&#39;s a couple of days potential savings.</li>
+<li>That&#39;s a couple of days of potential savings—and I am by no means an expert in agentic coding, since this was my first completed agentic coding project.</li>
</ul><br />
<h2 style='display: inline' id='conclusion'>Conclusion</h2><br />
<br />
-<span>Building Task Samurai with agentic coding was a wild ride—rapid feature growth, plenty of churns, countless fast fixes, and more merge commits I&#39;d expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the raw power (and some hazards) of agentic development.</span><br />
+<span>Building Task Samurai with agentic coding was a wild ride—rapid feature growth, countless fast fixes, and more merge commits I&#39;d expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a polished terminal UI in days instead of weeks is a testament to the power of agentic development.</span><br />
<br />
<span>Am I an agentic coding expert now? I don&#39;t think so. There are still many things to learn, and the landscape is constantly evolving.</span><br />
<br />
-<span>While working on Task Samurai, there were times I genuinely missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don&#39;t use AI, your next performance review may be awkward.</span><br />
+<span>While working on Task Samurai, there were times I missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don&#39;t use AI, your next performance review may be awkward.</span><br />
<br />
<span>Personally, I am not sure whether I like where the industry is going with agentic coding. I love "traditional" coding, and with agentic coding you operate at a higher level and don&#39;t interact directly with code as often, which I would miss. I think that in the future, designing, reviewing, and being able to read and understand code will be more important than writing code by hand.</span><br />
<br />