2021-05-15T18:38:00+01:00 buetow.org feed Having fun with computers! gemini://buetow.org/ Welcome to the Geminispace gemini://buetow.org/gemfeed/2021-04-24-welcome-to-the-geminispace.gmi 2021-04-24T19:28:41+01:00 Paul Buetow comments@mx.buetow.org Have you reached this article already via Gemini? You need a special client for that, web browsers such as Firefox, Chrome, Safari etc. don't support the Gemini protocol. The Gemini address of this site (or the address of this capsule as people say in Geminispace) is: ... to read on visit my site.

Welcome to the Geminispace

Written by Paul Buetow 2021-04-24, last updated 2021-04-30, ASCII Art by Andy Hood

Have you reached this article already via Gemini? You need a special client for that, web browsers such as Firefox, Chrome, Safari etc. don't support the Gemini protocol. The Gemini address of this site (or the address of this capsule as people say in Geminispace) is:

gemini://buetow.org

If you however still use HTTP then you are just surfing the fallback HTML version of this capsule. In that case I suggest reading on what this is all about :-).


    /\
   /  \
  |    |
  |NASA|
  |    |
  |    |
  |    |
 '      `
 |Gemini|
 |      |
 |______|
  '-`'-`   .
  / . \'\ . .'
 ''( .'\.' ' .;'
'.;.;' ;'.;' ..;;' AsH

Motivation

My urge to revamp my personal website

For some time I had to urge to revamp my personal website. Not to update the technology and the design of it but to update all the content (+ keep it current) and also to start a small tech blog again. So unconsciously I started to search for a good platform and/or software to do all of that in a KISS (keep it simple & stupid) way.

My still great Laptop running hot

Earlier this year (2021) I noticed that my 6 year old but still great Laptop started to become hot and slowed down while surfing the web. Also, the Laptop's fan became quite noisy. This is all due to the additional bloat such as JavaScript, excessive use of CSS, tracking cookies+pixels, ads and so on there was on the website.

All what I wanted was to read an interesting article but after a big advertising pop-up banner appeared and made everything worse I gave up and closed the browser tab.

Discovering the Gemini internet protocol

Around the same time I discovered a relatively new more lightweight protocol named Gemini which does not support all these CPU intensive features like HTML, JavaScript and CSS do. Also, tracking and ads is not supported by the Gemini protocol.

The "downside" is that due to the limited capabilities of the Gemini protocol all sites look very old and spartan. But that is not really a downside, that is in fact a design choice people made. It is up to the client software how your capsule looks. For example, you could use a graphical client with nice font renderings and colors to improve the appearance. Or you could just use a very minimalistic command line black-and-white Gemini client. It's your (the user's) choice.

Screenshot Amfora Gemini terminal client surfing this site:Screenshot Amfora Gemini terminal client surfing this site

Why is there a need for a new protocol? As the modern web is a superset of Gemini, can't we just use simple HTML 1.0? That's a good and valid question. It is not a technical problem but a human problem. We tend to abuse the features once they are available. You can be sure that things stay simple and efficient as long as you are using the Gemini protocol. On the other hand you can't force every website in the modern web to only create plain and simple looking HTML pages.

My own Gemini capsule

As it is very easy to set up and maintain your own Gemini capsule (Gemini server + content composed via the Gemtext markup language) I decided to create my own. What I really like about Gemini is that I can just use my favorite text editor and get typing. I don't need to worry about the style and design of the presence and I also don't have to test anything in ten different web browsers. I can only focus on the content! As a matter of fact, I am using the Vim editor + it's spellchecker + auto word completion functionality to write this.

Advantages summarised

  • Supports an alternative to the modern bloated web
  • Easy to operate and easy to write content
  • No need to worry about various web browser compatibilities
  • It's the client's responsibility how the content is designed+presented
  • Lightweight (although not as lightweight as the Gopher protocol)
  • Supports privacy (no cookies, no request header fingerprinting, TLS encryption)
  • Fun to play with (it's a bit geeky yes, but a lot of fun!)

Dive into deep Gemini space

Check out one of the following links for more information about Gemini. For example, you will find a FAQ which explains why the protocol is named "Gemini". Many Gemini capsules are dual hosted via Gemini and HTTP(S), so that people new to Gemini can sneak peek the content with a normal web browser. As a matter of fact, some people go as far as tri-hosting all their content via HTTP(S), Gemini and Gopher.

gemini://gemini.circumlunar.space
https://gemini.circumlunar.space

E-Mail me your thoughts at comments@mx.buetow.org!

DTail - The distributed log tail program gemini://buetow.org/gemfeed/2021-04-22-dtail-the-distributed-log-tail-program.gmi 2021-04-22T19:28:41+01:00 Paul Buetow comments@mx.buetow.org This article first appeared at the Mimecast Engineering Blog but I made it available here in my personal Gemini capsule too. ...to read on visit my site.

DTail - The distributed log tail program

Written by Paul Buetow 2021-04-22, last updated 2021-04-26

DTail logo image:DTail logo image

This article first appeared at the Mimecast Engineering Blog but I made it available here in my personal Gemini capsule too.

Original Mimecast Engineering Blog post at Medium

Running a large cloud-based service requires monitoring the state of huge numbers of machines, a task for which many standard UNIX tools were not really designed. In this post, I will describe a simple program, DTail, that Mimecast has built and released as Open-Source, which enables us to monitor log files of many servers at once without the costly overhead of a full-blown log management system.

At Mimecast, we run over 10 thousand server boxes. Most of them host multiple microservices and each of them produces log files. Even with the use of time series databases and monitoring systems, raw application logs are still an important source of information when it comes to analysing, debugging, and troubleshooting services.

Every engineer familiar with UNIX or a UNIX-like platform (e.g., Linux) is well aware of tail, a command-line program for displaying a text file content on the terminal which is also especially useful for following application or system log files with tail -f logfile.

Think of DTail as a distributed version of the tail program which is very useful when you have a distributed application running on many servers. DTail is an Open-Source, cross-platform, fairly easy to use, support and maintain log file analysis & statistics gathering tool designed for Engineers and Systems Administrators. It is programmed in Google Go.

A Mimecast Pet Project

DTail got its inspiration from public domain tools available already in this area but it is a blue sky from-scratch development which was first presented at Mimecast’s annual internal Pet Project competition (awarded with a Bronze prize). It has gained popularity since and is one of the most widely deployed DevOps tools at Mimecast (reaching nearly 10k server installations) and many engineers use it on a regular basis. The Open-Source version of DTail is available at:

https://dtail.dev

Try it out — We would love any feedback. But first, read on…

Differentiating from log management systems

Why not just use a full-blown log management system? There are various Open-Source and commercial log management solutions available on the market you could choose from (e.g. the ELK stack). Most of them store the logs in a centralized location and are fairly complex to set up and operate. Possibly they are also pretty expensive to operate if you have to buy dedicated hardware (or pay fees to your cloud provider) and have to hire support staff for it.

DTail does not aim to replace any of the log management tools already available but is rather an additional tool crafted especially for ad-hoc debugging and troubleshooting purposes. DTail is cheap to operate as it does not require any dedicated hardware for log storage as it operates directly on the source of the logs. It means that there is a DTail server installed on all server boxes producing logs. This decentralized comes with the direct advantages that there is no introduced delay because the logs are not shipped to a central log storage device. The reduced complexity also makes it more robust against outages. You won’t be able to troubleshoot your distributed application very well if the log management infrastructure isn’t working either.

DTail sample session animated gif:DTail sample session animated gif

As a downside, you won’t be able to access any logs with DTail when the server is down. Furthermore, a server can store logs only up to a certain capacity as disks will fill up. For the purpose of ad-hoc debugging, these are not typically issues. Usually, it’s the application you want to debug and not the server. And disk space is rarely an issue for bare metal and VM-based systems these days, with sufficient space for several weeks’ worth of log storage being available. DTail also supports reading compressed logs. The currently supported compression algorithms are gzip and zstd.

Combining simplicity, security and efficiency

DTail also has a client component that connects to multiple servers concurrently for log files (or any other text files).

The DTail client interacts with a DTail server on port TCP/2222 via SSH protocol and does not interact in any way with the system’s SSH server (e.g., OpenSSH Server) which might be running at port TCP/22 already. As a matter of fact, you don’t need a regular SSH server running for DTail at all. There is no support for interactive login shells at TCP/2222 either, as by design that port can only be used for text data streaming. The SSH protocol is used for the public/private key infrastructure and transport encryption only and DTail implements its own protocol on top of SSH for the features provided. There is no need to set up or buy any additional TLS certificates. The port 2222 can be easily reconfigured if you preferred to use a different one.

The DTail server, which is a single static binary, will not fork an external process. This means that all features are implemented in native Go code (exception: Linux ACL support is implemented in C, but it must be enabled explicitly on compile time) and therefore helping to make it robust, secure, efficient, and easy to deploy. A single client, running on a standard Laptop, can connect to thousands of servers concurrently while still maintaining a small resource footprint.

Recent log files are very likely still in the file system caches on the servers. Therefore, there tends to be a minimal I/O overhead involved.

The DTail family of commands

Following the UNIX philosophy, DTail includes multiple command-line commands each of them for a different purpose:

  • dserver: The DTail server, the only binary required to be installed on the servers involved.
  • dtail: The distributed log tail client for following log files.
  • dcat: The distributed cat client for concatenating and displaying text files.
  • dgrep: The distributed grep client for searching text files for a regular expression pattern.
  • dmap: The distributed map-reduce client for aggregating stats from log files.
DGrep sample session animated gif:DGrep sample session animated gif

Usage example

The use of these commands is almost self-explanatory for a person already used to the standard command line in Unix systems. One of the main goals is to make DTail easy to use. A tool that is too complicated to use under high-pressure scenarios (e.g., during an incident) can be quite detrimental.

The basic idea is to start one of the clients from the command line and provide a list of servers to connect to with –servers. You also must provide a path of remote (log) files via –files. If you want to process multiple files per server, you could either provide a comma-separated list of file paths or make use of file system globbing (or a combination of both).

The following example would connect to all DTail servers listed in the serverlist.txt, follow all files with the ending .log and filter for lines containing the string error. You can specify any Go compatible regular expression. In this example we add the case-insensitive flag to the regex:

dtail –servers serverlist.txt –files ‘/var/log/*.log’ –regex ‘(?i:error)’

You usually want to specify a regular expression as a client argument. This will mean that responses are pre-filtered for all matching lines on the server-side and thus sending back only the relevant lines to the client. If your logs are growing very rapidly and the regex is not specific enough there might be the chance that your client is not fast enough to keep up processing all of the responses. This could be due to a network bottleneck or just as simple as a slow terminal emulator displaying the log lines on the client-side.

A green 100 in the client output before each log line received from the server always indicates that there were no such problems and 100% of all log lines could be displayed on your terminal (have a look at the animated Gifs in this post). If the percentage falls below 100 it means that some of the channels used by the servers to send data to the client are congested and lines were dropped. In this case, the color will change from green to red. The user then could decide to run the same query but with a more specific regex.

You could also provide a comma-separated list of servers as opposed to a text file. There are many more options you could use. The ones listed here are just the very basic ones. There are more instructions and usage examples on the GitHub page. Also, you can study even more of the available options via the –help switch (some real treasures might be hidden there).

Fitting it in

DTail integrates nicely into the user management of existing infrastructure. It follows normal system permissions and does not open new “holes” on the server which helps to keep security departments happy. The user would not have more or less file read permissions than he would have via a regular SSH login shell. There is a full SSH key, traditional UNIX permissions, and Linux ACL support. There is also a very low resource footprint involved. On average for tailing and searching log files less than 100MB RAM and less than a quarter of a CPU core per participating server are required. Complex map-reduce queries on big data sets will require more resources accordingly.

Advanced features

The features listed here are out of the scope of this blog post but are worthwhile to mention:

  • Distributed map-reduce queries on stats provided in log files with dmap. dmap comes with its own SQL-like aggregation query language.
  • Stats streaming with continuous map-reduce queries. The difference to normal queries is that the stats are aggregated over a specified interval only on the newly written log lines. Thus, giving a de-facto live stat view for each interval.
  • Server-side scheduled queries on log files. The queries are configured in the DTail server configuration file and scheduled at certain time intervals. Results are written to CSV files. This is useful for generating daily stats from the log files without the need for an interactive client.
  • Server-side stats streaming with continuous map-reduce queries. This for example can be used to periodically generate stats from the logs at a configured interval, e.g., log error counts by the minute. These then can be sent to a time-series database (e.g., Graphite) and then plotted in a Grafana dashboard.
  • Support for custom extensions. E.g., for different server discovery methods (so you don’t have to rely on plain server lists) and log file formats (so that map-reduce queries can parse more stats from the logs).

For the future

There are various features we want to see in the future.

  • A spartan mode, not printing out any extra information but the raw remote log files would be a nice feature to have. This will make it easier to post-process the data produced by the DTail client with common UNIX tools. (To some degree this is possible already, just disable the ANSI terminal color output of the client with -noColors and pipe the output to another program).
  • Tempting would be implementing the dgoawk command, a distributed version of the AWK programming language purely implemented in Go, for advanced text data stream processing capabilities. There are 3rd party libraries available implementing AWK in pure Go which could be used.
  • A more complex change would be the support of federated queries. You can connect to thousands of servers from a single client running on a laptop. But does it scale to 100k of servers? Some of the servers could be used as middleware for connecting to even more servers.
  • Another aspect is to extend the documentation. Especially the advanced features such as map-reduce query language and how to configure the server-side queries currently do require more documentation. For now, you can read the code, sample config files or just ask the author for that! But this will be certainly addressed in the future.

Open Source

Mimecast highly encourages you to have a look at DTail and submit an issue for any features you would like to see. Have you found a bug? Maybe you just have a question or comment? If you want to go a step further: We would also love to see pull requests for any features or improvements. Either way, if in doubt just contact us via the DTail GitHub page.

https://dtail.dev

E-Mail me your thoughts at comments@mx.buetow.org!

Realistic load testing with I/O Riot for Linux gemini://buetow.org/gemfeed/2018-06-01-realistic-load-testing-with-ioriot-for-linux.gmi 2018-06-01T14:50:29+01:00 Paul Buetow comments@mx.buetow.org This text first was published in the german IT-Administrator computer Magazine. 3 years have passed since and I decided to publish it on my blog too. . .....to read on please visit my site.

Realistic load testing with I/O Riot for Linux

       .---.
      /     \
      \.@-@./
      /`\_/`\
     //  _  \\
    | \     )|_
   /`\_`>  <_/ \
jgs\__/'---'\__/

Written by Paul Buetow 2018-06-01, last updated 2021-05-08

Foreword

This text first was published in the german IT-Administrator computer Magazine. 3 years have passed since and I decided to publish it on my blog too.

https://www.admin-magazin.de/Das-Heft/2018/06/Realistische-Lasttests-mit-I-O-Riot

I havn't worked on I/O Riot for some time now, but all what is written here is still valid. I am still using I/O Riot to debug I/O issues and pattern once in a while, so by all means the tool is not obsolete yet. The tool even helped to resolve a major production incident at work involving I/O.

I am eagerly looking forward to revamp I/O Riot so that it uses the new BPF Linux capabilities instead of plain old Systemtap (or alternatively: Newer versions of Systemtap can also use BPF as the backend I have learned). Also, when I wrote I/O Riot initially, I didn't have any experience with the Go programming language yet and therefore I wrote it in C. Once it gets revamped I might consider using Go instead of C as it would spare me from many segmentation faults and headaches during development ;-). I might also just stick to C for plain performance reasons and just refactor the code dealing with concurrency.

Pleace notice that some of the screenshots show the command "ioreplay" instead of "ioriot". That's because the name has changed after taking those.

The article

With I/O Riot IT administrators can load test and optimize the I/O subsystem of Linux-based operating systems. The tool makes it possible to record I/O patterns and replay them at a later time as often as desired. This means bottlenecks can be reproduced and eradicated.

When storing huge amounts of data, such as more than 200 billion archived emails at Mimecast, it's not only the available storage capacity that matters, but also the data throughput and latency. At the same time, operating costs must be kept as low as possible. The more systems involved, the more important it is to optimize the hardware, the operating system and the applications running on it.

Background: Existing Techniques

Conventional I/O benchmarking: Administrators usually use open source benchmarking tools like IOZone and bonnie++. Available database systems such as Redis and MySQL come with their own benchmarking tools. The common problem with these tools is that they work with prescribed artificial I/O patterns. Although this can test both sequential and randomized data access, the patterns do not correspond to what can be found on production systems.

Testing by load test environment: Another option is to use a separate load test environment in which, as far as possible, a production environment with all its dependencies is simulated. However, an environment consisting of many microservices is very complex. Microservices are usually managed by different teams, which means extra coordination effort for each load test. Another challenge is to generate the load as authentically as possible so that the patterns correspond to a productive environment. Such a load test environment can only handle as many requests as its weakest link can handle. For example, load generators send many read and write requests to a frontend microservice, whereby the frontend forwards the requests to a backend microservice responsible for storing the data. If the frontend service does not process the requests efficiently enough, the backend service is not well utilized in the first place. As a rule, all microservices are clustered across many servers, which makes everything even more complicated. Under all these conditions it is very difficult to test I/O of separate backend systems. Moreover, for many small and medium-sized companies, a separate load test environment would not be feasible for cost reasons.

Testing in the production environment: For these reasons, benchmarks are often carried out in the production environment. In order to derive value from this such tests are especially performed during peak hours when systems are under high load. However, testing on production systems is associated with risks and can lead to failure or loss of data without adequate protection.

Benchmarking the Email Cloud at Mimecast

For email archiving, Mimecast uses an internally developed microservice, which is operated directly on Linux-based storage systems. A storage cluster is divided into several replication volumes. Data is always replicated three times across two secure data centers. Customer data is automatically allocated to one or more volumes, depending on throughput, so that all volumes are automatically assigned the same load. Customer data is archived on conventional, but inexpensive hard disks with several terabytes of storage capacity each. I/O benchmarking proved difficult for all the reasons mentioned above. Furthermore, there are no ready-made tools for this purpose in the case of self-developed software. The service operates on many block devices simultaneously, which can make the RAID controller a bottleneck. None of the freely available benchmarking tools can test several block devices at the same time without extra effort. In addition, emails typically consist of many small files. Randomized access to many small files is particularly inefficient. In addition to many software adaptations, the hardware and operating system must also be optimized.

Mimecast encourages employees to be innovative and pursue their own ideas in the form of an internal competition, Pet Project. The goal of the pet project I/O Riot was to simplify OS and hardware level I/O benchmarking. The first prototype of I/O Riot was awarded an internal roadmap prize in the spring of 2017. A few months later, I/O Riot was used to reduce write latency in the storage clusters by about 50%. The improvement was first verified by I/O replay on a test system and then successively applied to all storage systems. I/O Riot was also used to resolve a production incident related to disk I/O load.

Using I/O Riot

First, all I/O events are logged to a file on a production system with I/O Riot. It is then copied to a test system where all events are replayed in the same way. The crucial point here is that you can reproduce I/O patterns as they are found on a production system as often as you like on a test system. This results in the possibility of optimizing the set screws on the system after each run.

Installation

I/O Riot was tested under CentOS 7.2 x86_64. For compiling, the GNU C compiler and Systemtap including kernel debug information are required. Other Linux distributions are theoretically compatible but untested. First of all, you should update the systems involved as follows:

% sudo yum update

If the kernel is updated, please restart the system. The installation would be done without a restart but this would complicate the installation. The installed kernel version should always correspond to the currently running kernel. You can then install I/O Riot as follows:

% sudo yum install gcc git systemtap yum-utils kernel-devel-$(uname -r)
% sudo debuginfo-install kernel-$(uname -r)
% git clone https://github.com/mimecast/ioriot
% cd ioriot
% make
% sudo make install
% export PATH=$PATH:/opt/ioriot/bin

Note: It is not best practice to install any compilers on production systems. For further information please have a look at the enclosed README.md.

Recording of I/O events

All I/O events are kernel related. If a process wants to perform an I/O operation, such as opening a file, it must inform the kernel of this by a system call (short syscall). I/O Riot relies on the Systemtap tool to record I/O syscalls. Systemtap, available for all popular Linux distributions, helps you to take a look at the running kernel in productive environments, which makes it predestined to monitor all I/O-relevant Linux syscalls and log them to a file. Other tools, such as strace, are not an alternative because they slow down the system too much.

During recording, ioriot acts as a wrapper and executes all relevant Systemtap commands for you. Use the following command to log all events to io.capture:

% sudo ioriot -c io.capture
Screenshot I/O recording:Screenshot I/O recording

A Ctrl-C (SIGINT) stops recording prematurely. Otherwise, ioriot terminates itself automatically after 1 hour. Depending on the system load, the output file can grow to several gigabytes. Only metadata is logged, not the read and written data itself. When replaying later, only random data is used. Under certain circumstances, Systemtap may omit some system calls and issue warnings. This is to ensure that Systemtap does not consume too many resources.

Test preparation

Then copy io.capture to a test system. The log also contains all accesses to the pseudo file systems devfs, sysfs and procfs. This makes little sense, which is why you must first generate a cleaned and playable version io.replay from io.capture as follows:

% sudo ioriot -c io.capture -r io.replay -u $USER -n TESTNAME

The parameter -n allows you to assign a freely selectable test name. An arbitrary system user under which the test is to be played is specified via paramater -u.

Test Initialization

The test will most likely want to access existing files. These are files the test wants to read but does not create by itself. The existence of these must be ensured before the test. You can do this as follows:

% sudo ioriot -i io.replay

To avoid any damage to the running system, ioreplay only works in special directories. The tool creates a separate subdirectory for each file system mount point (e.g. /, /usr/local, /store/00,...) (here: /.ioriot/TESTNAME, /usr/local/.ioriot/TESTNAME, /store/00/.ioriot/TESTNAME,...). By default, the working directory of ioriot is /usr/local/ioriot/TESTNAME.

Screenshot test preparation:Screenshot test preparation

You must re-initialize the environment before each run. Data from previous tests will be moved to a trash directory automatically, which can be finally deleted with "sudo ioriot -P".

Replay

After initialization, you can replay the log with -r. You can use -R to initiate both test initialization and replay in a single command and -S can be used to specify a file in which statistics are written after the test run.

You can also influence the playback speed: "-s 0" is interpreted as "Playback as fast as possible" and is the default setting. With "-s 1" all operations are performed at original speed. "-s 2" would double the playback speed and "-s 0.5" would halve it.

Screenshot replaying I/O:Screenshot replaying I/O

As an initial test, for example, you could compare the two Linux I/O schedulers CFQ and Deadline and check which scheduler the test runs the fastest. They run the test separately for each scheduler. The following shell loop iterates through all attached block devices of the system and changes their I/O scheduler to the one specified in variable $new_scheduler (in this case either cfq or deadline). Subsequently, all I/O events from the io.replay protocol are played back. At the end, an output file with statistics is generated:

% new_scheduler=cfq
% for scheduler in /sys/block/*/queue/scheduler; do
    echo $new_scheduler | sudo tee $scheduler
done
% sudo ioriot -R io.replay -S cfq.txt
% new_scheduler=deadline
% for scheduler in /sys/block/*/queue/scheduler; do
   echo $new_scheduler | sudo tee $scheduler
done
% sudo ioriot -R io.replay -S deadline.txt

According to the results, the test could run 940 seconds faster with Deadline Scheduler:

% cat cfq.txt
Num workers: 4
hreads per worker: 128
otal threads: 512
Highest loadavg: 259.29
Performed ioops: 218624596
Average ioops/s: 101544.17
Time ahead: 1452s
Total time: 2153.00s
% cat deadline.txt
Num workers: 4
Threads per worker: 128
Total threads: 512
Highest loadavg: 342.45
Performed ioops: 218624596
Average ioops/s: 180234.62
Time ahead: 2392s
Total time: 1213.00s

In any case, you should also set up a time series database, such as Graphite, where the I/O throughput can be plotted. Figures 4 and 5 show the read and write access times of both tests. The break-in makes it clear when the CFQ test ended and the deadline test was started. The reading latency of both tests is similar. Write latency is dramatically improved using the Deadline Scheduler.

Graphite visualization of the mean read access times in ms with CFQ and Deadline Scheduler.:Graphite visualization of the mean read access times in ms with CFQ and Deadline Scheduler.
Graphite visualization of the average write access times in ms with CFQ and Deadline Scheduler.:Graphite visualization of the average write access times in ms with CFQ and Deadline Scheduler.

You should also take a look at the iostat tool. The iostat screenshot shows the output of iostat -x 10 during a test run. As you can see, a block device is fully loaded with 99% utilization, while all other block devices still have sufficient buffer. This could be an indication of poor data distribution in the storage system and is worth pursuing. It is not uncommon for I/O Riot to reveal software problems.

Output of iostat. The block device sdy seems to be almost fully utilized by 99%.:Output of iostat. The block device sdy seems to be almost fully utilized by 99%.

I/O Riot is Open Source

The tool has already proven to be very useful and will continue to be actively developed as time and priority permits. Mimecast intends to be an ongoing contributor to Open Source. You can find I/O Riot at:

https://github.com/mimecast/ioriot

Systemtap

Systemtap is a tool for the instrumentation of the Linux kernel. The tool provides an AWK-like programming language. Programs written in it are compiled from Systemtap to C- and then into a dynamically loadable kernel module. Loaded into the kernel, the program has access to Linux internals. A Systemtap program written for I/O Riot monitors when, with which parameters, at which time, and from which process I/O syscalls take place and their return values.

For example, the open syscall opens a file and returns the responsible file descriptor. The read and write syscalls can operate on a file descriptor and return the number of read or written bytes. The close syscall closes a given file descriptor. I/O Riot comes with a ready-made Systemtap program, which you have already compiled into a kernel module and installed to /opt/ioriot. In addition to open, read and close, it logs many other I/O-relevant calls.

https://sourceware.org/systemtap/

More refereces

IOZone
Bonnie++
Graphite
Memory mapped I/O

E-Mail me your thoughts at comments@mx.buetow.org!

Methods in C gemini://buetow.org/gemfeed/2016-11-20-methods-in-c.gmi 2016-11-20T18:36:51+01:00 Paul Buetow comments@mx.buetow.org You can do some sort of object oriented programming in the C Programming Language. However, that is very limited. But also very easy and straight forward to use.. .....to read on please visit my site.

Methods in C

Written by Paul Buetow 2016-11-20

You can do some sort of object oriented programming in the C Programming Language. However, that is very limited. But also very easy and straight forward to use.

Example

Lets have a look at the following sample program. Basically all you have to do is to add a function pointer such as "calculate" to the definition of struct "something_s". Later, during the struct initialization, assign a function address to that function pointer:

#include <stdio.h>

typedef struct {
    double (*calculate)(const double, const double);
    char *name;
} something_s;

double multiplication(const double a, const double b) {
    return a * b;
}

double division(const double a, const double b) {
    return a / b;
}

int main(void) {
    something_s mult = (something_s) {
        .calculate = multiplication,
        .name = "Multiplication"
    };

    something_s div = (something_s) {
        .calculate = division,
        .name = "Division"
    };

    const double a = 3, b = 2;

    printf("%s(%f, %f) => %f\n", mult.name, a, b, mult.calculate(a,b));
    printf("%s(%f, %f) => %f\n", div.name, a, b, div.calculate(a,b));
}

As you can see you can call the function (pointed by the function pointer) the same way as in C++ or Java via:

printf("%s(%f, %f) => %f\n", mult.name, a, b, mult.calculate(a,b));
printf("%s(%f, %f) => %f\n", div.name, a, b, div.calculate(a,b));

However, that's just syntactic sugar for:

printf("%s(%f, %f) => %f\n", mult.name, a, b, (*mult.calculate)(a,b));
printf("%s(%f, %f) => %f\n", div.name, a, b, (*div.calculate)(a,b));

Output:

pbuetow ~/git/blog/source [38268]% gcc methods-in-c.c -o methods-in-c
pbuetow ~/git/blog/source [38269]% ./methods-in-c
Multiplication(3.000000, 2.000000) => 6.000000
Division(3.000000, 2.000000) => 1.500000

Not complicated at all, but nice to know and helps to make the code easier to read!

The flaw

That's actually not really how it works in object oriented languages such as Java and C++. The method call in this example is not really a method call as "mult" and "div" in this example are not "message receivers". What I mean by that is that the functions can not access the state of the "mult" and "div" struct objects. In C you would need to do something like this instead if you wanted to access the state of "mult" from within the calculate function, you would have to pass it as an argument:

mult.calculate(mult,a,b));

How to overcome this? You need to take it further...

Taking it further

If you want to take it further type "Object-Oriented Programming with ANSI-C" into your favorite internet search engine, you will find some crazy stuff. Some go as far as writing a C preprocessor in AWK, which takes some object oriented pseudo-C and transforms it to plain C so that the C compiler can compile it to machine code. This is actually similar to how the C++ language had its origins.

E-Mail me your thoughts at comments@mx.buetow.org!

Spinning up my own authoritative DNS servers gemini://buetow.org/gemfeed/2016-05-22-spinning-up-my-own-authoritative-dns-servers.gmi 2016-05-22T18:59:01+01:00 Paul Buetow comments@mx.buetow.org Finally, I had time to deploy my own authoritative DNS servers (master and slave) for my domains 'buetow.org' and 'buetow.zone'. My domain name provider is Schlund Technologies. They allow their customers to manually edit the DNS records (BIND files). And they also give you the opportunity to set your own authoritative DNS servers for your domains. From now I am making use of that option.. .....to read on please visit my site.

Spinning up my own authoritative DNS servers

Written by Paul Buetow 2016-05-22

Background

Finally, I had time to deploy my own authoritative DNS servers (master and slave) for my domains "buetow.org" and "buetow.zone". My domain name provider is Schlund Technologies. They allow their customers to manually edit the DNS records (BIND files). And they also give you the opportunity to set your own authoritative DNS servers for your domains. From now, I am making use of that option.

Schlund Technologies

All FreeBSD Jails

In order to set up my authoritative DNS servers I installed a FreeBSD Jail dedicated for DNS with Puppet on my root machine as follows:

include freebsd

freebsd::ipalias { '2a01:4f8:120:30e8::14':
  ensure    => up,
  proto     => 'inet6',
  preflen   => '64',
  interface => 're0',
  aliasnum  => '5',
}

include jail::freebsd

class { 'jail':
  ensure              => present,
  jails_config        => {
    dns                     => {
      '_ensure'             => present,
      '_type'               => 'freebsd',
      '_mirror'             => 'ftp://ftp.de.freebsd.org',
      '_remote_path'        => 'FreeBSD/releases/amd64/10.1-RELEASE',
      '_dists'              => [ 'base.txz', 'doc.txz', ],
      '_ensure_directories' => [ '/opt', '/opt/enc' ],
      'host.hostname'       => "'dns.ian.buetow.org'",
      'ip4.addr'            => '192.168.0.15',
      'ip6.addr'            => '2a01:4f8:120:30e8::15',
    },
    .
    .
  }
}

PF firewall

Please note that "dns.ian.buetow.org" is just the Jail name of the master DNS server (and "caprica.ian.buetow.org" the name of the Jail for the slave DNS server) and that I am using the DNS names "dns1.buetow.org" (master) and "dns2.buetow.org" (slave) for the actual service names (these are the DNS servers visible to the public). Please also note that the IPv4 address is an internal one. I have a PF to use NAT and PAT. The DNS ports are being forwarded (TCP and UDP) to that Jail. By default, all ports are blocked, so I am adding an exception rule for the IPv6 address as well. These are the PF rules in use:

% cat /etc/pf.conf
.
.
# dns.ian.buetow.org 
rdr pass on re0 proto tcp from any to $pub_ip port {53} -> 192.168.0.15
rdr pass on re0 proto udp from any to $pub_ip port {53} -> 192.168.0.15
pass in on re0 inet6 proto tcp from any to 2a01:4f8:120:30e8::15 port {53} flags S/SA keep state
pass in on re0 inet6 proto udp from any to 2a01:4f8:120:30e8::15 port {53} flags S/SA keep state
.
.

Puppet managed BIND zone files

In "manifests/dns.pp" (the Puppet manifest for the Master DNS Jail itself) I configured the BIND DNS server this way:

class { 'bind_freebsd':
  config         => "puppet:///files/bind/named.${::hostname}.conf",
  dynamic_config => "puppet:///files/bind/dynamic.${::hostname}",
}

The Puppet module is actually a pretty simple one. It installs the file "/usr/local/etc/named/named.conf" and it populates the "/usr/local/etc/named/dynamicdb" directory with all my zone files.

Once (Puppet-) applied inside of the Jail I get this:

paul uranus:~/git/blog/source [4268]% ssh admin@dns1.buetow.org.buetow.org pgrep -lf named
60748 /usr/local/sbin/named -u bind -c /usr/local/etc/namedb/named.conf
paul uranus:~/git/blog/source [4269]% ssh admin@dns1.buetow.org.buetow.org tail -n 13 /usr/local/etc/namedb/named.conf
zone "buetow.org" {
    type master;
    notify yes;
    allow-update { key "buetoworgkey"; };
    file "/usr/local/etc/namedb/dynamic/buetow.org";
};

zone "buetow.zone" {
    type master;
    notify yes;
    allow-update { key "buetoworgkey"; };
    file "/usr/local/etc/namedb/dynamic/buetow.zone";
};
paul uranus:~/git/blog/source [4277]% ssh admin@dns1.buetow.org.buetow.org cat /usr/local/etc/namedb/dynamic/buetow.org
$TTL 3600
@    IN   SOA   dns1.buetow.org. domains.buetow.org. (
     25       ; Serial
     604800   ; Refresh
     86400    ; Retry
     2419200  ; Expire
     604800 ) ; Negative Cache TTL
; Infrastructure domains
@ IN NS dns1
@ IN NS dns2
* 300 IN CNAME web.ian
buetow.org. 86400 IN A 78.46.80.70
buetow.org. 86400 IN AAAA 2a01:4f8:120:30e8:0:0:0:11
buetow.org. 86400 IN MX 10 mail.ian
dns1 86400 IN A 78.46.80.70
dns1 86400 IN AAAA 2a01:4f8:120:30e8:0:0:0:15
dns2 86400 IN A 164.177.171.32
dns2 86400 IN AAAA 2a03:2500:1:6:20::
.
.
.
.

That is my master DNS server. My slave DNS server runs in another Jail on another bare metal machine. Everything is set up similar to the master DNS server. However, that server is located in a different DC and in different IP subnets. The only difference is the "named.conf". It's configured to be a slave and that means that the "dynamicdb" gets populated by BIND itself while doing zone transfers from the master.

paul uranus:~/git/blog/source [4279]% ssh admin@dns2.buetow.org tail -n 11 /usr/local/etc/namedb/named.conf
zone "buetow.org" {
    type slave;
    masters { 78.46.80.70; };
    file "/usr/local/etc/namedb/dynamic/buetow.org";
};

zone "buetow.zone" {
    type slave;
    masters { 78.46.80.70; };
    file "/usr/local/etc/namedb/dynamic/buetow.zone";
};

The end result

The end result looks like this now:

% dig -t ns buetow.org
; <<>> DiG 9.10.3-P4-RedHat-9.10.3-12.P4.fc23 <<>> -t ns buetow.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37883
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;buetow.org.			IN	NS

;; ANSWER SECTION:
buetow.org.		600	IN	NS	dns2.buetow.org.
buetow.org.		600	IN	NS	dns1.buetow.org.

;; Query time: 41 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Sun May 22 11:34:11 BST 2016
;; MSG SIZE  rcvd: 77

% dig -t any buetow.org @dns1.buetow.org
; <<>> DiG 9.10.3-P4-RedHat-9.10.3-12.P4.fc23 <<>> -t any buetow.org @dns1.buetow.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49876
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 7

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;buetow.org.			IN	ANY

;; ANSWER SECTION:
buetow.org.		86400	IN	A	78.46.80.70
buetow.org.		86400	IN	AAAA	2a01:4f8:120:30e8::11
buetow.org.		86400	IN	MX	10 mail.ian.buetow.org.
buetow.org.		3600	IN	SOA	dns1.buetow.org. domains.buetow.org. 25 604800 86400 2419200 604800
buetow.org.		3600	IN	NS	dns2.buetow.org.
buetow.org.		3600	IN	NS	dns1.buetow.org.

;; ADDITIONAL SECTION:
mail.ian.buetow.org.	86400	IN	A	78.46.80.70
dns1.buetow.org.	86400	IN	A	78.46.80.70
dns2.buetow.org.	86400	IN	A	164.177.171.32
mail.ian.buetow.org.	86400	IN	AAAA	2a01:4f8:120:30e8::12
dns1.buetow.org.	86400	IN	AAAA	2a01:4f8:120:30e8::15
dns2.buetow.org.	86400	IN	AAAA	2a03:2500:1:6:20::

;; Query time: 42 msec
;; SERVER: 78.46.80.70#53(78.46.80.70)
;; WHEN: Sun May 22 11:34:41 BST 2016
;; MSG SIZE  rcvd: 322

Monitoring

For monitoring I am using Icinga2 (I am operating two Icinga2 instances in two different DCs). I may have to post another blog article about Icinga2 but to get the idea these were the snippets added to my Icinga2 configuration:

apply Service "dig" {
    import "generic-service"

    check_command = "dig"
    vars.dig_lookup = "buetow.org"
    vars.timeout = 30

    assign where host.name == "dns.ian.buetow.org" || host.name == "caprica.ian.buetow.org"
}

apply Service "dig6" {
    import "generic-service"

    check_command = "dig"
    vars.dig_lookup = "buetow.org"
    vars.timeout = 30
    vars.check_ipv6 = true

    assign where host.name == "dns.ian.buetow.org" || host.name == "caprica.ian.buetow.org"
}

DNS update workflow

Whenever I have to change a DNS entry all have to do is:

  • Git clone or update the Puppet repository
  • Update/commit and push the zone file (e.g. "buetow.org")
  • Wait for Puppet. Puppet will deploy that updated zone file. And it will reload the BIND server.
  • The BIND server will notify all slave DNS servers (at the moment only one). And it will transfer the new version of the zone.

That's much more comfortable now than manually clicking at some web UIs at Schlund Technologies.

E-Mail me your thoughts at comments@mx.buetow.org!

Offsite backup with ZFS (Part 2) gemini://buetow.org/gemfeed/2016-04-16-offsite-backup-with-zfs-part2.gmi 2016-04-16T22:43:42+01:00 Paul Buetow comments@mx.buetow.org I enhanced the procedure a bit. From now on I am having two external 2TB USB hard drives. Both are setup exactly the same way. To decrease the probability that they will not fail at about the same time both drives are of different brands. One drive is kept at the secret location. The other one is kept at home right next to my HP MicroServer. ...to read on visit my site.

Offsite backup with ZFS (Part 2)

 ________________
|# :           : #|
|  : ZFS/GELI  :  |________________ 
|  :   Offsite : |# :           : #|
|  :  Backup 1 : |  : ZFS/GELI  :  |
|  :___________: |  :   Offsite :  |
|     _________  |  :  Backup 2 :  |
|    | __      | |  :___________:  |
|    ||  |     | |     _________   |
\____||__|_____|_|    | __      |  |
                 |    ||  |     |  |
                 \____||__|_____|__|

Written by Paul Buetow 2016-04-16

Read the first part before reading any furter here...

I enhanced the procedure a bit. From now on I am having two external 2TB USB hard drives. Both are setup exactly the same way. To decrease the probability that they will not fail at about the same time both drives are of different brands. One drive is kept at the secret location. The other one is kept at home right next to my HP MicroServer.

Whenever I am updating offsite backup, I am doing it to the drive which is kept locally. Afterwards I bring it to the secret location and swap the drives and bring the other one back home. This ensures that I will always have an offiste backup available at a different location than my home - even while updating one copy of it.

Furthermore, I added scrubbing (*zpool scrub...*) to the script. It ensures that the file system is consistent and that there are no bad blocks on the disk and the file system. To increase the reliability I also run a *zfs set copies=2 zroot*. That setting is also synchronized to the offsite ZFS pool. ZFS stores every data block to disk twice now. Yes, it consumes twice as much disk space but it makes it better fault tolerant against hardware errors (e.g. only individual disk sectors going bad).

E-Mail me your thoughts at comments@mx.buetow.org!

Offsite backup with ZFS gemini://buetow.org/gemfeed/2016-04-03-offsite-backup-with-zfs.gmi 2016-04-03T22:43:42+01:00 Paul Buetow comments@mx.buetow.org When it comes to data storage and potential data loss I am a paranoid person. It is not just due to my job but also due to a personal experience I encountered over 10 years ago: A single drive failure and loss of all my data (pictures, music, ....). ...to read on visit my site.

Offsite backup with ZFS

 ________________
|# :           : #|
|  : ZFS/GELI  :  |
|  :   Offsite :  |
|  :  Backup   :  |
|  :___________:  |
|     _________   |
|    | __      |  |
|    ||  |     |  |
\____||__|_____|__|

Written by Paul Buetow 2016-04-03

Please don't lose all my pictures again!

When it comes to data storage and potential data loss I am a paranoid person. It is not just due to my job but also due to a personal experience I encountered over 10 years ago: A single drive failure and loss of all my data (pictures, music, ....).

A little about my personal infrastructure: I am running my own (mostly FreeBSD based) root servers (across several countries: Two in Germany, one in Canada, one in Bulgaria) which store all my online data (E-Mail and my Git repositories). I am syncing incremental (and encrypted) ZFS snapshots between these servers forth and back so either data could be recovered from the other server.

Local storage box for offline data

Also, I am operating a local server (an HP MicroServer) at home in my apartment. Full snapshots of all ZFS volumes are pulled from the "online" servers to the local server every other week and the incremental ZFS snapshots every day. That local server has a ZFS ZMIRROR with 3 disks configured (local triple redundancy). I keep up to half a year worth of ZFS snapshots of all volumes. That local server also contains all my offline data such as pictures, private documents, videos, books, various other backups, etc.

Once weekly all the data of that local server is copied to two external USB drives as a backup (without the historic snapshots). For simplicity these USB drives are not formatted with ZFS but with good old UFS. This gives me a chance to recover from a (potential) ZFS disaster. ZFS is a complex thing. Sometimes it is good not to trust complex things!

Storing it at my apartment is not enough

Now I am thinking about an offsite backup of all this local data. The problem is, that all the data remains on a single physical location: My local MicroServer. What happens when the house burns or someone steals my server including the internal disks and the attached USB drives? My first thought was to back up everything to the "cloud". The major issue here is however the limited amount of available upload bandwidth (only 1MBit/s).

The solution is adding another USB drive (2TB) with an encryption container (GELI) and a ZFS pool on it. The GELI encryption requires a secret key and a secret passphrase. I am updating the data to that drive once every 3 months (my calendar is reminding me about it) and afterwards I keep that drive at a secret location outside of my apartment. All the information needed to decrypt (mounting the GELI container) is stored at another (secure) place. Key and passphrase are kept at different places though. Even if someone would know of it, he would not be able to decrypt it as some additional insider knowledge would be required as well.

Walking one round less

I am thinking of buying a second 2TB USB drive and to set it up the same way as the first one. So I could alternate the backups. One drive would be at the secret location, and the other drive would be at home. And these drives would swap location after each cycle. This would give some security about the failure of that drive and I would have to go to the secret location only once (swapping the drives) instead of twice (picking that drive up in order to update the data + bringing it back to the secret location).

E-Mail me your thoughts at comments@mx.buetow.org!

Perl Daemon (Service Framework) gemini://buetow.org/gemfeed/2011-05-07-perl-daemon-service-framework.gmi 2011-05-07T22:26:02+01:00 Paul Buetow comments@mx.buetow.org PerlDaemon is a minimal daemon for Linux and other Unix like operating systems programmed in Perl. It is a minimal but pretty functional and fairly generic service framework. This means that it does not do anything useful other than providing a framework for starting, stopping, configuring and logging. In order to do something a module (written in Perl) bust be provided.. .....to read on please visit my site.

Perl Daemon (Service Framework)

   a'!   _,,_ a'!   _,,_     a'!   _,,_
     \\_/    \  \\_/    \      \\_/    \.-,
      \, /-( /'-,\, /-( /'-,    \, /-( /
      //\ //\\   //\ //\\       //\ //\\jrei

Written by Paul Buetow 2011-05-07, last updated 2021-05-07

PerlDaemon is a minimal daemon for Linux and other Unix like operating systems programmed in Perl. It is a minimal but pretty functional and fairly generic service framework. This means that it does not do anything useful other than providing a framework for starting, stopping, configuring and logging. In order to do something useful, a module (written in Perl) must be provided.

Features

PerlDaemon supports:

  • Automatic daemonizing
  • Logging
  • logrotation (via SIGHUP)
  • Clean shutdown support (SIGTERM)
  • Pid file support (incl. check on startup)
  • Easy to configure
  • Easy to extend
  • Multi instance support (just use a different directory for each instance).

Quick Guide

# Starting
 ./bin/perldaemon start (or shortcut ./control start)

# Stopping
 ./bin/perldaemon stop (or shortcut ./control stop)

# Alternatively: Starting in foreground 
./bin/perldaemon start daemon.daemonize=no (or shortcut ./control foreground)

To stop a daemon running in foreground mode "Ctrl+C" must be hit. To see more available startup options run "./control" without any argument.

How to configure

The daemon instance can be configured in "./conf/perldaemon.conf". If you want to change a property only once, it is also possible to specify it on command line (that then will take precedence over the config file). All available config properties can be viewed via "./control keys":

pb@titania:~/svn/utils/perldaemon/trunk$ ./control keys
# Path to the logfile
daemon.logfile=./log/perldaemon.log

# The amount of seconds until the next event look takes place
daemon.loopinterval=1

# Path to the modules dir
daemon.modules.dir=./lib/PerlDaemonModules

# Specifies either the daemon should run in daemon or foreground mode
daemon.daemonize=yes

# Path to the pidfile
daemon.pidfile=./run/perldaemon.pid

# Each module should run every runinterval seconds
daemon.modules.runinterval=3

# Path to the alive file (is touched every loopinterval seconds, usable to monitor)
daemon.alivefile=./run/perldaemon.alive

# Specifies the working directory
daemon.wd=./

Example

So let's start the daemon with a loop interval of 10 seconds:

$ ./control keys | grep daemon.loopinterval
daemon.loopinterval=1
$ ./control keys daemon.loopinterval=10 | grep daemon.loopinterval
daemon.loopinterval=10
$ ./control start daemon.loopinterval=10; sleep 10; tail -n 2 log/perldaemon.log
Starting daemon now...
Mon Jun 13 11:29:27 2011 (PID 2838): Triggering PerlDaemonModules::ExampleModule 
(last triggered before 10.002106s; carry: 7.002106s; wanted interval: 3s)
Mon Jun 13 11:29:27 2011 (PID 2838): ExampleModule Test 2
$ ./control stop
Stopping daemon now...

If you want to change that property forever either edit perldaemon.conf or do this:

$ ./control keys daemon.loopinterval=10 > new.conf; mv new.conf conf/perldaemon.conf

HiRes event loop

PerlDaemon uses `Time::HiRes` to make sure that all the events run in correct intervals. Each loop run a time carry value is recorded and added to the next loop run in order to catch up lost time.

Writing your own modules

Example module

This is one of the example modules you will find in the source code. It should be quite self-explanatory if you know Perl :-).

package PerlDaemonModules::ExampleModule;

use strict;
use warnings;

sub new ($$$) {
  my ($class, $conf) = @_;

  my $self = bless { conf => $conf }, $class;

  # Store some private module stuff
  $self->{counter} = 0;

  return $self;
}

# Runs periodically in a loop (set interval in perldaemon.conf)
sub do ($) {
  my $self = shift;
  my $conf = $self->{conf};
  my $logger = $conf->{logger};

  # Calculate some private module stuff
  my $count = ++$self->{counter};

  $logger->logmsg("ExampleModule Test $count");
}

1;

Your own module

Want to give it some better use? It's just as easy as:

 cd ./lib/PerlDaemonModules/
 cp ExampleModule.pm YourModule.pm
 vi YourModule.pm
 cd -
 ./bin/perldaemon restart (or shortcurt ./control restart)

Now watch `./log/perldaemon.log` closely. It is a good practice to test your modules in 'foreground mode' (see above how to do that).

BTW: You can install as many modules within the same instance as desired. But they are run in sequential order (in future they can also run in parallel using several threads or processes).

May the source be with you

You can find PerlDaemon (including the examples) at:

https://github.com/snonux/perldaemon

E-Mail me your thoughts at comments@mx.buetow.org!

The Fype Programming Language gemini://buetow.org/gemfeed/2010-05-09-the-fype-programming-language.gmi 2010-05-09T12:48:29+01:00 Paul Buetow comments@mx.buetow.org Fype is an interpreted programming language created by me for learning and fun. The interpreter is written in C. It has been tested on FreeBSD and NetBSD and may also work on other Unix like operating systems such as Linux based ones. To be honest, besides learning and fun there is really no other use case of why Fype actually exists as many other programming languages are much faster and more powerful.. .....to read on please visit my site.

The Fype Programming Language

      ____                                      _        __       
     / / _|_   _ _ __   ___    _   _  ___  __ _| |__    / _|_   _ 
    / / |_| | | | '_ \ / _ \  | | | |/ _ \/ _` | '_ \  | |_| | | |
 _ / /|  _| |_| | |_) |  __/  | |_| |  __/ (_| | | | |_|  _| |_| |
(_)_/ |_|  \__, | .__/ \___|   \__, |\___|\__,_|_| |_(_)_|  \__, |
           |___/|_|            |___/                        |___/ 

Written by Paul Buetow 2010-05-09, last updated 2021-05-05

Fype is an interpreted programming language created by me for learning and fun. The interpreter is written in C. It has been tested on FreeBSD and NetBSD and may also work on other Unix like operating systems such as Linux based ones. To be honest, besides learning and fun there is really no other use case of why Fype actually exists as many other programming languages are much faster and more powerful.

The Fype syntax is very simple and is using a maximum look ahead of 1 and a very easy top down parsing mechanism. Fype is parsing and interpreting its code simultaneously. This means, that syntax errors are only detected during program runtime.

Fype is a recursive acronym and means "Fype is For Your Program Execution" or "Fype is Free Yak Programmed for ELF". You could also say "It's not a hype - it's Fype!".

Object oriented C style

The Fype interpreter is written in an object oriented style of C. Each "main component" has its own .h and .c file. There is a struct type for each (most components at least) component which can be initialized using a "COMPONENT_new" function and destroyed using a "COMPONENT_delete" function. Method calls follow the same schema, e.g. "COMPONENT_METHODNAME". There is no such as class inheritance and polymorphism involved.

To give you an idea how it works here as an example is a snippet from the main Fype "class header":

typedef struct {
   Tupel *p_tupel_argv; // Contains command line options
   List *p_list_token; // Initial list of token
   Hash *p_hash_syms; // Symbol table
   char *c_basename;
} Fype;

And here is a snippet from the main Fype "class implementation":

Fype*
fype_new() {
   Fype *p_fype = malloc(sizeof(Fype));

   p_fype->p_hash_syms = hash_new(512);
   p_fype->p_list_token = list_new();
   p_fype->p_tupel_argv = tupel_new();
   p_fype->c_basename = NULL;

   garbage_init();

   return (p_fype);
}

void
fype_delete(Fype *p_fype) {
   argv_tupel_delete(p_fype->p_tupel_argv);

   hash_iterate(p_fype->p_hash_syms, symbol_cleanup_hash_syms_cb);
   hash_delete(p_fype->p_hash_syms);

   list_iterate(p_fype->p_list_token, token_ref_down_cb);
   list_delete(p_fype->p_list_token);

   if (p_fype->c_basename)
      free(p_fype->c_basename);

   garbage_destroy();
}

int
fype_run(int i_argc, char **pc_argv) {
   Fype *p_fype = fype_new();

   // argv: Maintains command line options
   argv_run(p_fype, i_argc, pc_argv);

   // scanner: Creates a list of token
   scanner_run(p_fype);

   // interpret: Interpret the list of token
   interpret_run(p_fype);

   fype_delete(p_fype);

   return (0);
}

Data types

Fype uses auto type conversion. However, if you want to know what's going on you may take a look at the following basic data types:

  • integer - Specifies a number
  • double - Specifies a double precision number
  • string - Specifies a string
  • number - May be an integer or a double number
  • any- May be any type above
  • void - No type
  • identifier - It's a variable name or a procedure name or a function name

There is no boolean type, but we can use the integer values 0 for false and 1 for true. There is support for explicit type casting too.

Syntax

Comments

Text from a # character until the end of the current line is considered being a comment. Multi line comments may start with an #* and with a *# anywhere. Exceptions are if those signs are inside of strings.

Variables

Variables can be defined with the "my" keyword (inspired by Perl :-). If you don't assign a value during declaration, then it's using the default integer value 0. Variables may be changed during program runtime. Variables may be deleted using the "undef" keyword! Example:

my foo = 1 + 2;
say foo; 

my bar = 12, baz = foo;
say 1 + bar;
say bar;

my baz;
say baz; # Will print out 0

You may use the "defined" keyword to check if an identifier has been defined or not:

ifnot defined foo {
	say "No foo yet defined";
}

my foo = 1;

if defined foo {
	put "foo is defined and has the value ";
	say foo;
}

Synonyms

Each variable can have as many synonyms as wished. A synonym is another name to access the content of a specific variable. Here is an example of how to use is:

my foo = "foo";
my bar = \foo;
foo = "bar";

# The synonym variable should now also set to "bar"
assert "bar" == bar;

Synonyms can be used for all kind of identifiers. It's not limited to normal variables but can be also used for function and procedure names etc (more about functions and procedures later).

# Create a new procedure baz
proc baz { say "I am baz"; }

# Make a synonym baz, and undefine baz
my bay = \baz;

undef baz;

# bay still has a reference of the original procedure baz
bay; # this prints aut "I am baz" 

The "syms" keyword gives you the total number of synonyms pointing to a specific value:

my foo = 1;
say syms foo; # Prints 1

my baz = \foo; 
say syms foo; # Prints 2
say syms baz; # Prints 2

undef baz;
say syms foo; # Prints 1

Statements and expressions

A Fype program is a list of statements. Each keyword, expression or function call is part of a statement. Each statement is ended with a semicolon. Example:

my bar = 3, foo = 1 + 2; 
say foo;
exit foo - bar;

Parenthesis

All parenthesis for function arguments are optional. They help to make the code better readable. They also help to force precedence of expressions.

Basic expressions

Any "any" value holding a string will be automatically converted to an integer value.

(any) <any> + <any>
(any) <any> - <any>
(any) <any> * <any>
(any) <any> / <any>
(integer) <any> == <any>
(integer) <any> != <any>
(integer) <any> <= <any>
(integer) <any> gt <any>
(integer) <any> <> <any>
(integer) <any> gt <any>
(integer) not <any>

Bitwise expressions

(integer) <any> :< <any>
(integer) <any> :> <any>
(integer) <any> and <any>
(integer) <any> or <any>
(integer) <any> xor <any>

Numeric expressions

(number) neg <number>

... returns the negative value of "number":

(integer) no <integer>

... returns 1 if the argument is 0, otherwise it will return 0! If no argument is given, then 0 is returned!

(integer) yes <integer>

... always returns 1. The parameter is optional. Example:

# Prints out 1, because foo is not defined
if yes { say no defined foo; } 

Control statements

Control statements available in Fype:

if <expression> { <statements> }

... runs the statements if the expression evaluates to a true value.

ifnot <expression> { <statements> }

... runs the statements if the expression evaluates to a false value.

while <expression> { <statements> }

... runs the statements as long as the expression evaluates to a true value.

until <expression> { <statements> }

... runs the statements as long as the expression evaluates to a false value.

Scopes

A new scope starts with an { and ends with an }. An exception is a procedure, which does not use its own scope (see later in this manual). Control statements and functions support scopes. The "scope" function prints out all available symbols at the current scope. Here is a small example:

my foo = 1;

{
	# Prints out 1
	put defined foo;
	{
		my bar = 2;

		# Prints out 1
		put defined bar;

		# Prints out all available symbols at this
		# point to stdout. Those are: bar and foo
		scope;
	}

	# Prints out 0
	put defined bar;

	my baz = 3;
}

# Prints out 0
say defined bar;

Another example including an actual output:

./fype -e ’my global; func foo { my var4; func bar { my var2, var3; func baz { my var1; scope; } baz; } bar; } foo;’
Scopes:
Scope stack size: 3
Global symbols:
SYM_VARIABLE: global (id=00034, line=-0001, pos=-001, type=TT_INTEGER, dval=0.000000, refs=-1)
SYM_FUNCTION: foo
Local symbols:
SYM_VARIABLE: var1 (id=00038, line=-0001, pos=-001, type=TT_INTEGER, dval=0.000000, refs=-1)
1 level(s) up:
SYM_VARIABLE: var2 (id=00036, line=-0001, pos=-001, type=TT_INTEGER, dval=0.000000, refs=-1)
SYM_VARIABLE: var3 (id=00037, line=-0001, pos=-001, type=TT_INTEGER, dval=0.000000, refs=-1)
SYM_FUNCTION: baz
2 level(s) up:
SYM_VARIABLE: var4 (id=00035, line=-0001, pos=-001, type=TT_INTEGER, dval=0.000000, refs=-1)
SYM_FUNCTION: bar

Definedness

(integer) defined <identifier>

... returns 1 if "identifier" has been defined. Returns 0 otherwise.

(integer) undef <identifier>

... tries to undefine/delete the "identifier". Returns 1 if it succeeded, otherwise 0 is returned.

System

These are some system and interpreter specific built-in functions supported:

(void) end

... exits the program with the exit status of 0.

(void) exit <integer>

... exits the program with the specified exit status.

(integer) fork

... forks a subprocess. It returns 0 for the child process and the pid of the child process otherwise! Example:

my pid = fork;

if pid {
	put "I am the parent process; child has the pid ";
	say pid;

} ifnot pid {
	say "I am the child process";
}

To execute the garbage collector do:

(integer) gc

It returns the number of items freed! You may wonder why most of the time it will return a value of 0! Fype tries to free not needed memory ASAP. This may change in future versions in order to gain faster execution speed!

I/O

(any) put <any>

... prints out the argument

(any) say <any>

is the same as put, but also includes an ending newline.

(void) ln

... just prints a newline.

Procedures and functions

Procedures

A procedure can be defined with the "proc" keyword and deleted with the "undef" keyword. A procedure does not return any value and does not support parameter passing. It's using already defined variables (e.g. global variables). A procedure does not have its own namespace. It's using the calling namespace. It is possible to define new variables inside of a procedure in the current namespace.

proc foo {
	say 1 + a * 3 + b;
	my c = 6;
}

my a = 2, b = 4;

foo; # Run the procedure. Print out "11\n"
say c; # Print out "6\n";

Nested procedures

It's possible to define procedures inside of procedures. Since procedures don't have its own scope, nested procedures will be available to the current scope as soon as the main procedure has run the first time. You may use the "defined" keyword in order to check if a procedure has been defined or not.

proc foo {
	say "I am foo";

	undef bar;
	proc bar {
		say "I am bar";
	}
}

# Here bar would produce an error because 
# the proc is not yet defined!
# bar; 

foo; # Here the procedure foo will define the procedure bar!
bar; # Now the procedure bar is defined!
foo; # Here the procedure foo will redefine bar again!

Functions

A function can be defined with the "func" keyword and deleted with the "undef" keyword. Function do not yet return values and do not yet supports parameter passing. It's using local (lexical scoped) variables. If a certain variable does not exist, when It's using already defined variables (e.g. one scope above).

func foo {
	say 1 + a * 3 + b;
	my c = 6;
}

my a = 2, b = 4;

foo; # Run the procedure. Print out "11\n"
say c; # Will produce an error, because c is out of scoped!

Nested functions

Nested functions work the same way the nested procedures work, with the exception that nested functions will not be available anymore after the function has been left!

func foo {
	func bar {
		say "Hello i am nested";
	}

	bar; # Calling nested
}

foo;
bar; # Will produce an error, because bar is out of scope!

Arrays

Some progress on arrays has been made too. The following example creates a multi dimensional array "foo". Its first element is the return value of the func which is "bar". The fourth value is a string ”3” converted to a double number. The last element is an anonymous array which itself contains another anonymous array as its last element:

func bar { say ”bar” }
my foo = [bar, 1, 4/2, double ”3”, [”A”, [”BA”, ”BB”]]];
say foo;

It produces the following output:

% ./fype arrays.fy
bar
01
2
3.000000
A
BA
BB

Fancy stuff

Fancy stuff like OOP or Unicode or threading is not planed. But fancy stuff like function pointers and closures may be considered.:)

May the source be with you

You can find all of this on the GitHub page. There is also an "examples" folders containing some Fype scripts!

https://github.com/snonux/fype

E-Mail me your thoughts at comments@mx.buetow.org!

Standard ML and Haskell gemini://buetow.org/gemfeed/2010-04-09-standard-ml-and-haskell.gmi 2010-04-09T22:57:36+01:00 Paul Buetow comments@mx.buetow.org I am currently looking into the functional programming language Standard ML (aka SML). The purpose is to refresh my functional programming skills and to learn something new too. Since I already know a little Haskell, could I do not help myself and I implemented the same exercises in Haskell too.. .....to read on please visit my site.

Standard ML and Haskell

Written by Paul Buetow 2010-04-09

I am currently looking into the functional programming language Standard ML (aka SML). The purpose is to refresh my functional programming skills and to learn something new too. Since I already know a little Haskell, could I do not help myself and I implemented the same exercises in Haskell too.

As you will see, SML and Haskell are very similar (at least when it comes to the basics). However, the syntax of Haskell is a bit more "advanced". Haskell utilizes fewer keywords (e.g. no val, end, fun, fn ...). Haskell also allows to explicitly write down the function types. What I have been missing in SML so far is the so-called pattern guards. Although this is a very superficial comparison for now, so far I like Haskell more than SML. Nevertheless, I thought it would be fun to demonstrate a few simple functions of both languages to show off the similarities.

Haskell is also a "pure functional" programming language, whereas SML also makes explicit use of imperative concepts. I am by far not a specialist in either of these languages but here are a few functions implemented in both, SML and Haskell:

Defining a multi data type

Standard ML:

datatype ’a multi
	= EMPTY
	| ELEM of ’a
	| UNION of ’a multi * ’a multi

Haskell:

data (Eq a) => Multi a
    = Empty
    | Elem a
    | Union (Multi a) (Multi a)
    deriving Show

Processing a multi

Standard ML:

fun number (EMPTY) _ = 0
	| number (ELEM x) w = if x = w then 1 else 0
	| number (UNION (x,y)) w = (number x w) + (number y w)
fun test_number w = number (UNION (EMPTY, \
    UNION (ELEM 4, UNION (ELEM 6, \
    UNION (UNION (ELEM 4, ELEM 4), EMPTY))))) w 

Haskell:

number Empty _ = 0
number (Elem x) w = if x == w then 1 else 0
test_number w = number (Union Empty \
    (Union (Elem 4) (Union (Elem 6) \
    (Union (Union (Elem 4) (Elem 4)) Empty)))) w

Simplify function

Standard ML:

fun simplify (UNION (x,y)) =
    let fun is_empty (EMPTY) = true | is_empty _ = false
        val x’ = simplify x
        val y’ = simplify y
    in if (is_empty x’) andalso (is_empty y’)
            then EMPTY
       else if (is_empty x’)
            then y’
       else if (is_empty y’)
            then x’
       else UNION (x’, y’)
    end
  | simplify x = x

Haskell:

simplify (Union x y)
    | (isEmpty x’) && (isEmpty y’) = Empty
    | isEmpty x’ = y’
    | isEmpty y’ = x’
    | otherwise = Union x’ y’
    where
        isEmpty Empty = True
        isEmpty _ = False
        x’ = simplify x
        y’ = simplify y
simplify x = x

Delete all

Standard ML:

fun delete_all m w =
    let fun delete_all’ (ELEM x) = if x = w then EMPTY else ELEM x
          | delete_all’ (UNION (x,y)) = UNION (delete_all’ x, delete_all’ y)
          | delete_all’ x = x
    in simplify (delete_all’ m)
    end

Haskell:

delete_all m w = simplify (delete_all’ m)
    where
        delete_all’ (Elem x) = if x == w then Empty else Elem x
        delete_all’ (Union x y) = Union (delete_all’ x) (delete_all’ y)
        delete_all’ x = x

Delete one

Standard ML:

fun delete_one m w =
    let fun delete_one’ (UNION (x,y)) =
            let val (x’, deleted) = delete_one’ x
                in if deleted
                   then (UNION (x’, y), deleted)
                   else let val (y’, deleted) = delete_one’ y
                       in (UNION (x, y’), deleted)
                   end
                end
          | delete_one’ (ELEM x) =
            if x = w then (EMPTY, true) else (ELEM x, false)
          | delete_one’ x = (x, false)
            val (m’, _) = delete_one’ m
        in simplify m’
    end

Haskell:

delete_one m w = do
    let (m’, _) = delete_one’ m
    simplify m’
    where
        delete_one’ (Union x y) =
            let (x’, deleted) = delete_one’ x
            in if deleted
                then (Union x’ y, deleted)
                else let (y’, deleted) = delete_one’ y
                    in (Union x y’, deleted)
        delete_one’ (Elem x) =
            if x == w then (Empty, True) else (Elem x, False)
        delete_one’ x = (x, False)

Higher order functions

The first line is always the SML code, the second line always the Haskell variant:

fun make_map_fn f1 = fn (x,y) => f1 x :: y
make_map_fn f1 = \x y -> f1 x : y

fun make_filter_fn f1 = fn (x,y) => if f1 x then x :: y else y
make_filter_fn f1 = \x y -> if f1 then x : y else y

fun my_map f l = foldr (make_map_fn f) [] l
my_map f l = foldr (make_map_fn f) [] l

fun my_filter f l = foldr (make_filter_fn f) [] l
my_filter f l = foldr (make_filter_fn f) [] l

E-Mail me your thoughts at comments@mx.buetow.org!

Perl Poetry gemini://buetow.org/gemfeed/2008-06-26-perl-poetry.gmi 2008-06-26T21:43:51+01:00 Paul Buetow comments@mx.buetow.org Here are some Perl Poems I wrote. They don't do anything useful when you run them but they don't produce a compiler error either. They only exists for fun and demonstrate what you can do with Perl syntax.. .....to read on please visit my site.

Perl Poetry

 '\|/'                                  *
-- * -----
  /|\      ____
 ' | '    {_   o^>       *
   :        -_  /)
   :         (   (        .-''`'.
   .          \   \      /       \
   .           \    \   /         \
                \    `-'           `'.
                 \    . '        /    `.
                  \  ( \  )     (     .')
   ,,   t          '. |  /       |     (
  '|``_/^\___        '|  |`'-..-'|   ( ()
_~~|~/_|_|__/|~~~~~~~ |  / ~~~~~ |   | ~~~~~~~~
 -_  |L[|]L|/         | |\ MJP   )   )
                      ( |(       /  /|
   ~~ ~  ~ ~~~~       | /\\     / /| |
                      ||  \\  _/ / | |
             ~ ~ ~~~ _|| (_/ (___)_| |Nov291999
                    (__)         (____)

Written by Paul Buetow 2008-06-26, last updated 2021-05-04

Here are some Perl Poems I wrote. They don't do anything useful when you run them, but they don't produce a compiler error either. They only exist for fun and demonstrate what you can do with Perl syntax.

Wikipedia: "Perl poetry is the practice of writing poems that can be compiled as legal Perl code, for example the piece known as Black Perl. Perl poetry is made possible by the large number of English words that are used in the Perl language. New poems are regularly submitted to the community at PerlMonks."

https://en.wikipedia.org/wiki/Perl

math.pl

#!/usr/bin/perl

# (C) 2006 by Paul C. Buetow (http://paul.buetow.org) 

goto library for study $math;
BEGIN { s/earching/ books/ 
and read $them, $at, $the } library:

our $topics, cos and tan, 
require strict; import { of, tied $patience };

do { int'egrate'; sub trade; };
do { exp'onentize' and abs'olutize' };
study and study and study and study;

foreach $topic ({of, math}) {
you, m/ay /go, to, limits }

do { not qw/erk / unless $success 
and m/ove /o;$n and study };

do { int'egrate'; sub trade; };
do { exp'onentize' and abs'olutize' };
study and study and study and study;

grep /all/, exp'onents' and cos'inuses';
/seek results/ for @all, log'4rithms';

'you' =~ m/ay /go, not home 
unless each %book ne#ars
$completion;

do { int'egrate'; sub trade; };
do { exp'onentize' and abs'olutize' };

#at
home: //ig,'nore', time and sleep $very =~ s/tr/on/g;
__END__

christmas.pl

#!/usr/bin/perl

# (C) 2006 by Paul C. Buetow (http://paul.buetow.org) 

Christmas:{time;#!!!

Children: do tell $wishes;

Santa: for $each (@children) { 
BEGIN { read $each, $their, wishes and study them; use Memoize#ing

} use constant gift, 'wrapping'; 
package Gifts; pack $each, gift and bless $each and goto deliver
or do import if not local $available,!!! HO, HO, HO;

redo Santa, pipe $gifts, to_childs;
redo Santa and do return if last one, is, delivered; 

deliver: gift and require diagnostics if our $gifts ,not break;
do{ use NEXT; time; tied $gifts} if broken and dump the, broken, ones;
The_children: sleep and wait for (each %gift) and try { to => untie $gifts };

redo Santa, pipe $gifts, to_childs;
redo Santa and do return if last one, is, delivered; 

The_christmas_tree: formline s/ /childrens/, $gifts;
alarm and warn if not exists $Christmas{ tree}, @t, $ENV{HOME};  
write <<EMail
 to the parents to buy a new christmas tree!!!!111
 and send the
EMail
;wait and redo deliver until defined local $tree;

redo Santa, pipe $gifts, to_childs;
redo Santa and do return if last one, is, delivered ;}

END {} our $mission and do sleep until next Christmas ;}

__END__

This is perl, v5.8.8 built for i386-freebsd-64int

shopping.pl

#!/usr/bin/perl

# (C) 2007 by Paul C. Buetow (http://paul.buetow.org) 

BEGIN{} goto mall for $shopping; 

m/y/; mall: seek$s, cool products(), { to => $sell };
for $their (@business) { to:; earn:; a:; lot:; of:; money: }

do not goto home and exit mall if exists $new{product};
foreach $of (q(uality rich products)){} package products; 

our $news; do tell cool products() and do{ sub#tract
cool{ $products and shift @the, @bad, @ones;

do bless [q(uality)], $products 
and return not undef $stuff if not (local $available) }};

do { study and study and study for cool products() }
and do { seek $all, cool products(), { to => $buy } };

do { write $them, $down } and do { order: foreach (@case) { package s } };
goto home if not exists $more{money} or die q(uerying) ;for( @money){};

at:;home: do { END{} and:; rest:; a:; bit: exit $shopping } 
and sleep until unpack$ing, cool products();

__END__
This is perl, v5.8.8 built for i386-freebsd-64int

More...

Did you like what you saw? Have a look at Github to see my other poems too:

https://github.com/snonux/perl-poetry

E-Mail me your thoughts at comments@mx.buetow.org!