Monday, 9 June 2025

Computers Sure Have Come A Long Way

It's amazing how far computers have come :-) My first computer was a 486 DX2 with a 100 MB hard drive. That's no typo, it was measured in megabytes, not gigabytes! It, along with the CD-ROM drive we got later on, was connected to the computer via an IDE bus. This bus used a ribbon cable about 3 inches wide with 80 separate tiny wires running through it. If you needed to reconfigure the system, there were no two ways about it -- you had to completely shut the system down and disconnect the power before you started changing what wires were plugged into what. It seems awkward, but the reason the name is "IDE" -- "Integrated Drive Electronics" -- is that in even earlier systems, hard drives _didn't know how to access themselves_. You needed another specialized controller in between the computer and the disk for it to even work! But I digress. Today, I had a minor annoyance with a server. That server provides Internet for the house I grew up in. It is a _virtualized_ server, which means that the "computer" it's running on doesn't even exist, not physically. It's just a program running on _another_ computer. That other computer ran into a problem: it ran out of hard drive space. The hard drive in it was only 120,000 MB, and the virtual hard drive for the actual server just kept growing for whatever reason. (I say that because the actual filesystem inside of it is only about 6 GB, so why is the virtual disk over 87 GB? :-P) So, I needed to upgrade it to a bigger hard drive. In the "good ol' days", that would have meant shutting the computer down, installing the second hard drive (which probably would have meant disconnecting something else to free up a slot, because you could only connect two IDE devices at a time), then starting the system up in a special mode running a program to copy data from one disk to another. That process would take hours to complete, after which you'd then shut it down again, reorganize the disks, plug that optical drive back in or what have you, and then cross your fingers that the boot sector copied over properly and is still properly configured for the new drive, otherwise you have a whole new category of problems to solve! I was able to do it without shutting the computer down at all! The first thing that made this possible is the way hard drives connect to computers now. Back in the day when home users were struggling with IDE hard drives, fancy high-end systems used a different technology to connect called SCSI ("Small Computer System Interface"). With SCSI, you could connect up to 8 things to the same cable, but one of those things was the computer, so you could only connect 7 hard drives. Except you could do fancy tricks, like for instance you could connect 6 hard drives and 2 computers, and the computers could both access the hard drives at the same time. And, you could connect new devices at runtime, without having to shut everything down. If you needed to disconnect something, you'd need to make sure it was fully "released" by the software first, but that was also possible. Well, SCSI eventually fell by the wayside, because its cables, like IDE cables, are these huge fat wide ribbon cables with, in the case of SCSI, 68 separate tiny wires in them, twisted into pairs. It poses some limitations, and as electronics really properly matured, it became possible to blast data down a single wire really really fast. You couldn't send 16 bits in one go like with a SCSI cable, but you could send 16 bits one at a time in less time overall! This led to a new standard called SAS -- Serially-Attached SCSI. With SAS, the devices were all still SCSI devices, but the cables were much thinner. Like SCSI, SAS was made for higher-end situations, and full SAS support makes a device more complex and more expensive. But, us little guys struggling with IDE got an upgrade as well. A subset of the SAS standard called SATA -- "Serial ATA" (ATA itself is a very boring acronym :-) ) -- was created. It doesn't do all of the fancy tricks SAS can do, but it does do some of them. One of those things is that, like with the full SCSI standard, devices can be attached/detached at runtime. It's built into the standard. To support SATA, a system must give at least token support to it. Today's computers are pretty good at this. :-) The second thing that made this possible is a thing that is very commonly done to decrease the chance of losing data due to a hard drive failure. That thing is ... store the data twice, on two hard drives! This sounds almost banal and silly, but is literally a thing. It's a concept that has been given the fancy name "RAID" -- Redundant Array of Inexpensive Disks. RAID has different "levels" to it, and RAID level 1 is just straight-up mirroring. I use that pretty much everywhere, and it has saved my bacon multiple times. Back in the day, if you wanted to use RAID, you needed a specialized add-on circuit board between the computer and the drives. You _could_ just plug two hard drives in and then use the operating system to interact with multiple drives, but it was horrendously slow doing it that way. With IDE, each cable could connect two devices, and you'd get two cables, one per "channel". Each channel could only talk to one of the drives at a time. In addition, cheaply-made consumer controllers could only operate one channel at a time. So if you had two hard drives connected to the system and you wanted to mirror the data, so that when you hit Save, that file gets written twice, once to each drive, it now takes twice as long to save the file. These days, though, the SATA cables that connect the hard drives can transfer data considerably faster than any hard drive can actually process it, and even cheap controllers can send data down more than one cable at the same time anyway. The bottleneck is gone. So, you don't need fancy hardware, you just need the right software, and all modern operating systems come with that software. You just tell the operating system, "Hey, you see those two disks? Please pretend they're one disk, and whenever I save a file, just do exactly the same write on both of them at the same time." (There are fancier RAID schemes as well, but this one is the simplest and easiest to maintain.) So, back to my server. My server is actually two servers, because there's the server that's running on the hardware, and then there's the server running inside a program running on the first server (which is where the actual important work is taking place :-) ). The "outer" computer, which people have decided should be called a "hypervisor" because it manages virtual computers running inside of it, is running Slackware Linux. It is using a feature of the Linux operating system called "md" to mirror its (real, physical) hard drives. Anything that is written to disk is written to two disks. If/when a disk fails, it can be replaced without losing anything, and in some cases without even shutting the server off! The process of replacing the disks with larger disks is actually modelled as pretending that a hard disk has failed and needs to be replaced. Following instructions I found online, I ran a health check just to make sure, and then I told md that one of the drives had failed. (It hadn't really.) MD then took that disk out of rotation, and with another command, I was able to tell Linux to actually fully detach from the disk, so that the operating system no longer considered it to even exist any more. At that point, then, without shutting the computer off, I was able to unplug the first of the two disks from the computer, which continued running just fine on the second disk (just without redundancy at this point). As soon as I plugged the second disk in, the system immediately detected that it was connected (because connecting new devices at runtime is a core feature of SATA), and then I was able to tell md, "Hey, here's your replacement disk". MD brought it into the array and proceeded to copy the entirety of the remaining original drive onto it. This is another area where computers have advanced so insanely far. Suppose I had had to copy that old 486 computer's 100 MB hard drive to a replacement disk. That copy process would have taken about 15 minutes to complete. As computers progressed into the gigabytes of storage, the time required for shuffling data around increased correspondingly. A few years later, if you set up a computer with a 1 GB IDE drive in 1995, well, the IDE standard was still the same standard, it wasn't any faster, and duplicating that could now take something like 3 hours to complete! Fast-forward to today, replacing 120 GB hard drives with 512 GB hard drives. When I told md that it had a second disk again, and it needed to duplicate 120 GB of data from the remaining original disk onto the new disk, that process took ... a bit under 10 minutes. :-D So to summarize, with off the shelf consumer grade components and a completely free, community-made operating system, today's computer technology allowed me to: - Trivially use RAID-1 mirroring without any hardware support needed - Hotswap the hard disks (that is, plug/unplug them while the computer is powered on) - Remove and add RAID member devices at runtime (without having to shut down the operating system) - Regenerate a 120 GB RAID-1 array in 10 minutes And the virtual server, the one doing the actual work? It never even knew anything took place! It just kept chugging along the entire time blissfully unaware that the foundations of its existence were being shuffled about.

Friday, 5 July 2024

DQD.RealTimeBackup - Update 29

Firstly, a significant surprise and upset! My Microsoft account finally transitioned to my personal e-mail address, so I went to publish DeltaQ.CommandLineParser and found that somebody else had registered DeltaQ as a private prefix. I have been calling my personal things "Delta Q Development" for literally decades now, but hadn't published anything on NuGet.org. It was a huge surprise to discover that somebody else, for completely different reasons, had decided on exactly the same name for their thing!

So, DeltaQ.RTB is now DQD.RealTimeBackup, and DeltaQ.CommandLineParser is now DQD.CommandLineParser, and is published on NuGet.org for the world to enjoy.

Incidentally, version 1.0.2 of DQD.CommandLineParser has a fancy new feature: Dynamic command-line completion, if you're using a supported shell (PowerShell or Bash)! Worth checking out. https://github.com/logiclrd/DQD.CommandLineParser/.

Back to DQD.RealTimeBackup. After a few weeks of running and debugging, the backup engine is now in a much stronger state. The tracking of ZFS snapshots is now pretty solid. They no longer leak over time. A few small but important bugs in file operations have been fixed. The Backup Agent runs stably in the background.

User interface has also been given a bit of polish. As described in update 27, there is a status window written using Avalonia UI. Avalonia UI actually has a bug fix that is just on the verge of being published that results in the process crashing if the user interacts with the Tray Icon. The fix for this has already been merged into the main line and should be in whichever version is published after 11.1.0.

Some polishing has also been done to ensure that the UI looks good in both Light and Dark themes, and the UI window now displays notifications about important events related to the backup.

Thursday, 6 June 2024

DeltaQ.RTB - Update 28

I am encountering difficulties with Avalonia UI, the UI library I've been coding against. When an action is triggered from a menu attached to a tray icon, the application crashes due to an exception inside one of Avalonia's own threads. I have reported this, along with as much information as I could gather, to the Avalonia development team via GitHub. In the meantime, we wait :-)

In other news, some improvement has been made to the semantics of merging and uploading Remote File State Cache batches. They had a tendency to get backed up, because to keep things up-to-date, a short window of time is used which produces many small batches that need to be consolidated. The code now consolidates up to 5 batches at a time, and the queue action thread isn't woken up until everything is queued, which means that the previously-added logic to ignore uploads that are going to be deleted anyway can avoid a lot of time-consuming redundant file transfer.

I have also made the unhappy discovery that notification toasts in Gnome currently always have all newlines stripped. It isn't possible to inject even the most rudimentary of structure into a notification. The best currently possible is to make certain words bold or italic. I have begun inquiries with the Gnome design & development teams to see if this can be fixed in a future version.

Line count: 21,337

Monday, 3 June 2024

DeltaQ.RTB - Update 27

It has a face! I have begun the process of making the UI I'd like to see. Ultimately, I need to figure out asynchronous notification of important errors, and right now the application doesn't shut down properly when you tell it to exit. But, it's coming along.

Friday, 31 May 2024

DeltaQ.RTB - Update 26

I am of the opinion that this service is of limited usefulness if it has no way to bring issues to the user's (i.e., to my) attention. The inspiration for this project, Backblaze for Windows, displays a Notification Area icon ("system tray") that communicates continously with the underlying backup process. DeltaQ.RTB needs that too.

This is a complicated thing to implement.

  • I have no experience at all with GUI development on Linux. A helpful commenter on Stack Overflow has pointed me in the direction of Avalonia UI, which looks like it brings some of the essence of WPF to a cross-platform modality, and, crucially, has a TrayIcon class.
  • A robust mechanism for communication is needed. I also want it to be lightweight. To this end, I have implemented my own scheme from scratch.

Today's work is by far the largest chunk of new code in one sitting since the start of the project!

I have added:

  • A class called `ByteBuffer` that underpins all of the communication between processes. The code likes nicely structured objects, and all of the underlying communications mechanisms operate on one-dimensional arrays of bytes. Converting between the two is called "serialization" (and "deserialization"). `ByteBuffer` needs to be efficient and robust, and to that end, I created a very comprehensive set of unit tests for it to drive debugging and build confidenc.
  • An entire infrastructural module to the main DeltaQ.RTB code called `Bridge` which has a server type, a client type, a bunch of message types, and a corresponding bunch of message processor types.
  • A command-line tool called DeltaQ.RTB.Console to allow the IPC to be put into action, tested and debugged.

This lays the groundwork for making a graphical interface that can provide the aforementioned "tray icon". There may be changes to the protocol before it stabilizes, because I need to figure out how to thread asynchronous notifications from the server into the request/response model currently implemented.

Today's work added a whopping 3,756 lines of code to the project!

Line count (including a stub Avalonia UI project): 16,811

Thursday, 30 May 2024

DeltaQ.RTB - Update 25

Filtering – better path filtering was needed. I had implemented two schemes independently: exclude paths starting with a given prefix, and exclude paths containing a given component. Each was a separate list of strings in the configuration, and two separate bits of code handled them. It wasn't flexible enough, though. I wanted the ability to say, "Ignore paths that start with this subpath inside any home directory", or, more generally, "Ignore paths that match this regular expression."

So, I did it :-)

First, I wrote some code to enumerate users on the system. On POSIX systems, this consists of parsing a text file called /etc/passwd. I wanted to be able to unit test this, though, and a unit test that includes a read of the external file /etc/passwd isn't a unit test, it's (to some degree at least) an integration test. So, I abstracted accessing the file behind an interface that tests can mock out.

Then, I created a new concept of a PathFilter. At its core, the PathFilter uses a regular expression, but it can build those for you in several modes. You can create a Prefix-type PathFilter, which creates a regular expression that does the same filtering as the old ExcludePaths collection. You can create a Component-type PathFilter, which creates a regular expression that does the same filtering as the old ExcludePathsWithComponent collection. Or, you can create a Regex-type PathFilter and just write your own expression. This last one has an extension to the base regular expression syntax: if the string starts with ~, then it automatically enumerates all real users on the system and replaces the ~ with an expression that matches any home directory. On my system, ~ gets turned into ^(/root|/home/logiclrd)/.

Then to polish up that initial implementation, because freshly-written code is always full of bugs. I created a suite of unit tests that cover the intended functionality through all code paths, and used that to drive debugging until all the tests passed. Two minor complications:

  • With DeltaQ.RTB running out of the code directory, sometimes (though for some reason not always) I can't build because the build output files are in use by the running process.
  • For some reason, the C# Dev Kit extension for Visual Studio Code has stopped working on my system. I've tried reinstalling it, to no avail thus far. So, I don't get any of the fancy helper introspection right now. Good thing I have a mind like a steel thingamajig for all the framework details. :-)

With these changes, DeltaQ.RTB is running right now in realtime mode right now, ignoring Firefox's incessant modification of cache files and Visual Studio Code's state files and the GPUCache and DawnCache of, I believe, all the Electron-based apps running on the system, in addition to the baseline of path prefixes I previously determined.

Line count: 13,055

DeltaQ.RTB - Update 24

Exciting! Initial backup complete. The associated B2 bucket now has about 170 GB uploaded in about 346,000 files. I expect the file count to drop, because I added additional path exceptions after observing it uploading things I don't really need backed up – but the size probably won't significantly drop, because the new exceptions contain mostly tiny files.

The code is really maturing. It's developing all those tiny wrinkles that only come from actually running it and discovering the tricky and unexpected combinations of things that can happen.

  • The filesystem enumerator now artifically puts /home mounts first, because you want your home directory to be one of the first things backed up.
  • Added a work-around for a bug in a bunch of B2 APIs. Basically, they converted parts of their API to use HTTP GET requests instead of HTTP POST to be more idiomatic (you're "getting" data, so it should be HTTP GET – which isn't wrong, per se), and in the process they accidentally introduced additional filename format limitations into a bunch of their endpoints. So now you can e.g. upload a file whose filename contains a comma, a square bracket or an ampersand, but when you try to download that file, the download endpoint rejects the filename. Sigh. 😛
  • When you enumerate ZFS mounts, at least the way it's configured on recent Ubuntu versions, you find volumes like rpool/USERDATA/logiclrd_ltzio4 for /home/logiclrd and bpool/BOOT/ubuntu_znaqup for /boot. But, the way I'm gathering the mount info somehow also sees rpool and bpool devices mounted to the same mount points. This causes problems, because you can create a snapshot on these and it isn't a snapshot on the actual volume. So, the code now ignores ZFS devices that don't contain slashes. Seems right. 🙂
  • There was a bug in the code that tested whether the Backup Agent was busy, to determine whether an Initial Backup operation has completed. But, I only just now completed an Initial Backup operation for the bug to manifest. The bug was that I wrote "queue size >= 0" instead of "queue size > 0" – basically, "it's busy if the queue size is greater than 0 or if the queue size is 0", when a queue size of 0 means it's not busy!
  • Fixed a performance issue that arises with this sequence of batch uploads in a thing called the Remote File State Cache. The state files start to get quite large after a while, such that the background thread that uploads them can't keep up with their creation during Initial Backup. The result is a logged series of actions that uploads file #1, then deletes file #1 and uploads file #2, then deletes file #2 and uploads file #3. This is a simplification, but the essence is there: files get uploaded, and then pretty much immediately get replaced by a newer file. This presented an opportunity for optimization: If the file is going to be deleted anyway, don't bother uploading it!
  • I've just added a proper error logging infrastructure, intended to be only used for logging important exceptions that might have some bearing on consistency or completeness.
  • If the filesystem monitor detects a Move event, and the "from" path isn't being tracked, that's actually not the end of the world. Previously, it would just bomb out of the Process Backup Queue thread (and probably take the whole process down in the process), but now it just says, "Okay, well, let's just queue the "to" path as though it is a newly-created file." 🙂
  • Watching it run in filesystem monitoring mode, there is pretty much a continuous stream of activity from Firefox in its .mozilla and .cache folders. This is categorically content that does not need to be backed up, so I've added a filter mechanism to ignore paths that have certain components in them.

My laptop running Ubuntu 24.04 now has real-time cloud backup!

Line count: 12,535