Friday, 31 May 2024

DeltaQ.RTB - Update 26

I am of the opinion that this service is of limited usefulness if it has no way to bring issues to the user's (i.e., to my) attention. The inspiration for this project, Backblaze for Windows, displays a Notification Area icon ("system tray") that communicates continously with the underlying backup process. DeltaQ.RTB needs that too.

This is a complicated thing to implement.

  • I have no experience at all with GUI development on Linux. A helpful commenter on Stack Overflow has pointed me in the direction of Avalonia UI, which looks like it brings some of the essence of WPF to a cross-platform modality, and, crucially, has a TrayIcon class.
  • A robust mechanism for communication is needed. I also want it to be lightweight. To this end, I have implemented my own scheme from scratch.

Today's work is by far the largest chunk of new code in one sitting since the start of the project!

I have added:

  • A class called `ByteBuffer` that underpins all of the communication between processes. The code likes nicely structured objects, and all of the underlying communications mechanisms operate on one-dimensional arrays of bytes. Converting between the two is called "serialization" (and "deserialization"). `ByteBuffer` needs to be efficient and robust, and to that end, I created a very comprehensive set of unit tests for it to drive debugging and build confidenc.
  • An entire infrastructural module to the main DeltaQ.RTB code called `Bridge` which has a server type, a client type, a bunch of message types, and a corresponding bunch of message processor types.
  • A command-line tool called DeltaQ.RTB.Console to allow the IPC to be put into action, tested and debugged.

This lays the groundwork for making a graphical interface that can provide the aforementioned "tray icon". There may be changes to the protocol before it stabilizes, because I need to figure out how to thread asynchronous notifications from the server into the request/response model currently implemented.

Today's work added a whopping 3,756 lines of code to the project!

Line count (including a stub Avalonia UI project): 16,811

Thursday, 30 May 2024

DeltaQ.RTB - Update 25

Filtering – better path filtering was needed. I had implemented two schemes independently: exclude paths starting with a given prefix, and exclude paths containing a given component. Each was a separate list of strings in the configuration, and two separate bits of code handled them. It wasn't flexible enough, though. I wanted the ability to say, "Ignore paths that start with this subpath inside any home directory", or, more generally, "Ignore paths that match this regular expression."

So, I did it :-)

First, I wrote some code to enumerate users on the system. On POSIX systems, this consists of parsing a text file called /etc/passwd. I wanted to be able to unit test this, though, and a unit test that includes a read of the external file /etc/passwd isn't a unit test, it's (to some degree at least) an integration test. So, I abstracted accessing the file behind an interface that tests can mock out.

Then, I created a new concept of a PathFilter. At its core, the PathFilter uses a regular expression, but it can build those for you in several modes. You can create a Prefix-type PathFilter, which creates a regular expression that does the same filtering as the old ExcludePaths collection. You can create a Component-type PathFilter, which creates a regular expression that does the same filtering as the old ExcludePathsWithComponent collection. Or, you can create a Regex-type PathFilter and just write your own expression. This last one has an extension to the base regular expression syntax: if the string starts with ~, then it automatically enumerates all real users on the system and replaces the ~ with an expression that matches any home directory. On my system, ~ gets turned into ^(/root|/home/logiclrd)/.

Then to polish up that initial implementation, because freshly-written code is always full of bugs. I created a suite of unit tests that cover the intended functionality through all code paths, and used that to drive debugging until all the tests passed. Two minor complications:

  • With DeltaQ.RTB running out of the code directory, sometimes (though for some reason not always) I can't build because the build output files are in use by the running process.
  • For some reason, the C# Dev Kit extension for Visual Studio Code has stopped working on my system. I've tried reinstalling it, to no avail thus far. So, I don't get any of the fancy helper introspection right now. Good thing I have a mind like a steel thingamajig for all the framework details. :-)

With these changes, DeltaQ.RTB is running right now in realtime mode right now, ignoring Firefox's incessant modification of cache files and Visual Studio Code's state files and the GPUCache and DawnCache of, I believe, all the Electron-based apps running on the system, in addition to the baseline of path prefixes I previously determined.

Line count: 13,055

DeltaQ.RTB - Update 24

Exciting! Initial backup complete. The associated B2 bucket now has about 170 GB uploaded in about 346,000 files. I expect the file count to drop, because I added additional path exceptions after observing it uploading things I don't really need backed up – but the size probably won't significantly drop, because the new exceptions contain mostly tiny files.

The code is really maturing. It's developing all those tiny wrinkles that only come from actually running it and discovering the tricky and unexpected combinations of things that can happen.

  • The filesystem enumerator now artifically puts /home mounts first, because you want your home directory to be one of the first things backed up.
  • Added a work-around for a bug in a bunch of B2 APIs. Basically, they converted parts of their API to use HTTP GET requests instead of HTTP POST to be more idiomatic (you're "getting" data, so it should be HTTP GET – which isn't wrong, per se), and in the process they accidentally introduced additional filename format limitations into a bunch of their endpoints. So now you can e.g. upload a file whose filename contains a comma, a square bracket or an ampersand, but when you try to download that file, the download endpoint rejects the filename. Sigh. 😛
  • When you enumerate ZFS mounts, at least the way it's configured on recent Ubuntu versions, you find volumes like rpool/USERDATA/logiclrd_ltzio4 for /home/logiclrd and bpool/BOOT/ubuntu_znaqup for /boot. But, the way I'm gathering the mount info somehow also sees rpool and bpool devices mounted to the same mount points. This causes problems, because you can create a snapshot on these and it isn't a snapshot on the actual volume. So, the code now ignores ZFS devices that don't contain slashes. Seems right. 🙂
  • There was a bug in the code that tested whether the Backup Agent was busy, to determine whether an Initial Backup operation has completed. But, I only just now completed an Initial Backup operation for the bug to manifest. The bug was that I wrote "queue size >= 0" instead of "queue size > 0" – basically, "it's busy if the queue size is greater than 0 or if the queue size is 0", when a queue size of 0 means it's not busy!
  • Fixed a performance issue that arises with this sequence of batch uploads in a thing called the Remote File State Cache. The state files start to get quite large after a while, such that the background thread that uploads them can't keep up with their creation during Initial Backup. The result is a logged series of actions that uploads file #1, then deletes file #1 and uploads file #2, then deletes file #2 and uploads file #3. This is a simplification, but the essence is there: files get uploaded, and then pretty much immediately get replaced by a newer file. This presented an opportunity for optimization: If the file is going to be deleted anyway, don't bother uploading it!
  • I've just added a proper error logging infrastructure, intended to be only used for logging important exceptions that might have some bearing on consistency or completeness.
  • If the filesystem monitor detects a Move event, and the "from" path isn't being tracked, that's actually not the end of the world. Previously, it would just bomb out of the Process Backup Queue thread (and probably take the whole process down in the process), but now it just says, "Okay, well, let's just queue the "to" path as though it is a newly-created file." 🙂
  • Watching it run in filesystem monitoring mode, there is pretty much a continuous stream of activity from Firefox in its .mozilla and .cache folders. This is categorically content that does not need to be backed up, so I've added a filter mechanism to ignore paths that have certain components in them.

My laptop running Ubuntu 24.04 now has real-time cloud backup!

Line count: 12,535

Wednesday, 29 May 2024

DeltaQ.RTB - Update 23

Huh, weird. So, I hadn't actually read B2's documentation on what constitute valid filenames. It's pretty straightforward, though:

You set a file name when you upload a file. After a file is uploaded, you cannot change this name. Names should be a UTF-8 string up to 1024 bytes with the following exceptions: Character codes below 32 are not allowed. DEL characters (127) are not allowed. Backslashes are not allowed. File names cannot start with /, end with /, or contain //.

Thing is, without having read this, I just blithely started uploading files whose names are "rooted" paths -- i.e., that start with '/'. And it worked.

But now I'm encountering two files that seem to be being rejected because of their filenames. One of them contains a comma, the other one contains square brackets. Per the documentation, these should be perfectly valid filenames. What gives? 🤔

Tuesday, 28 May 2024

DeltaQ.RTB - Update 22

So, there are still some unknowns at this point. I haven't gotten past the "initial backup" stage yet, so I don't know what I'll encounter when it hits the real-time monitoring stage. I don't know that my periodic rescan implementation is correct. But, these things are mostly tested and observed working.

But there is one big unknown that is a problem right now. The Backup Agent takes ZFS snapshots as part of inspecting files. It's supposed to release them when it's done.

It doesn't seem to be releasing them.

I'm not sure why. This is a bug and this is a problem. I'm going to have to figure out what's going on with the ZFS mounts and the reference counting, and it's a hard thing to debug. I can't even run the code in a debugger right now, and even if I could, there are a lot of moving pieces in different places.

My system currently has 292 ZFS snapshots. They're quick and easy to clean up, except that the backup agent is running right now, and a few of those snapshots are actually in use. Don't want to pull the rug out from under it. :-)

UPDATE 5:46 PM:

I've added ZFS-specific debugging, but in the course of adding it, a thought occurred to me: What if it's not buggy at all, but it simply hasn't gotten to the point of releasing any of them because the initial backup queue is so large? I think this may be the case. Time to just let it run and see what happens, I guess.

After adding the debugging, I fired it back up, and it promptly created a snapshot that won't be released until 162,809 files are processed! Well, down to 162,712 now.

Meanwhile a second snapshot is tracking a measly 100 files, and a third snapshot is already tracking 145,168 files.

UPDATE 5:49 PM:

It's chewing through the files. Down to 152,704 now on [1].

UPDATE 5:50 PM:

Hmm, there may be a problem yet. It was processing files quickly because the remote file state cache said they were already uploaded. It has now hit files that were not already uploaded. Uploads are completing, but the reference count isn't dropping...

UPDATE 5:53 PM:

Ohh, these are all small files. Which means that they have their snapshot reference released before they enter the upload queue (they get staged to /tmp). Everything in the upload queue is already released. Until the queue hits the low water mark, it won't be pumping any more entries from the backup queue into it, which means it won't be releasing any more snapshot references. So, working as designed? 🙂 Fingers crossed.

UPDATE 5:56 PM:

Upload queue has to go from 10,000 entries to 5,000 before it'll put more into it. It's at about 9,200 right now. Just gotta wait and see.

UPDATE 5:59 PM:

Hmm, perhaps trouble after all. The reference count is 152,704, but the backup queue only has 71,419 entries in it. That's less than 152,704. Even after they're all processed, the reference count isn't going to get to zero...

UPDATE 6:45 PM:

Welp, it woke up, dropped another 5,000 files into the upload queue, released about as many snapshot references, but it still has more references than there are files in the queue. Hmm...

UPDATE 7:06 PM:

I think I found the hole. 🙂 There's a bit of code that checks, "Hey, is this exact file already uploaded (according to the cache)? If so, we don't have to do anything." It was interpreting "don't have to do anything" a little bit liberally – still has to release the snapshot reference! Let's see if it behaves a bit more as expected with that fixed. 🙂

UPDATE 7:12 PM:

Yey! From the log:

[1] Releasing reference for path: /code/D2.diff
[1] Reference count is now 1
[1] Releasing reference for path: /snap/README
[1] Reference count is now 0
[1] => Disposing of the snapshot
Running: zfs destroy rpool/ROOT/ubuntu_znaqup@RTB-638525379026955749
No longer tracking snapshot, now tracking 8 snapshots

UPDATE 8:08 PM

Huh, hit long polling for the first time and revealed a problem there. And, the main entrypoint code wasn't setting and checking all of the things it should have to ensure a proper shutdown on SIGINT or SIGTERM.

DeltaQ.RTB - Update 21

What good is a backup system that doesn't offer any way to restore files? 🙂 I created a command-line utility for listing & restoring backed-up files:

As demonstrated here, I also extended DeltaQ.CommandLineParser to support automatic usage display, built a new version of the associated package and updated DeltaQ.RTB to use it and implement support. 🙂

Sunday, 26 May 2024

DeltaQ.RTB - Update 21

During the initial backup operation, the program uses a VT510 code called DECSTBM to set off a region of he screen sace for persistent usage – stats and upload progress... Annoyingly, it seems that occasionally, the Gnome Terminal doesn't enforce this properly, and text written while the margins are in effect blows past the margins and makes a mess. I don't know if this is actually my bug, but I don't think my code is accidentally doing output while the margins are released... I've reviewed it, and as far as I can tell, the mutex logic is correct, such that during the brief intervals where it release the margins to update the stats, any output that is received simply has to wait until the margins are re-established. So I'm leaning toward a bug in the Gnome Terminal. Honestly, this is probably a very rarely-used feature, and it's entirely plausible that it's buggy.

DeltaQ.RTB - Update 20

There was a fundamental misunderstanding about how the B2 API handles file deletions (and file downloads, for that matter). In some cases, there are API endpoints that just randomly take a bucket name instead of a bucket id for no obvious reason. In the case of file deletions, if all you already have is the file's path, you need to look up the file id in order to submit a deletion request -- I assume this is to make sure you don't send deletions accidentally, a safeguard as it were. Because of this, the remote file state cache was failing to transfer and built up a considerable backlog. There is now (configurable) detailed debug logging from the remote file state cache, and based on that I was able to fix the broken B2 API operations. Between that and some other polishing, we're up to 10,600 lines of code now -- and the remote file state cache is now functioning as designed. The design of the delivery mechanism was such that it was robust to its own failure and no data was lost due to the bug -- it was just delayed, but that is now being rectified. 🙂

Stats:

  • Current line count: 10,600
  • Tests: 123

DeltaQ.RTB - Update 19

I was getting wildly inaccurate transfer speed values reported by the Backblaze B2 library I am using. I dug into it, and I suspect that, on Linux at least, the Stopwatch class (provided by .NET) is under-reporting short durations. The library is using Stopwatch to capture the time each write operation takes, so this makes it think the operation takes less time than it actually does and is thus faster than it is. But it's coming through way faster. I just watched it take about 2 minutes to upload a 28 MB file (not terribly impressive), and the entire time it was telling me the transfer rate was over 10 MB/sec!

So, I went upstream to that library's code, which is on GitHub, and reworked it to use a rolling average instead of computing the speed individually for each write operation. In the process, the timestamping is reworked so that it is not possible for it to be under-reported. I have submitted my changes for consideration by the package's authors: https://github.com/microcompiler/backblaze/pull/28

I don't have a way to see it in action just yet, because I've got an upload operation running that looks like it might actually run through to completion this time, and it's already uploaded 9 GB, and the only way to switch to the new Backblaze library would be to restart it. I don't want to do that!

DeltaQ.RTB - Update 18

Periodic rescan is now a thing, and giant files are now uploaded in chunks as required by the Backblaze B2 API. The status screen during the "initial backup" now shows the state of uploads as well. 🙂

Saturday, 25 May 2024

DeltaQ.RTB - Update 17

Added some more status detail 🙂 Activity throttle is in place.

10,004 lines of code right now!

Friday, 24 May 2024

DeltaQ.RTB - Update 16

Well, here's a problem 🙂 Need to figure out a way to throttle the activity.

DeltaQ.RTB - Update 15

Current line count: 8,815

There are no more classes to write, no more tests needed, no TODOs at this point. It's now at the stage of "run it, see what screws up, fix it, repeat". 😛

This is the stage at which code becomes robust.

Changes since the last update:

  • The B2 integration is now resilient to B2 randomly saying "no tomes available". This is apparently a thing. 😛
  • The Backup Agent should now be resilient to files being deleted while it's halfway through processing them.
  • Symlinks are excluded.
  • One minor and two major performance issues resolved.
  • Improved the polish on ongoing status updates.
  • Initial backup operations can now be cancelled properly.
  • The enumeration of ZFS mounts returns some red herrings, directories that are listed as ZFS mounts but don't have a ".zfs" metadata directory. These are now skipped; without a ".zfs" folder, it's not possible to read snapshot information, which is core to the algorithm.

It's fun to watch it operating, seeing the numbers fly by. With any luck, it'll become stable enough that it can do so to completion, instead of running into one thing or another and crashing after a minute or two. 😛

DeltaQ.RTB - Update 14

Coding late into the night 🙂

  • Current line count: 8,629
  • Tests: 120
  • TODOs: 0

Polishing things up. Starting to actually run the code, see it in action, fix the multitude of little logic errors that are invariably there after writing complex code and not having tested it yet. Fixed a bug upstream in the Backblaze B2 library I'm using – apparently, Backblaze added some new Capability codes and this library's code doesn't know about them yet (I guess the author isn't actively using it!).

So, it feels like I'm getting close to something that'll perform a useful job for me. 🙂

Thursday, 23 May 2024

DeltaQ.RTB - Update 13

Another little burst 🙂

  • Current line count: 8,292
  • Tests: 120
  • TODOs: 1

Fixed some bugs in the initial implementation of InitialBackupOrchestrator.cs. Fixed the code to use ZFS wrappers that are properly attached to the associated volumes when creating snapshots. Implemented the fallback file staging code. Fixed IoC registration; all types should be single instance.

DeltaQ.RTB - Update 12

Phew, busy afternoon!

  • Current line count: 8,060
  • Tests: 120
  • TODOs: 3

Did an initial implementation of initial backup and also command-line options to notify the backup agent of specific paths to process. Eventually, I think I need to move this to a dbus or similar communication system so that there can be a singleton instance and operations can be passed to it by invoking the command-line.

DeltaQ.RTB - Update 11

Stats:

  • Current line count: 7,013
  • Tests: 117
  • TODOs: 3

Split the concept of surface area out of the filesystem monitor class, so that it can be employed in launch modes that don't invoke the filesystem monitor at all. Updated and added unit tests accordingly.

Added a TODO for figuring out modes of operation for initial backup & specified file sets.

Wednesday, 22 May 2024

DeltaQ.RTB - Update 10

Stats:

  • Current line count: 6,694
  • Tests: 110
  • TODOs: 4

Implemented the long polling strategy. Polished up Start/Stop semantics in the Backup Agent. Wired up all the event types. Did some research to get a reasonably efficient way to compare two files. Better than baseline, anyway.

Tuesday, 21 May 2024

DeltaQ.RTB - Update 9

Stats:

  • Current line count: 6,045
  • Tests: 89
  • TODOs: 8

Significant polishing of the fanotify integration and handling of mount points. Fixed a bunch of issues there, and got some good advice from the linux-fsdevel mailing list that helped get the code into a state that consistently returns paths for all of the desired event types.

Created a tool DeltaQ.RTB.FileActivityTrace, whose existence is really to facilitate testing & development.

Wednesday, 15 May 2024

DeltaQ.RTB - Update 8

Stats:

  • Current line count: 4,970
  • TODOs: 9

Polished the entrypoint up a bit. It now handles SIGINT and SIGTERM properly, and the command-line can specify a verbosity level, whether more or less verbose than normal.

The TODO that was completed was reading configuration from a file on startup.

Tuesday, 14 May 2024

DeltaQ.RTB - Update 7

Stats:

  • Current line count: 4,844
  • Passing tests: 58, usually
  • Unreliable tests: 1
  • Tested classes: 14
  • TODOs: 10

No more stubbed test classes left 🙂 Tests are now caught up with the initial coding.

DeltaQ.RTB - Update 6

Quick update before bed 🙂

  • Current line count: 4,573
  • Passing tests: 48, usually
  • Unreliable tests: 1
  • Tested classes: 12
  • Untested classes: 2
  • TODOs: 8

Monday, 13 May 2024

DeltaQ.RTB - Update 5

Phew, it's been a busy day! So, today, I made a new NuGet package. Except, I can't actually upload it to NuGet for a month because I neglected to updated my e-mail address from the old company address. Microsoft does let you change your e-mail address from one that doesn't exist any more, but they make you wait 30 days to make absolutely sure that it isn't a case of attempted hijacking.

So, what does this NuGet package do? It parses command-lines. It's based on one I made years ago as a personal project and then later made use of in code I wrote for iQmetrix. In the work I did today, though, I reworked/rewrote it significantly based on my experience using it in its initial design.

The command-line parser project, which is fully configured for deployment to NuGet once the time comes around, and which is fully unit tested, is some 1,997 lines of code. Writing the tests to cover the implementation and hammer out bugs in it was enough effort that I decided to get myself sorted out with Visual Studio Code, as it has considerably better source introspection than VIM 😛

Then I updated the project I've been tracking here to use this command-line parser (I had a particular switch in mind when I started this), and reworked its entrypoint method. It was hosting an implementation class for testing purposes, but the next stage of testing, when I get there, is going to be of the service I'm implementing itself.

GitHub: DeltaQ.CommandLineParser

Stats:

  • Current line count: 4,046
    • Plus 1,997 lines in a subsidiary project. (I won't report this every time.)
  • Passing tests: 42, usually
  • Unreliable tests: 1
  • Tested classes: 11
  • Untested classes: 3
  • TODOs: 8

Sunday, 12 May 2024

DeltaQ.RTB - Update 4

Stats:

  • Current line count: 3,983
  • Passing tests: 42, usually
  • Unreliable tests: 1
  • Tested classes: 11
  • Untested classes: 3
  • TODOs: 5

This is about the one-week mark 🙂 Coming along! Well, I said that on the second day too. So now it's coming-er along-er. 🙂

DeltaQ.RTB - Update 3

Stats:

  • Current line count: 3,762
  • Passing tests: 39, usually
  • Unreliable tests: 1
  • Tested classes: 9
  • Untested classes: 5
  • TODOs: 5

Now uses dependency inversion (using Autofac).

Saturday, 11 May 2024

DeltaQ.RTB - Update 2

Just some stats :-)

  • Current line count: 3,203
  • Passing tests: 25, usually
  • Unreliable tests: 1
  • Tested classes: 7
  • Untested classes: 7
  • TODOs: 5

Thursday, 9 May 2024

DeltaQ.RTB - Update 1

Project maturing 🙂

  • I disabled implicit usings. I think they are silly, personally.
  • I renamed the project file and put it into a subdirectory.
  • I put a solution file in the parent directory.
  • I created a subdirectory for a ".Tests" project and put stubs in it for all of the classes needing testing.
  • I turned some of the low-hanging fruit stubs into actual test classes, and promptly found and fixed a bug.

Current line count: 2,108

Monday, 6 May 2024

DeltaQ.RTB

I started writing a program yesterday. I've now written about 870 lines of code. It's coming along. :-)

So far:

  • It has code to access Linux's fanotify kernel facility, including a graceful shutdown mechanism that sends the polling thread SIGINT to interrupt its blocking read operation.
  • It contains a (not yet well-tested) wrapper of the lsof utility, using its -F option to get machine-parseable output.
  • It contains code to enumerate mount points using the setmntent/getmntent/endmntent API.
  • It has the start of a wrapper of the ZFS utility, including a basic structure for scoped objects representing snapshots (so I can use 'using' on them and when they go out of scope, the snapshot is automatically removed).
  • It has a STRATEGY document writing up (in English) the long-term strategy I'm working toward. The STRATEGY document isn't counted in the line count.

It builds without warnings with dotnet build and runs with dotnet run, successfully getting notifications of file accesses.

The source code is in GitHub.

Programming is good. Programming is life. :-)