Sunday, 5 April 2026

QBX: Cross-platform SHELL

It's been a while since I posted a QBX update. What I've been working on is the SHELL statement. In QuickBASIC, if you enter SHELL "command", then it pops down to the underlying command interpreter (where you enter commands when you're not running some other program -- recall that DOS doesn't have a GUI of its own and fundamentally works by typing in commands at a text prompt), runs the command you specified allowing its output to go to the screen buffer, and then returns to your QuickBASIC code. If you run SHELL with no argument, then it runs a copy of COMMAND.COM and gives you a prompt right there inside your running QuickBASIC program. When you type 'exit', it returns back to QuickBASIC and picks up execution with the next statement after SHELL.

I wanted to support this with QBX. Right off the bat, I had a decision to make: Do I try to emulate DOS itself, so that a program that does SHELL "dir" will get the same text as it would on DOS, or do I integrate with the host operating system, so that programs can actually run any program on the host? I chose the latter approach.

.NET has a class Process built in that lets you redirect the standard input, standard output and standard error of a process while it is being launched. But, this redirection is not terribly flexible. It's line-buffered, and it doesn't provide any means to capture formatting like colours or cursor movements. But, C# lets you directly call into the operating system, and using that, I was able to create my own code to run a child process using features that the Process class doesn't encapsulate.

Starting in a 2018 version of Windows 10, Microsoft added a new function to the operating system. This function aims to bring some parity between Windows and other operating systems for console. To understand what this really means, it's important to understand a fundamental difference in how Windows handles consoles compared to other systems. Other systems have a legacy that dates back to when computers were in a different room and you sat at a terminal that consisted of a keywoard and a literal printer with paper running through it. This was called a "teletype", which became abbreviated TTY. With the initial TTYs, it was just a simple unformatted stream of bytes. Some byte values were given special meanings; a "carriage return" would instruct the printer to move its print head to the left, like when you slap the carriage to the left on a typewriter. A "line feed" would instruct the printer to advance the paper by the height of one line of text. These operated independently; you could give a line a bold effect, for instance, by sending a carriage return and then reprinting the same text overtop of itself.

As things became more advanced, computers started to have monitors we'd recognize today instead of printers, and this introduced a new aspect to the codes a terminal might need to recognize. On a printer, it isn't really meaningful to say "put the cursor at this (x, y)" (though you could, to a limited extent, "backfeed" the paper), but this is a core function on a screen. Screens could also do things like make text blink. So, these new "attributes" of the text needed a way to be expressed. The Digital Equipment Corporation, as I understand it, pioneered the use of "escape codes" which used a particular sequence of bytes unlikely to appear in actual text to signal to the terminal that it was receiving a command instead of text to display. These eventually got picked up and standardized by ANSI and have been the official standard for how "teletype" communication should format text (and also handle keyboard input for things that aren't plain text). On UNIX systems, when a program runs, it is given three "streams": one for reading input, one for writing output, and one for writing output that is about errors, that isn't part of the normal program output, and as a rough approximation, generally speaking, the input stream uses ANSI codes to tell you about nonprintable keys, and the program can use ANSI codes in the output streams to format the text or control the terminal.

Windows took a very different approach. Instead of programs having these input and output streams using arcane byte sequences to encode things beyond plain text, they created an array of functions in a system library that code could call to give the operating system instructions related to "consoles". A console in Windows is an abstract representation of one of those terminals that you might hook up to a computer. It's a rectangular array of characters, with attribute information, and it maintains a queue for input. The input isn't encoded using escape sequences, but rather lays out all the information about each keypress in a structured way. Instead of having to recognize ESC [ A and say "Oh, yes, that's the Up Arrow key", a program simply looks for an INPUT_RECORD whose EventType is KEY_EVENT, and then check its attached KeyEvent data to see whether its wVirtualKeyCode is equal to VK_UP. This makes the code far easier to write, to reason about, to get right, and it makes complex console applications much simpler to write. It also is less flexible and makes it quite difficult to transport this data from one place to another such as a remote computer -- something that is trivial with TTYs where everything is byte streams from the ground up, whether local or not.

Over the years, people have sometimes struggled with this dichotomy, because it means that writing a text mode program that works on both Windows and UNIX systems is a bit of a pain in the butt. In 2018, Microsoft finally addressed the problem. They created a new system for Windows called "ConPTY", a translation layer that sits between actual consoles and things interacting with them from the outside and restructures the input and output into byte streams with ANSI sequences. With ConPTY, you can now connect to a Windows console application from a remote computer using a TTY-style byte stream, and whatever that application does to the console using the system API calls will get translated into the ANSI sequences needed for it to show up in your xterm or whatever. With ConPTY, you can also write a console application that doesn't use the system library calls at all, instead updating its UI using the same ANSI sequences that UNIX applications use. It bridges that gap.

So, tying this back around to QBX, in my early experimentation, I looked into how to use ConPTY on Windows. It's not terribly difficult but it has some non-obvious things you have to take into account. One of those things is the key sequence Ctrl+Break. On Windows, this gets handled "out of band" -- it isn't part of the input sequence an application reads, but rather gets sent as a special kind of signal to handlers applications can register. These signals are broadcast to "process groups", so if you want to run a child program that can receive Ctrl+Break without it affecting your program, you start it in a new process group. It turns out, though, that if you use ConPTY, the act of creating the "pseudoconsole" with ConPTY creates its own process and you have no control over how it's created. It will always be attached to the current process group. If you then create your console process in that pseudoconsole in a new process group, then it can't receive Ctrl+Break events. The ConPTY process generates them in a different process group.

(Incidentally, this is actually largely parallel to how break events work on UNIX systems. On UNIX systems, the part of the operating system that handles TTY input to programs recognizes an "interrupt" character (by default Ctrl+C) and turns it into a "signal". The same terminology is even used; processes are in "process groups", and signals are broadcast to all processes in a process group. But I digress.)

Anyway, I did get ConPTY support working on Windows and tied it back into a SHELL command in QBX, so now you could run SHELL to execute a child process and get the output appearing in the QBX window as though it were an actual DOS SHELL. But, it had a few quirks. Not all keys map perfectly across the boundary (apparently there's a type of encoding called KiTTY that can be used that provides better fidelity, but I never figured that out). Not all kinds of updates map cleanly to ANSI codes through ConPTY. So, having written a terminal emulator that could recognize all these VT and ANSI sequences and turn them into QBX-style screen buffer updates, I ended up setting that aside and writing my own mini-ConPTY. Just like ConPTY, QBX creates a second process attached to the same Windows OS-level console, and when output is detected, it scans the characters and turns any changes into updates that it sends back to the main QBX process. It doesn't bother trying to use ANSI sequences, though; it just sends raw snapshots of the characters and attributes. So, it can't miss anything or be unable to encode some change. This works really well.

So, that gets us to having a really solid SHELL in Windows. But QBX runs on Linux and MacOS X as well. How to do the same thing thing on these operating systems?

The first thing to contend with is that Windows is the odd one out with respect to how you even create a child process to begin with. Windows' approach is very straightforward and intuitive: There's a system library function called CreateProcess. You tell it what file to run. It runs it, and returns the information you need to interact with that new process, which is now running alongside your program independently. That's it. Nothing fancy needed. This is what the aforementioned Process class wraps, and it's easy enough to call it yourself using a feature of .NET called "Platform Invoke".

On UNIX systems, though, the way processes are created is really very different. Fundamentally different. It's just a completely different way of thinking about the problem from the ground up. Like with TTYs, it comes from UNIX having an absolutely ancient history with ties back to early systems that worked differently, and support being carried forward continuously. Some of the earliest mainframe systems (pre-UNIX) didn't have the ability to run more than one program at a time. They may have filled entire rooms with their hardware but in abilities they were quite similar to the little hobby systems of the 1980s like the TRS-80. If we jump in not quite at the very beginning, but a little ways along, you find systems that had a concept of files, and of programs being represented by the data in files. You'd issue a command to such a system to run a program by saying "EXEC file" (or something similar), and the system would load that file in and run it. The process of loading and running that file would _completely replace_ the program that let you type "EXEC file" in the first place. This made sense, because the idea of there being multiple programs running at the same time didn't exist at all yet. So, "exec replaces the process with something else" was the paradigm.

As mainframes grew in capabilities, they reached a point where running multiple programs concurrently (by rapidly switching between them and giving little "time slices" to them in turn) was a reality. But, how to do actually run these programs? A lot of the time, when you want multiple programs running at the same time, it's actually the _same_ program, just working with different data in each process. So, they came up with a new operation: FORK. If you call the "fork" function, it does some stuff and then returns and your program keeps executing. But, that stuff that it did before returning means that "fork" gets to return a second time -- in a new process that is a clone of the one you started with. Everything copied, its own independent state, at exactly the same point in the program. A function that you call once and it returns twice.

It was observed that this was actually good enough. You didn't also need a function like Windows has to create a process running a different program, because you still had "exec". So, if program A wants to run program B alongside it, it can just "fork" -- now there are two As running -- and then one of them can "exec B". The exec completely replaces one of the As, and now there's one A running and one B running. Tada!

This fork/exec model is exactly how all programs have run on pretty much every full-featured operating system since, except Windows (and possibly Mac Classic?). When you run Linux today, or Mac OS X, or FreeBSD, or any of the other plaethora of operating systems whose lineage ties back to UNIX, every time a program is run, it's actually a fork/exec happening under the hood. That means that if QBX wants to run child processes, it also has to fork/exec.

We loop back at this point to the Process class that's part of .NET. When you're running .NET code on Linux or OS X, the Process class is in fact using fork/exec to run the programs you're requesting. But, it has the same limitations. It can attach "pipes", as they're called, to the standard input, output and error of the process, and that's it. There's a bunch of management involved with running a child process that can operate as its own shell/independent terminal, and none of that is offered by or possible with the Process class.

So, back to the drawing board again. As with Windows, we can use Platform Invoke to make calls into the system, this time the fork and exec functions (exec has a bunch of variants and technically in this case the appropriate one is "execvp" specifically). In addition to this, we can use a function called openpty to set up a pseudo TTY for the process to run in. This is all straightforward code to write.

So I wrote it, and ... it didn't work. It crashed. Every single time. At first, it was because I had found the wrong definition of a structure in the system header files and a buffer was too small, so the call to openpty would corrupt the parent process. But, I hammered away at the bugs and still could not get it to work. I was pretty sure all the details were exactly right, and the call to fork in the parent process would return properly and tell me about the child process it had started, but on the child process side, nothing would happen.

After struggling with this for some time I eventually figured out how to use GDB, the native debugger, and configure it to follow the execution into the child process to see what was happening. What I eventually discovered is that while fork returns to the parent process with the child process ID successfully, the second return within the child process was not succeeding. The process crashed before control got back to the C# side. Something inside Platform Invoke couldn't handle the transition to a different process. Ultimately, I had to conclude that cheekily calling fork and exec directly from C# code was just not going to be possible.

What would be needed, then, was a "shared object" of non-.NET code, written in a language like C (a "bare metal" language), to call fork and exec on behalf of the .NET code. The problem with such a library, though, is that unlike .NET code which runs anywhere .NET runs, a native library contains machine code that can only work on the exact CPU and operating system it was made for. So, if I make such a library for Linux and it works on Linux, it's not going to work on OS X, and vice versa.

In the .NET world, this is not an unknown problem, and it is addressed by something called a Dependency Manifest. With a Dependency Manifest, you can put different versions of a library in folders separated by what kind of machine they're for, and then when you use Platform Invoke, it picks from the available implementations automatically. The popular package system NuGet ties into this functionality; you can make a NuGet package that packages multiple versions of a native library, and when it is used by another project, the build output is set up properly with an appropriate Dependency Manifest ("deps.json" file) that tells it how to find the right implementation.

So, I took a step back from QBX and created a brand new project. DQD.ForkPTY is a NuGet package that you can use from any .NET project and which lets you do the fork/exec dance in conjunction with openpty (technically using a common combined wrapper operating systems provide called "forkpty"), and it's presented with a nice, simple .NET interface at the top end.

I now have DQD.ForkPTY working on Linux, FreeBSD and OS X, and using DQD.ForkPTY, QBX is able to offer SHELL functionality not only on Windows, using my custom proxy mechanism, but also on Linux and OS X. The terminal emulator that I wrote for ConPTY actually worked pretty much out of the box for Linux. When I tried it on OS X, it immediately ran into things that I had forgotten to implement -- functions that simply weren't being used by the things I had run previously on Windows with ConPTY and on Linux. But, 200-some lines later, with all those missing functions implemented, SHELL worked perfectly in QBX on Mac OS X!

Here's a video showing it in action:

(You may notice that I kill the process to get out of GORILLAX.BAS -- this is because of a key mapping issue that's on my TODO list. Ctrl+Break isn't working on OS X right now. Will figure that out though 🙂)

DQD.ForkPTY can be found here, for those interested:

Wednesday, 18 March 2026

QBX: Progress Update

In the course of Discord updates about QBX, it came to my attention that it had been about a month and a half since I'd made any update in that forum. Development had definitely not been stagnant, so I decided to review changes since the previous update there -- about 6 weeks prior. Here's what happened with QBX over the course of the past 6 weeks:
  • Over 450 commits.
  • The completion of INT 21h emulation, along with a comprehensive test suite that verifies the interrupt interface layer on top of the DOS kernel.
  • Maturation of the DOS kernel's file I/O layer, including, to some degree, the Short File Names support for non-Windows platforms.
  • File I/O support in the QuickBASIC parser & interpreter.
  • Lots of IDE bugs fixed. Lots still remain, I'm sure (as well as gaps).
  • Improved number formatting & parsing.
  • Conversion of numbers to/from strings (CVI, MKL$, etc.), including to/from MBF binary representation for floats (CVSMBF, etc.).
  • Help file parsing & display (I don't distribute the HLP files, but if you give it the files from the QB71 distribution, it'll display them.)
  • On that note, contextual help for menus and implemented dialogs, and F1 to jump to the topic for the word under the cursor.
  • Partial support for having files loaded that aren't part of the project source (such as .TXT files).
  • Event registration & dispatch.
  • Proper handling of error handlers and event handlers in cross-module situations.
  • New core dialogs, such as the F2 "SUBs" dialog.
  • One small but key detail in line parsing/formatting: End-of-line comments stay at the column they're at unless the reformatted code bumps them to the right.
  • Support for "pinning" variables to memory locations (VARSEG/VARPTR, SSEG/SADD) so that buffers can be passed to interrupts, etc.
  • BLOAD/BSAVE support.
  • COMMON blocks for sharing variables between modules in a multi-module program.
  • Proper segregation of namespaces between modules, including allowing SUBs and FUNCTIONs to be declared with differing but semantically-identical TYPEs.
  • VARPTR$() for "X" PLAY and DRAW commands.
  • VGA ROM font at segment F000.
  • Starting SUBs and FUNCTIONs by typing the opening line.
  • STATIC variables & SUBs.
  • Identifier canonicalization.

So, you know. A few things. :-)

Tuesday, 10 March 2026

QBX: Help Files

This makes me really happy 🙂

https://youtu.be/GBkzloes26U

QuickBASIC comes with help files that are encoded in a proprietary binary format (with two layers of compression, no less!). Fortunately, someone out there put in the work reverse-engineering the format and wrote a document accurately describing how to decode and interpret the bytes. The document does an excellent job describing what could otherwise be a nightmare of complex binary formats.

The QuickHelp format employes two layers of compression. First, the text is compressed using a combination of tokenization and run-length encoding.

The tokenization involves identifying words or phrases that are repeated and adding them to a table, after which every instance of them in the text can be replaced with a reference to that table. Run-length encoding is much simpler: If you see the same byte multiple times in a row, just encode how many times it was repeated. A horizontal rule of 78 horizontal line characters can then just be "repeat this character 78 times" rather than literally 78 characters directly.

Then, after the tokenization pass, the resulting byte stream is compressed using Huffman compression. Huffman compression is based on a simple idea: Instead of using a rigid scheme of 8 bits for every byte, use fewer bits for bytes that show up more commonly, at the expense of rare bytes which then take longer than 8 bits. As long as there is a noticeable bias to some byte values, it can quite effectively compress things. As with many compression algorithms, Huffman compression requires you to treat a file that is really a stream of bytes as a stream of bits, automatically transitioning from one byte to the next as needed.

To get back the original data, you have to apply these steps in reverse, first the Huffman compression and then the Keyword compression. If you get even one detail wrong, everything after that point will almost certainly be indecipherable noise. Fortunately, though, the documentation I found was precise and accurate enough that it was relatively straightforward to write the code and it works a treat. 😃

Once the encoding in the file is sorted out, you then have the semantic meaning of the data to worry about. In a QuickHelp database, as it's called, there's a list of Topics, and each Topic can be linked to by one or more Context Strings. Within the text of a Topic, each line stores its text as a series of spans, each with its own formatting, and then a series of links, each of which specifies a start & end character on the row and the Context String or Topic Index (index into the list of topics) to which to link.

The lookup of help topics for keywords is pretty simple. The context strings simply are the keyword. But, help topics providing contextual help for menus and dialogs are a bit less obvious. Reverse-engineering the mappings required some trial and error with the actual QuickBASIC, checking which help pages with what text appeared from each dialog. QBX only has a small subset of the dialogs anyway, but it was important that the help context strings be mapped correctly.

In this video, the program you're seeing is my QBX project, but the help data is coming from the actual BAS7QCK.HLP file from a QuickBASIC 7.1 installation.

Sunday, 8 March 2026

QBX: File I/O! More generally, DOS INT 21h!

All has been quiet on the QBX-tern front, as one might say, for a few weeks now. The reason is that I've been coding my butt off creating an INT 21h DOS API interrupt emulation layer and a test suite that exercises every single function. I finished that a few days ago, and now QBX implements file I/O on top of that:

Since QBX implements file I/O using the emulation layer, it should be possible for a program that mix-and-matches QuickBASIC's file access abstraction with DOS interrupt calls to work.

Monday, 26 January 2026

QBX: Border Fill Algorithm

In paint programs, a common operation is Flood Fill. You pick a colour, click on a region, and that region is "flooded" with that colour. Every adjacent pixel of the same colour is replaced by your new colour.

In QuickBASIC, there is a statement PAINT which provides a related function, but it is different in a subtle but very important way: It doesn't paint as long as it sees pixels of the same colour, it paints up until a specified border. It doesn't matter what pixels it is replacing. This is a related operation called a Border Fill.

It cannot be overstated: Border Fill is much trickier to implement than Flood Fill!

But, for QBX to faithfully execute QuickBASIC code, it needed a Border Fill algorithm that was reliable and fast.

The implementation of QBX's Border Fill uses an interesting data structure: It tracks a subset of a 2D area as a series of intervals using an Interval Set structure built on a B-Tree. The B-Tree can very quickly enumerate all elements whose keys are greater than or equal to a specified key. The intervals are placed into the tree using their right edges as the key. It is then possible to test whether a given coordinate is in the set or not by enumerating the intervals whose right edge is greater than or equal to the given coordinate. This immediately eliminates all sets that come before it, and the enumeration will either find an interval that contains the point, or it will find an interval that comes after the point -- in which case all future intervals also come after the point. Fast algorithms for intersecting intervals are also possible.

The initial implementation tried very hard to maintain a traditional queue of spans needing to be processed, but I just couldn't get it perfect. There was always some edge case that would either fail to paint or enter an infinite loop, processing areas it had already processed. I came to the conclusion that this was because:

  • Spans in the queue to be processed don't get merged together if they're adjacent, and
  • When advancing to a new scan and trying to expand left or right, the expansions are queued as independent entries (because they have different propagation flags)

But, it suddenly occurred to me that the merging problem could be solved with the existing interval set implementation I was already using to track which parts were processed, and that once I did that, there were no propagation flags any more, which meant that the extension could simply be processed as part of the span it came from, rather than being queued independently.

So, I reworked it to do exactly that, and that solved all the problems.

Saturday, 17 January 2026

QBX: PC Speaker Emulation

I'm excited by this. 🙂

Getting closer to running NIBBLES. The latest advancement: PC speaker sound emulation.

You might think, looking at this, that it's not _really_ PC speaker sound emulation, it's QBASIC PLAY statement emulation, right?

Wrong 🙂

Behind the scenes, there's a simulated 8253 timer chip. Its Timer 2 configuration is linked to a simulated 8042 keyboard controller chip (well, this last one is a bit of a stretch, because it only cares about the two speaker control bits). (For performance reasons, it simulates frequencies and ticks instead of communicating actual raw ticks.)

The function in charge of playing a note as part of a PLAY statement does so in a manner that reconfigures the 8253 to generate the target frequency. The sound you're hearing is the raw square wave resulting from the simulated output from the timer chip. Woohoo!

Wednesday, 7 January 2026

QBX: Now Runs Code

A small update: My QuickBASIC clone now runs code 🙂 At this point, only the specific, limited set of statements and functions actually used in FERN.BAS are supported. But, this proves the strategy.

Friday, 2 January 2026

Presenting: QBX

About 2 weeks ago, I got properly started on my QuickBASIC environment emulator. I call it QBX. It's now a bit over 21,000 lines of code, and it features:

  • A code model for the QuickBASIC programming language that can take a structured representation of a program and output it with canonical formatting.
  • A lexer and a parser that can read QuickBASIC code and build code models for it.
  • Automated testing of the lexer and parser.
  • A mostly-complete low-level emulator of the VGA chipset, the most popular graphics system for PCs in the late '80s.
  • Code Page 437 translation to match the emulated VGA fonts.
  • Keyboard handling that should do exactly what's needed to, eventually, be wired up to INKEY$, the QBASIC keyboard input function. This one is more involved that it looks on the surface. 🙂
  • Maybe 10% of the QuickBASIC IDE, showcased here.

The classic blue screen you see here is fully emulated. The VGA memory is, as in the real chipset, divided into 4 64KB planes. In the mode shown here, characters are read from plane 0 and attributes from plane 1, and within each character box, the font from which the actual dots are generated is read from plane 2. The emulation also supports CGA/EGA/VGA graphics modes, including the classic 640x480x16 and 320x200x256 VGA modes. The emulation is based on the actual VGA hardware registers and timing. The idea is that, eventually, when this thing can actually run programs, a program that directly accessed the VGA hardware should work in this too.

The incomplete IDE here doesn't yet know how to run code, but it does parse code to the abstract code model and format it back out. You can see keywords being capitalized and expressions being spaced out here. The formatting should be a very close match to actual QuickBASIC, if not identical to it. (For instance, if I run the well-known gaem NIBBLES.BAS through it, the output it produces is byte-for-byte identical to the input file.) The editing experience is also as close as I've been able to make it and, as I find mistakes, will get better. Microsoft's DOS TUIs had interesting mechanics around selection and the clipboard, and those are also shown here.

The menu doesn't actually do anything yet, but it is complete and fully behaves the way the menu should.

I feel really good about this project and the way it's coming along. And, it was really awesome, having done all the development so far on a Windows machine, to pull the code down on a Linux machine, type "dotnet run", and have it start up immediately on a completely different operating system and do exactly what it's supposed to do 🙂