Pith: Getting it from here to there

My last post was all about EDI and its horrific intertwining of data structure with meaning. Suppose you are a business that integrates with another business via EDI. You've just created an EDI document describing some business object, such as an invoice, that your partner needs to see (almost certainly by means of translating your document from a less-arcane internal format -- there might be no help for you if your business uses EDI internally!). How do you get it to them?
While the EDI X12 standard is a product of private industry and costs many pennies to get ahold of, the people charged with figuring out ways to get data from one place to another took a different approach, one more in line with the atmosphere of open communication that characterizes the technological core of the Internet.
There are many standards out there (many many standards) that define how the Internet works. (What does it mean to say "how the Internet works"? The Internet is a "network of networks". Within each network, it's a fair bet that, most of the time, the computers are all of the same type, and even where they're not, there's a local system administrator making sure every machine knows how to talk to every other machine. Two disparate networks, however, could be running very different systems. The whole purpose of the Internet is to provide a consistent set of protocols so that a computer in network A and a computer in network B have a common language that allows them to address each other and share information.) Each of these standards is called an RFC, short for Request For Comments, and describes how one small piece works.
To give an example, back when people were first inventing e-mails, an e-mail message consisted of a block of monospaced text. There was no formatting, no special meaning, no means to attach files. It was just a block of text. Within one mainframe system, users sending each other e-mail was handled by a piece of code on the server that had the special privilege to go into the recipient's files and add the text of the new message to the end of a "mailbox" file. How exactly the file was formatted wasn't particularly important, as long as the program for reading mail within that system could understand the files produced by the code for delivering mail.
Next thing you know, though, people at one site (a university, most likely) want to send messages to people at another site, and the two mainframes aren't compatible. Their mail programs don't write the same format, and even if they did, since they're different computers, it isn't possible for the mail delivery software on one to get its fingers into the files on the other. The two computers could speak to each other only by establishing connections. A connection was a lot like a file in that you could read from it and write from it, but it had one very significant difference. With a file, you can "rewind" or "fastforward" -- you can jump to a particular point in the file and do your reading or writing there. With a network connection (really, with any stream, of which a connection is a particular type), you can write data, but it must come after the data you wrote just before it, and the other end will get it in just the same way. Similarly, you can read data, but it'll always come in exactly the order the other end sent it. So, you want to have a piece of software at site A talk to a piece of software at site B and deliver mail. Clearly, they must be speaking the same language.
Enter the RFC. In the early 1980s, some smart men got together and put some thought into exactly what pieces of information would need to be exchanged in what order, and they came up with a generic way for any site to present an e-mail message to another site. Once received, the other site could do with it what it wanted, such as appending it to a user's "mailbox" file. The key thing was that over the wire, there was a standard way to deliver the mail. It might have looked something like this:

MAIL FROM: <alice>
RCPT TO: <bob>
DATA
Hello Bob,
Want to go for lunch?
-Alice
.
QUIT

E-mail today is still sent using a protocol very much like this, called SMTP, or Simple Mail Transfer Protocol, and it is defined by RFC 2821.
Now, you could imagine an argument taking place between two computer scientists at different universities. One might be telling the other one, "Our mailboxes store the date that a message was sent. It's very useful! You should do the same thing." But where to put this date? The other might reply, "That's all very well, but we've found it more useful to be able to send the same message to more than one person, for a group discussion. Our system stores a list of people involved in such a group discussion." Where to put this list?
A separate RFC describing an Internet Mail Format takes care of these types of details. Completely separate from how the e-mail is delivered from one place to another, it specifies how the e-mail is laid out. The SMTP protocol moves a block of bytes from one server to another, and the mail format says what exactly those bytes are.
The Internet Mail Format tells you not only how to indicate the date, but exactly what layout to use for it, and it tells you to indicate who the message is from and who it's for and exactly how to lay out their e-mail addresses, and it provides for a clear separation of these bits of "extra" information, which it calls headers, from the message itself, which it calls the body.
An SMTP delivery of a message in this format looks something like this:

MAIL FROM: <alice>
RCPT TO: <bob>
DATA
From: Alice MacBob <alice>
To: Bob MacAlice <bob>
Subject: lunch
Date: Fri, 26 Oct 2012 10:30:00 -0500

Hello Bob,
Want to go for lunch?
-Alice
.
QUIT

This split between headers and body allows a fair degree of flexibility, but it doesn't give you the ability to attach a video file of a cat doing something hilariously stupid. To allow for this, an extension of the Internet Mail Format was created called Multipurpose Internet Mail Extensions, or MIME.
MIME was defined so that any valid Internet Mail Format message is also a valid MIME entity, but MIME allows you to do so much more. To start with, it allows you to define a message in multiple parts. These parts could be alternative versions of the same thing, such as one using HTML to provide nice formatting to the content of a message, and another with the same message stripped down to plain text for mail programs that don't support HTML, or it could be multiple documents, such as a message body and attachments associated with it.
So, we started out talking about the delivery of EDI from one place to another, and we ended up talking about how e-mails work. What could these two things possibly have in common? To misuse a metaphor, we've gone off a tangent onto a curve, and that curve has come full circle. :-)
I mentioned earlier that a group of people was tasked with coming up with a standardized way of getting EDI data from one business server to another. The result of their work was a series of documents they called Applicability Statements, and they made each one into an RFC: AS1, AS2, AS3, AS4. These strangely-named documents all have one thing in common: They don't actually specify protocols or formats. Instead, they list other formats and describe how they can be applied to business-to-business communication. The first of those standards, AS1? Yes, it does specify business-to-business communication by e-mail.
Setting aside for the moment how crazy this really is, let's look at how it recommends you do this. First, here's a series of headings from within the AS1 specification:


3.0 Referenced RFCs and Their Contribution


3.1 RFC 821 SMTP [7]


3.2 RFC 822 Text Message Format [3]


3.3 RFC 1847 MIME Security Multiparts [6]


3.4 RFC 1892 Multipart/report [9]


3.5 RFC 1767 EDI Content [2]


3.6 RFC 2015, 3156, 2440 PGP/MIME [4]


3.7 RFC 2045, 2046, and 2049 MIME [1]


3.8 RFC 2298 Message Disposition Notification [5]


3.9 RFC 2633 and 2630 S/MIME Version 3 Message Specifications [8]

Yes, that is fourteen different RFCs upon which this standard is building.
The basic idea is that the EDI document gets treated in roughly the same way as that "Hey Bob, Want to go for lunch? -Alice" message earlier -- a sequence of bytes that get wrapped up in the structure defined by the MIME RFC. The resulting MIME entity is then delivered by SMTP to the remote server. Once delivered, the remote server can process it however it wants -- it might be spooled into a mailbox for another application to read, or the mail server might be customized for AS2 and pass it off directly to a handler on-the-spot.
In the integration I'm working on, we are fortunately not using AS1. The idea of involving a mail server in a critical business flow frankly gives me nightmares. Our integration uses Applicability Statement 2, which describes the official, standard means of delivering EDI documents via HTTP.
HTTP, as a protocol, is far better-suited to this type of activity. It has long had as one of its primary responsibilities moving machine-readable data from one process to another for the specific purpose of processing that data and producing a response.
An HTTP request is actually remarkably similar to a document encoded according to the rules of the Internet Mail Format -- similar enough as to arouse suspicions that it may have been intentionally designed to be similar. Here is a comparison:

POST /handler HTTP/1.1
Host: servername
Content-Type: text/plain
Content-Length: 123
Accept: text/plain, */*

Hello Bob,
Want to go for lunch?
-Alice

From: Alice MacBob <alice>
To: Bob MacAlice <bob>
Subject: lunch
Date: Fri, 26 Oct 2012 10:30:00 -0500

Hello Bob,
Want to go for lunch?
-Alice

Now, here's where things get a bit weird. AS2 says, on the one hand, to use MIME, with its header/data structure, as the "packaging" for the EDI data, and, on the other hand, to use HTTP, with its header/data structure, as the "delivery method", but it says to treat them as the same thing!
When delivering an AS2 request, MIME entity that represents the wrapped-up EDI document has a bunch of headers, a blank line, and then data. The HTTP request that does the actual delivery also has a bunch of headers, a blank line, and then data. The fully-formed AS2 request merges the MIME headers (red) into the HTTP headers (blue), so that the body of the MIME entity is the body of the HTTP request:

POST /path/to/request/handler HTTP/1.1
Host: servername
Content-Type: application/edi-x12
Content-Length: 123
AS2-From: Fabrikam
AS2-To: Contoso
AS2-Version: 1.0
Message-ID: 09A8D1403A2F4C87BF3FEC1120C4EFA3
Date: Fri, 26 Oct 2012 10:30:00 -0500

edi data here

This is weird. It's also interesting to implement, because while web servers are very good at the request processing model, they aren't typically designed to give you full control over the request or response. In most cases, they handle the headers for you, since in most cases, only the body of the request and the body of the response really matter significantly to the application. Meanwhile, most components for working with MIME entities have no reason to suspect this split of the headers from the body data, so parsing the MIME entity being sent in the request and sending the response MIME entity back are both a bit tricky.
Just for completeness, I'll mention that the AS3 standard takes delivery of business documents for on-demand processing into the realm of FTP. It models the delivery of a MIME-encoded EDI document around a file upload. E-mail may not be the best fit for on-demand request processing, but surely FTP uploads represent the polar opposite of what is desirable!
I think the message to take away from all of this is that the people out there making the standards that everyone ends up having to implement are not geniuses, and they're not focused on ways to simplify the implementation. They really should be, though, because the simpler the implementation is, the fewer bugs there will be, and the more reliable the software will be in the long run. :-)

Pith

Friday, 26 October 2012

Getting it from here to there

3.0 Referenced RFCs and Their Contribution

3.1 RFC 821 SMTP [7]

3.2 RFC 822 Text Message Format [3]

3.3 RFC 1847 MIME Security Multiparts [6]

3.4 RFC 1892 Multipart/report [9]

3.5 RFC 1767 EDI Content [2]

3.6 RFC 2015, 3156, 2440 PGP/MIME [4]

3.7 RFC 2045, 2046, and 2049 MIME [1]

3.8 RFC 2298 Message Disposition Notification [5]

3.9 RFC 2633 and 2630 S/MIME Version 3 Message Specifications [8]

No comments:

Post a Comment

About Me