Navigation: [Next Page] [Previous Page] [Contents] [My Writing Page] [My home page]
Copyright © 2004, Glenn Story
I could see this was happening when I returned from Japan and began looking for a new job. I ended up going to a company I had almost joined at the time I went to Calma—Tandem Computers.
Tandem had unique hardware and software. The essence of Tandem’s uniqueness is that it is fault-tolerant. That means that any single hardware component can fail and the computer is guaranteed to keep running. It does this by having redundant components for everything—CPU, memory, I/O devices, power supplies. The operating system is designed to detect component failures and switch in the backup. But in most cases the backup has not been sitting idle, but rather has been doing itls own processing.
The Tandem OS was all written in a higher-level language rather than assembly language. The language was Tandem-proprietary: TAL (Transaction Application Language). The name is misleading in several ways: It’s not necessarily related to using transactions, and it’s not only for writing application code.
The original Tandem computers were not based on microprocessors. Instead they had processor boards. They were microcoded machines. The main microcode was stored in a file in the file system on the disk and loaded during system boot. So there was a chicken-and-egg problem, since the instructions executed by the boot process were microcode based. The solution was to have a subset of microcode stored in ROM. This provided enough machine instructions to boot the system and load the “real” microcode. However, the TAL compiler could generate machine instructions that weren’t in the ROM subset. Therefore the boot code had to be written in assembly language to ensure that it only included the proper subset of op-codes that was supported by the ROM version of the microcode. Every system component other than the boot was written in TAL.
Tandem systems used what Jim Gray referred to as a “shared nothing” architecture. He used this phrase primarily to distinguish Tandem architecture from shared memory systems that are very common. By not sharing memory between processors Tandem eliminated a possible “single point of failure” i.e. a component that could fail and bring the system down.
Tandem systems included up to sixteen processors, each with its own memory and I/O bus. There were two inter-processor buses (IPBs) that connected to each processor for the purpose of allowing communication between them.
If a process on one processor needs to communicate with an I/O device (such as a disk) that was physically attached to a different processor, then it would use the IPB to send the request to the other processor.
There is a software component, called the Message System that provides the software support for communicating between processes. If the two processes were in the same processor, the Message System uses shared memory to transfer the message. If, on the other hand, the two processes are on separate processors, then the Message System used the IPB. (The careful reader will note a mixture of verb tenses in this paragraph. At the time of this writing, the Tandem system still exists; thus the present tense. However the IPB has been replaced by something called ServerNet, which is why I write about it in the past tense.)
The Message System can also send messages to other systems as well. Systems that are relatively close together can be linked via a fiber-optic ring called FOX. Systems that are further separated can be connected via LAN or WAN networks. Thus making I/O requests, such as reading files, can be extended across a customer’s entire network with very little additional system code, and no additional application code, since the Message System takes most of the burden of determining how to deliver a message to the destination process.
In order for an application to be fault-tolerant (at least originally) it was written to run as a process pair. This means that the program actually runs in two processes, each on a separate processor. One process of the pair is designated the primary, and the other the backup. If the CPU or memory for the primary process fails, the backup process takes over. In order for the backup to be kept apprised of the current state of processing in the primary process, the primary periodically sends checkpoint messages to the backup process.
Programs using the model of fault-tolerant programming involving primary and backup processes in a process pair were difficult to write. The concept of sending checkpoints from the primary to the backup was not difficult to grasp or implement, but when to take these checkpoints was hard to get right. Worse, testing the application to make sure it failed over correctly in all circumstances was very difficult. Tandem was very successful despite this and soon had a number of imitators, all of whom zeroed in on the problem of having to deal with programming process pairs. So Tandem came up with a simplified mechanism: An application server had only to (1) be stateless and (2) be transaction-protected. If a process or its processor failed, the transaction would be rolled back, and since there was no state to lose, a new incarnation of the process could be created on any CPU. The transactions were monitored by TMF (Transaction Monitoring Facility). Although Tandem did not invent the concept of transactions, they certainly developed it, including the concept of a distributed transaction and the two-phase commit. Although the application didn’t have to be written as a process pair, the TMF processes did. But this shifted the burden of writing and testing process pairs from the application to Tandem-supplied code (TMF). Tandem components used what we called a “requestor server” model. This model is virtually identical to the “client server” model in wide-spread use today.
In addition to doing pioneering work in the area of distributed processing, Tandem also was an early developer of distributed data as well. Tandem files could be partitioned and distributed across a network.
Most of my time at Tandem was spent working on file systems (we had more than one). In most operating systems, the file system uses some kind of programming interface to communicate with I/O drivers that in turn control the hardware. For reasons, and using technology previously described, the Tandem file systems create message packets and send them to an I/O process (which may be on a different processor) using the Message System.
The Message System is a privileged system, meaning that the callers of Message-System functions must themselves be privileged. If a user process wants to communicate with another process, it uses the file system to do so. An application program can open another process in a manner almost identical to opening a disk file or other I/O device.
The file-system support for process-to-process communications consisted of two parts: a requester (client) part, and a server part. Tandem processes (or process pairs) could be given names. If a requester wanted to send a message to another process it would OPEN that process using its name. In the case of a process pair, the name would route the message to the primary. If the primary failed, the name would be inherited by the backup. The requestor would just use a file-system WRITE to send the message to the server and then a READ. Or it would more frequently use a WRITEREAD which combined these operations.
The server needed a file by which it could read incoming requests. This file had a special name, $RECEIVE.
Of course the most frequent thing one wants to do with the file system is access disk files. As a consequence I worked closely with developers for the Disk Process (DP2) since DP2 was a major consumer of request messages generated by the file system.
When I started working for Tandem they had only one file system. Then they started making a SQL system, and that project invented its own file system to support the specific needs of SQL. Tandem created the first commercially viable SQL system in terms of performance. Even IBM’s DB2 was originally too slow to be useful for “real world” applications. I never worked on the SQL file system.
The first major project I worked on in the File System group was nothing less than a full rewrite of the file system as part of a project called “Exceed”. Exceed was started because the original Tandem OS allowed a maximum of 255 processes per processor. Like a lot of limitations built into software, this number seemed huge at the time the OS was created, but eventually became a limitation. So the Exceed project was created to remove that and any other arbitrary limitations we discovered in the OS code.
I ended up writing the code to create files, rename them, and purge them. To this day if someone creates a file on a Tandem system they’re using code I wrote.
When I started at Tandem I worked on a Tandem 6530 terminal. This terminal operated in both a block mode like the IBM 3270 and character mode like most other terminals of the time. I could also use a dial-up modem to connect to a Tandem system. I used software supplied by Tandem to emulate the 6530 on my IBM PC clone. This was the first time I could seriously do work from home. But when I needed to use a lab machine to test operating-system changes, I generally had to be physically present in the lab, so I couldn’t do such work from my office, let alone from home.
I was generally in two-person offices at Tandem. When I first started there was a shortage of office space so I was in a three-person office. My office-mate resisted adding a third person, but it didn’t really bother me: I had just come from an environment where we were all in cubicles, and in Japan it had been open office space without even dividers.
The Tandem operating system (originally known as “Guardian”) was a proprietary operating system, designed specifically for Tandem hardware, and using its own set of APIs and user and operator interfaces, such as the command language TACL (Tandem Advanced Command Language), written by Roland Findlay, whom I later worked with and became friends with.
Just before I started working at Tandem I had started working with UNIX. While I was at Tandem, UNIX became increasingly important, and so Tandem decided to mount a project to add UNIX APIs to its OS. Among other things, this required a new file system that contained UNIX interfaces and semantics. This caused the Tandem OS to be renamed to “NonStop Kernel” with two “personalities”: “Guardian” (the traditional proprietary interfaces) and “OSS” (Open System Services) with UNIX interfaces.
I worked on the OSS project, with my first assignment to write the File Manager, a once-per-CPU process that received requests from the Disk Process and the Memory Manager. The original Guardian file system was virtually all client code. Also, each process’s file-system operations were isolated from those of other processes. The UNIX file-system semantics requires some amount of sharing of information between parent and child processes. We also decided for performance reasons to have client-side caching, which created additional requirements for shared data between processes on a given CPU. This in turn created a need for communication from DP2 and the Memory Manager to request changes to these CPU-wide data structures. It was the File Manager’s role to handle these communications.
My second assignment in the OSS file-system group was to work on an NFS (Network File System) server. Three of us on that project received a patent for our work.
United States Patent |
6,081,807 |
Story , et al. |
June 27, 2000 |
Method and apparatus for interfacing with a stateless network file system server
Abstract
A method and apparatus for interfacing with a stateless NFS (Network File System) server. A pseudo-open state is created for a file when a request from a network client for accessing the file is received in a network server. The term pseudo-open data relates to a set of data that is kept in a network server. The pseudo-open describes the state of a file being currently accessed via an NFS server in the network server. The pseudo-open data differs from normal file state data in that it can be created or recreated at will, thus preserving the stateless functionality of the NFS server. Thus, if a request is received at any time and there is no pseudo-open state established for the file, the pseudo-open state will be established or reestablished at that time. If, on the other hand, a request is received for which a pseudo-open state already exists, the overhead of creating the pseudo-open state is avoided, and the existing data is used. The pseudo-open state is stored in a file-system data structure called VNODE. Each active file has an associated VNODE. The pseudo-open state of a file can be then closed. The state of the file can be changed to a higher or lower level of access privilege via open-promotion or open-demotion operations, respectively. Open-demotion refers to the change of a file state to a lower level of access privilege.
Inventors: |
Story; Glenn (Palo Alto, CA); Sodhi; Amardeep S. (Fremont, CA); Tom; Gary (San Jose, CA); Yee; Mon For (San Francisco, CA) |
Assignee: |
Compaq Computer Corporation (Cupertino, CA) |
Appl. No.: |
874426 |
Filed: |
June 13, 1997 |
Essentially we invented the “pseudo-open,” a concept that bridges the gap between the stateless world of NFS and the stateful world of file-system access expected by the Disk Process.
On the OSS project, I traded my 6530 terminal for a Unix workstation. At first I had a MIPS workstation; this was later traded for a Sun workstation. These gave me a GUI environment, and there was a Sun 6530 emulator for use in connecting to the Tandem system. But Tandem developed cross compilers that ran on Sun, so I only had to do testing on the Tandem hardware. By now we could do most of that from our office over the network. By extension I could work on a lab machine from home. I could even reboot the system from home. It would then drop off the network while it rebooted, and if it failed to boot, I was out of luck. That would require a trip to the lab.
At some point Tandem decided to port their code to Microsoft Windows NT. The goal was to have a clustered SQL system running on NT. I became part of that project, originally assigned to help port the file system. But the file-system port was fairly mechanical and easy, so they were pretty much done by the time I joined the project. I was then given the task of developing the administrative user interface for starting and stopping the system. I spent the rest of my time at Tandem working on “Broadway,” the administrative user-interface portion of the NT project.
Eventually the NT project was cancelled, the victim of changing corporate strategies. Broadway was the only portion that survived. It was redirected to provide GUI administrative interfaces for the Tandem NonStop Kernel operating system. I worked on this effort for a short time before I left Tandem (actually Compaq, which had purchased Tandem) to go to work for Sun Microsystems.
Of course, for the NT project, I used an NT workstation. I was, by then, running NT at home as well, and I could use a product called PC Anywhere that would allow me to see and interact with my work computer from home. There were no lab machines, my office computer was where I tested my code.
I should say a few words about Jimmy Treybig, the founder of Tandem and CEO for most of the time I was there. I first ran into Jimmy at the new-employee orientation. He was scheduled to give a welcome talk and was fiddling with and cursing a slide projector. I figured he was some AV guy. His Texas accent and word usage didn’t match my expectations of a company president. But, as an article in the San Jose Mercury once said of him, “He comes across like a country bumpkin, but he’s smarter than you’ll ever be.” He was smart enough to train himself well for starting his own company: He worked for Hewlett Packard for several years, where he learned how a tech company was run. He also learned the “HP way,” a management style that respects and takes care of its employees. He then went to work for Kleiner & Perkins, a venture-capital company, to learn how money was doled out to start-up companies.
The Tandem hardware had obvious similarities to HP computers of the time. The TAL programming language was based on an HP language, SPL. (The file system, however, was based on IBM’s VSAM). Tandem’s original funding came from the very VC company that Jimmy had worked for. But most important Jimmy brought with him the notion that a company should trust and take care of its employees. For that reason, more than any other, I loved working for Tandem.
The best example I can think of Jimmy’s personal caring for his employees was this: at one point a Tandem employee was diagnosed with bone cancer. The only thing that would save him was a bone-marrow transplant, but finding a matching donor was very unlikely. Jimmy had Tandem pay for expensive screening blood tests for any employee who wanted to take it. I volunteered.
Tandem was famous for its “beer busts,” parties every Friday afternoon where employees could have a few beers, eat some chips, and socialize.
Ultimately, when times got tough for Tandem, and Jimmy resisted laying people off, he eventually got forced out. Tandem was acquired by Compaq, whose management team at the time was primarily marked by paranoia rather than trust. (Fortunately, the Compaq board of directors soon booted them out.) Then Compaq was bought by HP (ironic, considering Tandem’s roots), but by then the HP way was largely dead. (When asked about it in a newspaper interview, the HP CEO basically said “times have changed.” Sad but true.)
The HP acquisition happened after I left. Tandem employed over 8,000 employees at its peak. Now, the NonStop division of HP, all that’s left of Tandem, has probably 800 employees. When a current project to convert to a new processor is finished, probably many of them will be let go. I mourn the loss of Tandem—it’s the only company I feel that way about.
Navigation: [Next Page] [Previous Page] [Contents] [My Writing Page] [My home page]