BTI logo

– BTI 8000

While BTI did well selling systems on the BTI 5000 to car dealerships across the country, management boldly decided to move into the super-minicomputer This was a multi-year effort, and as the revenue from the older systems peaked and started to decline, the pressure for the 8000 to succeed mounted.

BTI 8000 Timeline (link)

(The following timeline was supplied by Ron Crandall)

While the BTI 3000, 4000, and 5000 was keeping BTI growing, a few people in both the hardware and software department starting thinking about a next generation design, one that broke from the HP CPU heritage. In 1974, Ron Crandall, George Lewis (Lew), Bill Cargile, and Bill Quackenbush started preliminary investigations along these lines. The effort was short lived, as BTI 4000 and 5000 work kept everyone too busy to do much else. Nevertheless, certain important decisions were made and the overall architecture for a new generation machine was mapped out. The machine would have a system backplane into which one of four types of modules would be plugged. Memory, CPU, PPU (peripheral processing unit, basically a DMA engine connecting to peripheral controllers), and SSU (system services unit, essentially what was left over, such as operator interface, boot, remote diagnostic, time of day clock, error handling). Bill Cargile designed an asynchronous backplane to interconnect the major parts of the machine.

The plan for the for the system never completely went dormant, but there were several major obstacles along the way that consumed inordinate amounts of time. A major one being what we 'affectionately' called the octo-bus. By late 1975 some people began full time work on the 8000. A group of people flew to Corvallis, OR in March of 1975 to meet with Jim Meeker and coax him into working for BTI. He set about specifying the very CISC-y BTI 8000 instruction set. Ron Crandall, frantically busy with system software issues on the 5000, used all of his available spare time to architect a robust file system structure, one that would be "crash-proof". Many of the key components came from the design of the 5000 disk structure, but a lot of thought went into the various faults that could occur. Other people started on various components of the design. Roger Fairfield was given the task of making the asynchronous backplane prototype work.

Roger's prototype was up and running in a few months, but there were persistent failures that no one could track down. The effort continued for most of a year, from mid 1975 to 1976 IIRC. When Bill Quackenbush was finally freed up from the nasty, unworkable octo-bus, he relatively quickly demonstrated that the problem lay in the synchronizers that were a crucial part of the asynchronous priority resolution protocol. Bill built a test rig that ran the two interacting devices off of the same clock but with a variable phase between them. He might have just adjusted two clocks to be as close as possible and then just relied on the drift to vary the phase, but in any event, by hooking up a 'scope to the synchronizers, you could see the output 'fence sit' for way longer than the advertised propagation delay. Not too surprising, since the parts weren't designed to be synchronizers and we were violating the setup and hold times. Bill was able to relatively quickly redesign the bus to be synchronous and another task (system clock) was added to the function of the SSU.

The overall hardware effort was plagued with a series of problems. Realizing early that the software development could in no way wait for actual hardware, an instruction set simulator was built using BTI 5000 hardware. The original simulator just simulated the instruction set and was used to develop software products like the assembler/linker and much of the early operating system kernel. A few I/O devices were added so that OS projects could be completed. At some time, various components of the machine became available. The emulator was modified to connect to a bus interface board and the emulator played the role of CPU while the memory, and later, even the PPU did their job installed in an actual card cage.

In February 1977, Lew decided that the burden of supporting the BTI 5000 was excessively interfering with progress on the 8000. So a pact with the devil was signed and the 5000 development was passed off to another (and incompetent) group. This had repercussions later for the 5000 product line. We also went on a concerted hiring effort and many more joined the 8000 effort.

As the magnitude of the project increased, Lew became overwhelmed and the continuity and logistics of the project started to suffer. Unfortunately, some of these issues resulted in problems that plagued the 8000 throughout its life. The most grievous of these issues involved the schedule for completion. Even as late as February 1980, Lew was insisting that a completed machine would ship by the computer conference in May. Just a walk down the row of offices and labs easily put paid to that idea. Some of the more optimistic schedules for such vital items as a disk controller were September. Unfortunately, marketing geared up for a major marketing push based on Lew's hopelessly unrealistic schedule. Consequently, interest in the machine peaked well before it could be shipped. Also, Lew had stockpiled about a million dollars in inventory for the initial production run of these machines resulting in needless expenses for the company. This is the particular bit of mismanagement that got Lew removed from the management position. He chose not to remain as a contributing staff member.

A few anecdotes about the CPU effort are instructive of the kind of problem that we suffered. Lew had decided that our chief PC board layout guy, John Caris, would do layouts for all of the boards so that we would always be working on close to shippable quality boards. Since John could do a layout of a BTI 5000 board (about 80 square inches and 80 DIPS) in about two weeks, Lew figured he could to a layout of a BTI 8000 board (about 460 square inches and 450 DIPS) in ten weeks (the simple ratio of the parts count). What Lew was smoking to assume such a thing is unknown. But John floundered with the first CPU PC board layout for eight months before Lew would relent and allow a wire wrap prototype. This prototype was brought up in a few months and several more CPUs were then wire-wrapped and debugged so that we could finally get a working, full speed (some devices had to be down-clocked, but this had little effect on the development effort) machine. Even with the new CPUS, the emulator still played several key rolls in the system operation. But development could finally go on full bore.

Meanwhile, the PC board layout for the CPU continued. In order to 'help' John Caris, another layout person was brought onto the project and they worked alternating shifts. This worked okay for a while, but then progress slowed to a virtual standstill. It seems that each would have particularly difficult routing issues to resolve. So they would remove some of the others traces in order to solve their issue. This comedy of errors went on until we finally qualified a vendor who could make four layer boards in the size we required. But another example of a massive expense and, worse, schedule slippage. I should note that when these CPUs came back, nothing worked, even though they were schematically correct versions of the working wire wrap prototypes. It seems that many buses were layed out with parallel traces and the crosstalk was sufficient to induce phantom signals. Since this problem afflicted almost all of the buses, it took a lot of time and effort to fix as well.

Because of Lew's optimistic schedules, BTI prematurely started letting the world know about the 8000 in 1978. They presented papers at technical conferences; the 8000 was mentioned in sales literature; glossy brochures were produced touting its advanced features.

The 8000 didn't really started shipping until June 1981, and even then, the first few systems were moderately unreliable. The bus transfers would suffer from protocol errors whose basic cause was some firmware problems. Making it worse, the remote diagnostic facility (RDF) wasn't in place, making it impossible for in the field failures to be diagnosed and repaired in a timely manner (subsequently, we successfully used the rdf on many occasions to restart a crashed system with no loss of user data; their sessions just resumed where they had stopped). In an odd reverse from the usual case, the operating system proved to be fairly reliable, even in these early times. By late 1981, these issues had been worked out.

BTI ramped up staff in anticipation of the need to ship many of the new BTI 8000 systems in 1982 ... but the orders only trickled in. A massive layoff was the result, and two smaller ones followed the same year. Finally, in March, 1986, the decision was made to halt all development on the 8000 and to simply support any existing customers. Slowly these customers dwindled, and BTI kept downsizing.

In the end, there were perhaps 30 paying customers for the system. BTI built around 15 others, some of which were used internally, others were used as demo systems to bait customers into a purchase.

By 1993, BTI was down to about a dozen employees, but surprisingly, there were still 19 systems in the field as of 1995. Phil Deal supported the few remaining customers, but the final systems were retired in 2002, and the US part of BTI was closed down.

Variable Resource Architecture (VRA) (link)

IBM pioneered the idea of a system architecture with the IBM/360 family. The 360 architecture was an abstract model of computation, where many different machines implementing that model could span a couple orders of magnitude in performance while sharing the cost of developing the OS and other tools and preserving the customer's own software investments.

BTI didn't have the resources to develop a family of computers, and took a different approach. They decided to build a multiprocessor, where a low end system contained a single CPU, a single memory controller, and a single I/O controller. Higher end systems were built by adding more resources, instead of having a family of uniprocessors with a range of performance.

BTI developed a model where a single high speed backplane connected together one to many instances of each of a few computing resources, with each type of resource being identical and treated equally. This is known as a symmetric multiprocessor. BTI didn't invent the idea (for example, Burroughs 5000, Tandem T/16), but it also wasn't very common either.

BTI called this idea Variable Resource Architecture, or VRA for short.

Here are some key design features of the BTI 8000 VRA:

Fail-Soft Behavior (link)

Because of the bank/accounting/business focus, BTI wanted to assure customers that its data was safe. Although not nearly as fault tolerant as the Tandem line of computers, real effort was put into making the system "fail-soft."

BTI defined this to mean that it when hardware failed, the system would not cause harm, and it would be be easy to repair. Fail-Soft was engineered into different aspects of the system; it was not any one single piece of technology.

Virtual Machine Multiprocessing (VMM) (link)

It was alluded to above, but one of the concepts of the 8000 was Virtual Machine Multiprocessing, or VMM. This meant that the user programs were entirely unaware how much memory the system had, how many CPUs the system had, and any attempt manipulate a resource was mediated by the OS.

The virtualization of the user state was and is very common; it is required for protecting processes from either other, due to either malice or errors.

But virtualization was especially important for BTI in that virtualization also meant that a user program couldn't tell if it was running on a single CPU system or one with eight. When a system was reconfigured, either adding or removing resources, user programs didn't need to be modified in any way, and nothing needed to be recompiled.

BTI 8000 Operating System (Monitor) (link)

The BTI 8000 OS was frequently called the monitor, as it monitored and controlled the system activities.

Like user programs, the monitor was distributed and ran on any and all CPUs. The only time there was any asymmetry was at boot time: after boot up diagnostics had finished, the SSU would enable the CPUs, and the CPUs would attempt to lock out all the other CPUs, but only one would succeed. That winning CPU would be responsible for bootstrapping the monitor into memory, and patching various configuration tables based on the resources which were available in the system.

Pervasive Security Model

A well thought out security model was designed to appeal to the business, accounting, and banking markets.

The core security boundary was the account, which was arranged in a hierarchy of four levels of account control. It is described in the next section.

Like most operating systems, there was the concept of user state and privileged state. The user mode programs had no ability to access resources directly, other than the memory pages owned by that user. All other resources were handled by making an "XREQ" (eXecutive REQuest) call to into the monitor.

All files stored on removable media, including the primary disk packs as well as tape backups, were encrypted. It took extra work to save unencrypted files, typically when generating a tape to be exported to a different computer system.

Each account had limits on the resources made available by its superior account, including cumulative and per-session CPU time, cumulative wall-clock time, saved file block limit, and scratch file block limit.

Account Model

The operating system had the concept of "accounts," with four levels of account hierarchy. The "system" accounts were at the top of the pyramid. These in turn granted resources and privileges to division level accounts. Division level accounts delegated to project level accounts, and these controlled user-level accounts.

The default state was for all files to be private to an account. An account had an access control list, basically a list of other accounts which were granted various levels of permission to use all files within the account. For example, an account might permit all people in his division to read all files, and grant a specific list of people read/write privileges.

Beside the per-account access list, there was a per-file access list, offering the same types of privileges. In both per-account and per-file access lists, the permissions could also be tied to a password as an extra measure of security.

Although superior accounts were able to do anything an inferior account had permission to do, a superior account could also relinquish some or all of those permissions. Once, done, these permissions could be restored by the inferior account.

The security system also recognized that the responsibilities of the system administrators often were different than those of managers. Although system accounts were higher in the security hierarchy, individual administrator accounts were typically set up so they didn't have permission to access private files. Instead, they were in charge of managing print queues, mounting and dismounting disk volumes, and monitoring the process table. There was a MASTER account, though, that had the ability to do anything.

Groups of accounts could be "encapsulated," meaning they were walled off from the rest of the system. An account inside the barrier couldn't share any files with people outside the barrier, and was unable to write to files outside the barrier. This would be useful for ensuring the accounting department files were inaccessible to anyone outside of accounting, even if someone in accounting attempted to defeat security.

File System

The file system was flat for an account, other than the schism between the normal files and library files.

The user process had available 202 virtual I/O channels, known as LUNs (Logical Unit Numbers). By default 1 and 2 were the standard in and out channels, respectively. The first 200 could be assigned by the process to point to a file or other device. LUN 200 was always pointing at the file holding the executable. LUNs 201 and 202 were not assignable, and were always pointing at the user's terminal if in interactive mode, otherwise in a batch process they pointed at the virtual card reader and spooled line printer device.

Beside mapping a LUN to a specific file, it could be mapped to one of a number of logical devices:

.TERM
user terminal
.LP
line printer (spooled)
.MT
magnetic tape drive
.NULL
"write only memory"
.CDR
card-image reader's view of .TASK
.DIR
the directory of an account library
.LOCK
inter-process semaphore
.PATH
inter-process communication link
.CODE
executable program memory image file
.SAF
sequential access file
.RAF
random access file

Note that the CDR card reader was used by batch programs, and what it really pointed at wasn't a card reader but a sequential file containing lines of text emulating a card reader.

Different logical devices had various properties associated with the logical file type. For instance, the .TERM type had information about the width of the terminal, the number of lines per page, baud rate, terminal type, etc.

BTI 8000 Software (link)

This needs to be fleshed out, but in short, major tools were:

In 1985/1986, there was a project to develop a C compiler for the machine. One engineer worked on it, and supposedly got pretty far. It parsed standard C and had multiple passes making both high level and peephole optimizations. It was canceled before it was production ready, and the engineer working on it left to work developing mapping software for a charitable organization. (Does anybody remember? Perhaps Mike Byron?)

BTI 8000 Compute Modules (link)

The BTI 8000 had four different classes of resources. A minimal system had one of each, and more than one of each could be added to a system to increase throughput and capacity. These four were named:

SSU
System Services Unit
MCU
Memory Control Unit
CPU
Computational Processing Unit
PPU
Peripheral Processing Unit

Each board in the system was a 20" x 23" card that plugged into a 16 slot backplane. The CPU was self-contained, but some of the others were connected via ribbon cables to more distant resource; for example the memory controller was cabled over to another cabinet containing the core memory modules. Every board in the backplane was microcoded to allow self test and intelligent configuration.

At the time the 8000 was introduced, it wasn't practical to build an eight layer 20" x 23" board. Instead, the bus interface logic and power distribution were laid out using the copper on the board, and the rest of the wiring was wire wrapped by machine. A Plexiglas sheet was mounted on the rear of each board to prevent accidentally snagging any wires while adding or removing boards from the system.

SSU (System Services Unit)

The SSU performed some simple management functions. Normally, only one SSU was used in a system, but a second SSU could be installed, in which case the redundant one was used as a hot backup in case the primary SSU failed.

The SSU contained:

  • the master system oscillator
  • a real time clock
  • an interface to front panel switches and a vacuum fluorescent status display
  • a system monitor for abnormal power and temperature conditions
  • a remote diagnostic interface, allowing BTI technicians to log in remotely
  • a permanent, unique ID, which allowed both BTI and 3rd party software to be locked to a specific machine

At system reset, all the resources would perform self-test. The SSU would wait until all tests were complete, then poll all the slots to figure out which cards were present and healthy. Any failures would halt the system and be reported on the monitor panel, a small vacuum fluorescent display. The SSU would then enable all the CPUs to run. One CPU would lock out the other CPUs from running, and would then start the monitor bootstrap process. The monitor would use the system configuration data collected by the SSU to establish OS configuration tables.

The SSU was usually positioned in one of the middle slots. As the source of the backplane clock, and thus the clock for the entire system, a central position minimized the clock skew between boards.

The SSU used the Signetics 8X300 microcontroller for its intelligence. The 8x300 was one of the earliest microcontrollers, and had a reputation for being an ugly beast to program, and for running quite hot, as it was implemented in bipolar logic.

Ron Crandall adds:

I found it to be just another relatively simple instruction set device. What made this installation interesting is that the instruction word was a full 24 bits... 16 for the 8x300 and 8 more to control various gates on the board. So each instruction was a 16 bit opcode for the 8x300 and 8 more bits that controlled what the 8x300 saw on its buses as it executed the instruction.

MCU (Memory Control Unit)

Originally, and for most of the life of the 8000, an MCU was simply an interface, and didn't directly control any memory. The MCU ran some quick diagnostics after reset and took care of the backplane bus protocol.

Requests from the bus were sent via ribbon cables to an external box, mounted in a second cabinet, which contained core memory and the actual core memory timing, driver, and sense circuitry.

The MCU had minimal pipelining. It could accept two operations before it started turning away new requests. Even those two requests weren't pipelined, other than the act of transmitting the request across the bus. While the first command was being processed, the second command simply sat in an input buffer, waiting its turn.

It was a system feature that the MCU directly supported atomic operations. The CPU defined a number of instructions as atomic, meaning a memory location would be read and the MCU would deny all other requests until the card performing the read/modify/write wrote back the updated value.

Although the backplane bus protocol was largely fair, there was a slight latency advantage to cards in the lower slot numbers. Therefore, it was advantageous to place the memory controllers in the lower numbered slots, since read latency critically affected system performance.

The core-based MCU's could be expanded in increments of 128 KB. A minimal system required at least 256 KB total, although practically all systems had more than this.

In 1985 or 1986, BTI designed a new memory controller that had an array of 64Kb DRAM chips mounted on board. SECDED ECC logic performed error checking and correction; a Z80 performed extensive diagnostics at power up of both the DRAM array and the board's control logic. The Z80 also logged any corrected errors so later this log could be inspected to see if a given chip or column was marginal. The board also used idle cycles to sweep through memory, with the hope that single bit errors could be repaired before they turned into double bit (uncorrectable) errors.

CPU (Computational Processing Unit)

A minimal system would have a single CPU, and high end systems could have up to eight CPUs. It wasn't architecturally specified as such, but none of the CPUs designed by BTI had any cache. Every instruction or data operation went over the backplane to a MCU. Instructions were fetched with double word transfers, which both increased backplane and efficiency, and acted as a kind of prefetch.

Because of the long latency to memory (around 12 cycles, or 750 ns, in the best case), the CPU would initiate the fetch of the next instruction as soon as it knew the instruction wasn't a branch.

An earlier CPU design used eight 74F181 4b ALUs as the core computing element.

Five different CPU designs were done, although I'm not sure how many of them were shipped. The last one, CPU5, was completed in 1985. John Kinsel designed the hardware; Jeff Libby wrote the microcode. The heart of the CPU used eight 29C03 4b bit slice chips and a 29C11 (?) sequencer. It also used a Z80 supervisory processor to run diagnostics at power up.

CPU5 was very horizontally microcoded, with a 108 bit wide microword (96 functional, 12 parity). The microcode store was 8K words deep, and all in RAM; this allowed upgrading the microcode in the field by swapping in a different daughter card containing a bank of EPROMs.

In addition to the 2903 ALU, the CPU5 datapath logic contained a barrel shifter and some other assorted logic. The microcode assembler was intelligent and could compute the number of cycles it would take to compute a given result; this value was stored in the microword. Thus, different microinstructions took different amounts of time.

Considering the multicycle microinstructions, the lack of a cache, and the long latency to read memory from an MCU, performance of a given CPU wasn't stellar. A single CPU would use typically less than 10 percent of the available backplane bandwidth. Even though it was slow compared to higher end 32b CPUs of the era, the BTI 8000 was still at least three times faster than the then top-end BTI 5000. Note that the CISC instruction set of the 8000 was designed to maximize the amount of useful work done for each instruction word fetched from memory. This meant that in a head to head comparison with a more conventional machine, the 8000 would do the same work while fetching about half as many instructions. This helped mitigate the speed deficiencies of the architecture.

The slow CPU perversely had a system benefit; it made the simple backplane bus practical as a means of increasing system throughput. If a single CPU had been able to saturate the backplane bandwidth, it would have precluded adding more CPUs as a means of increasing performance. As it was, BTI estimated that seven CPUs in a system running a typical mix of operations ran as fast as about five and a half ideal CPUs.

PPU (Peripheral Processing Unit)

The PPU was essentially a DMA engine. Each PPU could connect to four I/O controllers over two high speed and two low speed channels. For instance, the disk controller used a high speed channel, and the terminal muxes sat on a low speed channel.

A CPU could set up a DMA channel operation in memory, consisting of a list of registers to poke in a given I/O controller, a transfer of a given size to/from a given memory block; a sequence of these could be chained together. Once the channel program was constructed, the CPU would point the PPU at it, go on to some other process, and the PPU would take care of it.

Because there were multiple CPUs and a process could switch between CPUs frequently, it made no sense for a completed PPU program to interrupt a CPU. Instead, the PPU channel program would be told to write a given word to a particular location in memory. The next time the monitor program was sweeping the suspended process list, looking for work to do, it would find the notice from the PPU that the requested work was done, and the CPU would move the process from the suspended process list (or whatever action was appropriate).

The PPU, acting as a DMA engine, had a byte wide interface to each I/O controller (via ribbon cables), with FIFO decoupling on each channel. The PPU took care of all the byte packing and unpacking and address generation so each I/O controller could be simplified. High speed channels at 10 MHz, and the low speed channels at 5 MHz.

The list of I/O controllers included:

  • disk controller
  • 9-track real-to-real tape controller
  • cartridge tape controller (good for backups)
  • printer controller
  • terminal controller (up to 19.2 Kbps)

BMB (Bus Monitor Board)

This board was used only by the developers. In essence, it was a logic analyzer custom made for the BTI 8000 bus protocol. Because only a couple were ever built, it was a write-wrapped affair.

A Z80 was able to set up a few triggers and capture events meeting some constraint. It was useful for, say, finding all the traffic between the MCU and a given disk controller, or looking for the first read after a certain address was written with a certain value.

It was flexible enough that I was able to write code to do statistical analysis of the mix of reads, double word reads, writes, callbacks, etc. on the bus.

BTI 8000 Instruction Set Architecture (link)

The BTI 8000 was architected in the mid 1970s, when complex instruction sets, as typified by the DEC VAX computer, was state of the art. Memory was a very expensive commodity, and it was thought that highly encoded instruction sets would make the most use of this expensive resource. At the time, core memory was still a viable technology for main memory.

The instruction set was defined by the software architecture group. Many features of the instruction set were chosen for performing OS-centric operations, such as operating on linked lists, performing atomic read/modify/write operations, and automatic subroutine linkage tasks. The focus was on encoding as much information in as few bits as possible, and in operating on arbitrary sized fields a fundamental operation. While these did make efficient use of the limited memory, it greatly complicated the CPU design, and made some of the operations very slow.

Here are a few examples of the complications.

The order of execution of the calling sequence is as follows:

  1. The CALL instruction reads up the instruction word at location S and verifies that it is an ENTR instruction. Note that the CALL-ENTR pair assumes parameters to follow. If no parameters are to be passed, the instructions used are CALLNP-ENTRNP. A mismatch causes a fault. It then uses the operand of the ENTR instruction to save R7. It then places the address of the ENTR instruction plus 1 into R7. The program counter is advanced to the next instruction.
  2. The PAR A1 instruction is executed. It places the address of the operand A1 into R0. Then it swaps the incremented program counter with REG7.
  3. The STP F1 instruction is executed. It verifies that the passed value is an address (not a value). It copies the address of A1 (from R0) into the operand F1. Then it swaps the incremented program counter with REG7.
  4. The PAR A2 instruction is executed. It places the address of the operand A2 into R0. Then it swaps the incremented program counter with REG7.
  5. The STPV F2 instruction is executed. It sees that the caller gave an address, so it dereferences the address and stores the resulting value into F2. Then it swaps the incremented program counter with REG7.
  6. The PAR2 A3 instruction is executed. It places the address of the double operand A3 into R0. Then it swaps the incremented program counter with REG7.
  7. The STP2 F3 instruction is executed. It verifies that the caller gave an address for a double word, so it stores the address into F3. Then it swaps the incremented program counter with REG7.
  8. The PARV A4 instruction is executed. It places the contents of operand A4 into R0. Then it swaps the incremented program counter with REG7.
  9. The STPV F4 instruction is executed. It verifies that the caller gave an value so it stores the value into F4. Then it swaps the incremented program counter with REG7.
  10. The PARL A5 instruction is executed. It places the address of operand A5 into R0. Then it swaps the incremented program counter with REG7.
  11. The STPL F5 instruction is executed. It verifies that the caller gave an address so it stores the address into F5. It verifies that the caller specified that this was the last parameter. Then it continues to the next instruction in the called routine.
  12. The function runs and the LEAVE instruction places REG7 into the program counter to resume the callers context and restores REG7 from the indicated operand.

Supposedly the person writing the microcode for the first CPU exclaimed, facetiously, that the listing for the CPU microcode was larger than the listing for the OS.

User State

Like most OS's, there was an explicit model of the user state. BTI called this the virtual machine. By having a clear definition of this state, multiple generations of CPUs could run the same code without modification. It was even normal to have a system containing multiple CPUs of different generations with the processes running on a different type of CPU each time slice (typically 100 ms, or until the process blocked on a resource).

The user's concept of the computer was:

  • one 17b program counter

    17 bits was enough because the virtual address space was 128 Kwords (512 KB).

  • one 15b process status register

    This held various user-accessible mode bits and flag state. For instance, mode bits indicated if memory locations containing uninitialized values were to cause a trap. Other bits contained the results of the most recent comparison.

  • eight 32b registers

    All eight registers were nearly general purpose, although a few specialized instructions were hardwired to use certain registers, such as CMOVE (character move). Other registers were assigned a dedicated function via software convention, such as using R6 as the stack pointer.

  • a 17b current console area register

    This pointed to a 10-word area in the user space where the user state was stored in the event of an interrupt. Ten words were enough to hold the program counter, the process status register, and the eight general purpose registers.

  • 512 KB of memory

    A system could contain more than 512 KB, but any single process was limited to a total of 512 KB virtual address space to hold all the code and data. Although the user saw 512 KB, it was actually organized into 4 KB pages that could be swapped between main memory and disk. The OS also allowed limiting a given process to less than the total 512 KB.

Memory Paging

With a limited virtual address space of only 128K words, the paging system was very simple: a single table containing the mapping for 128 pages of 4 KB per page was sufficient. This table lived on the CPU in a small SRAM. The bottom 10 bits of the address were unchanged and indexed a word within the page, and upper 7 bits of the virtual address indexed the mapping table, producing the physical page address and other status.

The page mapper had 256 entries: 128 for the current user space, and 128 for the monitor. One bit in the monitor status register indicated if the CPU was in user mode or privileged mode, and that selected which half of the mapping table was in use.

Each page table entry had 20 bits, with various fields.

  • 4 bits indicated which slot to address. Any slot could be specified, not just an MCU
  • 12 bits provided the physical page address (up to 16 MB per slot)
  • four control bits, indicating whether the page was resident, whether it was modified, whether it had been accessed (useful for LRU aging)

Data Types

The BTI 8000 had instructions that operated on a number of data types. Most of them are tersely listed here.

  • 32 bit fixed point
  • 64 bit fixed point
  • 64 bit floating point
  • bit field from 1 to 32 bits long
  • 8 bit character (extra support vs. the generic bit field addressing)
  • 32b pointer
  • linked list primitives
  • pushdown stack primitives
  • miscellaneous

The machine used two's complement arithmetic, but an optional commercial instruction set added extensive operations for supporting variable sized BCD math operations, and things like "FIELD EDIT" opcodes (like a PRINT USING statement in a single instruction).

For integer and floating point values, a unique "uninitialized" value was defined by the instruction set. The uninitialized value was an msb of 1, with 31 or 63 trailing zeros. This corresponds to the most negative value in a two's complement number system. If the uninitialized value checking was enabled, a trap occurred if any operands were seen with that value.

Instruction Formats

Lacking an instruction set reference manual, the following information has been paraphrased from a paper BTI present in AFIPS Volume 48 National Computer Conference (1979, pp. 513-528).

All instructions in the BTI 8000, without exception, were 32 bits wide and aligned on 32b boundaries. The first ten bits supplied the major opcode, but some instruction formats encoded sub-opcodes in other parts of the instruction word.

Like most computers, the BTI 8000 trapped any illegally encoded instructions. The designers designated a word of all 0s or all 1s to be illegal, as well as any opcode that started with 0x20, ASCII space. These values were deemed the most likely data words, and so making them illegal meant that errant programs would more likely get trapped before doing harm.

There were a total of about 200 opcodes, and around 30 different addressing modes. Helping keep things sane, just about any address mode that made sense could be used for any opcode.

  1. Immediate

    Format: [10b opcode][5b mode][17b field]

    In this format, bits [16:0] are used to form either an immediate value, or an immediate address. Depending on the size of the operand called out by the opcode, the immediate value may be expanded to 32 bits or 64 bits.

    • the 17b field is right justified and zero filled to form an immediate
    • the 17b field is right justified and ones filled to form an immediate
    • the 17b field is left justified and zero filled to form an immediate
    • the 17b field is the word address of an operand in memory
    • the 17b field is the word address of an indirect pointer in memory
  2. Indexed Memory

    Format: [10b opcode][2b mode][3b idx reg][17b address]

    This either supplies the address of a word in memory, or it supplies a location in memory of a pointer to another location in memory. The index register value is then added to that address to provide the location in memory where the operand resides. Instructions with double word length use an offset of two times the index register value.

    • 17b direct address
    • 17b indirect address
  3. Base Register

    Format: [10b opcode][5b mode][3b base reg][4b submode][10b offset]

    There are six different modes that use this format; their behaviors are complicated and not described here.

    • register to register
    • register indirect
    • word array
    • character array
    • formal parameter
    • stack
  4. Indexed Base Register

    Format: [10b opcode][5b mode][3b base reg][3b idx reg][1b submode][10b offset]

    This format is like the Base Register format, except there is a smaller offset field, and an index register value is added to the effective address that the plain Base Register format would compute.

    • register indirect
    • word array
    • character array
    • formal parameter
  5. Type Conversion

    Format: [10b opcode][5b mode][3b reg][4b submode][3b unused][2b type][5b unused]

    This format is used to convert between 32 bit fixed point, 64 bit fixed point, and 64 bit floating point formats. The fixed point formats can be treated as signed or unsigned, and conversions can be specified to round or not in case of loss of precision.

  6. Byte

    Format: [10b opcode][5b mode][3b base reg][5b bit][5b field len][4b offset]

    The instruction set has no shift or rotate instruction. Instead, this format is used by some instructions. In one mode a register is viewed as a circular list of bits and the instruction specifies an arbitrary field starting at an arbitrary offset. In the other mode the register specifies a word in memory where the bit field exists and again, an arbitrary field can be extracted. Rotates and shifts can be obtained by using Load-Effective-Address instruction and this addressing mode.

    • register ("circular")
    • array ("zigzag")

Words of memory which are used as pointers are also encoded:

[2b character][3b bit][5b field len][5b mode][17b address or immediate]

The "mode" field is akin to the (A) format above. Which fields were used and how they were interpreted depended on the operation. Note that a pointer could point to not just a word in memory, but an arbitrary 1-32b field in memory. Other wonders were possible. In array mode, the offset value is multiplied by the field size and the appropriate math is carried out so that a packed array of arbitrary (1-32b) values could be directly addressed.

Instruction Set Summary (link)

This set of instructions was lifted from BTI_8000_Technical_Summary_Sep78.pdf.

APPENDIX A: SUMMARY OF USER-MODE CPU INSTRUCTIONS

A.1 Fixed Point Arithmetic

ADD
operand added to contents of specified register, result stored back in that register
ADDM
("add to memory") as above, but result replaces operand instead of register
ADDB
("add to both") as in ADDH, but result also stored in register
ADD2, ADD2M, ADD2B
double-word analogs of above
SUB
operand subtracted from contents of specified register, result stored back in that register
SUBM, SUB2, SUB2M
see ADD family
RSB
("reverse subtract") contents of specified register subtracted from operand, result stored back in that register
RSBM, RSB2, RSB2M
see SUB family
MUL, MULM, MUL2, MULZM
multiply family (see ADD, SUB)
DIV, DIVM, DIV2, DIV2M
divide family
RDV, RDVM, RDV2, RDVW
reverse divide family
LD, LDN (N="negate"), LD2, LDN2
load register family
INCL, INCL2
Increment operand by 1, then load reg. with this new value
ST, ST2
store register (single, double)
STW, STW2, STMW, STMWZ
store the value "one" (W) or "minus one" (MW)
STU, STU2
store the value "undefined" (hexadecimal 80000000)
STZ, STZZ
store the value "zero"
EXCH, EXCH2
exchange register, operand
INC, INC2, DEC, DEC2
increment/decrement operand by one
INCP, DECP
increment/decrement pointer. These instructions assume the operand is a pointer. The bit length of the pointed-to entity (carried in the pointer) is added to/subtracted from its bit address, thus moving the pointer forward/backward one entry, no matter what the size of the entry.

A.2 Floating Point Arithmetic

These instructions deal with 64-bit (double word) floating-point operands, which have 11-bit biased exponents and 52-bit mantissas. Double-precision floating-point operands (128 bits) are generated and manipulated by software.

FAD, FADM, FADB
floating add ("to memory", "to both")
FSB, FSBM, FMU, FMUM, FDV, FDVM
floating subtract, multiply, divide
FRSB, FRSBM, FRDV, FRDVM
floating reverse subtract, reverse divide
FINC, FDEC
floating increment, decrement memory (by one)
FINCL
increment floating-point operand by 1, then load adjacent registers with this new value

A.3 Boolean Arithmetic

AND, ANDW, AND2
similar to fixed-point ADD family
BSUB, BSUBM
result = register AND NOT operand (Boolean subtract)
BRSBM
Boolean reverse subtract to memory
IOR, IORM, IOR2
inclusive OR family
XOR, XORM, XOR2
exclusive OR family
SETT
(set and test) set operand to one after setting condition bits to comparison of register and operand (used for locking of critical regions)

A.4 Jumps

Unconditional
JMP (load Program Counter with operand)
Conditioned on PSR condition bits
JCC,JCS (if carry clear/set), JOC, JOS (if overflow clear/set), JEQ, JNE, JLT, JGT, JLE, JGE
Conditioned on comparison of register contents to zero ("Z") or minus one ("MW")
JEQZ, JEQZ2, JNEZ, JNEZ2, JLTZ, JLTZ2, JGTZ, JGTZ2, JLEZ, 3LEZ2, JGEZ, JGEZ2, JEQMW, JNEMW
Bit tests
JBT, JBF (if bit in register true/false)
Address tests
JZA, JNZA ( if address field of register zero/non-zero)
Register increment/decrement
IRJ, DRJ (inc/dec register, then jump if result not equal to zero); JIR, JDR (if register not equal to zero, inc/dec register and jump)
Linkage jumps, conditioned on zero/non-zero address field fetched through register
LJZA, LJNA (load register with address field of word it points to, then jump if result zero/non-zero); RLJZA, RLJNA (remember, 1ink, and jump -- save register in adjacent register, then proceed as in LJZA, LJNA)

A.5 Subroutine Linkage

Several instructions are provided for subroutine 1inkage; they check entrypoints and provide parameter type-checking for the subroutine. The calling sequence and the entry sequence are executed part by part, passing one parameter at a time with the PAR (pass parameter) instructions on the calling side and corresponding STP (store parameter) instructions on the subprogram side. These instructions specify the parameter type (including "2" for doubleword), whether the parameter is being passes by location or value ("V"), and whether this is the last ("L") parameter in the protocol.

CALL, CALLNP ("NP" = no parameters)
begin linkage from calling side
ENTR, ENTRNP, ENTRS ("S" = start, for non-standard parameter passing)
begin subroutine
PAR, PARZ, PARL, PARZL, PARV, PARVZ, PARVL, PARVZL
pass parameter
STP, STPZ, STPL, STPZL, STPV, STPVZ, STPVL, STPVZL
store parameter
LEAVE
leave subroutine
LDPC, LDPCS
load Program Counter ("S" = also load status bits)
EXPC, EXPCS
exchange Program Counter (and status) with operand
JSR
jump and save return address in register

A.6 Compare Instructions

CKB, CKB2, FPCKB, I2CKB
bounds checking for array indexing
CPR, CPR2, UCPR, UCPRZ
signed/unsigned compare register with operand
MCPR
masked compare register with operand (adjacent register selects bits)
CMZ, CMZ2
compare operand ("memory") to zero
STLEQ
store logical one ("1") iff condition bits = "EQ", else store zero
STLNE, STLLT, STLGT, STLLE, STLGE
as above for other conditions

A.7 Character Instructions

These instructions are interruptible, and deal with character strings whose starting address and length are given by register values. The CMOVE instruction loads and stores whole words and thus is quite efficient no matter what the character alignment might be.

CSRCH
search for a specified character in a specified string
CMS
compare strings (can be paired with CSRCH to search for substrings)
CMOVE
move string

A.8 Miscellaneous Instructions

LDPSR, STPSR
load/store Process Status Register
CLPSR, IORPSR, XORPSR
PSR bit manipulation
HIB, HIB2
find location of leftmost one-bit in operand
LEA, LEA2
load effective address (generate a pointer)
XCT
execute operand as if it were an Instruction (one level only)
LSRCH
linked list search. Searches through a linked list of structures for a match between the value in a specified part of each structure and a value in a register (or register pair)
PMUT
(permute) Using a 32-word table, this instruction can permute bits in a register, encrypt data, compute parity, and form block checksums.
NOP
no operation

A.9 Address Modes

In addition to specifying a register, many instructions also specify an operand through an address mode field. Address mode parameters can in turn involve the specification of one or two registers used to arrive at an operand. Indirect addressing proceeds through "pointers", which themselves specify five different methods of addressing. The following summary is by class, with the number in parentheses representing the total number of modes in each major class. The distinction between single-word and double-word addressing (for word-size operands) is not considered In this count, since that distinction is made in the instruction operation-code field.

( 1) DIRECT
( 1) INDEXED
( 3) IMMEDIATE
( 5) INDIRECT
( 2) INDIRECT AND INDEXED (first indirect, then indexed)
( 1) REGl (register select, with value biased)
( 1) ARWDl (offset from base register)
( 1) CACHl (offset to character from base register)
( 5) FPVRl (offset from base register, then indirect)
( 1) REG2 (as in REGl, but indexed)
( 1) ARWD2 (offset from base register, then indexed)
( 1) CACH2 (offset from base register, then indexed to character)
( 2) FPVR2 (offset from base register, then indirect, then indexed)
( 1) CBM (circular bit-string mode)
( 1) ZBM (zig-zag bit-string mode)
( 1) STK (stack mode)
( 4) TCONV (type conversions: integer/floating-point, etc.)

Totals: 32 address modes through 17 classes

Trivia (link)