Computer History Of The World, Part 9

by Bob Seidel

(It's been a while since I continued with my computer history series, so let's get back on track.)

I think that I have had a bit of a mental blockage in writing about my next few years at IBM. Although the work was technically challenging and rewarding, there were other aspects of the job that were less than pleasant. In fact, it was pretty bad.

The time was the early 80's, and I had relocated to IBM Poughkeepsie (NY) - the home of giant mainframe computer development. This effort required literally tens of thousands of people among three sites in the Mid Hudson Valley, and of course I was a small cog in that operation. But I did have a fairly important and interesting area - Recovery. Very few people (especially those only used to today's PCs) did not realize that the mainframe computers of that day (the IBM 308x and 3090 families) were essentially self-diagnosing and self-repairing. Oh, a human being had to actually plug in the repair part, but that was only after the computer itself diagnosed the problem, decided what needed replacing, and dialed up the support center to request the repair. It worked like this:

There were a number of independent and redundant elements in those mainframe computers - more than one processor, more than one memory bank, more than one I/O (Input/Output) channel, etc. Each element was built with very advanced circuitry that detected errors. When an error was detected, the element would stop (but not the entire computer) and the service processor notified. The service processor (a separate computer within the mainframe) was able to literally look at the state of all the circuitry in the failing element. A log of the error was recorded, and then using that state information, the service processor would recover the error in the element. For example, if there was a parity error (parity is a computer code that detects errors), the service processor could re-write the logic with the corrected parity. When the element was recovered, it was put back on-line.

Only if an element failed again during recovery, or failed again within a specified period of time, was it considered permanently bad and taken off-line for good. While this recovery was going on, the service processor also ran an analysis of the failed data and recommended if a part should be replaced. If so, the service processor dialed up a connection to a support center and scheduled a repair. All done without human intervention and the customer would never notice a thing unless the entire computer had to be removed from service.

Now, this is pretty techie stuff, and I am sorry about that. For a techie like me it was (or could have been) heaven. But wait.

The head of mainframe development at that time was, quite frankly, a tyrant. Now I really do have ambivalent feelings about him. I don't know of anyone else who could have done that job he did, and he was certainly responsible for a huge amount of revenue to the IBM Corporation. But he ran his shop with an iron hand. We had management meetings all day long. The first meeting was shortly after the start of the workday - to look at problems discovered in testing the prior night (testing ran 24x7) and to prioritize personnel and equipment to look at them. We then had a mid-day meeting to gauge progress, and after normal work hours was the final day's status meeting, with the head himself. There were also informal meetings held during the day, so that we managers could have the right story to tell at the formal meetings. It was not fun, folks. The only thing that kept a lot of us going was the knowledge that what we did was so significant to the corporation.

And this went on for about three years for me. At the end of the project, I took the opportunity to depart from large mainframe development and get into something new - supercomputing.

(Bob Seidel is a local computer consultant in the Southport / Oak Island area. You can visit his website at www.bobseidel.com or e-mail him at bsc@bobseidel.com).