The Therac-25 — the Software Bug That Radiated Patients to Death
Summary
When Atomic Energy of Canada Limited (AECL) shipped the Therac-25 medical linear accelerator at the start of the 1980s, it marketed a machine whose safety the manufacturer had quietly relocated from steel to software — and the gap between that promise and the harm was eventually measured in carbonized tissue and dead patients. Earlier models, the Therac-6 and Therac-20, had retained electromechanical interlocks: physical hardware that mechanically blocked the high-power photon beam unless the beam-spreading and flattening apparatus was correctly in place. The Therac-25 deleted those interlocks to cut cost and add flexibility, trusting reused, single-author, unreviewed control code to keep the two beam modes — low-current electron and ~100x-stronger raw photon — from being confused. They were confused. Between June 1985 and January 1987, six patients received massive overdoses; at least three died.
The lethal mechanism was a race condition, not a melodrama. If an operator at the VT-100 terminal entered the prescription, then within roughly eight seconds used the cursor to edit the beam mode from X-ray to electron and pressed Enter, a fast typist could outrun the software's set-up routine. The machine's internal state and its physical hardware fell out of sync: the console believed it was delivering a safe electron dose while the accelerator fired an unattenuated photon beam with no spreader in place — a needle of radiation on the order of 15,000 to 25,000 rad against a prescription of roughly 200. A second, independent defect — a one-byte counter that overflowed to zero exactly when an operator hit a particular timing — could disable a safety check entirely. Both bugs were dormant most of the time, which is precisely why they were so dangerous.
For nineteen months AECL insisted the machine could not overdose. After the first injuries the company told hospitals the Therac-25 was incapable of the harm being reported, and could not reproduce the fault in its own facility because its engineers did not type the way an experienced therapist did. The reckoning came not from AECL but from a Tyler, Texas medical physicist, Fritz Hager, who painstakingly reproduced the malfunction, and from the U.S. Food and Drug Administration, which on May 2, 1986 declared the Therac-25 defective under the Radiation Control for Health and Safety Act and required corrective action plans before the machines could resume routine use. The case became — through Nancy Leveson and Clark Turner's 1993 IEEE Computer investigation — the founding text of software-safety engineering: the canonical proof that a computer can be a murder weapon when its makers treat code as inherently safer than the hardware it replaced.
Timeline
Safety, Relocated From Steel to Software
The Therac-25's original sin was an architectural decision that looked like an upgrade. Its predecessors carried independent electromechanical interlocks — physical fuses, microswitches, and mechanical stops that would not let the machine fire a high-energy photon beam unless the beam-flattening hardware was confirmed in the path. These interlocks were dumb, slow, and effective: they did not depend on the correctness of any program. The Therac-25 removed them to reduce cost and grant the software full control over beam shaping and energy. AECL then reused control software written largely by a single programmer, never independently reviewed, never subjected to the kind of testing the new safety burden demanded. The company's own documentation reflected confidence that the software was reliable; its hazard analysis effectively assumed software did not fail. In a machine that could now generate, with no mechanical veto, a photon beam roughly a hundred times stronger than the prescribed electron dose, that assumption was the precondition for every death that followed.
The Race Condition and Nineteen Months of Denial
The defect that killed was a timing window. A therapist would enter the prescription on the console, notice the beam mode was wrong, and use the cursor keys to change X (photon) to E (electron), pressing Enter to confirm. If this correction was completed within roughly eight seconds — a speed only an experienced operator achieved — the editing routine and the set-up routine raced, and the machine's recorded state diverged from the hardware's physical configuration. The console displayed a normal, safe electron treatment; the accelerator delivered a raw, unspread photon beam at full power. Patients described an intense electric shock or burning at the instant of exposure; the dosimetry was so far off scale that operators were sometimes told the machine had delivered no dose at all and were tempted to re-treat. AECL could not reproduce the fault because its engineers typed deliberately, and so for nineteen months it told hospitals the overdoses being reported were impossible. The signal — burned patients, screaming patients, dead patients — existed in the field long before the manufacturer would admit the mechanism existed in its code.
One Physicist, One Agency, and the Recall
The reckoning arrived through individuals, not the manufacturer. At the East Texas Cancer Center, after two patients on the same machine were maimed in three weeks, physicist Fritz Hager refused AECL's assurance that overdose was impossible. Working with the therapist, he reconstructed the exact keystroke sequence and timing that produced the cryptic "Malfunction 54" error, demonstrating a repeatable, lethal fault that AECL's own engineers had been unable to find. The documented reproduction is what forced the issue. On May 2, 1986 the FDA declared the Therac-25 defective under the Radiation Control for Health and Safety Act, ordered AECL to warn every user, and demanded a corrective action plan; the device could not return to routine service until the agency was satisfied. AECL's first plan was rejected as inadequate, and it took multiple revisions — and a sixth overdose at Yakima in January 1987 that exposed a separate latent counter-overflow bug — before the FDA accepted a fix that restored hardware interlocks and eliminated the unsafe editing path. AECL eventually withdrew from the linear-accelerator market entirely.
Contributing Factors
Aftermath
The immediate consequence was a recall and a manufacturer's exit: AECL's corrective action plan ultimately reinstated the hardware interlocks it had removed, closed the unsafe editing path, and the company withdrew from the medical-accelerator business that the Therac-25 had been built to win. The durable ripple was intellectual. Nancy Leveson and Clark Turner's July 1993 investigation in IEEE Computer — twenty-four pages assembled from FDA records, depositions, and AECL correspondence — became the foundational case study of an entire discipline, taught in virtually every software-engineering and engineering-ethics curriculum since. From it came the now-axiomatic principles that safety is a system property and not a software property; that you cannot test reliability into a safety-critical system; that overconfidence in software is itself a hazard; and that user reports of impossible failures must be investigated as if they were possible. What remains is a single proper noun used as a warning. To invoke "the Therac-25" is to name the moment a computer killed people because its makers believed the code was safer than the steel it replaced — the byword for the race condition, the removed interlock, and the manufacturer that would not listen.
Lessons
- Never remove a hardware fail-safe in favor of software without raising the software to a far higher assurance standard; a dumb mechanical interlock that cannot be argued with is worth more than elegant code you merely trust.
- Treat reused or inherited code as unproven in its new context — bugs that were harmless because old hardware masked them become lethal the instant that hardware is gone.
- Assume concurrency and timing faults exist below the resolution of your test scripts; design so that a race cannot place the machine in an unsafe physical state, because you will not reliably reproduce the bug before a user does.
- When the field reports a failure your model says is impossible, the model is the suspect — investigate every "impossible" report as a live hazard rather than reassuring the customer it cannot happen.
- Do not let a regulator's silence read as a safety verdict; absence of pre-market scrutiny is a gap to be filled by your own hazard analysis, not a permission to ship.
References
- Therac-25 Wikipedia
- An Investigation of the Therac-25 Accidents Nancy G. Leveson & Clark S. Turner, IEEE Computer 26(7):18–41, July 1993
- Notorious software bug was killing people 40 years ago — at least three people died after radiation doses 100x too strong from the Therac-25 Tom's Hardware
- An Investigation of the Therac-25 Accidents (full text, PDF) Leveson & Turner, IEEE Computer 1993 (Stanford-hosted full text)
- Fatal Dose — Radiation Deaths linked to AECL Computer Errors Canadian Coalition for Nuclear Responsibility