← back to the index
DV-015 radiation oncology 1986

The Therac-25 — the Software Bug That Radiated Patients to Death

Patients implanted
11 units installed (U.S./Canada); 6 known overdose accidents
Failure or harm
≥3 deaths; doses ~15,000–25,000 rad (~100x intended)
In use
~1982–1987 (accidents Jun 1985 – Jan 1987)
Status
Recalled (Class I)

Summary

When Atomic Energy of Canada Limited (AECL) shipped the Therac-25 medical linear accelerator at the start of the 1980s, it marketed a machine whose safety the manufacturer had quietly relocated from steel to software — and the gap between that promise and the harm was eventually measured in carbonized tissue and dead patients. Earlier models, the Therac-6 and Therac-20, had retained electromechanical interlocks: physical hardware that mechanically blocked the high-power photon beam unless the beam-spreading and flattening apparatus was correctly in place. The Therac-25 deleted those interlocks to cut cost and add flexibility, trusting reused, single-author, unreviewed control code to keep the two beam modes — low-current electron and ~100x-stronger raw photon — from being confused. They were confused. Between June 1985 and January 1987, six patients received massive overdoses; at least three died.

The lethal mechanism was a race condition, not a melodrama. If an operator at the VT-100 terminal entered the prescription, then within roughly eight seconds used the cursor to edit the beam mode from X-ray to electron and pressed Enter, a fast typist could outrun the software's set-up routine. The machine's internal state and its physical hardware fell out of sync: the console believed it was delivering a safe electron dose while the accelerator fired an unattenuated photon beam with no spreader in place — a needle of radiation on the order of 15,000 to 25,000 rad against a prescription of roughly 200. A second, independent defect — a one-byte counter that overflowed to zero exactly when an operator hit a particular timing — could disable a safety check entirely. Both bugs were dormant most of the time, which is precisely why they were so dangerous.

For nineteen months AECL insisted the machine could not overdose. After the first injuries the company told hospitals the Therac-25 was incapable of the harm being reported, and could not reproduce the fault in its own facility because its engineers did not type the way an experienced therapist did. The reckoning came not from AECL but from a Tyler, Texas medical physicist, Fritz Hager, who painstakingly reproduced the malfunction, and from the U.S. Food and Drug Administration, which on May 2, 1986 declared the Therac-25 defective under the Radiation Control for Health and Safety Act and required corrective action plans before the machines could resume routine use. The case became — through Nancy Leveson and Clark Turner's 1993 IEEE Computer investigation — the founding text of software-safety engineering: the canonical proof that a computer can be a murder weapon when its makers treat code as inherently safer than the hardware it replaced.

Timeline

~1982
The Therac-25 ships
AECL markets a dual-mode (electron/photon) linear accelerator that, unlike the Therac-6 and Therac-20, omits independent electromechanical interlocks and enforces beam safety in software.
Jun 3, 1985
Marietta, Georgia
Katherine ("Katie") Yarbrough receives an estimated 15,000–20,000 rad against a prescribed ~200 rad at Kennestone Regional Oncology Center; she survives with a severe burn. She later sues AECL and the hospital.
Jul 26, 1985
Hamilton, Ontario
A patient at the Ontario Cancer Foundation is overdosed (~13,000–17,000 rad). AECL is notified; the patient dies November 3, 1985, of causes attributed to the radiation injury.
Dec 1985
Yakima, Washington
A patient at Yakima Valley Memorial Hospital suffers a striated radiation burn; AECL attributes it to other causes and denies an overdose is possible.
Mar 21, 1986
Tyler, Texas — first fatal accident
Voyne Ray Cox receives ~16,500–25,000 rad to the back in under a second at the East Texas Cancer Center; he reports a searing electric shock and dies September 20, 1986.
Apr 11, 1986
Tyler, Texas — second accident
Verdon Kidd, 66, is overdosed during facial treatment on the same machine; he dies May 1, 1986 — the first death from a therapeutic radiation accident in North America.
Apr–May 1986
The physicist reproduces the bug
East Texas physicist Fritz Hager works with the therapist to recreate the editing sequence that triggers the malfunction (Malfunction 54), isolating the race condition AECL had failed to find.
May 2, 1986
FDA declares the device defective
Under the Radiation Control for Health and Safety Act, the FDA requires AECL to notify users and submit a Corrective Action Plan; routine operation is to be suspended pending fixes.
Jun 13, 1986
First CAP submitted
AECL files an initial corrective action plan proposing hardware and software changes; the FDA finds it inadequate, and multiple revisions follow into 1987.
Jan 17, 1987
Yakima, Washington — sixth accident
A second Yakima patient receives ~8,000–10,000 rad against a prescribed ~86 rad, exposing a different latent bug (the counter overflow); he dies in April 1987.
1987
Final corrective action and exit
The FDA approves AECL's revised CAP, which restores hardware interlocks and removes the unsafe editing path; AECL ultimately leaves the medical-accelerator business.
Jul 1993
Leveson & Turner's investigation
Computer publishes the definitive 24-page analysis, fixing the Therac-25 as the founding case study in software safety.

Safety, Relocated From Steel to Software

The Therac-25's original sin was an architectural decision that looked like an upgrade. Its predecessors carried independent electromechanical interlocks — physical fuses, microswitches, and mechanical stops that would not let the machine fire a high-energy photon beam unless the beam-flattening hardware was confirmed in the path. These interlocks were dumb, slow, and effective: they did not depend on the correctness of any program. The Therac-25 removed them to reduce cost and grant the software full control over beam shaping and energy. AECL then reused control software written largely by a single programmer, never independently reviewed, never subjected to the kind of testing the new safety burden demanded. The company's own documentation reflected confidence that the software was reliable; its hazard analysis effectively assumed software did not fail. In a machine that could now generate, with no mechanical veto, a photon beam roughly a hundred times stronger than the prescribed electron dose, that assumption was the precondition for every death that followed.

The Race Condition and Nineteen Months of Denial

The defect that killed was a timing window. A therapist would enter the prescription on the console, notice the beam mode was wrong, and use the cursor keys to change X (photon) to E (electron), pressing Enter to confirm. If this correction was completed within roughly eight seconds — a speed only an experienced operator achieved — the editing routine and the set-up routine raced, and the machine's recorded state diverged from the hardware's physical configuration. The console displayed a normal, safe electron treatment; the accelerator delivered a raw, unspread photon beam at full power. Patients described an intense electric shock or burning at the instant of exposure; the dosimetry was so far off scale that operators were sometimes told the machine had delivered no dose at all and were tempted to re-treat. AECL could not reproduce the fault because its engineers typed deliberately, and so for nineteen months it told hospitals the overdoses being reported were impossible. The signal — burned patients, screaming patients, dead patients — existed in the field long before the manufacturer would admit the mechanism existed in its code.

One Physicist, One Agency, and the Recall

The reckoning arrived through individuals, not the manufacturer. At the East Texas Cancer Center, after two patients on the same machine were maimed in three weeks, physicist Fritz Hager refused AECL's assurance that overdose was impossible. Working with the therapist, he reconstructed the exact keystroke sequence and timing that produced the cryptic "Malfunction 54" error, demonstrating a repeatable, lethal fault that AECL's own engineers had been unable to find. The documented reproduction is what forced the issue. On May 2, 1986 the FDA declared the Therac-25 defective under the Radiation Control for Health and Safety Act, ordered AECL to warn every user, and demanded a corrective action plan; the device could not return to routine service until the agency was satisfied. AECL's first plan was rejected as inadequate, and it took multiple revisions — and a sixth overdose at Yakima in January 1987 that exposed a separate latent counter-overflow bug — before the FDA accepted a fix that restored hardware interlocks and eliminated the unsafe editing path. AECL eventually withdrew from the linear-accelerator market entirely.

Contributing Factors

01
Software substituted for hardware interlocks without commensurate rigor
The Therac-25 deleted the independent electromechanical safeguards present in the Therac-6 and Therac-20 and moved beam safety entirely into code, but applied no additional verification to compensate. Removing a dumb, fail-safe mechanical veto in favor of complex software is only acceptable if the software is held to a far higher assurance standard; AECL inverted that logic, trusting the more fragile system more.
02
A single, unreviewed author and reused legacy code
The control software was written largely by one programmer, never subjected to independent code review, and partly inherited from earlier machines whose hardware had masked its latent defects. The same bugs had existed in the Therac-20, where physical interlocks rendered them harmless; ported into an interlock-free machine, dormant faults became fatal.
03
Concurrency defects invisible to ordinary testing
Both lethal bugs — the eight-second editing race and the one-byte counter that overflowed to zero — depended on rare timing coincidences that surfaced only under fast, real-world operator behavior. The faults were essentially untriggerable by deliberate test scripts, which is precisely why AECL could not reproduce them and why end users became the unwitting test population.
04
A nineteen-month refusal to believe the field
AECL responded to early injury reports by asserting overdose was impossible and attributing burns to other causes, rather than treating each report as evidence of an unknown hazard. The institutional failure was not the bug but the disbelief: a manufacturer that dismissed clinicians' and patients' direct testimony for over a year while the same machine kept firing.
05
A reactive, post-harm regulatory posture
No pre-market scrutiny caught the architectural decision to remove interlocks; the FDA acted only after multiple deaths, declaring the device defective in May 1986 under radiation-control authority. Oversight arrived as autopsy rather than prevention, and even then the fix required several rejected corrective plans and one further fatal accident before it was adequate.

Aftermath

The immediate consequence was a recall and a manufacturer's exit: AECL's corrective action plan ultimately reinstated the hardware interlocks it had removed, closed the unsafe editing path, and the company withdrew from the medical-accelerator business that the Therac-25 had been built to win. The durable ripple was intellectual. Nancy Leveson and Clark Turner's July 1993 investigation in IEEE Computer — twenty-four pages assembled from FDA records, depositions, and AECL correspondence — became the foundational case study of an entire discipline, taught in virtually every software-engineering and engineering-ethics curriculum since. From it came the now-axiomatic principles that safety is a system property and not a software property; that you cannot test reliability into a safety-critical system; that overconfidence in software is itself a hazard; and that user reports of impossible failures must be investigated as if they were possible. What remains is a single proper noun used as a warning. To invoke "the Therac-25" is to name the moment a computer killed people because its makers believed the code was safer than the steel it replaced — the byword for the race condition, the removed interlock, and the manufacturer that would not listen.

Lessons

  1. Never remove a hardware fail-safe in favor of software without raising the software to a far higher assurance standard; a dumb mechanical interlock that cannot be argued with is worth more than elegant code you merely trust.
  2. Treat reused or inherited code as unproven in its new context — bugs that were harmless because old hardware masked them become lethal the instant that hardware is gone.
  3. Assume concurrency and timing faults exist below the resolution of your test scripts; design so that a race cannot place the machine in an unsafe physical state, because you will not reliably reproduce the bug before a user does.
  4. When the field reports a failure your model says is impossible, the model is the suspect — investigate every "impossible" report as a live hazard rather than reassuring the customer it cannot happen.
  5. Do not let a regulator's silence read as a safety verdict; absence of pre-market scrutiny is a gap to be filled by your own hazard analysis, not a permission to ship.

References