Lessons from the Fukushima Nuclear Accident

Posted onMarch 14, 2011 by Ben Tomhave

Unless you’ve been living under a rock for the past week, then you undoubtedly know that Japan was rocked a few days ago by an 8.9 magnitude earthquake (the 3rd largest in the past decade and top 10 overall – also check out the NYT’s before & after shots) and a subsequent tsunami that exponentially compounded the ill effects of the disaster. Coming out of that incident, one of the most hyped “news” items has been the aftermath at the Fukushima nuclear power generation facility. It turns out (unsurprisingly) that much of this coverage has been faulty, inappropriately throwing around talk of “melt downs” when, in fact, things are under control.

For a great, detailed description of the entire incident, check out Barry Brook’s post “Fukushima Nuclear Accident – a simple and accurate explanation” over on the Brave New Climate blog. It’s an excellent discussion of the accident, which highlights several poignant points that can be directly applied to information security and information risk management (also see this post, which dispels one inaccuracy in Brook’s post – that there is not, in fact, a “core catcher” installed – and provides even greater assurance that things are well in-hand).

Specifically, there are 5 take-away points to consider:

True Defense in Depth Matters: There have been some knocks against “defense in depth (DID)” within infosec architecture over the past few years, but the fact of the matter is that most DID isn’t really DID. Reading through the article referenced above highlights what true DID looks like, and it stresses the key objective: survivability. It’s time to re-evaluate enterprise architecture in the light of true DID and survivability theory.
Threat Modeling Matters: It’s an essential part of the planning process. Failure to model “worst case” scenarios will lead to unsurvivable conditions. Sure, even then, you’ll miss some things, but you can model for that, too. Tools like FAIR and TARA can help analyze scenarios to better understand their reach and impact.
Risk Management Matters: There has been a lot of backlash against “risk management” of late, but mostly because people don’t understand it and/or don’t do it right. Risk management ties into threat modeling, DID, and strategic planning. You need to do what we here at Gemini do: Assess, then Architect, then Apply. Assessing your situation is imperative as it helps you make better-informed decisions. Architecting solutions means looking at alternatives and following sound engineering practices. Finally, you can apply what you’ve learned through the previous two steps and choose and remediation or deployment path that makes good business sense.
Pre-Incident Process Documentation Matters: Imagine the catastrophe that would have occurred if the nuclear engineers in Japan had to completely make things up as they go. You will not make consistently good decisions while under duress. It’s far better to have pre-defined processes in place to help expedite matters without losing focus on the primary objective: surviving! Processes designed before an incident will allow for cooler analytical thinking that puts the focus on the right outcomes for the business.
Understanding Details of Your Environment Matters: If you don’t know your environment, then you’re not going to be able to make the best possible decisions. There’s a distinct difference between incomplete information and no information. True, you’ll always know something, but the number of gray areas should be systematically decreased until you reach an acceptable level of ambiguity. Unfortunately, very few organizations can make the claim today that they know their environments very well. Even when organizations do make that claim, bad things still manage to happen (e.g. Bradley Manning leaks tons of classified data from what should be a fairly well secured and understood environment). Moreover, good luck writing decent response plans if you don’t know what sort of infrastructure you may be supporting during an incident.

All of this goes toward one fundamental notion: Once again, we need to be focusing on survivability, not on some false zero-sum notion about protecting assets. Sites will be attacked, sites will be compromised, apps will get broken, data will be leaked, and people will make mistakes. We know this and accept this. And yet, somehow we still see a mainstream approach that implies that we can somehow stop all these bad things from happening. Instead, our focus should be spread between protection, detection, containment, and correction.

We can prevent problems up to a point, but this is merely our first line of defense. Beyond that, we then must be able to detect an incident as quickly as possible, which will in turn mobilize response capabilities to isolate and contain the incident, allowing operations to continue despite degraded conditions, until the problem can be corrected and operations fully restored. This is, in a nutshell, survivability.