We all spend a large amount of time on the roads. Often when I’m on my bike I wonder to myself, what are the chances of me getting in an accident? If I get in an accident, how likely am I to survive? What sort of injuries will I suffer? This got me thinking about the distribution of accident severity.

Most car accidents are rather harmless – a mere dent, scratching the paint. Less collisions cause mild injuries, even less result in serious injuries, and only a few send you to the morgue. How is the severity distributed? If, for example, it is a Gaussian distribution, then the amount of minor events overshadow the serious events completely. If you get into an accident – chances are you will be alright. It will take many accidents – most of them minor and resulting in only broken bones – before you meet your demise by a stampeding truck.

But why should we believe that the distribution is Gaussian? Gaussian distributions occur naturally when you add up a bunch of random numbers, and we have the feeling that our chances of getting in an accident are “random”, so it might make sense. But this is just an illusion. Much like stocks, which we also consider to be “random” but are actually moved by many people with individual wills, wants, and other irrationalities – accidents are caused in part by drunkenness, foolishness, recklessness – human traits which not random. I suspect that accident severity acts like many other phenomena in the world – according to a power law.

In order to verify this, I ran a check against the data from RSA’s exhaustive report (Israel’s Road Safety Authority) on traffic accident statistics.

The first problem we encounter is how to rank accidents by severity. Should I give broken limbs a score of 3, while giving injuries that require surgery a score of 6? How many points should paralyzed victims get? It turns out that there is such a formal ranking, called the Injury Severity Score (ISS). Unfortunately, most of the data did not contain ISS rankings, as the report is usually more interested in “death or serious injury”. Further, to me such a score seems a bit arbitrary, although good enough to work with if its available.

However, not all is lost, and in the last section in the report they compare the police logs against the medical records, and these contain interesting data. The purpose of the section is to compare how severe the police thought an accident was when they reported it, compared to how severe the hospital staff thought it was – it turns out there is often a difference between the two opinions.

Let us turn to the data.

The table I used looks at all the cases that the police defined as “suffering light injuries”, but in fact the victim had to stay at the hospital for over 24 hours. The table provides a partial ISS ranking, but taken over ranges of scores, and not particular points – this makes it rather unfitting for applying a power law without begging the question. Fortunately, the table provides what is, in my opinion, an even better indicator of accident severity than ISS: the number of days the patients spent at the hospital. I think it is rather safe to say that the longer the stay, the more serious the injury – someone who spends 40 days in intensive care is more than just “bruised”. Further, this parameter is easily measurable, non-arbitrary, and can be extended indefinitely – meaning that it differentiates between someone who lays in a coma in the hospital for two weeks, and someone who is in that state for three years (interesting note: if this is how we measure severity, the power law, and not the Gaussian, is sure to appear, because of its long tail phenomenon. You can almost always find a finite number of people who were very heavily injured by accidents, i.e, lay for many years in a hospital bed).

Graphing the number of occurrences vs. the number of days in the hospital, and applying a power law regression curve, we get:

The data fits rather well to a power law, with an exponent of -1.72. This is rather nice and reassuring. However, there aren’t a lot of data points – only 5 – so the fit isn’t that miraculous. One thing that would help us is to include the last number in the table – “7 and above”. It’s impossible to add it as just one point to the graph, since it actually represents an infinite number of points.

How do we use it in our calculations? The most intuitive action would be to integrate our power law from 7 to infinity and see if the sum we get is equal to the 413 listed in the table. But this is inadequate, as people are finite and not continuous, and so are our days in the hospital – it will give incorrect results, just like summing a discrete harmonic series and comparing it to its integral gives different results. Rather, we will have to extrapolate using the power series, and sum it all up. If the sum coincides with the table, it will reassure our theory.

What do we get…? about 715 people. But the table says only 413. Where did the other 300 go?

This is indeed a certain blow to the model, and it might mean that the distribution is not entirely powerlike in form; that’s perfectly ok, but I want to defend the idea for a moment. A plausible explanation that I can find is this: this model only took into consideration what the police reported as “mild injuries” but the hospitals thought otherwise, but that makes up only 2003 people, while the total amount of heavily injured people was over 3500 (taken from another table). We are missing a lot of people who were ranked by the police as more heavily injured, and thus will contribute to the tail of the curve. This makes our curve fit more nicely (notice that the 6 day point is below the plot line), and also contribute to the number of those who have been hospitalized over 7 days.

Further, the value of the integral is strongly dependant on of the power law itself. We only have a few points to start with, so generally the curve can have a rather large variance in both coefficient and exponent, and still fit without losing much accuracy. This makes the integral change dramatically. The model is sensitive to small amounts of data, like in this case. Can we get any more data? I did not find anything online, but we can try to estimate how many accidents only resulted in only one day of hospitalization – a figure not shown in the table. How can we estimate this number?

For simplicity, let’s assume that only light injuries can result in one day at the doctor’s – anything more severe is bound to have you in there for more time, at least for some tests and rest. There were about 20000 lightly injured folk coming in to see medical care, according to hospital trauma records. If only about a third – say, 7000 – were to stay for more than a day, the data would look like this:

This drastically changes the power curve, but the fit does not change by an enormous amount. Summing up everything from 7 onwards in this case yields 427 people, much closer to the 413 required. Even if the estimate I have given is not correct (it is rather arbitrary, and lays at the hands of the modeler), this shows that changes in the curve parameters itself can amend the disparity.

Does it follow that traffic accident severity obeys a power law? I myself am convinced so, although the power curves given in the graphs above do not represent the true distribution. They only represent the distribution of severity of accidents reported by the police as resulting in “light injuries”. If the severity indeed follows a power law, its coefficient and exponent will be different from the one attained here. Still, this is an exemplifying case.

If accidents do obey the power law, we should be extra careful on the streets and roads. It means that when accidents happen, there is a not-so-minute chance that they will be either fatal or tremendously damaging. It means that the ratio between major accidents and minor accidents is not as small as we would like. It means that the expected penalty for getting involved in an accident can, in practice, be infinitely high (death is an obvious case; but even if we exclude that from the calculation, we still get nasty things like “indefinitely long coma” or “lifelong paralysis”). The conclusion from all of this? Drive safely.