Book Review Part 2 of 2: Super Crunchers

By on in with 2 Comments

It’s amazing what an airplane flight can do when it comes to finishing up books that have been lingering. Especially when you pack the “fun” book in your checked baggage so you can’t cheat. Sucks when you arrive at the hotel to find that your “fun” book is, apparently, still sitting in the bedroom at home! Argh!

Part 2 will be a bit less harsh, as Ayres does eventually touch on some of my bigger issues. Sort of. I’m still not satisfied with his treatment, though.

Ayres does, at least once, throw in a pretty critical caveat: “As long as you have a large enough dataset, almost any decision can be crunched.” That’s a HUGE caveat, even in our increasingly data-capturing world. As a matter of fact, I like to talk about the data explosion. But, when I discuss it, it’s as a warning that you can’t just see “getting the raw data” as the biggest challenge in making data-driven decisions. My claim is that the bigger challenge is developing discipline about how you approach that data. While Ayres does periodically speak to the fact that there is real skill and creativity required to develop the hypotheses that you want to test, he does not put nearly enough emphasis on this point. And, not only that there needs to be diligence in developing the hypothesis, but also considerable rigor in determining how that hypothesis will be tested.

Ayres walks through a pretty fascinating example of super crunching gone awry in the case of John Lott, who did some super crunching that demonstrated that concealed handgun laws brought down the crime rate. According to Ayres, Lott made an error in the data prep for his analysis that, when corrected, did not show this at all. Ayres uses the example more to preach that even data-oriented people can still get caught up emotionally and refuse to face hard facts. While that is undoubtedly true, Ayres also misses the opportunity to speak in any real depth as to the amount of data prep work that needs to be done to normalize and cleanse data before actually running a regression. He does mention this…but it is very brief.

More good stuff: Ayres devotes a good chunk of a chapter to explaining (and illustrating) just how bad humans are at gauging the quality of their own intuition. Many of the points he makes here echo Daniel Gilbert’s points in Stumbling on Happiness. Points very well taken. But — and I’m sure Ayres would say that my next statements prove his point and that I’m just one more person in denial — things get stretched pretty far at times here. For instance, Ayres claims that the safety procedures that flight attendants follow as a hard script almost all of the time make them more effective than a less structured approach. Seeing as I was sitting on a flight, where I had easily tuned out the flight attendant’s script, I had a “Gimme a break!” moment. Everyone who travels at all has occasionally stumbled across a Southwest (and it even happened to me on United once) flight where the flight attendant has a little more fun. Everyone listens! And, they are clearly sticking to the content of the script, if not the specific language.

Oops. Language. I’ll nitpick just a little bit. Ayres uses “digitalize” a lot. Maybe he can’t help it — he’s a lawyer AND an academic, so why go with the shorter, common synomym — digitize — when a longer word will suffice. He also defines CDI as “consumer data integration” in the context of Acxiom Corporation’s services. While Google indicates this is one possible explanation of the acronym, even Acxiom seems to use the breakdown that I’m much more familiar with: customer data integration. Splitting hairs a little bit, I realize. But, it’s an indication that Ayres really isn’t all that familiar with CDI, which is a descriptor of a host of technologies that are trying to actually solve all of the complexity of normalizing data from disparate data source that Ayres barely acknowledges. Ayres also uses “commodification” a lot, and I was prepared to zing him there, too…but Google shows this as a more common word than it’s synomym, commodization. So, I learned something there!

Two more specific beef examples and I’ll wrap up.

Ayres devotes a good chunk of writing to various plans that use super crunching to predict the likelihood of recidivism for inmates who are being paroled. The statement that really got me was that, according to the data, an inmate who received a score of four or higher when his history was plugged into a certain model would have a 55% chance of committing another sex offense within 10 years. Ayres then jumps into example of one such inmate who received a score of four, was paroled anyway, and promptly disappeared. The 55% is what gets me, though. Ayres ignores that this is far from an overwhelming indication. It may very well be my liberal bias that I don’t think that it’s a slam dunk to keep everyone locked up knowing that 45% of these people would NOT commit another sex offense within ten years. Ayres completely glosses over this question, which seems to be an ethical one worth addressing. I actually made a note — Minority Report — when I read this. I was thinking of the Philip K. Dick-inspired Steven Spielberg movie that starred Tom Cruise. In the movie, citizens are arrested and based on “foreknowledge” — a vision by one of three “pre-cogs” (it’s a very small leap to turn these specially-endowed humans into a super crunching computer) that the person is going to commit a murder in the future. The movie is chilling in a 1984 kind of way. And, ultimately, condemns persecuting someone for something they have not yet done. Ayres sounds just a little too much like the films main antagonist — Director Lamar Burgess (played by Max von Sydow) — for my comfort. Interestingly, later in the book, Ayres refers to the same film…but not at all for the same reasons. Rather, he references the personalized ads that Tom Cruise is bombarded with at all times while walking down a street in the movie (which really isn’t that far of a stretch from today, given the proliferation of RFID technology).

Second specific beef has to do with NASA history. Ayres points to Gus Grissom as one of the astronauts who balked at the idea that the Mercury splashdown capsules would be designed so that that would not be able to be opened from the inside. Ayres then points out that, because the astronauts wan out, Gus Grissom was able to panic upon splashdown and blow the hatch open prematurely. Which he did. And almost drowned. First off, Ayres gives no acknowledgement to the fact that there are a lot of reasons to believe (as NASA does) that the hatch blew open due to an equipment malfunction — not because Grissom did anything inappropriate. Second, Gus Grissom was killed in a fire inside a training capsule for the Apollo 1 mission. I honestly don’t know enough of my history here to know if the training capsule could be opened from the inside, or, if not, if it would have made a difference. My guess is that the fire was so fast that it wouldn’t have mattered. But, on both accounts, Grissom seems like a spectacularly lousy example of Ayres’s point.

So, I’ll wrap up. The book has left a dirty taste in my mouth. Ayres delivered a book that he hopes will sell with the gee-whiz factor. My guess is that Bantam Books smelled “Jump on the Freakonomics Bandwagon” money, and, as bandwagon books, movies, and TV shows are prone to do, it underdelivers. The book dramatically downplays the challenges involved in super crunching — not just challenges in the “it’s hard” sense, but challenges in the “large, clean, relevant data sets” are not the same as “large amounts of raw data.” He actually includes examples of super crunching where, when the results came out, they were applied on a limited scale because the basic approach to the analyses were called into question. So, how about a chapter on “design of experiments?” He does have good points, and his assertion that companies could benefit from having a designated “devil’s advocate” to question — hard — any analysis is a great idea. There is definitely a shock of wheat here and there in the book. Unfortunately, Ayres spends most of his writing about the chaff.

2 Comments


  1. The biggest problem with Ayres’ book is that he doesn’t try to describe the other side of the debate very accurately on many different points.

    1) “Lott made an error in the data prep for his analysis that” — Ayres is not accurate on this point. First he is referring to a paper by Plasssmann and Whitley that I helped with, but the paper was published by them. Second, the central results by Plassmann and Whitley were not affected by this small error. A letter by Plassmann can be found here (http://johnrlott.tripod.com/link3.html). Some of the various research papers that find that right-to-carry laws reduce crime rates can be found here: http://johnrlott.tripod.com/postsbyday/RTCResearch.html
    Many more papers by economists have found the result that right-to-carry laws reduce violent crime rates.

    2) Ayres ignores that others have been unable to confirm the results of some of his research that he discusses in the book. In part this is because he has been unwilling to share his data with others:

    http://johnrlott.tripod.com/postsbyday/may8.html

    3) Ayres ignores that people found that the research he touts on abortion and crime had serious errors and that when the research was done the way that Ayres’ friends said it should be done, the results for violent crime were either eliminated or reversed. See here:

    http://www.economist.com/finance/PrinterFriendly.cfm?story_id=5246700

  2. Pingback Gilligan on Data by Tim Wilson » “You only get one chance to do it right. Try not to screw it up.”

Leave your Comment


« »