Sunday, 1 June 2014

Advance Book Review: Dataclysm by Christian Rudder

Bats are Not Bugs Header of Book Excellence

Belle and Sebastian is the least black band in existence. "6'4" and "Truck Driver" are the words least likely to be used by Asians in dating profiles. Do these factoids illustrate some kind of fundamental truths regarding human nature? Not really. Are they fascinating (albeit perhaps a little obvious in the case of our melanin-repellent Glaswegian twee pop friends) enough to sustain an entire book? I certainly think so. In Dataclysm, Christian Rudder leverages vast reserves of data of over 185 million people from sites such as Facebook, Twitter, and OkCupid to pose some illuminating and entertaining theories and findings regarding social behavior. Some discoveries, such as the ones I mentioned at the start of this review, are merely compelling factoids. Others offer a bit more practical application and understanding of our society. All make for a very worthwhile read.

While "big data" has been hijacked by businesspeople to basically cover any data set that would cause a performance slowdown in Excel (i.e. anything with more than 10 rows), there really is quite a bit of data that businesses, governments, and pop social science authors can utilize. According to Facebook Power Editor's reach estimator tool, advertisers targeting 24 year-old Spanish-speaking women in the market for new economy cars have approximately 1,380 lucky users to serve their ads to. Sabermetricians can determine Dante Bichette's OBPS in the seventh inning of games on Tuesdays on cut fastballs on the inside of the plate in 1998. However, these massive reams of highly-specific data are pretty useless unless you are able to pose interesting questions about their contents and possess the statistical acumen to properly answer them. Author Christian Rudder thankfully possesses both qualities, and is an acerbic and skilled writer to boot. He is a co-founder of the popular dating site OkCupid.com and SparkNotes and also maintained the blog OkTrends which included posts debunking dating profile photo myths and what white people truly like (evidently a lot of Tom Clancy and Phish). Rudder is a capable and engaging guide through the data and peppers his analysis with pop-culture references, amusing asides, and even some insightful comments.

If you found either of those blog posts intriguing I suggest you pick up this book immediately, because Dataclysm is basically a longer collection of such material. The book seems to have grown out from Rudder's blog and the book reads like a series of extended blog posts on human behavior. While Rudder came from an online dating site and there is plenty of (e)ink devoted to the topic of romance, he examines much more than just what big data tells us about relationships. Dataclysm is structured into three major divisions: what data tells us about sex and relationships, what data tells us about our broader culture, and finally what data tells us about how individuals identify themselves. This allows him to investigate phenomena such as how word lengths in tweets show that Twitter might be improving society's writing ability and how Facebook likes can accurately predict users' demographics.

Readers do not need any real knowledge of statistics to fully enjoy the book. While the analysis certainly seems well thought out and thorough, Rudder spares his audience any mentioning of p-values or Spearman correlation tests and instead just focuses on the social learnings that result from his number crunching. He presents all of his findings clearly and cogently and is generous with the charts and infographics. Rudder strikes a nice balance between keeping the book moving at a fast pace and fully exploring his topics and Dataclysm held my interest the entire time. I finished the book in two days and my only real complaint is that I wish it was longer and that he explored more topics. I really enjoyed the book and think that Dataclysm narrowly edges out Gabriel Sherman's The Loudest Voice in the Room as my favorite book of 2014 thus far.

In Sum
Any users interested in the usual pop social science suspects (Levitt, Gladwell, the Heath Brothers, etc.), Nate Silver's data analysis, or Chuck Klosterman's cultural musings should really pick this up. Dataclysm isn't going to provide you with the one secret insight that will guarantee financial and romantic success (which is good because that magic bullet doesn't actually exist), but it does offer some truly fascinating discoveries about our society and online behavior in a quick and easy read. Highly recommended. 

9/10

No comments:

Post a Comment