The book starts with the story of a company named Farecast in 2003. Oren Etzioni at the University of Washington is on an airplane. He decides to ask other passengers how much they paid for the seats. And it turns out that one person paid one fare, and another person paid another fare. Of course, this made Oren really upset. And the reason why is that he took his time to book the air ticket long in advance, assuming he was going to pay the lowest price. And he started thinking – if only I knew what stays behind airfares. How would I know if a presented price at an online travel site is a good one or a bad one? And then he came up with the insight. Because he was a computer scientist, he realised this is just an information problem. Next, his decided to collect the flight price record of every single flight in commercial aviation in the United States for every single route and to identify how long in advance the ticket was bought for the departure, and what price was paid. Based on that, he created a system for which a major goal was making predictions on whether the price is likely to rise or fall. And eventually, it worked pretty well. Then decided to get more data. So he did and got more data until he had 20 billion flight-price records that fuelled his predictions. The system saved a lot of customer’s money, and then Microsoft came and bought his system for $100 million.
This data of the airfares have become the raw material for a new business, and a new source of value, and a new form of economic activity. And right now many companies, Google, Amazon, Apple, eBay, IBM etc. are running a new economy with this data. It is a fuel of the information economy.
Now a few things about more data. We’re going from an environment where we have always been information-starved. We have never had enough. And right now we live in the world where that is no longer the operative constraint. Although we never have all the information, the book gives a few great examples of why more data is better than clean data. Another striking example regarding the conflict between the quality and quantity of data related to Amazon. So when their recommendation system was in an embryonic phase, developers focused on figuring out the reasons which stayed behind customers’ decisions. The had all data regarding traffic, searches etc. and based on their assumptions they were recommendations. Needless to say, these recommendations were inaccurate and didn’t match witch customer needs. Then they decided to give the computer a free hand and, eventually, based on simple correlations, the whole idea started to work properly. Knowing “what?” is sometimes better than “why?”.
Anyway, now what we seeing is that lots of things being datafied. Facebook datafies our friendships, Twitter datafies our thoughts an whispers, LinkedIn datafies our professional contacts. Google datafies our intentions. And, actually, this was another great example of how datafied information owned by Google was better than some major statistical office in the US. So based on Google search engine the company has a better predictor of what is the likelihood of outbreaks of flu are. Statistical offices base on reported data and Google bases on what is real, on observed behaviour.
The interesting thing is that the value of data is hidden not in the primary purpose for what it was collected for, but now with big-data techniques, it’s often uncovered in its multiple secondary uses that are just limited by our imagination. There are going to be winners and losers in this new world. There are three features that seem to be distinguishing who is going to do well. That are skills, the mindset, and the data. The skill is rather straightforward. It is the people who have technical knowledge, or it’s the vendors who sell you systems. The mindset and creativity might be more important, and it’s all about the imagination how to find new applications for data that we already have. The data means that who has access to the data is going to be critical. That’s the resource. So ironically, what seems to be abundant today is actually the source of scarcity.
Of course, there are serious issues regarding big data. One is our privacy. But the much bigger issue is going to be our propensity. The idea is that we’re going to have algorithms predicting our likelihood to do things. And it might look a little bit like in “Minority Report” and the idea of pre-crime. In addition, we might be denied a loan, because we’re going to not have the likelihood to repay it. Here I’m not writing about existing scoring systems but I can imagine predictions fuelled by thousands of variables and the algorithm covering a hundred pages which will be far beyond our cognitive capabilities.
In conclusion, I can say that the book is really interesting as it presents how data might be used. Most ideas are not complex and they make the book easy to comprehend. I believe that the book is a great warning for people who spread information about themselves without any deeper reflection. We all have to keep in mind that big data is a resource and a tool. It is meant to inform rather than explain and it should point us toward understanding. However, it can still lead to misunderstanding, depending on how well or poorly it is processed.
Big Data: A Revolution That Will Transform How We Live, Work, and Think
by Viktor Mayer-Schönberger and Kenneth Cukier
Complexity of ideas
In the article I made use of a few authors’ interviews.
Size: 242 pages
Other information and reviews of this book on Goodreads: https://www.goodreads.com/book/show/15815598-big-data