Voices » Jeff Stibel » Google's Secret Weapon: MapReduce
1:01 PM Tuesday December 9, 2008
Google's stock is now down more than 50% year to date but the Google guys don't seem to be concerned. Here's why -- and why it is critically important to your business:
Most people think the reason is because Google dominates search. But Google is building a new secret weapon that has more to do with the brain than search. The effort is called MapReduce, a simple yet powerful software program that enables Google to use the Internet to think.
MapReduce does what our brains do all the time: It categorizes (Maps) key pieces of information, distributes it across its server farm of PCs, and then eliminates (Reduces) irrelevant data (computers--unlike MapReduce and the brain--soak in everything). Google now uses MapReduce for over 10,000 programs, ranging from the processing of satellite imagery, language processing and responding to popular queries. It is now processing roughly 100,000 functions daily and digesting 20 petabytes of data each day.
Does this sound like the perfect computer? Think again. This is not even your typical computer: one that is stable, logical, and failsafe. Instead, it is error prone, strapped together with Velcro (literally) and unreliable. Or as one Senior Vice President at Google recently said, "Nobody builds servers as unreliable as we do." But it is the same paradox that makes the brain work, wherein its seeming imperfections are what make MapReduce (and the brain) so powerful.
As the inventors of MapReduce noted in a recent paper, "It has been used across a wide range of domains within Google including: large-scale machine learning problems; clustering problems...; extracting data to produce reports of popular queries; extracting properties of Web pages for new experiments and products...; processing of satellite imagery data; language model processing for statistical machine translation, and; large-scale graph computation." Or in other words, the tasks Google performs are similar to the functions performed by the brain: learning, categorization, vision and language.
If all this sounds a bit more like human thought than computing, it should. We often fail to see just how powerful Google really is and how much it behaves like a brain, if for no other reason than we interact with Google through its homepage. As Vint Cerf, Google's Chief Internet Evangelist, points out, "While it presents itself as a web interface to most people, Google could just as well present itself as a programmable interface, which means that you can start writing software that gets information through the eyes, so to speak, of Google. That creates a vocabulary, if you like, that programmable systems can use in order to take advantage of what Google is capable of doing with its gigantic database."
Even Google's harshest critics no longer dismiss the value of MapReduce or the power of the computing cloud. In a recent New York Times article, Bill Gates "acknowledged that MapReduce was a significant technology, but he asserted that Microsoft was building its own parallel processing software..." In a somewhat circuitous compliment, Gates said: "They did MapReduce; but we have this thing, called Dryad, that's better."
What's happening is that MapReduce is opening the door to the analysis of vast amounts of information--from terabytes of data on the voting habits of Americans, to the fluctuations of billions of individual airline fares, to scores of terabytes of health data. This will change the landscape of virtually everything we do. "The biggest challenge of the Petrabyte Age won't be storing all the data," Wired magazine noted recently, "but figuring out how to make sense of it." Making sense of it: That is where the brain behind the Internet is now heading.
TrackBack URL for this entry:
http://blogs.harvardbusiness.org/cgi-bin/mt/mt-tb.cgi/3310
No trackbacks have been made to this entry.
Posting Guidelines
We hope the conversations that take place on HarvardBusiness.org will be energetic, constructive, free-wheeling, and provocative. To make sure we all stay on-topic, all posts will be reviewed by our editors and may be edited for clarity, length, and relevance.
We ask that you adhere to the following guidelines.

Jeffrey M. Stibel is an entrepreneur and brain scientist. He studied business and brain science at MIT Sloan and Brown University, where he was a brain and behavior fellow. Stibel has authored numerous academic and business articles on a variety of subjects and is the named inventor on the US patent for search engine interfaces. He is currently President of Web.com (NASDAQ: WWWW) and serves on academic Boards for Tufts and Brown University, as well as the Board of Directors for a number of public and private companies. Stibel is the author of Wired for Thought: How the Brain Is Shaping the Future of the Internet, being published by Harvard Business Press in September 2009.
ADVERTISEMENT
Michael Jackson and the Zombieconomy Umair Haque
How Michael Jackson Became a Brand Icon John Quelch
Debunking Social Media Myths David Armano
A Good Way to Change a Corporate Culture Peter Bregman
Great Communicators Are Great Explainers John Baldoni
Debunking Social Media Myths David Armano
Michael Jackson and the Zombieconomy Umair Haque
How Michael Jackson Became a Brand Icon John Quelch
How to Identify Your Employees' Hidden Talents Steven DeMaio
Why Microsoft Had to Destroy Word Peter Merholz
This simulation will help you learn how to craft conversations that are fact based, minimize defensiveness, and draw out the best thinking from everyone involved.
In many organizations, marketing exists far from the executive suite and the boardroom. Learn how to improve the link between high level corporate strategy and the marketing function.
ADVERTISEMENT
Comments
This is the first time I have seen this web page. I am very iterestd in articles like this one.
- Posted by Gennaro DiMassa
December 11, 2008 10:11 AM
It doesn't sound like you know what you're talking about...
- Posted by Z
December 11, 2008 9:24 PM
Is this story a joke? Do you have any idea about what MapReduce actually is?
MapReduce is simply a framework for cluster computing. Also, you are aware that MapReduce has been in use for years, right? MapReduce is not new: http://labs.google.com/papers/mapreduce.html (from 2004).
- Posted by Joe
December 11, 2008 9:48 PM
Thanks for all of the feedback. Too often however, technologists mistake technology for products. I fully understand the technology behind MapReduce and true to the feedback, it is a “framework for cluster computing.” But, and this is a BIG but, the products that have been built around MapReduce are enabling brainlike intelligence and that is the point of this post.
On top of that, you must remember that the brain is really just a machine, albeit one of the most complex machines on the planet. So to say that my point is wrong just because MapReduce is old or just a clustering system is missing the point entirely. The point is that the brain is just a distributed processing system with MapReduce-like software enabling it.
These are the words of the Google folks, not mine: "[MapReduce] has been used across a wide range of domains within Google including: large-scale machine learning problems; clustering problems...; extracting data to produce reports of popular queries...;language model processing for statistical machine translation, and; large-scale graph computation." Even the most stogy neuroscientist would have no problem replacing Google, machine, and MapReduce, with different functions of the brain in the above sentence.
This type of broad-based cognitive functioning is enabled by distributed computing, combined with the ability to map and reduce large amounts data. This is what the brain does!
Best,
- Posted by Jeff Stibel
December 12, 2008 9:58 AM
Jeff, the problem is not with the products. Maybe the point of this post, as you mean it, "the products that have been built around MapReduce are enabling brainlike intelligence". I'm not a brain scientist, so I cannot argue about brainlike intelligence, but if you replaced it by "the products that have been built by Google are enabling brainlike intelligence", there will be less questions.
MapReduce doesn't enable any intelligence. It's just a small piece of technology that improves scaling of processing. The real intelligence here is *processing*. In other words, map(data, function) — "function" here is the intelligence, not the "map" itself.
Take this sentence of your article:
"But Google is building a new secret weapon that has more to do with the brain than search. The effort is called MapReduce, a simple yet powerful software program that enables Google to use the Internet to think."
It's just factually wrong.
1) Google is not building the "weapon", they have already build it.
2) The "weapon" is not secret — everybody knows about it, there are other users of this technology other than Google, and you've cited the paper on MapReduce, and gave a link to it — how's that secret?
3) It doesn't enable Google to use Internet to think.
Your description of what MapReduce does is wrong:
"MapReduce does what our brains do all the time: It categorizes (Maps) key pieces of information, distributes it across its server farm of PCs, and then eliminates (Reduces) irrelevant data (computers--unlike MapReduce and the brain--soak in everything)."
"Map" is here not for categorization, and "Reduce" is not for eliminating of irrelevant data. Yes, the words (map and reduce) may be misleading. MapReduce doesn't do anything like this by itself.
Here's what it does (from the paper):
"MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper."
The following sentence is wrong:
"computers--unlike MapReduce and the brain--soak in everything"
I don't know how to explain this, but MapReduce soak in everything as well.
- Posted by Dmitry Chestnykh
December 12, 2008 10:46 AM
Thanks Dmitry for your thoughtful response and I do see where the problem is. I think we are talking about the same thing technology-wise, but you may be missing a piece of the brain science. The elegance behind the brain, it turns out, is really not that different than MapReduce. The brain (really the cerebral cortex) spends a good deal of energy on “scaling and processing.” This has historically been incredibly difficult to replicate. MapReduce’s innovation is that it mimics the brain in this functionality. And the “new secret” I was referring to was not MapReduce itself, but the fact that it was similar in scope to the functioning of the cerebral cortex.
That said, your points are well taken and we will make sure to be clearer next time around!
- Posted by Jeff Stibel
December 12, 2008 3:08 PM
Jeff,
You should also be aware of the derivative software framework called Hadoop. Hadoop is similar to MR, and is in use today at Yahoo, among others. See here or here for more info.
Generally, organizations with huge clusters of computers and massive parallel data problems use some variant. It's interesting the parallel the general algorithm have with thinking, but it's not just Google's secret weapon, as some have pointed out.
best,
::t
- Posted by Todd Drake
December 15, 2008 3:44 PM
I'm not a scientist nor do I stay up to date on the latest technologies. You can discuss the academics of underlying technologies, but at the end of the day Google is better at presenting the collective data, ideas and opinions on the internet in a logical form than anyone else.
In my opinion it's not what you know that matters, but what you do with your knowledge. Applying information technology to solve business processes or add value to people's lives is an entirely different discussion than the academics of any one technology.
Ward
Cusco, Peru
www.lifeinperu.com
- Posted by Ward Welvaert
December 16, 2008 11:33 AM