Essay

It’s Not Information Overload. It’s Filter Failure

September 6, 2010

Clay Shirky has established himself as one of the most influential thinkers on the social and economical impact of the Internet. He has written and lectured extensively on crowdsourcing and collaborative efforts without the need for traditional organizational structures. In this article, originally presented at Web 2.0 Expo Sept. 2008 at the Javits Center in New York, he explains the challenge of the always evolving information: the filtering process.

Contributors

This article was originally presented at the Web 2.0 Expo Sept. 2008 at the Javits Center, New York, NY, and has been transcribed with the permission by the author.

It starts with this chart. You all know this chart. This is IDC’s version of the chart, Hal Varen and Peter Lineman of UC Berkeley have a version of this chart, and Google has a version. This is the chart of how fast the information in the world is growing. No matter who does the chart, it always looks like this: up and to the right, and the rate of increase is always increasing.

We love this chart. This chart makes us feel better. This is why I am not getting anything done: I’m suffering from information overload. This has been the obvious salvation for writing-blocked tech journalists for fifteen years. When we don’t know what to write, we can always go down the hall to our editor and say, “Hey, I want to do a story about information overload.” And the editor, looking up for their overflowing email boxes, says, “That’s brilliant!” You always get to do that story. So, for fifteen years we have been reading the SAME story about information overload.

But if it has been the same story for fifteen years, and you can find stories from ’93 that is the same story that showed up in your RSS feed three seconds ago, then why is it still such a surprise? If this is the normal case, then why are we constantly talking about writing about it as if it is a big deal?

Here is why I think this is and it goes back to the printed press. Guttenberg and the invention of the movable type injected for the first time, in life outside the universe, information abundance. By the 1500s, the cost of producing a book had become so cheap and the volume of books being produced became so large that an average literate citizen could have access to more books than they could read in a lifetime. So ye ole information overland is actually a problem of ancient providence.

The other problem that Guttenberg introduced into life was risk. If you owned a printing press, you could make money if people bought your books. But, you could lose money if people didn’t buy your books. Since you had to print the books in advance, you were taking on all the risk that the books would sell. This is the problem of publishing. The economic solution was pretty simple: make the publisher responsible for filtering for quality. There is no obvious reason why someone good at running and operating a printing press should be good at figuring out what books to print. The economic logic of “print in advance and sell it”—high upfront costs and recoup when it reaches the people—meant that the word publisher came to mean two things: (1) people who decide what to publish, and (2) people who do the publishing.

There have been many media revolutions between Guttenberg and now, by the middle of the 21st century we had recorded music, movies, televisions, but the curious thing is all of those other media types had the same economics. Whether it is the printing press or a TV tower, it costs one a lot of money to get started and one had to filter for quality. What the Internet did was introduce for the first time post-Guttenberg economics. The cost of producing anything by anyone has fallen through the floor, famously, and as a result, there is no economic logic that says you have to filter for quality before you publish. Proof of this hypothesis I leave to you, but I recommend you starting with livejournal.com or you can pretty much start anywhere and discover that the filter for quality is way downstream from production.

What we are dealing with now isn’t information overload, because we are always dealing with information overload, the problem is filter failure. An example you face is spam. Everyone has the morning ritual of deleting the spam out of their email: identifying the messages you have to remove, getting rid of them, and getting on with your day. This process is some combination of a mechanical filter plus a user getting rid of the last few bits and pieces. Everyone will have had the experience over the last couple of years there being a day where you say, “Oh my goodness! The volume of spam has doubled. My inbox is full of spam again.” So I set out to measure this and watched my inbox, particularly messages I had to delete in the morning. What I discovered was my experience of spam doubling, came when the volume of spam I received increased only 25%. It wasn’t actually that there was a lot more information; it was that there was just enough information to break the systems I had in place. It wasn’t about the increase in volume; it was the collapse of the filter’s I had.

Spam I think is a really good indication of the information overload problem generally. It requires multiple kinds of filters, automated and manual, and different solutions for different people. All the solutions are temporary. No matter what you use, you have to retune. There is no “set it and forget it” solution. Finally, you have to take the volume increase for granted. You have to assume you will continue to be targeted. The logic of spam is that the economic incentive to target is enormously high and the cost enormously low. It is really a filter problem, rather than an information problem.

In the context of spam and traditional information management, I started to think this is a general system design problem for our era. Not a computer system, but social systems—the institutional and social bargains we all have with one another when we are dealing with each other in our daily lives. I am trying to apply this idea of filter failure as a design lens to other types of social systems besides just managing hard drive space. Let me tell you something that happened to a friend of mine last November (2009) that illustrates this problem. A former student, a colleague and a good friend, decided to break off her engagement with her fiancé. In addition to the mix of emotion, horror and administrative work you have to go through when you are doing something like that, she also had to engage in the twenty-first century ritual of the “changing of the relationship status.” She had to go onto Facebook, grab the button that says “engaged” and flip it to “single” and press submit. She considers doing this and thinks about the result which will be in her news feed on Facebook. Suddenly, she realizes she might as well buy a billboard. Here is her dilemma: she has a lot of friends on Facebook and also has a lot of “friends” on Facebook—people she went to high school with, people she knows peripherally two jobs ago. She doesn’t want all of those people suddenly getting deeply personal information about her, just her narrow circle of real friends. She especially doesn’t want her fiancé and his friends to see it, and she also doesn’t want to tell his friends before he does; she wants to give him the space to do so. She goes onto Facebook to fix this problem. She first finds Facebook’s privacy policy, very clearly thought-out and written, very carefully descriptive and not hidden or buried, linked to on most pages on the site. In addition to the policy, she finds her own personal settings for managing her privacy settings. She figures out how she is going to do this. She checks the appropriate check boxes, and she is able to go to the interface, take her status from “engaged” to “single” and press submit. Two seconds later everyone of her friends in her network gets this message, “Your friend is now single.” All of her fiancé’s friends get that message too, and the email starts to pour in, the AIM starts to come in, the phone is ringing off the hook and everybody knows. Total disastrous privacy meltdown, self-inflicted.

We look for fault in circumstances like this, so it is tempting to blame my friend. Well, I have known her for a long time and she did her graduate thesis on comparative analysis of Friendster, Facebook and Meetup. This is not an average user. If she doesn’t get the interface, it is a pretty safe bet it is out of the reach of most people. So we want to blame Facebook. They had the wrong checkboxes, wrong description, wrong privacy settings setup. But it is hard to blame Facebook when they have made so much of an effort. James Grimmelmann, who writes so much about social networks, has said that Facebook has the best expressed and best executed privacy management tools he has seen on any of the networks. The actual problem is that managing your privacy preferences is an unnatural act. It is just something no one is good at, either setting up or maintaining. Prior to the present era, the only person any of us could name or call privacy preferences was Greta Garbo. This is not something we are used to. Privacy is a way of managing information flow. What my friend wanted to do was tell four to five of her close friends and they would tell the next circle out, and slowly the information would seep through the network in impartial ways and not instantaneously.

That is how it used to work. The big question we are facing around privacy now is that we are not moving from one engineered system to another engineered system with different characteristics. We are moving from an evolved system to an engineered system. We have pushed formal and explicit statements about privacy into our lives for the first time. Prior to the current era, the principal guarantor of privacy wasn’t law or regulation, and it wasn’t hardware or software. It was inconvenience. It was a hassle to spy on people. We lived most of our lives not in the bubble of privacy or the glare of publicness, we had what we called back in the day our “personal life.” That is a phrase almost no one uses anymore, except to refer to technology. We have a lot of personal technology. We don’t have so much personal life. In personal life, we can walk down the street talking with a friend and someone could be listening to you, but they are not. It is not like every word you say is being recorded for posterity. But now it is like that, a lot like that. For people like my friend it is almost completely like that, whose social life is lived hammer and tongs in those kinds of environments. This inconvenience and hassle, an inefficiency to information flow, wasn’t a bug, it was a principal feature. As long as we have a world of completely explicit privacy preferences it isn’t going to be a good fit for the way we live our lives. This is a question of filtering, not managing information. How do we want to design the filters so that privacy works the way we need it to work?

My friend is a story of outbound information flow, spam is a story about inbound information flow, and those are both relatively clear cases. There are some stories where the information is so bound up in institutional design that we can’t even identify which direction the flow that needs to be filtered should be going.

This story illustrates this problem. Chris Avenir is an 18 year-old, and because he is eighteen, he has grown up in this environment. By the time he was five the Internet was public, by the time he was fifteen MySpace, Friendster and Facebook had all launched. By the time he was twenty, he goes to college and this spring up at Ryerson College in Canada, he enrolled in the Chemistry class. Like all students since time immemorial, he says this is hard and I am going to work for the test and so I’ll start a study group. Because he is eighteen, he starts the study group on Facebook and calls it “Dungeon-Ryerson College Chemistry study group.” It goes pretty well. He gets 146 of his classmates to join the group and they are sitting around talking about chemistry on the site. Suddenly he is called up on charges and the college threatens to expel him. How many charges? 147 of them: one for setting up the Facebook group and one for each of his fellow students that joined. Ryerson College says this is cheating. Here is there point of view: “Our academic misconduct code says if work is to be done individually and students collaborate, that’s cheating, whether it’s by Facebook, fax or mimeograph.” They are saying Facebook is media, we are treating this as publishing and once you are operating under a mediated environment, it is immaterial to us how it works. Here is Avenir’s reaction, “If this kind of help is cheating, then so is tutoring and all the mentoring programs the university runs and the discussions we do in tutorials.” He named the group the “Dungeon” because that is the name of the room on the Ryerson campus where the real study groups meet. He thought, Facebook is just an extension of group life, and I am just extending it into this zone—Facebook. What had Avenir done to freak Ryerson out so much?

What he had done was crash two different kinds of information flows into one another. Every college has two different messages, an inside message and an outside message. The inside message is welcome to the community of scholars and we are glad you here, come join us, we are having the best kind of conversation, the best kind of class to be in are small seminars where you can discuss things with your peers. It is very much about community, conversation, and joining the group. To the outside world they say we do quality control of individual minds, we pack them with education and when they have enough education packed in them, we slap a diploma on them and ship them off. The thing that keeps these two modes from colliding is just the inconvenience of the real world. It is a hassle to get groups together, to coordinate times to meet. Real world stuff stays pretty much bounded by the walls of the campus, and those two messages are just separate.

What Avenir did by moving the study group to Facebook was he caused those two messages to collide, and we have the clash of metaphors. Ryerson College says Facebook is like media, Avenir says it is just an extension of the real world and we are caught in this either/or choice, a bit like the public or private choice in privacy. The problem is that if you are going to make that choice, you are going to make the wrong choice. You know what Facebook is like, and it is not like a fax machine or a mimeograph and it is not like a meeting in the basement of Ryerson. Facebook turns our to be a lot like Facebook. There is no metaphor that can be picked up and slapped on it that will tell us what to do about it. Facebook is different than what has gone before it ,and if it wasn’t it would get any users. Facebook is only worth spending time on because it is different.

There is no simple solution to the problem. Avenir has a point. He has been invited into an environment where group conversation is normal, and he thought he was doing the right thing. For all of Ryerson’s terrible overreaction, they have a point, too, because even though there are study groups that meet in the real world Dungeon in the basement, none of those tables seat 146. If you have a small study group, half a dozen or so, somebody comes in and says, “You know, I am really here to just mooch off you guys, I just want to know the answers to the chemistry test and I am not going to participate,” you get kicked out. Small groups defend themselves against free-riders, large groups don’t.

The Internet allows large systems that are free-rider tolerant rather than free-rider resistant. If there are a 146 people in a Facebook group, then somebody is free-riding. There is more than enough information out there. We have known the formulation for hydrochloric acid for some time now. We aren’t asking the students to figure it out so we know it, we are asking the students to figure it out so they have experience in figuring things out. This is exemplary of the filter failure. When you see the Ryerson College/Chris Avenir fight, it isn’t over information or access to information, but rather, a fight over flows and access to flows. It suddenly becomes clear that what we are dealing with is not putting the filter back at the source—the way we have always done in the past, but rethinking the institutional model. You have to have good conversation and individual effort, and you have to design a system that accommodates both. Currently, we are breaking the system we’ve got.

Part of the reason information overload presents such a consistent problem in the current environment is that we don’t have the obvious tools to pick up. Using a metaphor of current media and of physical space, each of those illuminates part of the current landscape but not enough.

We are really pitched forward into a new challenge and I believe this isn’t a design problem. I don’t think anybody can start going out and coding the college of the future tomorrow. This is more of a mental shift; a way of seeing the world that assumes that we are to information overload as fishes are to water—it is just what we swim in. Isaac Asimov once said, “If you have the same problem over a long time, maybe it is not a problem, it is a fact.” That is information overload. Talking about it as if it explains or excuses anything is actually a distraction. We have had information overload in some form or another since the 1500s. What is changing now is the filters we use for the most of the 1500 period are breaking, and designing new filters doesn’t mean simply updating the old filters. They have broken for structural reasons, not for service reasons.

In some situations this will be a simple matter of programming. Certainly the pressure to get this right has led an enormous number of post-categorically filtering mechanisms. That is why Digg voting mechanisms work, “tagging” mechanisms work; it is the logic behind all search engines.

Some of it will not. Some is actually going to be around rethinking social norms. When we feel ourselves getting too much information, I think the discipline to say to ourselves is not “what happened to the information,” rather “what filter just broke, what was I relying on before that stopped functioning.” When we start asking that question, we will get some clue as to where to put the design effort.

Comments