(De)centralization of information activities and services

Nota: All trademarks rights are held by their respective holders.

At the last Open Source Convention (OSCON), a lively debate developed between Eben Moglen, who heads the Software Freedom Law Center and the publisher Tim O’Reilly, organizer of OSCON. Tim O’Reilly triggered this exchange by declaring that free software licenses were today obsolete as they are not able to address the new proprietary control challenges raised by the centralized information services of the Web 2.0 era. Since Eben Moglen has spent the best part of his recent life as the orchestra conductor for the revision process of the main free software license, process that recently gave birth to the version 3 of the GNU General Public License, it was likely for him to react to such a statement.

Eben Moglen went slightly further than just reacting. He delivered what Tim O’Reilly himself described as a tongue lashing, criticizing O’Reilly for having lost the last 10 years promoting some minor business models under the name of open source instead of focusing on fundamental issues of rights and freedoms. He depicted the Web 2.0 as just one more minor fad in this line, hiding the much deeper trend towards decentralization of the IT infrastructure and activities. He urged O’Reilly to join the conversation on human rights and freedoms. He defended that the GPLv3, far from being obsolete, has rightly acknowledged that not all issues can be solved by a free software license: it has allowed for experimenting with new licensing models specifically tailored for Web services (the Affero GPL license, still being finalized) and has set the ground for starting a new process of “diplomacy” to solve the complex conflicts of rights in todays’ IT world.

In this piece, I address an underlying issue in this debate. During the few moments when Eben Moglen had to breathe, Tim O’Reilly tried to do what he is a master at: raising issues of architecture. It is my feeling that the discussion on centralization and decentralization of IT services deserves a more in-depth treatment that was possible in the OSCON debate. I share Eben Moglen’s frustration with the open source promoters’ refusal to see beyond software development and business models and to acknowledge the fundamental stakes of the information era. However, this frustration is not a good adviser if it leads us to ignore what may be a valid insight, even when this insight is mixed with debatable arguments or motivations.

Types of information services

A wide variety of information services are mixed together into the Web 2.0 basket. Some are simple Web browser-based versions of applications that preexisted as standalone software with almost the same functionality, though of course their promoters try to add some groupware facilities to make them look different. Others services deal mostly with the collaborative production of metadata used for access to distributed contents. Still others develop functionality that was not possible to imagine or never took off in the standalone or LAN-based client-server environement. With such diverse situations, it is unlikely that one can treat the (de)centralization issue with just one general statement. To approach it, we need a classification of information services. The one I propose here is based on simple parameters, so it should be possible for anyone who uses or intends to develop an information service to find out to which categories this services belongs (as with all taxonomies, real-life examples often cut across boundaries of ideal classes). The parameters are as follows:

  • whether use of the service is for producing information, locating information or distributing it;
  • whether many users contribute to the production of each relevant unit of information or meta-information, or on the contrary each relevant (to usage) piece of information is produced by an individual or a limited group of people[1];
  • whether the information object of the service is separable, that is there is value in a limited part of it, or on the contrary one needs access to the whole to or use it in a relevant manner.

Let’s look at some major classes of information services and examples in each class:

Class Use for? Many or few producers for relevant information? Separable in usage?
Web versions of personal and small group applications (Writely/GoogleDocs, NumSum, Netvibes) production few separable
Media hosting and distribution (Flickr, YouTube, DailyMotion, …) distribution few separable
Open collective granular drafting (Wikipedia) production few séparable pour certains usages seulement
Web contents/debate cartography (Glinkr) locating few / many separable
Public commenting and annotation systems (STET, co-ment, Plosone) production many (comments) / few (text) separable at text level
Social networking (MySpace, FaceBook, Meetup, …) production / locating many (network) / few (profile and docs) separable / whole
Integration between large-scale referential information and interpretation from persons or groups (GoogleMaps, ENSEMBL, …) locating / production many whole (referential background)
Search services (GoogleSearch), Metadata aggregation services (Technorati) locating many whole

Let’s remark that for the first class, the dominance of a particular service provider seems somewhat contigent, arbitrary. If the software basis of the corresponding service is freely available and usable, and users are free to export data, the switching cost to another provider or to becoming one’s own provider is limited. There are benefits to use the same provider than many other people, particularly when each individual is involved in many groups or when learning to use the service is complex. But if both the software and data exports are free (as in freedom), any serious abuse of user rights by the provider is likely to be sanctioned by desertion. On the contrary, if the software basis is proprietary or if there are practical obstacles to data exports, users may be deeply locked in (how deep depending on how essential to them is the service or the data they have created using it).

On the contrary, a service in the last category (search) is intrinsically hard to achieve without centralizing large amounts of information. P2P searching is progressing, but the simple physics of information indicate that it is very unlikely to compete in reasonable terms with searching based on centralized (or at least partially centralized) indexes. Entry cost in service provision are huge. Users are not locked in by their own data, but the offer of services is limited to an oligopoly, and the offer of high quality search engines that are reasonably fair (to all information sources) is even smaller, and could one day be empty.

Why is centralization not dead?

So we are faced with some situations where centralization has clear technical and functional benefits and others where it is limited to some effects of scale who are more significant in terms of ability to cash on advertising (delivering users to businesses) than for delivering value to users. If you are searching the web, any solution that you will design that implies searching the planet of information at query time will be painfully slow. Of course you can have intermediate situations where you partially centralized indexing (on the model of peering servers in P2P). If you design a service such as co-ment, STET, plosone or ENSEMBL where a text (in the wide sense, including a genome) is associated with comments or annotations from users distributed other the planet, it will be very painful to use it if at least metadata associated with one text’s comments is not centralized in some manner. However, it is perfectly feasible to decentralize the service itself so that different texts will be in different instances of a service operated from various locations. If you operate a video distribution service, you obtain by centralization some economy of scales on the connectivity needed, but P2P (Bittorrent type) distribution enable small services to compete very decently with large ones … except for access to advertising. If you look for sharing a few texts among a group of friends together with a few communication facilities, the only gains from centralization result from user network effects (the fact for instance that many people already have a Google account) and leveraging from one application domain to another. If you operate a service based on some referential data such as geographic information, the centralization gain will most of time result only from proprietary control on some software or the referential information itself.

Interestingly, centralization needs do not result only from the physical distribution of some information to which access is needed at a given time. The development of new decentralized activities results itself in new demands for centralized services. A typical example is how the development of metadata and its use in RSS syndication fueled the success of services such as Technorati or Pingomatic.

Why should we pay attention to the diversity in this bestiary of services? Because it means that the defense and promotion of user rights, freedoms and capabilities will have to rely on diverse mechanisms depending on which services we are talking about: software and data licenses, user awareness, what Eben Moglen called “diplomacy” (multi-party negotiation on conflict of rights under the attentive scrutiny of user communities) and this particular form of diplomacy that is called regulation, which we tend to forget these days because we have seen so many bad forms of it.

Capabilities, licenses, diplomacy and regulation

If our aim is to work towards human development, towards freedoms and capabilities, we need to make sure that the potential of information activities is not confiscated for the sole benefit of given players, powers or forms of businesses. We need to build an environment where economic success rewards service to human and social development rather than one where it results from successfully appropriating common resources. What I claim here is that different strategies are necessary depending on which information services give rise to proprietarization issues. These strategies can be illustrated by 3 examples that capture most of the differences described in the previous section:

  • Personal applications and more generally separable services: for these services, the existence of free software alternatives using licenses with strong copyleft clauses for services (such as the coming Affero GPLv3) and the ability to export data are the key enablers of freedom. It does not mean that everyone will desert from proprietary software-based services, but the fact that sufficiently many people are able to do it is a powerful incentive for providers to behave not too bad in terms of users rights. For personal or separable services, all we need is the existence of free software-based software alternatives and of effective interoperability rights. The latter meaning that it should not be possible to use patents, database property-like rights, or similar restrictions to prevent people from exporting their own data out of a service or from writing software to help other users to do so. We don’t need proprietary software-based services to be “open”, all we need is the possibility to open a door to go out of them. How much the proprietary software providers will react by making their services more open or even free is up to them, past experience showing that once there are helped by the existence of true free software alternatives, they make fast progress, with the rare exception of some incurable cases.
  • Services mixing common referential data with personal or small group data. For these services one needs to have, in addition to free software alternatives and interoperability rights, a third enabler for user freedoms and capabilities : a commons status for the referential information. The typical example is geographical information, for which the basic referential information is generally produced by public organizations. It has a quasi-commons status in the US, but is often proprietary in Europe. As a results, after the National geographic institutes have turned the information they produce into a private property, this status extends to the services developed on their basis, whether through deals with GoogleMaps or when they are operated by the institute itself (as in f.i. Geoportail). The fact that these services are provided in all or part free-of-charge is of course irrelevant, or actually worsens the picture : it means that it is more difficult for a new entrant to provide a service that respects user freedoms and capabilities. But how are we going to get this third enabler ? Societal production of geographical data (for instance) can act as a last recourse alternative. However, while adding peer-produced data to a geographical background is truly efficient, the production of the referential itself is better done by a specialized organization. This means that only pressure on the organizations producing referential information and/or legislation giving a commons status to information produced by the public sector can secure this enabler of user freedoms./li>
  • Search engines and other services where widely distributed data needs to be processed in quasi-real-time: this is clearly the most complex situation. The problem is not network effects (that arise when many people use the same software or service). Actually the switching cost from searching Google to using another search engine is very low. The problem is: is there any such service that behaves decently? Up to now, Google has behaved better than others when delivering services to users (see demonstration in Yochai Benkler’s Wealth of Networks pages 285-289), which is why we all use it. How it is behaving when providing services about users is much more uncertain. Providing a Web search service requires an extraordinary amount of network communication to be completed at query time or more possibly prior to that. This means that even if you have all the software needed as free software, and possess all the know-how needed to use this software, you still require a very significant amount of time and money to enter as a service provider. So is it good to have crawling, indexing and querying software as free software. Sure if we can. But that’s only a small part of the answer to the problem. How does one make sure that there exist at least one provider for something expensive to set up that behaves decently? Cory Doctorow has recently given a very good idea of what happens when none does. The traditional answer to that question was to arrange for a democratic government to provide it. However, that seems not truly a guarantee of behaving decently, specially these times. The short term answer might be diplomacy, but it is better to have some fallback solutions if diplomacy does not work, for instance because other diplomats, representing financial investors or security agencies exert a stronger diplomacy. The weight can come from two elements : regulation (for instance privacy regulation that Google tries to preempt by defining its homemade rules) and governmental organization of collecting the means for a societal alternative. The latter is a useful concept because it addresses the difficulty of collecting large sums from many people, while leaving the choice of who will actually do the thing in the hands of the people. It can done in many ways. The competitive intermediaries proposed by Jamie Love and Tim Hubbard in the field of medical research are an appealing concept for search engines, or will be one when the true effects of advertising and market intelligence business models will show their full breadth. For a more general discussion of mutualization schemes see this piece. Are the open services proposed by Tim O’Reilly of use? I doubt it because the issues at stake go well beyond openness. They belong to fundamental rights, so fundamental that diplomacy can work because many people care about fundamental rights. However if it does not, we can’t afford to be left without a solution.

Finally, what about social networking systems? In the table above they appear as mixed creatures. Individual components of the information in these services are produced … by individuals. But for those interested, part of the functionality lies in network representation and navigation that requires access to information produced by many sources. Despite this duality, I believe that a very decent peer to peer social networking system can be built without hitting the kind of physical obstacles that are in the path of distributed search.

[1] This is not a stable distinction: the existence of web-based production software for some contents encourages its collective production. However, at one given state of the social practice of creating say texts or videos, the distinction is quite clear, and evolution in this respect is a slow process involving changes to deeply rooted social behaviour.

This post is also available in: French

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *