Don’t be misled by the title of this post: the quotation marks are the most important part. Sunday’s New York Times featured
an article by John Markoff where he introduces the term “Web 3.0” for the first time in a mainstream media publication. Web 3.0 (no quotes this time) and its closely-related descriptor, “The Semantic Web,” have been discussed in geek circles and tech blogs for some time, but it appears that we have clearly reached a milestone with the NYT article.
Turning away from the terminology debate for a moment, it is worthwhile to take a look around at ways that this next-generation approach to finding information is already making its way into applications and venture capital portfolios. At its heart, the semantic web is another step in the evolution of content mark-up languages (
XML) that enable computers to treat textual data (technically “unstructured” data, versus “structured” data that fits conveniently into rows and columns) in more intelligent ways. The enrichment of content – using free-form tags (or “folksonomies”) or structured categorization systems (taxonomies) has its roots in information science going back decades, and even further back to the beginning of the 20th century, with the introduction of the Dewey Decimal System in libraries. What is new and exciting about the budding Web 3.0 era is the convergence of text mining and artificial intelligence, which enables computers to glean the meanings of words in context, and new applications that enable the application of human intelligence (acting individually and in groups) to this process. Whether through collaborative applications, like Wikis, through “voting systems” like digg.com, or through self-defined communities of interest, like my former Biz360 boss You Mon Tsang’s new company,
Boxxet – this combination of computing power and the inherent “wisdom of crowds” is already having an impact.
The company discussed in the Markoff article is a start-up called
Radar Networks, founded by web visionary Nova Spivak, who founded EarthWeb and took it public in 1998. While his new company is still operating in stealth mode, the interview hints at the direction this is going. To get accurate results (i.e. results that would be plausible using average common sense) from any mathematical algorithms, you need masses of data. Text mining can “discover” concepts and trends from even a small corpus of data, but the results are often strange or laughable unless this process has been performed across hundreds or thousands of documents. The same thing is true about algorithms based on “crowd” data or behavior; witness how Google rankings can be manipulated or distorted by the intentional actions of a small minority of users. It will be interesting to see how Radar Networks and other companies looking to commercialize the semantic web will deal with this problem. The semantic web will inevitably become reality, however, enabled by the inexorable growth of computing power as more and more of us participate in the online world. Let’s stay tuned.