Why is the web so weird and witless?

INNOVATION TALK: THE EARLY worldwide web was chiefly used by companies to publish content and advertisements about themselves…

INNOVATION TALK:THE EARLY worldwide web was chiefly used by companies to publish content and advertisements about themselves.

Today, the web has developed into a social network in which all of us can easily contribute content ourselves, by Tweets, Facebook posts, online comments and so on. However, sometimes I wonder has it become the “worldwide what the heck?” as inane content is frequently automatically presented to us as a frustrating distraction.

Too often when we search the web, irrelevant results are returned which often hide the real answers which we seek. Frequently when using a social network, pointless advertisements are served up for extraneous offers such as animal print clothing, teeth implants and half marathons. Why is the web so weird and witless?

Sir Tim Berners Lee is the founding father of the web, and built the world’s first web site in 1991. Since 2001, he has been promoting the “semantic web” as an extension of the current web to give well-defined meaning to the information available online, so enabling better co-operation of computers and people. Much of today’s web software does not understand the meaning of web pages: while it may understand that a page should be formatted in a certain way, it may not understand that, for example, my current page is relating to prescribed treatment from my mother’s doctor and now requires identifying a reputable physiotherapist within 10km of her home. Berners Lee advocates that if a large amount of the data available worldwide on the web can be categorised, sorted and understood by computers, then the web would become immeasurably more valuable as a global resource.

READ SOME MORE

The first step for the semantic web has been to classify information using taxonomies (akin to the Dewey Decimal Classification system widely used in libraries). This can then be augmented by ontologies, which are akin to equipping a computer with concepts: for example, the concept of a “company” is a “legal entity” owned by “shareholders” and at a set of “places” where “people” come together to offer a “service” or “product” bought by “customers” in conjunction with “partners” and “suppliers” and obeying “regulations” established by “government authorities”. In principle, once data and content is labelled and tagged using these approaches, then more intelligent software could be engineered to not only understand how to lay out a web page for a screen, but also to understand what each page is actually describing.

There are some obvious challenges. Some of the content on a specific web page could be vague and incomplete – or even sarcastic, ironic or deceitful. Taxonomies and ontologies should work across all languages. In the absence of a global central authority, independently developed ontologies may be inconsistent: is the concept of a “company” in Ireland entirely correct for, for example, an IFSC back-office operation? Less obvious, but a key technical point, is that on the one hand web pages are currently structured as “trees” (a page contains sections which in turn contain paragraphs which in turn contain sentences), but on the other hand knowledge is structured as “graphs” (for example, people do not contain other people, but rather could be related, and/or friends, and/or share interests, and/or work for the same company).

The “social semantic web” adds human intervention via social networks. Automatic classification of web content using taxonomies and ontologies can be augmented by collaborative labeling and tagging of data by humans. Some strategists believe that exploiting social networking can lead to better results: for example, if I am seeking a good physiotherapist within 10km of my mother’s home, would asking my social circle of friends and acquaintances lead to a better result?

The quest for higher-quality online advertising – the “right ad at the right time in the right place” – is a strong commercial catalyst for a better and wiser web. Google attempts to solve this by inferring our interests based on our search behaviour. Facebook attempts to solve it by analysing our chit-chat with our friends. The more that an online advertiser can encourage us to directly or indirectly tell it about our interests, the more likely it can become highly successful – and useful. Despite what many in the print media believe, for them, there is a substantial opportunity online since they could observe which specific articles we each read, and tailor advertisements to each of us accordingly.

However, it would be disappointing if the sole benefit of the semantic web were better targeted advertising. Rather, it should actively assist us, gently intervening when needed, politely bringing things to our attention. Imagine if web software tools were sufficiently powerful to unearth the latent intelligence already in the web. Medical clinicians may discover new links between diseases, deduced from research results already available today but currently lost in the mass of the web. Historians may realise that particular events are related, based on evidence whose importance had been overlooked. Indeed, all of us may all discover new relationships between stories, events and data which were latent but hitherto unrecognised in the web.

There have been two decades of the worldwide web, and a decade of the semantic web, but the web still has many “what the heck” moments. There is substantial opportunity for innovation to make the web wise and intelligent.


Chris Horn is an Advisory Board member of the SFI funded DERI project at NUIG researching the semantic web. He is also chairman of Sophia Search, a Belfast-based company with semantic search and discovery solutions

Chris Horn

Chris Horn

Chris Horn, a contributor to The Irish Times, was the cofounder, chief executive and chairman of Iona Technologies