October 24, 2005

Alternatives to the Semantic Web?

Danny Ayers asks about Alternatives to the Semantic Web?

My rant got pretty long, so I've put it here. (This is also notice, that Platform Wars is now going to be my main place for talking about the Semantic Web, as it mainly interests me as a battle-ground between some rival theories.)


[quote]One of the reasons the Semantic Web vision appeals to me is I lack the imagination to think of alternatives[/quote]

Sure you can. Take the defining feature of the semantic web (the URI) and negate it. :-)

[quote]and it also seems to make sense to use URIs as the key identifiers. Er… but that’s the Semantic Web.[/quote]

Agreed (with second part). And that's the crux of the matter.

I'll suggest the alternative to the SemWeb is the SynWeb, a web which doesn't need "key identifiers". A world with lots of online data, marked up with syntactic cues which make it easy to parse (eg. good old fashioned XML, or Markdown or YAML); more powerful tools and libraries for parsing and querying data with these formats; plus lots of programs which scoop up the data and combine them in interesting ways.

The difference is that the knowledge needed to give semantics to the data resides in the programs which do the combining, rather than in a schema which has been prepared earlier.

Why is this "better" (easier, more plausible)?

Because it's much easier to decide what something like an "author name" means at the point where you're producing and consuming it - ie. in the context of an application which actually wants that information - than it is to correctly determine what it means in advance, in general, for all possible producers and consumers[1].

This is the way meaning works everywhere else - eg. in natural language, the meaning of a text depends on the interpretations made by the author and the reader, in the pragmatic context of what they're communicating about. It's not formally fixed as the sum-total of the meanings of all the words.

Could the SynWeb bring all the benefits of the semantic web?

Most of them. In the sense that any particular application you can think of that requires that someone write a specific program (P1) to put data from A together with data from B, can be done in the SynWeb. In that case, the knowledge is going to reside *within* the program P1.[2]

The one thing that the semweb promises that the non-semweb can't is the "miracle" applications : where A and B produce data without any knowledge of, or deliberate co-ordination with, each other, and a user of program P2, which is a generic semweb joiner without any special knowledge of A or B, finds that the two forms of data are such an exact fit that they can usefully be combined.

I guess the degree to which you believe in the semweb promise is the degree to which you think that such miracle situations will occur in real life. Personally, I think that the hard part is understanding the data from A and B sufficiently well to see if and when they can be combined at all.

Anyone who can do that can probably write a P1, containing that insight. Manipulating the relevant XML, especially with today's XML libraries, isn't so hard. And I think the SynWeb will see yet more powerful syntax processing and querying tools.[3]

The semweb scenario presupposes users who can't write such a quick custom script to combine A and B, but can understand the data (and the schemas) well enough to notice and formulate (in some sort of query language) sensible joins.

I may be wrong, and I'm always open to counter-evidence, but I still can't think of an example where this has actually taken place (ie. two datasets have been usefully joined by a program which didn't explicitly know about these two data-sets.) Any suggestions?[4]

Notes

[1] Sure you can use something like RDF as a representation format of data for a specific application for one set of users. But in this case the URI isn't actually buying you anything over any other sort of locally produced UID. So the differentiating feature of the semweb isn't actually being used.

[2]

In the comments : [quote]And writing scrapers is reasonably easy to do. I think this has got a lot of potential. There’s more work to be done on the software, but to me it is the best attempt at doing useful RDF that I have seen so far.[/quote]

Of course it's the best attempt at doing anything useful.

But scrapers are the living embodyment of the SynWeb.

Scrapers are the avatars of the theory that programs, not URIs, are what give meaning to data. They're stocks of rival knowledge about how to interpret it.

They're what the SemWeb wants to dispense with. Or rather, would be dispensing with if things were going its way. Instead, the proliferation of scrapers is a strong hint that it's not working out.

[3] RDBMS analogies with the semweb are wrong. The RDBMS is basically a powerful SynWeb tool. Meaning is relative to the applications. The design of a database is typically internal to a project or organization, and meaning derives from this context. To the extent SPARQL is just a good graph-shaped database it might also be a good SynWeb tool.

[4] I think some people have already mentioned the capability of adding data from other ontologies as a passanger on RSS 1.0 feeds, but unless the feed-consumer is doing something interesting with this data, without knowing about it, it still isn't doing anything that a P1-style program in the SynWeb couldn't do.

3 comments:

Danny said...

Manual trackback:
http://dannyayers.com/archives/2005/10/24/an-alternative-the-synweb/

garrett said...

If I understand your thoughts correctly, I've been thinking along a similar path... I've been specifically considering a world with YAML (or something similar as a expressive form - to communicate structured, loosely typed data. This is semantically marked-up data, but very dependent on the context of execution (reminescent of the "pragmatic web" notion).

I posted a quick screenshot and discussion of a rails application implementing a form of a parallel, typed 'command line' for a web server. If noting else, it's been interesting to consider...

http://discoverrails.blogspot.com/2005/10/command-line-on-steroids-with-rails.html

Also, I don't quite understand how you differentiate between the syntactic, the semantic and the pragmatic.

-garrett

Composing said...

Yep, that YAML thing looks very cool. (Though it's kind of annoying to me as a Pythonista that so much cool stuff seems to be happening in Rails these days. ;-)

But I'm definitely a big fan of SmartAscii / Wiki-like markup languages for programming

Not sure I really understand the semantic / pragmatic distinction either. What I feel is that if the capital S, capital W SemanticWeb means anything, it's a particular theory about what's the right way to infer the meaning of a piece of data. And that theory is that data becomes meaningful when it's associated with a particular URI.

I think the original grand vision of the SemWeb is that this association with the URI is both necessary and sufficient for a piece of data to be considered meaningful, whereas from people like Danny Ayers I get the feeling that they'll accept alternatives, just that URIs are the preferred way.

Now, I personally, feel that meaning is widely distributed in the context. Programs are good repositories of meaning. Although human practice, and programmer folklore are acceptable too.

What I mean by "pragmatic" is just that the true "meaning" of a piece of data is determined by (can be infered from) this wider context, including the requirements of the users, the behaviour of the programs, the file-format etc.

I don't see URIs as necessary, or even why they should be preferred. I think there are cases when it's pretty hard to create a URI naming scheme. And much easier to create a normal file format and assume that the programs that use it will be able to make the right inferences.