Re: RULE vs MGET from Benja Fallenstein on 2004-03-10 (www-rdf-interest@w3.org from March 2004)

From: Benja Fallenstein <b.fallenstein@gmx.de>
Date: Wed, 10 Mar 2004 13:52:00 +0200
To: Patrick Stickler <patrick.stickler@nokia.com>
Cc: ext Phil Dawes <pdawes@users.sourceforge.net>, www-rdf-interest@w3.org
Message-ID: <404F0160.7030709@gmx.de>

Patrick Stickler wrote:
>>> (2) it violates the principle of URI opacity
>>
>>
>> Is this a real-world problem? robots.txt violates the principal of
>> URI opacity, but still adds lots of value to the web.
> 
> And it is frequently faulted, and alternatives actively discussed.
> 
> In fact, now that you mention it, I see URIQA as an ideal replacement
> for robots.txt in that one can request a description of the root
> web authority base URI, e.g. 'http://example.com' and recieve a
> description of that site, which can define crawler policies in
> terms of RDF in a much more effective manner.

That would carry over one of the reasons why we need a replacement for 
robots.txt: that its notion of 'web site' is bad. If somebody maintains 
a website for some project at http://someuniversity/~name/projectname/, 
that site should be able to have e.g. robot exclusion information 
without convincing the university's web server admins or purchasing a 
___domain name. See

     http://www.tbray.org/ongoing/When/200x/2004/01/08/WebSite36

The above proposes a Website: header containing an RDF URI. With URIQA, 
you could do an MGET on a page to discover its site, then do an MGET on 
that URI to find out about its robots policy. But doing an MGET on the 
root URI of the ___domain would be really flawed.

- Benja

Received on Wednesday, 10 March 2004 06:52:41 UTC