- From: Benja Fallenstein <b.fallenstein@gmx.de>
- Date: Wed, 10 Mar 2004 13:52:00 +0200
- To: Patrick Stickler <patrick.stickler@nokia.com>
- Cc: ext Phil Dawes <pdawes@users.sourceforge.net>, www-rdf-interest@w3.org
Patrick Stickler wrote: >>> (2) it violates the principle of URI opacity >> >> >> Is this a real-world problem? robots.txt violates the principal of >> URI opacity, but still adds lots of value to the web. > > And it is frequently faulted, and alternatives actively discussed. > > In fact, now that you mention it, I see URIQA as an ideal replacement > for robots.txt in that one can request a description of the root > web authority base URI, e.g. 'http://example.com' and recieve a > description of that site, which can define crawler policies in > terms of RDF in a much more effective manner. That would carry over one of the reasons why we need a replacement for robots.txt: that its notion of 'web site' is bad. If somebody maintains a website for some project at http://someuniversity/~name/projectname/, that site should be able to have e.g. robot exclusion information without convincing the university's web server admins or purchasing a ___domain name. See http://www.tbray.org/ongoing/When/200x/2004/01/08/WebSite36 The above proposes a Website: header containing an RDF URI. With URIQA, you could do an MGET on a page to discover its site, then do an MGET on that URI to find out about its robots policy. But doing an MGET on the root URI of the ___domain would be really flawed. - Benja
Received on Wednesday, 10 March 2004 06:52:41 UTC