Frank --

Many thanks for your thoughtful comments, below.

I'll try to take up each of them later in this note, but I should point out first please that the approach being advocated is not actually a controlled vocabulary one (as you assumed). It can be used for that purpose, but the technology itself is actually open vocabulary. The design is unusual, and it flouts our common expectation that natural language processing must entail the painstaking construction of (controlled) dictionaries and grammars of that moving target, English.

Roughly speaking, the design of the system [1] is "lightweight natural language plus heavyweight inference". What this means is that there is a conceptually straightforward two-way mapping from an English sentence with n place-holders (such as some-name, a-number....), to a unique machine-oriented n-ary predicate. This circumvents, rather than trying to solve, the "AI-complete natural language understanding problem". But, it has the kind of practical advantages that would one expect if it were possible to tie programming language comments computationally to the underlying Java or such.

To make this work, the supporting inferencing has to be highly declarative in nature. It's based on [2].

Of course, if person A writes some lightweight English rules, and person B computes with them, B may misunderstand what A meant. To mitigate this, the system can supply step-by-step English explanations of its inferencing.

Now, to your most excellent comments. You wrote...

It seems to me the context of the original discussion (correct me if I'm wrong) was that the business folks could understand fairly straight XML pretty well (even though it was pretty much equivalent to the RDF), but there seemed to be extra concepts in the RDF that created problems.

It seems that both XML and RDF are difficult for business folks as soon as you add query processing or inferencing for a non-trivial business task. Think XQuery, SPARQL, OWL, SWRL. Then, the business folks rightly call on their programmers. The business folks write specs as diagrams and in English, and hope that the programmers and the resulting programs assign the expected meanings. There are strong historical reasons for this approach, but we can surely do better when business value or personal safety is at risk.

It seems to me that the Clyde example you use really illustrates the dangers that can occur using *natural language*, rather than artificial languages. If I'm not misunderstanding the example, the problem occurs because someone uses "is a" (a natural language phrase) as if it meant both "is an instance of" and "is a subClass of", and then reasons as if it meant only one of those things. Your controlled English vocabulary distinguishes between "is a member of the set" and "is a named subset of", but so does RDF (rdf:type and rdfs:subClassOf), and certainly anyone using RDF would be unlikely to assume that they meant the same thing (even if there was confusion about what they *did* mean). In either case, the user needs to understand the difference between these two concepts in order to use them properly. Just because your vocabulary provides distinct "English" for these concepts doesn't necessarily mean that users will know "which English" to use in which cases.

As you point out, it's possible to make the mistake that is pointed out in the example in RDF and also in English. However, consider the whole system consisting of business folks, plus programmers, plus the web.   In that context, allowing a programmer to write rdf:type owl:TransitiveProperty is simply asking for trouble! If, on the other hand, the person writing the rules does so in executable (open vocabulary) English, there is a closer coupling between the business intention and what actually happens.

It's very important to distinguish between true natural language and controlled natural languages (which, I believe, is what you're proposing) [Actually not, please see above, -- Adrian]. There have been a number of these languages proposed (there's some discussion going on right now on the SUO email list on this topic as well). They can certainly be helpful in some circumstances (I've seen some work that looks reasonable on applying this approach in defining security policies, for example). However, care is needed in the general case. If you know you're talking to a machine (as you are in specifying machine-interpretable semantics), you need to keep in mind how the machine is going to interpret your "natural language" so as to couch it properly. It's the same idea as writing a contract. A contract is something that may wind up being interpreted by a court, and if the contract is at all complicated, you may want to talk to a lawyer, who will couch your wishes in a somewhat different "natural language" that the court will interpret properly (the way you intended). This itself is something that's quite familiar in a business context.

Yes, this goes to the core of the discussion. What is proposed, and available in [1], is something like a generalized, computed "contractual coupling" between the surface lightweight English and the internal machine notations. It's conceptually simple, but robust in practice, specially when compared to dictionary-grammar based systems. It's not in any way a contribution to classical natural language processing research, and the open vocabulary English is a bit stilted, but it works. And it works differently from controlled vocabulary systems.

...Just as we need to be concerned about how to convey semantics to people, we need to be concerned about how to convey them to machines, if machines are going to be able to do anything with them.

Yes, and the computational coupling between the human and machine languages addresses this.

That's why we have artificial languages, and need to go beyond OWL (SWRL is an example). If we're going to use controlled natural language to communicate between people and machines, there needs to be a well-defined translation from the controlled natural language to the artificial ones

A candidate well-defined translation is outlined above. In a bit more detail: Allow place holders in open vocabulary English sentences.   Write rules using such sentences. Support this with a bidirectional, generic, computational coupling between a sentence with n place holders and an n-ary predicate.

One can judge how well this works by running the provided examples in the system [1], and also writing and running one's own examples.   No dictionary or grammar of English is constructed, so there is a trade-off to be made to get strict English meanings. The trade off is, roughly, this. If you mean to write the same thing in two different places, you must use exactly the same English sentence (e.g. copy-paste). Alternatively, you must write rules that say that two sentences mean the same thing. (The system can in many cases warn an author if she fails to do this.)

...I don't think the translation should be buried inside the code of the controlled natural language interpreter

Yes, if the translation were conceptually complex, burying it would indeed be a bad idea. However, the translation to and from machine language is conceptually simple -- but it has to be supported by a highly declarative inference method, such as [2].

...Just because a controlled natural language looks like natural language doesn't mean that it will always be easy for non-informed users to interpret or use it properly (particularly in complicated examples, and especially when communicating from the human to the machine instead of vice-versa).

As mentioned, the advocated approach is open vocabulary. Otherwise, agreed, we can misunderstand one another when speaking English face-to-face, so we can certainly misunderstand a computer system that produces English results from English rules. But this still looks less dangerous than exposing non-informed authors and users to raw RDF, OWL or SWRL. As one of the designers of OWLish languages said in a recent list posting "no sane person would write directly in OWL" -- and he was addressing programmers.

Sorry to go on at such length, but these do seem to be important questions. For folks just joining the discussion , there's some material in [3,4] that tries to flesh this out a bit.

                         Cheers, -- Adrian

[1] Internet Business Logic, online, and free for experimental use, at www.reengineeringllc.com .

[2] Backchain Iteration: Towards a Practical Inference Method that is Simple Enough to be Proved Terminating, Sound and Complete. Journal of Automated Reasoning, 11:1-22

[3] http://www.reengineeringllc.com/Internet_Business_Logic_e-Government_Presentation.pdf

[4] http://www.reengineeringllc.com/demo_agents/

Dr. Adrian Walker
Reengineering LLC
PO Box 1412
Bristol
CT 06011-1412 USA

Phone: USA 860 583 9677
Cell:    USA 860 830 2085
Fax:    USA 860 314 1029

At 05:09 PM 12/20/2004 -0500, you wrote:

Adrian Walker wrote:

Danny, Peter --
Here are my two cents about...
/"I don't know if it's the semantics or what, but for some reason RDF
just comes across as too geeky for the business side of the house.
Maybe it's just that they've been hearing OO for 10 years and believe
that "Objects" are supposed to be something good so they instantly
adopt them. (Resource? What's a resource?)."

Adrian--

It seems to me the context of the original discussion (correct me if I'm wrong) was that the business folks could understand fairly straight XML pretty well (even though it was pretty much equivalent to the RDF), but there seemed to be extra concepts in the RDF that created problems. So, without getting into the XML vs. RDF argument (and after all, you can translate most if not all of the reasonable XML markup languages into RDF without too much trouble if you need to), if the business folks were happy with XML, they didn't necessarily need natural language did they?

/The e-Government Presentation at www.reengineeringllc.com <http://www.reengineeringllc.com/> argues that RDF is way too geeky, that this will be dangerous in real world applications, and that there is something we can do about it without throwing out the RDF baby with the bathwater.

The presentation certainly argues a need for "natural language" for communicating with (many) humans, and I don't really disagree. However:

a. It seems to me that the Clyde example you use really illustrates the dangers that can occur using *natural language*, rather than artificial languages. If I'm not misunderstanding the example, the problem occurs because someone uses "is a" (a natural language phrase) as if it meant both "is an instance of" and "is a subClass of", and then reasons as if it meant only one of those things. Your controlled English vocabulary distinguishes between "is a member of the set" and "is a named subset of", but so does RDF (rdf:type and rdfs:subClassOf), and certainly anyone using RDF would be unlikely to assume that they meant the same thing (even if there was confusion about what they *did* mean). In either case, the user needs to understand the difference between these two concepts in order to use them properly. Just because your vocabulary provides distinct "English" for these concepts doesn't necessarily mean that users will know "which English" to use in which cases.

b. It's very important to distinguish between true natural language and controlled natural languages (which, I believe, is what you're proposing). There have been a number of these languages proposed (there's some discussion going on right now on the SUO email list on this topic as well). They can certainly be helpful in some circumstances (I've seen some work that looks reasonable on applying this approach in defining security policies, for example). However, care is needed in the general case. If you know you're talking to a machine (as you are in specifying machine-interpretable semantics), you need to keep in mind how the machine is going to interpret your "natural language" so as to couch it properly. It's the same idea as writing a contract. A contract is something that may wind up being interpreted by a court, and if the contract is at all complicated, you may want to talk to a lawyer, who will couch your wishes in a somewhat different "natural language" that the court will interpret properly (the way you intended). This itself is something that's quite familiar in a business context.

That something we can do is to add some real world semantics -- far beyond the limited view of semantics as equal to type information inside all those angle brackets.

We certainly need to add those semantics, and you're right that it needs to be beyond the limited view of semantics that can currently be represented in RDF, RDFS, and OWL. However, just as we need to be concerned about how to convey semantics to people, we need to be concerned about how to convey them to machines, if machines are going to be able to do anything with them. That's why we have artificial languages, and need to go beyond OWL (SWRL is an example). If we're going to use controlled natural language to communicate between people and machines, there needs to be a well-defined translation from the controlled natural language to the artificial ones (i.e., I don't think the translation should be buried inside the code of the controlled natural language interpreter).

Moreover, as I observed earlier, just because a controlled natural language looks like natural language doesn't mean that it will always be easy for non-informed users to interpret or use it properly (particularly in complicated examples, and especially when communicating from the human to the machine instead of vice-versa).

--Frank