Thanks Rob for pushing me in this thread ;-)
I'd be interested to hear your thoughts on how A2ML or other related
options [3] could or could not work in the broader HTML, SVG, WebGL open
stack I've outlined below.
We think that by using an XML format like A2ML,  audio could be treated exactly as SVG  or X3D in an AR Framework:
The similarity between structured interactive audio (A2ML) and structured interactive graphics (SVG) is explained in:
http://svgopen.org/2010/papers/34-Sound_Objects_for_SVG/

Our vision of an AR framework is shown in the attached image.