HTML-Math proposal summary

Introduction

This letter summarizes the current proposal for HTML-Math from Wolfram Research to the W3C HTML-Math Editorial Review Board. The current proposal benefits greatly from prior suggestions from and discussions with the other members of the board, for which we're very thankful.

This letter is by no means a complete, precise description of our proposal -- many details are left out (most importantly, the complete proposed standard character and operator dictionary and the precise set of transformation rules for expanding the standard linear syntax macros). These details will be supplied later if the general direction of this proposal is accepted. I think enough of our proposal is explained to give a good idea of its flavor and to serve as the basis for further discussions.

Some important aspects remain to be discussed further by the group before they are well enough understood to be part of a formal proposal, notably how best to allow author extensions of the built-in character and operator dictionary and transformation rules; these aspects are left to be specified in future amendments to this proposal.

Note that this letter supersedes all prior proposals from Wolfram Research, including the "position papers" (which were in general more precise than I am trying to be in this summary, though they were at a less concrete level). Note also that this letter is not an "official document" but rather is part of our ongoing dialogue with the HTML-Math ERB.

Goals

[description to be added later -- they're essentially the same ones listed on the ERB's home page and shared by the group members]

Overall Architecture

HTML-Math can be embedded in an HTML document within SGML-style MATH elements; i.e., it is preceded by a <MATH> begin tag (possibly with attributes) and followed by a </MATH> end tag. The contents of the MATH element are called "HTML-Math source text".

HTML-Math is designed to be interpretable either by code in an HTML browser, by a specialized "browser plugin" program, or by a standalone program via the "foreign notation mechanism" of a general SGML processing program. In order that HTML-Math is compatible between any of these implementation modes, the first step in processing HTML-Math source text is always the simultaneous parsing of SGML entities (used to represent extended characters by name) and embedded SGML markup (begin and end tags used to represent hierarchical structure and to provide a place for adding attributes to subelements of a document).

HTML-Math is always processed by the following sequence of steps. (Several of these steps make use of built-in information, consisting of a dictionary of character and operator properties, and a set of transformation rules; in a future amendment to this proposal, this information will be author-extensible for all or part of a document.)

1. Parsing of SGML-style entities (which represent extended characters by name) and markup tags.

2. Tokenization ("lexical analysis") of the non-markup source characters (including those represented by SGML entities parsed in step 1). (Each markup tag is treated as a single token; thus the output from this step is a single linear sequence of tokens.)

3. Operator-precedence-based parsing of the resulting token sequence, to generate an "expression tree". (When this letter needs to give examples of such an expression tree in a way distinct from the source notation, it will use the "display list representation", to be described.)

4. Application of transformation rules to the expression tree, to generate another expression tree, called the final display list.

5. Rendering of the final display list, in the medium and style chosen by the user of the rendering software.

Each of these steps is explained in more detail below. (This letter does not attempt to specify every detail, however; this will be done by subsequent addendums to this letter, if what is described here is accepted.)

But first I will show how the above steps unfold for a simple example, as a general orientation to this proposal.

Example, and overview of processing steps

Here is a piece of text with one piece of embedded math and one display equation:

	The solutions to the general quadratic equation
	<math mode=inline>
	ax^2+bx+c=0
	</math>
	are given by
	<math>
	x = {-b &PlusMinus; &root;{b^2-4ac}} &over; 2a
	</math>

If this example was rendered into ASCII it might look something like this:

	                                                  2
	The solutions to the general quadratic equation ax  + bx + c = 0
	are given by:
	
		             ________
		            / 2
		    -b +- \/ b  - 4ac
		x = -----------------
		           2a

Here is a brief description of each of the steps in the parsing of this example.

step 0: find MATH elements

Each piece of source text between the begin and end tags of a MATH element is parsed separately by HTML-Math. The first MATH element has the attribute mode=inline, causing it to be displayed in-line within the surrounding text; the second one has no mode attribute, so it uses the default value mode=display and is shown as an unnumbered display equation. The rest of this description concerns only the second MATH element.

steps 1 and 2: process extended characters (and markup); tokenize

Tokenization of the second MATH element results in the following token sequence; each token is a list of a token-type and a string literal (possibly containing extended characters) made from the token's source characters, and some attributes (not shown):

	(mi "x")
	(mo "=")
	(mb "{")
	(mo "-")
	(mi "b")
	(mo "&PlusMinus;")
	(mo "&root;")
	(mb "{")
	(mi "b")
	(mo "^")
	(mn "2")
	(mo "-")
	(mn "4")
	(mi "a")
	(mi "c")
	(me "}")
	(me "}")
	(mo "&over;")
	(mn "2")
	(mi "a")

The division of source characters into tokens, and the token types, are determined from the dictionary of character and operator properties. Each token may also contain a list of attributes and values which are also defined by the dictionary, such as precedence for operator tokens, but these are not shown above, for the sake of clarity.

[The ways of "escaping" characters which would otherwise affect the tokenizer (like the double quote which delimits string literals (described below)) will be specified later. This can't be done with extended character notation in a straightforward way, since step 1 is free to replace it with the actual characters it represents; this is a necessary feature of an architecture in which an SGML tool might preprocess HTML-Math.]

The full details of tokenization are given below, including a way of representing a multi-character identifier. It is also possible to give any token directly using SGML markup, e.g. 2,3 for a number literal containing a comma; this is useful for representing individual tokens which would not be tokenized in the desired way by the built-in dictionary, or which would not be given the desired attributes.

Briefly, the token types mi, mn, and mo represent tokens which will be parsed as identifiers, numbers, and operators respectively; mb and me represent begin and end tags, in this case "invisible grouping" characters. (At this stage, the mo tokens which will typically be rendered as "linear" operators in a 2-dimensional graphic medium (i.e. shown between their operands in a horizontal row) are not distinguished from the ones which will never be rendered directly since the expressions containing them will be transformed before rendering. The mi and mn tokens will all be rendered directly by default. The desire to support other rendering media, and (eventually) both author- and user- defined transformation rules in addition to the built-in rules, is one of the reasons for not distinguishing these kinds of operators at this stage.)

The SGML entities (e.g. ±) each represent extended characters. They are treated the same as ordinary characters, in that their tokenization and subsequent parsing is determined entirely by their entries in the character and operator dictionary. The ones shown above happen to be single-character operators, but others, e.g. "α", are letters which would be tokenized as identifiers. (This proposal will be accompanied later by a complete list of extended characters and their properties, comprising the ones in standard character sets like ISOtech and many new ones (and new names for old ones). All old names will be case-sensitive, but all new names consisting of concatenated words (as most will) will be allowed with the contained words capitalized or not. These characters are used not only to represent hundreds of renderable special characters which can appear in typeset mathematics (not all of which are part of Unicode), but also some nonrendering operators and identifiers used to generate certain layout schemas or for "semantic disambiguation". Characters of all of these types are used in various examples throughout this letter.)

step 3: parse according to operator attributes

The token sequence is then parsed according to the attributes of the operator tokens (precedence, associativity, whether an operator acts as a left or right bracket, whether it can embellish other operators). { and } are grouping characters whose only effect is to ensure that their contents are grouped into one subexpression by the parser.

The parser decides as it forms each subexpression whether it is a term or an operator (and thus how it is used during further parsing). In most cases it is a term, but when operators are "embellished" (e.g. subscripted) the resulting expressions remain operators, and retain the precedence and other attributes of the base operator. (This doesn't occur in the present example.)

The parser also introduces new tokens where necessary to represent "missing terms" and "missing infix operators", and decides whether missing operators should be parsed as "multiplication" (as in the above example) or "named function application". (It's unfortunate that this decision can't be deferred until the transformation rule stage, but these two invisible operators have different precedences. Authors are free to insert them explicitly instead of letting the parser choose one. Once the present proposal is amended to allow author extensions, authors will also be free to add transformation rules which further transform expressions containing these invisible operators, or even entirely new invisible operators.)

The token inserted in place of a missing term is (mi "&MissingTerm;"). The tokens inserted in place of missing infix operators are one of (mo "&InvisibleTimes") or (mo "&FunctionApplication;"), depending on how the parser interprets this invisible operator. The rule for deciding which one is inserted is precisely this: an invisible function application operator is inserted if and only if its left operand would be an identifier or a scripted identifier, and the token to its right is a left bracket operator (such as a left parentheses). By a scripted identifier is meant any "left-nesting" of any number and type of scripting schemas (subscripts, superscripts, prescripts, under or overscripts) and non-directly-rendering schemas (e.g. font changes) around an identifier token -- that is, the parser descends into the base (first) argument of any such scripts (zero or more), then checks whether it has reached an identifier.

These inserted tokens will typically render invisibly; the reasons they are inserted explicitly by the parser are to allow them to be inserted instead by the author with identical effects, and to simplify the later use of transformation rules.

The expression tree generated by the parser can be represented in "display list format" as follows (though it won't be suitable for display until some transformation rules are applied); the "leaf nodes" are tokens as described above, and the subexpressions grouped by the parser are lists headed by "mterm" (as in this example) or "moperator":

(mterm
	(mi "x")
	(mo "=")
	(mterm
		(mterm
			(mterm
				(mo "-")
				(mi "b")
			)
			(mo "&PlusMinus;")
			(mterm
				(mo "&root;")
				(mterm
					(mterm
						(mi "b")
						(mo "^")
						(mn "2")
					)
					(mo "-")
					(mterm
						(mn "4")
						(mo "&InvisibleTimes;")
						(mi "a")
						(mo "&InvisibleTimes;")
						(mi "c")
					)
				)
			)
		)
		(mo "&over;")
		(mterm
			(mn "2")
			(mo "&InvisibleTimes;")
			(mi "a")
		)
	)
)

step 4: transform to display list

The next stage is applying transformation rules to the parse tree to generate the "display list". These rules expand the "linear syntax" abbreviations for layout schemas (only some of which are used in this example) into a more general form (which can also be given directly using SGML markup). In a future version of this proposal, there will also be provisions for author-defined rules for use in expanding abbreviations or new constructs (perhaps with semantic connotations), and users (of renderers) will be able to add or override rules for expanding new constructs.

The display list is a representation of a single "displayable (or renderable) object", which typically contains other displayable objects as components. Each sublist is headed by the name of a "layout schema", which can be thought of (in the terminology of object-oriented programming) as a "class" of displayable objects. The layout schema include the token types which can be rendered directly, as well as a small list of compound forms corresponding to the "expression constructors" used in most present typeset mathematics.

The complete list of layout schemas are given below in a separate section, including for each one the transformation rules used to interpret its linear syntax form, and an SGML markup form in which it can be given in full generality and with attributes. (Any HTML-Math expression can be given in full SGML markup form, so that every subexpression is a separate SGML element; or these forms can be mixed with the ordinary linear syntax forms used in this example.)

The present example is transformed by the built-in rules to give the following display list:

(mrow
	(mi "x")
	(mo "=")
	(mfraction
		(mrow
			(mrow
				(mo "-")
				(mi "b")
			)
			(mo "±")
			(mroot
				(mrow
					(mscripts
						(mi "b")
						(mrow)
						(mn "2")
					)
					(mo "-")
					(mrow
						(mn "4")
						(mo "⁢")
						(mi "a")
						(mo "⁢")
						(mi "c")
					)
				)
			)
		)
		(mrow
			(mn "2")
			(mo "⁢")
			(mi "a")
		)
	)
)

At the risk of excessive repetition: each list in a display list like the above comes in one of the forms

	(layout-schema-name
		argument-1
		argument-2
		...
	)

where the layout-schema-name (e.g. mfraction, mrow, mroot) is one of the short fixed list of layout schemas (given below), or

	(token-type-name "token-character-string")

where the token-type-name (e.g. mi, mn, mo) is one of the short fixed list of token types (given below).

The process by which parsing and transformation generates the above display list is not given for this example, but should be clear from the descriptions below of the general rules for each step and from the specific descriptions of the layout schemas involved.

step 5: rendering

The display list is suitable for more or less direct rendering, since it is made of layout schema which are conceived as constructors for expression renderings. For each layout schema in the complete list given below, typical conventional renderings for 2-dimensional graphic media are described.

However, HTML-Math does not specify or require any particular rendering behavior. This is because it is intended to represent expressions in a way that allows them to be rendered to various quite different media (including, for example, interactive speech), and even within one medium, to be rendered according to the style preferences of an individual user, and in a way which suitably fits the context provided by the surrounding document.

On the other hand, HTML-Math does specify some contextual information which must be available for the rendering of any subexpression. This information includes certain attributes from surrounding HTML-Math or HTML elements (or, in the future, attributes specified by author- or user- specified rendering rules), and also certain attributes inherited from the location of a MATH element in a surrounding document (such as the text font, fontsize, and baseline position), which may ultimately be determined by non-math browser code either from the document itself or from something about its display environment. If an HTML browser supports HTML-Math embedded in an HTML document by means of an external program (e.g. a "plugin" or "helper application"), it must supply these attributes to that external program in order to allow rendered expressions to reasonably fit with their surroundings.

(A complete list of rendering attributes is not given in this letter. The mode attribute (which can be display or inline) has been mentioned already; among the attributes not mentioned so far are whether subscripts and superscripts should be positioned as is conventional for math or for chemical formulas. These attributes can be given on any HTML-Math element (when it's expressed as SGML markup) and apply to it and to all enclosed elements.)

HTML-Math also specifies a few semantic conventions which the layout primitives are intended to convey, when this might be necessary for correct rendering; for example, 2-dimensional renderers may render fractions with horizontal fraction bars or infix slashes according to the width of the fraction elements and the available width of the display, but this would not be correct for "columns" or "vertical vectors" as opposed to fractions.

copy commands, rendering, and computer algebra systems

When HTML-Math expressions are rendered into a potentially interactive medium (e.g. a window on a computer screen, which is also a potential acceptor of gestures with a mouse), a renderer may provide various "copy commands" which can be used to copy entire expressions or subexpressions, in various formats, into other documents or programs. There may be separate commands for copying either the HTML-Math source text associated with a subexpression, or the rendered output in various formats.

Note that there are two distinct ways in which source text for an expression might be copied -- either with or without the document-supplied contextual information which modifies its interpretation or rendering in the present environment. (Even entire expressions may be affected by contextual information in larger parts of the surrounding HTML document.) It is suggested that both kinds of commands be provided. The present standard is intended to make it clear exactly which such information needs to be copied and how it can be represented in the copied source text.

Some computer algebra systems should be able to accept HTML-Math input directly. For the sake of others, some renderers may provide copy commands which translate HTML-Math into the native input form for those systems. Such commands can be considered to be doing rendering into a special medium, which is intended to be displayed to a program rather than to a human being. When renderers allow their users to specify additional transformation rules for rendering into various media and/or in various styles, it is suggested that the list of supported media and styles (as well as the rendering rules for each one) be user-extensible, and that "copy rendered form" commands be provided for each medium and style for which the user has provided any rendering rules. This will allow the creators of computer algebra systems to publish lists of suggested "rendering rules" for translating HTML-Math expressions into the input formats of their systems, which can be easily installed by users for use in their renderers, and further modified or extended by users when desired.

rendering of syntax errors

It's possible for HTML-Math source text to have "syntax errors", though the language is specified so as to make this rare. (In the case of mismatched begin and end tags this may or may not prevent HTML-Math-specific code from ever seeing the incorrect source text, depending on the characteristics of the browser code for HTML as a whole.) In all cases in which HTML-Math-specific code detects a syntax error in HTML-Math source text, HTML-Math specifies only that the renderer should (1) not crash or otherwise generate any runtime error, (2) render correct subexpressions in the standard way (to the extent that parsing can separate them from the erroneous ones), and (3) render incorrect subexpressions in a way which makes it obvious to a human viewer that they are incorrect (e.g. as a "visual error message" of some kind). HTML-Math encourages renderers to make it as easy as possible for human viewers to learn the nature of an error from the rendering of an erroneous subexpression (even when they have minimal knowledge of HTML-Math), but there are no formal specifications about how this should be done.

The purpose of the requirement that errors be obvious in the rendering is to ensure that authors can use any HTML-Math browser for "testing" their HTML-Math source text, and can be sure that if it "appears to work right" in their test browser, that it is correct standard HTML-Math and therefore can be expected to "work right" in other browsers.

upward-compatible extensions

There is one exception to the above rule about the rendering of erroneous expressions, since it would otherwise disallow upward-compatible extensions added by creators of specific browsers to the source language accepted by HTML-Math (when these would be syntax errors in the standard source language), such as adding new extended characters or layout schemas: such extensions to a renderer are allowed, provided that the renderer can be run in a mode which disables all nonstandard extensions and treats them as syntax errors as described above. Renderers with nonstandard extensions should make it easy for users to discover the existence of this "strict standard mode" and how to use it.

Example 2

As another example, the indefinite integral of dx over x can be represented as

	&int; &dd; x &over; x

where the extended characters used represent (respectively) the integral sign (a large operator with precedence somewhat higher than +), the "differential d" (a high-precedence prefix operator), and an infix operator for forming fractions with horizontal bars (with precedence near that of division).

(The integral sign character can also be called &integral; or &Integral;. The name ∫ is provided since it is already part of the ISOtech character set.)

This source text is parsed into the form

(mterm
	(mo "&int;")
	(mterm
		(mterm
			(mo "&dd;")
			(mi "x")
		)
		(mo "&over;")
		(mi "x")
	)
)

and then transformed into the form for rendering

(mrow
	(mo "&int;")
	(mfraction
		(mrow
			(mo "&dd;")
			(mi "x")
		)
		(mi "x")
	)
)

When the result is rendered, the integral sign (being a large operator) is rendered in a larger font size.

If a definite integral was desired, this would be represented by embellishing the ∫ operator with a subscript and superscript, which could be done using their linear syntax forms by (for example)

	&int;_1%2 &dd; x &over; x

The details of the features introduced with this example are given below.

Layout schemas

Every expression in HTML-Math ultimately specifies the relative sizes and arrangement of a collection of symbols layed out in a "logically 2-dimensional" manner. This structure is specified not as coordinates, but in terms of a small set of "perceptual primitives" or "layout schemas" which are sufficient to describe almost all of the commonly used notations in existing typeset mathematics. This choice of level of representation is both as general and as abstract as possible while still being based on the structure of the notation for an expression, rather than purely on its semantic structure or meaning.

Although the layout schemas and the typical notations rendered with them are described in this letter in 2-dimensional terms since they are most commonly understood that way, they can also be considered as abstract expression constructors, so that HTML-Math notations are not inherently tied to physically 2-dimensional media, but can equally well be rendered into other media such as interactive speech or computer algebra systems.

Primitive token types such as "variable" or "number" are also considered a form of layout schema, though they have no substructure, because each of these token types is conventionally rendered differently. In the terminology of object-oriented programming, the layout schemas can be considered the subclasses of the class of renderable objects.

Each layout schema has a name beginning with "m" (for "math"). This name is used in the "display list representation" (as shown in the example given above) as the head of the display list for an instance of a given schema, and is also the SGML element name for the SGML form of each schema (i.e. the name used in the begin and end tags). (The initial "m" is partly to avoid collisions with other HTML tag names, and to make it easy for a reader to tell which tags are specific to HTML-Math. Note that certain non-math-specific HTML tags may be embedded in HTML-Math expressions, e.g. links, anchors, or font changes.)

The tag names with just one letter after the "m" are token types; the others are layout schemas, or (in the case of mterm and moperator, which are not strictly layout schemas but are included in the following list anyway) part of the expression tree generated by the parser. (The names are not usually related to the linear syntax forms in which some instances of a layout schema can be given.)

The following list of token types and layout schemas includes for each one a description of its intended purpose, conventional rendering (not a formal part of the standard), and semantic connotations (if any). The SGML markup form and the linear syntax form is also given; for the token types, the tokenization rules are discussed.

There are also some more HTML-Math examples showing the processing steps for schemas not covered in the example discussed earlier.

Token type schemas

Here is a short summary of the token type schemas described here:

Name    Represents                    Some examples in HTML-Math source

mi      variable or identifier        a       \sin     <mi>num-trees</mi>
mn      number literal                3.1              <mn>3.1e10</mn>
mt      text string                   "such that"      <mt>such that</mt>
mo      operator (rendered or not)    +                <mo
prefix=true>++</mo>
mb      begin tag or {                {       <mterm>  <mfraction>
me      end tag	or }                  }       </mterm> </mfraction>

mi: variable or identifier

The mi token type represents "identifiers", named elements typically used as variable names. Characters defined as letter-like by the dictionary are tokenized into individual identifiers (even if they are not separated by whitespace). (Rationale: single-letter variable names are much more common in math than multi-character variable names.)

A backslash followed by one or more letterlike characters or digits is tokenized into a single identifier (even if it starts with a digit); thus \sin and \3d are both single identifiers. The backslash is not part of the identifier name -- thus \x\y and xy are turned into the same pair of identifier tokens.

Note that the sequence of letterlike characters forming a single token after \ can't include whitespace. If it contains extended characters given in SGML entity notation, these must be letterlike, and the entity names should be terminated with ";" rather than with whitespace (except perhaps for the last one).

E.g., to specify a single identifier which looks like "cos" except that the middle letter is a hypothetical extended character, one might use

        \c&ExtendedLowercaseO;s

(followed by some non-letterlike non-digit character).

Any character sequence may be designated as an identifier by enclosing it within <mi>...</mi>.

Identifiers are typically rendered (in a 2-dimensional graphic medium) by displaying the characters of the name in a closely-proportionally-spaced horizontal row, with single-character identifiers rendered in italic (except for certain characters such as double-struck capital letters like the Z often used to represent the set of integers).

In future amendments to this proposal, it will be possible to associate a semantic type and a locality of reference to a given identifier in a given scope of a source document, and to specify a instance of an identifier as a "defining instance" in some scope, but the issues involved are not discussed in this letter.

mn: number literal

The mn token type represents "numeric literals", sequences of digits and decimal points typically used to represent numbers directly. The precise rules for tokenization of number literals are: one number literal token is formed from every maximal-length sequence of one or more digits mixed with zero or more decimal points, with no two decimal points adjacent. (If two such sequences overlap; the first one is used. However, this case is impossible, so this rule is never used.)

(The character dictionary determines which characters count as "digits" and as "decimal points"; in the standard dictionary these are "0123456789" and "." respectively. Note that the standard dictionary also declares "." as an operator, but its use as a decimal point in a legal number literal overrides its use as an operator.)

Neither commas nor minus signs (nor any form of "scientific notation") are automatically treated as parts of number literals. (E.g., -3 is parsed as a unary negation operator applied to 3.)

However, any character sequence may be designated as a number literal by enclosing it within <mn>...</mn>.

Number literals are typically rendered as a closely spaced row of their constituent characters, not in italics.

mt: text string

The mt token type represents text strings. These can be given as "string literals", i.e. as character sequences between double quotation marks (") (which are not part of the strings), making use of precisely the same character escape sequences [to be specified later] or extended character names as text outside of string literals can use, or they can be given as arbitrary character sequences between <mt>...</mt> tags.

String literals are typically rendered the same way as text which surrounds the MATH element. When exported into a computer algebra system, they should typically be represented as string literals in the format of that system.

mo: operator

The mo token type represents an operator token, whose properties (listed below) will affect the grouping of subexpressions by the parser.

The dictionary of character and operator properties defines certain characters as operator characters, and certain sequences of these characters as operators, with specific values of the properties listed below. The tokenizer turns maximal sequences of operator characters into operator tokens, and gives them attributes corresponding to the properties in the dictionary. When several potential operator tokens overlap, the leftmost one is chosen.

(When the dictionary is made author extensible in a future version of this proposal, it may be possible for authors to declare character sequences as operators which would otherwise be tokenized as identifiers, but this will never be done in the standard dictionary. Rationale: authors must be able to write any sequence of letter-like characters without worrying that it will be tokenized as an operator by default in some future versions of HTML-Math.)

Any character sequence can be specified as an operator by enclosing it within <mo>...</mo> tags; the properties given below will have default values (to be specified later along with the full standard operator dictionary) unless specific values are specified using attributes within the begin tag, e.g. <mo prefix=true prec=400>++</mo>.

The properties of any operator token include:

Whether the operator can be prefix, infix, or postfix (more than one form can be allowed). (Prefix operators have no left operand and postfix operators have no right operand.)

A left and right precedence (which can be used to define associativity and bracketing behaviors, as described below). Left precedences are only meaningful when left operands are allowed, so they are optional otherwise; similarly with right precedences. In fact, a separate right precedence can be given for use in the infix and prefix cases (as is done with a minus sign) or in the large-operator case; this is not allowed with the left precedence (since that would make it impossible in some cases to parse embellished operators unambiguously).

Whether the operator can be used to embellish other operators. Embellishing an operator means using it within some compound layout schema (e.g. giving it a subscript) so that the resulting expression has the same parsing properties as the bare operator (and in particular, so that the operator's original precedence is used to parse the terms surrounding the expression for the entire embellished operator). (The parser will copy all attributes of the original operator to the expression it generates to represent the embellished operator, which is headed by "moperator" (see below).)

Whether this operator is always a "large operator" (such as the integral or summation signs). This affects parsing in a way which supersedes some other attributes (e.g., large operators have only right operands). It is also possible to turn ordinary operators into large ones (see the mlargeop layout schema, below).

Whether this operator is "stretchy". This is typically used for brackets. This affects only rendering, and typically means the operator's vertical size depends on that of its operands (for brackets, operands are what is contained between matching pairs).

The attribute names and values corresponding to these properties are:

prefix=true (means this form is allowed, not required)
postfix=true
infix=true

leftprec=number
rightprec=number
rightinfixprec=number
rightprefixprec=number
largeprec=number

embellisher=true (means use to embellish other operators is allowed, not required)

large=true (means always parsed and rendered as a large operator)

stretchy=true

The numbers used as precedences must be integers (positive or negative). Higher numbers mean higher precedences, i.e., stronger binding.

The parser groups a term with the adjacent operator which has the higher precedence (assuming it is being used in a form which takes an operand on that side). If these precedences are equal, it groups the term with both operators; this feature is used to define "bracketing operators" such as parentheses so that the token sequence parsed from

(x)

namely

	(mo "(")
	(mi "x")
	(mo ")")

groups into the single expression tree

	(mterm
		(mo "(")
		(mi "x")
		(mo ")")
	)

In the standard dictionary, all kinds of left brackets are prefix operators with the same right precedence (which happens to be 0), and all right brackets are postfix operators with that same value of right precedence (that is, also 0), which means that even brackets of different kinds can group together, e.g. in expressions like

	[0,1)

(Individual brackets can be prevented from grouping at all by enclosing them in <mterm>...</mterm> (see below). Note that the invisible grouping characters { and } are not operators at all, and can't be prevented from grouping (nor, of course, do they render as curly braces); extended characters are provided which do parse as regular brackets and render as curly braces.)

The same feature of grouping a term with both adjacent operators is used to allow certain operators to have "flat" or "n-ary" associativity, e.g. + and ⁢. This is what causes the source text "4ac" (in the example given far above) to parse to a single (mterm ...) subexpression containing three subterms (which are mn and mi tokens for 4, a, and c) separated by two (invisible) operator tokens.

Some operators are intended instead to be left or right associative; for example, the superscripting operator ^ (see the mscripts layout schema below) is right associative, meaning that a^b^c parses in the same way as a^{b^c}. This is achieved (in the standard dictionary) by giving ^ a slightly higher left precedence than right precedence. Similarly, left associative operators have a slightly higher right precedence than their left precedence.

Sometimes, more than one operator has the same left and right precedence; this is true, for example, of relational operators, so that sequences of inequalities turn into single subexpressions even when (e.g.) both < and <= (or &LessEqual;) are used in the same sequence.

Note that even infix + and infix - are flat-associative with the same precedences; this means that the source text "a - b + c" parses into a single subexpression of five tokens, which is appropriate for rendering even though it is not the most convenient structure for some other purposes such as evaluation (though it is not in any sense inconsistent with the semantic meaning of the expression).

The properties described above are sufficient to generate all possible behaviors of any operator in HTML-Math. For convenience, alternative attributes are provided which set the above properties in typical ways. (These can presently be used only in <mo> tags, but in the future will be most commonly used when authors can add new operators to the dictionary.) These attributes have the default value "unused" so that they will have no effect unless set explicitly. These attributes are:

prec=number (sets both left and right prec to this number, perhaps modified by the assoc attribute)

assoc=right, left, flat (determines slight increment or decrement of a left or right prec set by the prec attribute)

bracket=left, right (sets the apropriate values of prefix, infix (false), postfix, all precs)

After the parser chooses between alternative forms of an operator token, it generates a modified token with only the appropriate attributes set, for passing to subsequent stages (transformation rules and rendering). This may be important if those stages use the attribute values in some way; e.g., the renderer may wish to add a different amount of spacing to the left of a prefix operator (which has no left operand) or an infix operator (which has one), or to make the spacing depend on the absolute precedence, or a user-specified transformation rule may depend on whether an operator was used in prefix form.

Some operators are normally never rendered directly; instead they are treated as "macros" for expressing other forms (like layout schemas) in an abbreviated way. For example, this happens to all the operators in the linear syntax forms of the layout schemas. This is implemented by built-in transformation rules.

Other operators will be rendered as "themselves". Typically they are rendered as if they were the same text characters (possibly extended characters) used to name them, with surrounding spacing adjusted by the renderer to best convey the structure of the expression.

Large operators are typically rendered specially: in a larger than normal font size, and with any embellishing scripts placed in different positions depending on whether the expression mode is inline or display. Stretchy operators are also typically rendered specially (as described earlier).

mb: begin tag or {
me: end tag or }

The mb and me token types are used by the tokenizer to represent begin and end tags (respectively) of HTML-Math-specific SGML elements which can contain subexpressions (but not elements like mn which contain only character sequences and are turned into single tokens). They are also used to represent the left and right invisible grouping characters, { and }, respectively.

These tokens are never rendered directly; what they each mean is described under the element name. It is an error for these begin and end tokens not to match exactly. (HTML-Math does not allow end tags to be left out or to be given in abbreviated forms.)

The invisible grouping characters are equivalent to the tags <mg> and </mg>. Their behavior is described under the tag name mg.

schemas for internal use

Some schemas are generated by the parser, but intended to be used up by transformation rules rather than being rendered directly -- thus they are not truly layout schemas, so I refer to them only as "schemas". They can be the heads of subexpressions in display lists generated by the parser, but are not normally present in display lists presented to the renderer.

(When authors can modify the built-in transformation rules, it may become possible for these schemas to be presented to the renderer, which the renderer should treat as an error, as described in the earlier section on rendering erroneous expressions.)

These schemas can be given directly in source text using SGML begin and end tags of the same name. In this case, the tokenizer produces mb and me tokens (described above) for the begin and end tags respectively, which are then parsed by the parser as if they were a special kind of brackets (different from ordinary brackets since they must always match, and in some cases don't prevent ordinary brackets from matching "across" them), producing a schema named with the tag name.

The { and } invisible grouping characters are treated by the tokenizer precisely the same as the <mg> and </mg> tags, respectively. (In the main example I showed them as generating the tokens (mb "{") and (me "}"), which was for the clarity of that description, but they act just as if they generated (mb "mg") and (me "mg") respectively. Of course, this fact has no visible effect once the parser has properly matched them; the actual internal representation used is of course not specified by HTML-Math (for these or any other data structures) provided the behavior is as specified here.)

The complete set of such schemas is:

Name         Purpose                           Example use

mg           invisible grouping (aka { })      {1-x} &over; {1+x}
mterm        term-like expression sequence     a+b
moperator    embellished operator              +_2
mlargeop     makes regular operators large
<mlargeop>&Union;</mlargeop>

mg: invisible grouping

The effect of surrounding some HTML-Math source text with { and } or <mg> and </mg> is to force the parser to parse it separately and group it into a single subexpression.

In some cases this forces bracket operators not to match anything outside, but this explicitly does not happen to a bracket operator, or to an embellished bracket operator, which by itself constitutes the entire renderable contents of the {...} form.

In no case does use of invisible grouping, by itself, force an operator to be treated as a term.

mterm: represent or force term-like expression sequence

Most subexpressions generated by the parser should be treated as terms if they are part of larger expressions. The sequence of component expressions which form them is grouped by the parser into an mterm schema. (This will later be transformed into some layout schema for rendering.) For example, a+b is parsed (before transformation rules are applied) as (mterm (mi "a") (mo "+") (mi "b")). (This will then be transformed by standard rules into the layout schema (mrow (mi "a") (mo "+") (mi "b")), since (mo "+") is not used as the linear syntax for some other layout schema.)

Source text enclosed in <mterm>...</mterm> tags is parsed normally, but then explicitly "forced" to be treated as a single term (for the purposes of further parsing) even if it would otherwise not be.

For example, <mterm>+</mterm> is parsed (before transformation rules are applied) as (mterm (mo "+")) and <mterm>+_2</mterm> is parsed (also before transformation rules) as (mterm (moperator (mo "+") (mo "_") (mn "2"))).

The explicitly added mterms will be removed by transformation rules before rendering, so their only effect on rendering is the indirect one of producing layout schemas which have operators in positions that would normally be used for terms; this may, for example, affect the spacing around those expressions, depending on the spacing rules used in the renderer.

moperator: embellished operators

When an operator is directly followed by another (postfix or infix) operator with the attribute embellisher=true, the first operator is "embellished" by the second one (and by its right operand, if any). The parser generates an internal expression headed by moperator (before applying transformation rules), and treats it as if it was itself an operator with the attributes of its "base" (the embellished operator), i.e. uses them for parsing the surrounding source text.

For example, in the expression

a +_2 b

the + is embellished by the _ (the infix operator for subscript) and the 2. Thus the parser generates (before transformation rules are used) (mterm (mi "a") (moperator (mo "+") (mo "_") (mn "2")) (mi "b")).

mlargeop: makes regular operators large

The standard dictionary provides a small number of operators with the large=true attribute (such as the integral and summation signs), but a large number of operators may be used this way in mathematical notation. HTML-Math provides for this by allowing any operator to be surrounded by the <mlargeop>...</mlargeop> tags, thereby turning it into a large operator in this instance. For example, <mlargeop>&Union;</mo> is a large "set union" operator made from &Union; which is an ordinary set union operator. An expression representing the union of all sets in the set S might be represented as

	<mlargeop>&Union;</mo>_{s&Element;S} s

where the extended character &Element; is the set-membership operator (which looks something like a small ε).

The precedence of a large operator generated by mlargeop is determined by the first successful method from among:

the precedence given by attributes in the <mlargeop> begin tag;

the largeprec attribute of the original operator, if it has one;

just higher than the ordinary precedence of the original operator.

"Expression constructor" layout schemas

All renderable expressions with subexpressions are represented by one of the following layout schemas:

Name            Represents              Some examples in HTML-Math source

mrow            horizontal sequence     a+b     [0,1)   &int;
&ee;^-x^2 &dd; x
mfraction       fraction                2 &over; 3      {1-x}
&over; {1+x}
mroot           radical (nth root)      &root; 2        &root; 2 % n
mscripts        subscript or superscript or aligned pair
                                        a_1     x^2     &Sum;_{x=1}%n
munderscript    underscript             &RightArrow;__"word"
moverscript     overscript              x^^&Cap;
mprescripts     presubscript or presuperscript or aligned pair
                                        F___0   F^^^1   F___0%%%1
mbox            hides all internal structure from renderer
                                        <mbox>x^2</mbox>^2

All of these except mrow are typically rendered in a "2-dimensional" form when rendering into 2-dimensional graphical media.

mrow: horizontal sequence

Most operators are (in 2-dimensional media) rendered in a horizontal row between or next to their operands. The layout schema used in this case is mrow.

An operator like + has no special transformation rule to specify its layout, so a "default" rule is used which turns any (mterm ...) schema which remains after other rules have been tried into an (mrow ...) layout schema with the same arguments. For example, a+b will be parsed into (mterm (mi "a") (mo "+") (mi "b")) and then transformed by this rule into the renderable form (mrow (mi "a") (mo "+") (mi "b")).

Renderers typically use spacing rules within an mrow which are sensitive to whether the constituents are terms or operators (including embellished operators), to the type of operator (e.g. prefix or infix), and sometimes to the relative precedence of nested operators.

An mrow can be specified directly as an SGML element by source text which looks like

	<mrow> arg1 <mc> arg2 <mc> ... <mc> argn </mrow>

where "argi" means the source text for the ith argument, and <mc> ("c" stands for "comma") is a special HTML-Math empty element used only to separate multiple arguments of the SGML forms of schemas. (It can be used in any schema which allows more than one argument.)

This form can be used to specify any mrow with one or more arguments.

A missing argument (e.g. in <mrow></mrow> or between two <mc>s) is replaced by a nested empty mrow (i.e. one with no arguments), which is neither an operator nor a term, and (typically) renders invisibly with zero width. There is no way to specify an empty mrow "by itself" in source text. (Empty mrows are used internally to represent certain other missing constructs, e.g. in the mscripts layout schema. For this use, it is important that an empty mrow is not equivalent to the missing terms or operators sometimes inserted by the parser, and that it can't be represented except as an argument to the SGML form of a schema.)

mfraction

A fraction with a horizontal bar is usually specified using the linear syntax form

	numerator &over; denominator

making use of the extended character &over; which is an infix operator with precedence near that of division (the / infix operator).

It can also be specified in the SGML form

	<mfraction> numerator <mc> denominator </mfraction>

It's an error if there are other than two arguments given in this form (i.e. other than one <mc>).

[In SGML form, certain rendering attributes which will be described later can be added to the begin tag, e.g. to modify the appearance of the horizontal bar.]

This layout schema carries the semantic connotation of a fraction, i.e. something which is semantically equivalent to division. This is important, because it means renderers are allowed to render it as if it was an mrow containing the / operator, e.g. if this is necessary due to the display width being too small (or whenever their user prefers it that way).

mroot: radical (nth root)

A radical sign (typically used to represent square or nth roots) can be specified by using the linear syntax

	&root; x

	&root; x % n

In the first case there is typically no "n" shown (in the place of the "nth root").

The semantic connotations include: n is equivalent to 2 if it's missing, and this expression can be rendered instead as a 1/nth power if necessary or desired.

The SGML form can be either of

	<mroot> x </mroot>
	<mroot> x <mc> n </mroot>

It's an error if there are other than one or two arguments given in this form (i.e. other than zero or one <mc>).

[In SGML form, certain rendering attributes which will be described later can be added to the begin tag, e.g. to modify the appearance of the horizontal bar above the expression whose root is being extracted.]

mscripts: subscript or superscript, or a vertically aligned pair of these

The mscripts layout schema represents an expression with one or both of a subscript and superscript, which are intended to be "vertically aligned" if both are present. (Note that this "vertical alignment" is logically meaningful even in non-2-dimensional media, e.g. in tensor notations.)

The following linear syntax forms generate the same thing as the following SGML forms (which all correspond to an expression tree (in display list format) of (mscripts arg1 arg2) for some arguments):

Form        SGML form                                        Explanation

x_a         <mscripts> x <mc> a <mc>   </mscripts>
subscript
x^b         <mscripts> x <mc>   <mc> b </mscripts>
superscript
x_a%b       <mscripts> x <mc> a <mc> b </mscripts>
aligned pair
x^b%a       <mscripts> x <mc> a <mc> b </mscripts>
aligned pair

It's an error if the mscript element has other than three arguments (separated by <mc>s).

The _ and ^ operators are each right-associative. [Full details of their precedences, and those of all other linear syntax operators related to scripts of all kinds, including the % used above, will be described later with the full table of operators and precedences.]

How the _ ^ and % operators interact to add multiple "scripts" to one "base" is described in a separate section below.

mscripts used to represent tensors

Additional infix operators %_ and %^ are provided which are left-associative but which also generate subscripts and superscripts. These are for use in representing tensor notations in the following way.

"Left-nested" mscript layout schemas (i.e. where the first argument of each one is the next one, from outermost to innermost in a chain) are interpreted as representing vertically aligned pairs of tensor indices (from farthest away to closest to the base expression). (This means that the source text will contain the indices from left to right.)

It is acceptable for some indices to be missing, and these will be rendered invisibly.

(This special interpretation of left-nested mscripts schemas also extends through left-nested mprescripts schemas in the same nested chain, and through any schemas which have no effect on rendering (such as font changing schemas), but not through any other schemas (in particular, not through moverscript, munderscript, or mbox schemas). The order of adding a new layer of scripts and a new layer of prescripts doesn't matter.

A typical rendering algorithm for this case (ignoring the possibility of unusually tall subscripts or superscripts) would determine the horizontal positions of the script arguments of an mscripts layout schema normally (i.e. based on the horizontal position and width of the entire base argument), but to determine their vertical positions would "burrow down" into the base (through any left-nested mscripts, mprescripts, and schemas with no effect on rendering) and depend only on the vertical position and height of whatever it found inside. A more careful algorithm might make use of a general grid or table layout facility to position all the scripts at once. Effectively, all the layout schemas in one of the left-nested chains being considered here form a single renderable object. (The reason they are not represented as a single level in one layout schema object is to make the transformation rules which form them from the linear syntax operators much simpler to express than would be possible that way, especially when scripts are mixed with prescripts.)

digression: format of transformation rules

The present proposal makes use of built-in transformation rules to turn linear syntax forms of some operators into internal forms.

Although the present proposal defines no "official" appearance or format for those rules (nor even their properties in general), there is some use for such a format in order to document the built-in rules (and, perhaps, to allow them in practice to be read from a file rather than hardwired into the rendering code, if desired). (Furthermore, it is expected that a future amendment to this proposal will provide a way for authors to add such rules themselves, for which a format will be needed.)

To these ends, here is an example of some built-in rules and a description of their format and operation.

The actions of the infix scripting operators can be described (and are in fact implemented) using the following transformation rules:

$base _ $sub     ->  <mscripts> $base <mc> $sub <mc>
</mscripts>

$base ^ $super   ->  <mscripts> $base <mc>      <mc> $super
</mscripts>

$base %_ $sub    ->  $base _ $sub

$base %^ $super  ->  $base ^ $super

<mscripts>   $base <mc>      <mc> $super </mscripts>  %  $sub  ->
  <mscripts> $base <mc> $sub <mc> $super </mscripts>

<mscripts>   $base <mc> $sub <mc>        </mscripts>  %  $super  ->
  <mscripts> $base <mc> $sub <mc> $super </mscripts>

The format of these rules in general makes use of the following two constructs:

Construct                     Purpose

template -> result            infix operator "->" for representing one rule

$name                         formal parameter or "pattern variable"

Such a rule is used by finding a subexpression which matches its template (or pattern) (which generates a necessary set of bindings of the pattern variables to subexpressions of the matched expression), and replacing that subexpression with the "result" after substituting the same pattern variable bindings in the result.

A list of such rules is used by using the first rule which matches, and repeatedly transforming an expression until no rules match. (But in the above example, the order of the rules doesn't matter.)

A list of rules should actually be repeatedly applied to all the subexpressions of an expression tree, deepest first; and whenever a rule is used, applied recursively and immediately to the result generated (after substitution of bindings for pattern variables). This matters in the present example (but explaning why it matters here is left as an exercise for the reader).

How the rules of the example behave specifically (i.e. what they are for) is described in the next section. [End of digression.]

how the infix operators for scripting work to fill in the desired script positions on one base

To repeat: The actions of the infix scripting operators can be described (and are in fact implemented) using the transformation rules given in the above explanation of the transformation rule format.

Here is how these rules actually work: The first set of rules just say that the infix operators _ and ^ each make a new mscripts element from their arguments, in each case leaving the unused script position empty:

$base _ $sub     ->  <mscripts> $base <mc> $sub <mc>
</mscripts>

$base ^ $super   ->  <mscripts> $base <mc>      <mc> $super
</mscripts>

The second set of rules simply say that the %_ and %^ operators do precisely the same thing:

$base %_ $sub    ->  $base _ $sub

$base %^ $super  ->  $base ^ $super

(The differences between these operators and _ and ^ are entirely in their precedences and associativities.)

The final set of rules say what the % operator does: it fills in an empty script position remaining in the outermost mscripts element:

<mscripts>   $base <mc>      <mc> $super </mscripts>  %  $sub  ->
  <mscripts> $base <mc> $sub <mc> $super </mscripts>

<mscripts>   $base <mc> $sub <mc>        </mscripts>  %  $super  ->
  <mscripts> $base <mc> $sub <mc> $super </mscripts>

By the use of these operators, a piece of source text can add new subscripts and superscripts to a given base from innermost to outermost, alternating subscripts with superscripts as it pleases, but once adding a script to a farther-right index position than before, can never "go back" to an empty position farther to the left.

A typical pattern for entering a tensor would be: for each pair of vertically aligned index positions, enter them in one of the forms

	$base %_ $sub
	$base %^ $super
	$base %_ $sub % $super

depending on which indices are present. Since the %_ and %^ operators always skip to the next index position whereas % never does (which is all evident from the above rules), this pattern will always work (unless both of a pair of vertically aligned index positions are empty, which is presumably a very rare case!). (Authors who prefer entering superscripts first can use the $base %^ $super % $sub form when both scripts are present.)

For example, the tensor which should render something like

     ab
    x
      cd

(with four indices in three aligned columns) could be entered as either

    x %^ a %^ b % c %_ d

    x %^ a %_ c % b %_ d

(using only left-associative operators).

munderscript

[to be described later]

moverscript

[to be described later]

mprescripts: presubscript or presuperscript or aligned pair of these

[to be described later]

mbox: hides internal structure from renderer

The mbox layout schema takes exactly one argument and has no effect on parsing (it does not force the argument to be a term, etc, any more than { } does). It tells a renderer to ignore any internal structure in the argument expression which might otherwise alter its interpretation of a layout schema -- only the overall "size and shape" of the argument is allowed to affect rendering of layout schema containing the mbox. (In non-graphical media, "size and shape" refers to whatever parameters must be taken heed of in order to fit a rendered subexpression into a rendered whole expression without undue "gaps" or "overlaps". E.g., in speech, "size and shape" might refer to the time interval occupied by a vocalization.)

Other than that, mbox has no effect on rendering (it's an invisible wrapper around its argument).

Its main use is to separate left-nested mscripts or mprescripts layout schemas from being interpreted as specifying scripts in successive tensor index positions on the same base; i.e. it forces (e.g.) <mbox>x^2</mbox>^2 to look more like

      2
     2
    x

than

     2 2
    x

However, since a renderer is allowed in principle to use arbitrary rules to allow subexpression structure to affect rendering of a whole expression, use of mbox may have other affects as well.

Additional Topics

The following additional topics (besides the ones mentioned near the word "later" in the above letter) will be dealt with later in more detailed versions of this proposal:

how tables or grids or matrices are represented

special markup with optional semantic information (such as information about identifier types and scopes)

equation numbering and related topics

special markup with optional hints to renderers, such as linebreaking hints or hints about good choices of subexpressions which can be collapsed or expanded interactively by the viewer

embedded html elements of various kinds (e.g. links)

relation to CSS

changes to font, bold or italic, font size

how diacritics are represented as overscripts

the "prime" postfix operator, which is really a superscript

SGML rules for terminator character of entities

extended chars included directly in source file also allowed

details of character properties defined in the dictionary

Introduction

Goals

Overall Architecture

Example, and overview of processing steps

step 0: find MATH elements

steps 1 and 2: process extended characters (and markup); tokenize

step 3: parse according to operator attributes

step 4: transform to display list

step 5: rendering

copy commands, rendering, and computer algebra systems

rendering of syntax errors

upward-compatible extensions

Example 2

Layout schemas

Token type schemas

mi: variable or identifier

mn: number literal

mt: text string

mo: operator

mb: begin tag or {me: end tag or }

schemas for internal use

mg: invisible grouping

mterm: represent or force term-like expression sequence

moperator: embellished operators

mlargeop: makes regular operators large

"Expression constructor" layout schemas

mrow: horizontal sequence

mfraction

mroot: radical (nth root)

mscripts: subscript or superscript, or a vertically aligned pair of these

mscripts used to represent tensors

digression: format of transformation rules

how the infix operators for scripting work to fill in the desired script positions on one base

munderscript

moverscript

mprescripts: presubscript or presuperscript or aligned pair of these

mbox: hides internal structure from renderer

Additional Topics

mb: begin tag or {
me: end tag or }