Re: Outdated link for character entity list in validator error message (non SGML character number)

2012-07-25 19:56, Josh Hillman wrote:

> When using http://validator.w3.org to validate HTML 4.01 Strict (and
> possibly others) by using direct input, the "non SGML character number" (133
> in this particular case) is reported appropriately when a character outside
> the accepted range is encountered, however the "character entity" link
> referenced in the error description appears to be outdated.  The "character
> entity" link references character entity documentation for HTML 3:
>    http://www.w3.org/MarkUp/html3/latin1.html

You are quite right. The link was wrong from the beginning, since the 
HTML 3 draft should never have been cited except as work in progress, 
and it expired in 1995.

> Shouldn't the link reference character entity documentation for HTML
> 4(.01)?:

That would be better. But the entire error description is outdated and 
really wrong from the beginning. A reference to entities is really 
irrelevant when the issue is plain character data.

> You have used an illegal character in your text. HTML uses the standard
> UNICODE Consortium character repertoire, and it leaves undefined (among
> others) 65 character codes (0 to 31 inclusive and 127 to 159 inclusive) that
> are sometimes used for typographical quote marks and similar in proprietary
> character sets.

That�s not correct at all. Unicode defines those code position as 
allocated to control characters, not undefined. They are disallowed in 
HTML, but that�s a different issue.

> Your best bet is to replace the character with the nearest equivalent ASCII
> character,

That was hardly good advice in the last ten years or so.

> or to use an appropriate character entity.

�Character entity� is a misnomer.

> For more information
> on Character Encoding on the web, see Alan Flavell's excellent HTML
> Character Set Issues reference.

It was truly excellent in the old days, but I�m sure Alan would prefer 
references to newer resources. Besides, there�s now flavell.org that 
hosts Alan�s material, so pointing to archive.org is outdated.

> This error can also be triggered by formatting characters embedded in
> documents by some word processors. If you use a word processor to edit your
> HTML documents, be sure to use the "Save as ASCII" or similar command to
> save the document without formatting information.

In 2012, such advice more likely causes confusion than helps anyone.

Yucca

Received on Sunday, 29 July 2012 19:53:09 UTC