- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 29 Jul 2012 22:52:28 +0300
- To: Josh Hillman <hillman@joshhillman.com>
- CC: www-validator@w3.org
2012-07-25 19:56, Josh Hillman wrote: > When using http://validator.w3.org to validate HTML 4.01 Strict (and > possibly others) by using direct input, the "non SGML character number" (133 > in this particular case) is reported appropriately when a character outside > the accepted range is encountered, however the "character entity" link > referenced in the error description appears to be outdated. The "character > entity" link references character entity documentation for HTML 3: > http://www.w3.org/MarkUp/html3/latin1.html You are quite right. The link was wrong from the beginning, since the HTML 3 draft should never have been cited except as work in progress, and it expired in 1995. > Shouldn't the link reference character entity documentation for HTML > 4(.01)?: That would be better. But the entire error description is outdated and really wrong from the beginning. A reference to entities is really irrelevant when the issue is plain character data. > You have used an illegal character in your text. HTML uses the standard > UNICODE Consortium character repertoire, and it leaves undefined (among > others) 65 character codes (0 to 31 inclusive and 127 to 159 inclusive) that > are sometimes used for typographical quote marks and similar in proprietary > character sets. That�s not correct at all. Unicode defines those code position as allocated to control characters, not undefined. They are disallowed in HTML, but that�s a different issue. > Your best bet is to replace the character with the nearest equivalent ASCII > character, That was hardly good advice in the last ten years or so. > or to use an appropriate character entity. �Character entity� is a misnomer. > For more information > on Character Encoding on the web, see Alan Flavell's excellent HTML > Character Set Issues reference. It was truly excellent in the old days, but I�m sure Alan would prefer references to newer resources. Besides, there�s now flavell.org that hosts Alan�s material, so pointing to archive.org is outdated. > This error can also be triggered by formatting characters embedded in > documents by some word processors. If you use a word processor to edit your > HTML documents, be sure to use the "Save as ASCII" or similar command to > save the document without formatting information. In 2012, such advice more likely causes confusion than helps anyone. Yucca
Received on Sunday, 29 July 2012 19:53:09 UTC