HTML and XHTML validation and repair

The educational technology and digital learning wiki
Jump to navigation Jump to search

Draft

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Introduction

Learning goals
  • Learn why standards are important and why web pages should comply with standards
  • Be able to validate HTML, find broken links and validate CSS
  • Be able to fix broken pages (i.e. understand error messages and use a repair tool)
Prerequisites
Moving on
Level and target population
  • Beginners
Remarks
  • This is a first version ...

Why should you care about valid code ?

W3C's My Web site is standard! And yours? propaganda page delivers the main arguments (quoted fragments are citations from this piece written by Karl Dubost)

  • Designing with standards will simplify Web site code maintenance because you will not have multiple versions for different browsers. Your pages will have a longer life and will not be dependant upon vaporous technologies.
  • “Technical constraints exist with any artistic medium, whether you are drawing, sculpting, or designing Web pages. Watercolors or oil paintings have their own constraints, but these techniques do not to block creativity, rather they provide structure for creative expression.” Have a look at the various designs at CSS Zen Garden that shows off 210 different cool designs that work with exactly the same XHML page.
  • “People with disabilities represents 8% to 10% of the total population. It's easier to maintain a Web site that follows accessibility guidelines (and therefore Web standards). Your Web site traffic will increase, and a wider variety of browsers will have access to site content”. Just an example: You may not care about blind people that use speech synthesizers to listen to web contents, but you want to use such a tool yourself in your car sometimes in the near future. Also you may have a cell phone and want to be able to look at the same web contents in a more linear way.
  • “Standards have been designed to keep in mind all potential audiences and technologyies. You will not be attached to any company or proprietary technology. You can use technologies that are independent of platforms requirements.”. E.g. valid HTML does run in the 2007/9 bunch of new browsers like Safari and Chrome.

The same article also provides some extra advise:

  • “Unfortunately, many books do not teach good Web programming. When you are creating a Web site, you should check the correctness of your markup. If you are a Web developer, be careful using books to develop your application and read the particular specifications which you are trying to implement.”
  • “Many authoring tools do not generate valid markup. Some have syntax checkers embedded into them, others do the right thing, and many do not generate valid markup. As an intermediate solution, you have to check your Web page with an HTML validator.”
  • May CMSs (e.g. templates uses or generators) produce bad code. There isn't much you can do about this, except complaining to the people who produce these.

Basically, what Karl Dubost is saying is that you must care about validity and you must not trust your favorite tools or even published books. You do have to learn how to validate code with independant validating tools.

The toolbox

SGML and XML validation

Since HTML code is SGML and XHTML is XML, standard SGML and XML parsers can validate the syntax of (X)HTML pages, e.g. find mispelled and illegal tags, find misplaced tags (e.g. a "p" within a "ul"), identify missing end tags and quotes for attributes etc.

Such tools can't find mistakes that relate to informally spelled out specifications (as opposed to DTDs (most HTML dialects) and XML Schemas (for some more recent standards).

Typically, complex text editors that programmers use have builtin tools (often via extensions) that validate SGML and XML languages. XML editors can validate XHTML (and other XML languages like SVG]], but not HTML.

IMHO, that kind of validation is good enough for most educational sites - Daniel K. Schneider 13:39, 4 September 2009 (UTC).

The tidy program

The tidy program is the most well know validation and repair program. Since tidy is an open source library, it is also embedded in many authoring environments as well as browser extensions.

The W3C series of online quality assurance tools

  • MarkUp Validator - Also known as the HTML validator, it helps check Web documents in formats like HTML and XHTML, SVG or MathML.
  • Link Checker - Checks anchors (hyperlinks) in a HTML/XHTML document. Useful to find broken links, etc.
  • CSS Validator - validates CSS stylesheets or documents using CSS stylesheets.

In addition, server administrators could install the Log Validator, i.e. a local crawling engine that will analyse the quality of a website with the help of various processing modules, e.g. the three tools above.

Other tools

Some high-end web authoring systems may include other good tools, but we shall not discuss these here.

Fixing code with Tidy

  • ...

Past and future of web standards

Standard vs. implementation

Development of web standards can be described as being fairly chaotic at times. If we interpret "standard" like many (misguided) decision makers and web developers "standard" becomes a synonym of "implemented" and (worse) "pratise".

A very typical example was the summer 2009 discussion about IE 8 that fixed bad CSS implementation mistakes of IE7. Some said that IE 8 wasn't compliant with the "IE 7 standard". Of course it's compliant with standards that IE 7 did render. IE 8 just renders code as intended, i.e. webdevelopers don't have to create wrong IE-specific CSS mistakes anymore ...

A related issue concerns introduction of proprietary extensions for which both Microsoft and Netscape were famous. The list of wild extensions is fairly long:

  • Early tags like the infamous blink
  • DHTML (both Netscape and Microsoft) had their own. DHTML is now standardized as a combination of [[CSS], DOM and JavaScript
  • New languages that pushed too hard before being standardized, e.g. Microsoft's

So please, when you talk about standards, make a distinction between standard and implementation. Do never use browser-specific extensions or hacks (unless you have the intention and the money to repair your code once a new browser version or product hits the market.

On the other hand, it is perfectly rational and ethical to produce "nich contents" that are standardiszed or at least "industry" accepted and that only work with a set of given browsers, for example

  • SVG, a standard for vector graphics (Firefox or Opera)
  • SMIL, a standard for multimedia animation an synchronisation (IE)
  • Flash, a proprietry vector graphics format and associated scripting language for multimedia animation and interaction and RIA platform (most browsers)

But think (!) before you do this.

  • Java, a proprietary (but open source) computer language specification for RIAs (i.e. applets)
  • etc...