Set up your own local W3C XML/HTML file validator
You can use the public W3C Markup Validation Service to validate your HTML and XML files. However there are times the need to set up your own local validator arises. For example, your web site is an internal one and may be inaccessible from W3C's server. You may also be concerned about uploading your internal files to a public server. Since the W3C validator is open source, you can download and install it by yourself. Although there is the installation documentation for the W3C Markup Validator, we describe in detail on how to install the validator as a non-root user and in a custom directory.
Prerequisites: LAMP development environment and supporting Perl packages
As a prerequisite we suggest you set up a basic LAMP (Linux, Apache, MySQL, PHP/Perl) development environment and install the Perl packages useful for web site development.
Get the source files and install them to the right place
You can download the latest tar balls for the W3C markup validator and the Document Type Definitions (DTDs). Following we assume that you work with the version 0.8.3.
Unpack the two tar balls to your build directory and
they all go into the same directory
validator-0.8.3.
Browse the directory and get familiar with the file
layout of the source distribution.
To maintain the maximal flexibility, we want to put
all the files to a separate directory under our
web server's document root, e.g.,
$HTDOCS/validator.
If you follow
our article on setting up LAMP environment before,
then $HTDOCS is /opt/dev/apache/htdocs.
Now you can create the
$HTDOCS/validator
directory and copy all the files under
validator-0.8.3/htdocs to it.
Basically the validator-0.8.3/htdocs directory
contains all the files that will be read and served by
the Apache server.
The workhorse for the markup validation is actually the
one CGI script check in the
validator-0.8.3/httpd/cgi-bin/ directory.
You need to copy it to your Apache's actual
cgi-bin/ directory (
$HTDOCS/../cgi-bin/ in our case).
You need to at least
modify the first line of the script to point to your
local Perl executable (/opt/dev/perl/bin/perl
in our case).
Later on we will describe some additional changes
required of this script.
You can also copy another script sendfeeback.pl
to your cgi-bin/ directory.
Next you need to rename the additional Apache configuration
file for the validator,
validator-0.8.3/httpd/conf/httpd.conf, to
validator.conf and copy it to your Apache's
configuration directory. In our case, it is under
$HTDOCS/../conf. In your main Apache configuration
file httpd.conf add the additional line
Include conf/validator.conf
to include it.
Now you need to modify the validator.conf file to provide the
right locations for all the supporting files. For example, for the
AliasMatch directive, you should change it to something like
AliasMatch ^/+w3c-validator/+check(/+referer)?$ /opt/dev/apache/cgi-bin/check
AliasMatch ^/+w3c-validator/+feedback(\.html)?$ /opt/dev/apache/cgi-bin/sendfeedback.pl
to match the locations of the CGI scripts.
You also need to modify the
Alias
and the Directory
directives to point to your
$HTDOCS/validator
directory:
Alias /w3c-validator/ /opt/dev/apache/htdocs/validator/
<Directory /opt/dev/apache/htdocs/validator/>
...
If you use
mod_perl
version 2.0 and it has been loaded before the
validator.conf file is loaded, then
you also need to comment out the block that contains
<IfDefine MODPERL2>
You may need to comment out the block that contains the
Proxy
directive as well if you don't configure Apache as a proxy
server.
Install additional Perl packages
Now you can restart the Apache server and resolve any startup
issues you may have. Once it restarts successfully, you can
point your browser to
http://<host>:<port_number>/w3c-validator/
and try it out. Mostly you will get the 500 Internal Server Error.
In this case, you have to check the Apache error log
($HTDOCS/../logs/error_log in our case)
to find out why it fails.
To save you some time, we describe the additional changes
you have to make.
First you need to install a series of additional Perl modules. Actually the CGI script depends a lot on them to do the heavy lifting. These are the additional Perl packages you need to install:
Config::General
Encode::HanExtra
Encode::JIS2K
HTML::Encoding
HTML::Template
SGML::Parser::OpenSP
XML::LibXML
Net::IP
Most of the above packages are easy to install. Here we only
provide some more details on the
SGML::Parser::OpenSP module. If you don't have
OpenSP installed, you first need to download and install
it from
the
OpenJade distribution site.
There seems to be some minor problem with the source code distribution of the OpenSP 1.5.1 and we have to make the following two changes to compile the code successfully.
First in the
include/RangeMap.cxx file, we have to add the following
line
#include "constant.h"
otherwise it will complain that
wideCharMax
is not defined which is in fact defined in the
constant.h file.
Second we have to modify the
include/InternalInputSource.h
file and change this line
InternalInputSource *InternalInputSource::asInternalInputSource();
to
InternalInputSource* asInternalInputSource();
Once you install OpenSP, you need to download
the SGML-Parser-OpenSP package
manually. Unpack the tar ball and modify the
Makefile.PL file
to update
$options{LIBS} to add the proper library path to OpenSP
(-L/opt/dev/lib in our case)
and the INC
to add the proper include path to OpenSP
(-L/opt/dev/include in our case).
Then in the OpenSP.xs file,
you also need to comment out the following two lines
if (_hv_fetch_SvTRUE(hv, "show_error_numbers", 18))
pk.setOption(ParserEventGeneratorKit::showErrorNumbers);
because OpenSP's interface changes a bit.
Now you can run
perl Makefile.PL
gmake
gmake install
to install the SGML-Parser-OpenSP package.
After you install all the required Perl packages, you still have some more hurdles to overcome. :) Check your Apache error log for the details. Here we outline the changes to save your time.
Update configuration files for the chosen layout
First you need to copy the template directory
validator-0.8.3/share to
your $HTDOCS/validator directory so Apache can
access the templates to generate the resulting HTML files that
make up the validator frontend pages.
Second you need to modify
the
$HTDOCS/validator/config/validator.conf
configuration file to make sure that the right paths are set.
Please note that this is a different configuration file from
the one used by Apache ($HTDOCS/../conf/validator.conf).
To be specific, you need to modify the
Base,
Templates,
and Library settings in
the Paths section.
In our case, the right settings are
Base = /opt/dev/apache/htdocs/validator
Templates = $Base/share/templates
Library = $Base/sgml-lib
I know you have gone through a long journey to come to the end. Now
restart Apache, open your favorite browser and point to
http://<host>:<port_number>/w3c-validator/.
Voila, it works and you can start validating your markup files with
your own server now.
Automate site validation
Once you test that your local W3C markup validator works, you should consider automating this process for all the pages you own. Again you can use the powerful WWW::Mechanize perl package to do the heavy lifting. You can also combine it with the Test-Simple perl package to write test scripts that can become part of your site's test plans.
To begin with, you first need to compile the URI list of all
of your markup pages which you can get easily from your sitemap
file. If you haven't created a sitemap file yet, we suggest
you do so and you can find more details from our article on
submitting your web site to the popular search engines.
Then in your script you can just iterate through all the URIs
and then use
WWW::Mechanize to submit it to your local W3C
markup validator.
You can download the sample script that uses WWW::Mechanize to automate the validation of yuonlamp.com's markup pages and adapt it to your local environment.