Encoded Ampersands (&) and Web Crawlers

Standard

I encountered a problem recently where IBM’s Omnifind Web Search Engine would not crawl internal links rendered within PersPective wiki.

Basically the problem is it could only navigate links like this:

http://someserver/somepage?param1=foo&param2=foo2

As opposed to this:

http://someserver/somepage?param1=foo&param2=foo2

 

The second version complies with XHTML 1.0 recommendations.  PersPective, like many XSLT based web renderers, is unable to render and unencoded ampersand. 

This is not good news.

My solution was to create an ASP.NET based HttpModule which filtered the output to replace the encoded ampersands.

So a  link such as:

<a href=”http://foo/wiki/perspective.aspx?action=view&amp;pagename=system:Welcome”>Click here</a>

Will be changed to:

<a href=”http://foo/wiki/perspective.aspx?action=view&pagename=system:Welcome”>Click here</a>

 

The solution works well.    My code is in VB.NET, although the article I based it on is in CSharp.  Take a look at “Producing Xhtml Compliant Pages With Response Filters” for more information.

 

The project is in Visual Studio 2003 format (.net 1.1) to match the current version of PersPective, although I’m sure it only needs a quick recompile to work with 2.0.

I’ve uploaded a zip file containing the dll and source here.  Enjoy!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s