Discuss: A More Useful 404
by Dean Frickey
- Editorial Comments
22 Spiders
If a spider follows a link to a page that doesn’t exist, then the e-mail message from that will allow me to correct the link and the 404 goes away.
Except that Slurp goes around deliberately making up URLs that it expects not to exist, so that it can check the site is correctly sending a 404 for non-existent pages (so it knows it can assume that a 200 page really is A-OK). You don’t want to be notified of every instance of this. I’m sure there is something in the user-agent string that you look for and use some sort of trickery to filter those out.
posted at 03:37 pm on November 24, 2008 by Stephen Down
23 spiders/bots
A couple of reader’s have written and are concerned the spiders and bots could result in a large number of e-mails being sent. But spiders and bots that are guessing at URLs will not generate e-mails because they are not following bad links and therefore will (probably) not have an HTTP_REFERER. But as I mentioned, HTTP_REFERER can be faked, so I’m not going to say for certain that this is always the case with all spiders or bots. However, I have been using the ideas presented here for the past few years and have yet to experience any problems with spiders or bots accessing the site.
posted at 09:09 pm on November 24, 2008 by Dean Frickey
24 Untitled
I have taken this stuff into consideration when doing redesigns for a number of sites. Instead of sending emails, I created a database to capture that data and then allowed the client to provide a correct URL for common issue pages and turn it into a redirect page. It also allows the user to see a list grouped by common pages and know how often it comes up.
Much more useful than an email every time something comes up. Also more scalable than the method in the article.
posted at 10:02 pm on November 24, 2008 by Matthew Darnell
25 Untitled
Just one thing.
If a user is reading certain page from your site, and manually types a new URL but gets a 404 error, wouldn’t that count as “a bad link on your site”?
For example, someone that’s on www.domain.com and types in the address bar www.domain.com/contact.
Wouldn’t that send a HTTP_REFERER with your domain? Then you would get an email saying there’s a bad link on the index page when there really isn’t.
Maybe I’m getting confused here, but I wanted to ask to make sure.
posted at 06:43 am on November 26, 2008 by Kevin Selles
26 Untitled
i was looking some place where I could get complete info about 404. Thanks to blog owner for writing such a good post
posted at 12:23 pm on November 26, 2008 by think flick
27 why not call the 404.pl directly?
Apache is more than happy to use a CGI as your ErrorDocument:
ErrorDocument 404 /cgi-bin/404.pl
If you don’t want that cgi-bin in the URL, just go for
Alias /404-not-found /cgi-bin/404.pl
ErrorDocument 404 /404-not-found
posted at 01:03 pm on November 26, 2008 by Dick Davies
28 HTTP_REFERER
Kevin Selles: It’s a good question, so thanks for asking it. The referer header is only sent by the browser when a link is clicked so, no, manually entering an incorrect URL will not generate an e-mail, regardless of the page you’re currently viewing.
posted at 05:14 pm on November 26, 2008 by Dean Frickey
29 Calling 404.pl directly
Dick Davies: You are correct in that Apache could be configured to call the Perl script directly. But when doing this the Perl script would be responsible for building the complete 404 page with all of the elements and styles necessary to have the look and feel of the website. And it will be more difficult to access the styles and shared elements which would be located somewhere under document root.
By executing the Perl script from within the .shtml page, the design of the 404 page (i.e. headers, footers, navigation, etc.) is easy. If you have a template for your site, the 404 page is simply a template file with the line,
<!—#include virtual=”/cgi-bin/404.pl” —>
inserted at the point where the content needs to appear.
posted at 05:34 pm on November 26, 2008 by Dean Frickey
30 Untitled
Dean Frickey: Many thanks for the explanation.
posted at 01:07 am on November 27, 2008 by Kevin Selles
Got something to say?
Discuss this article. We reserve the right to delete flames, trolls, and wood nymphs.
Create a new account or sign in below if you’d like to leave a comment.
Subscribe to this article's comments: RSS (what’s this?)






21 Many Thanks
Fine, I’m not a professional; I’m not even old enough to be classed as one. But personally, and from my point of view, can I congratulate the author on another great ALA article. It appeals to my more practical mind, and even made me update my 404 page.
In response to everyone else’s comments, I would like to add my own thoughts: firstly, that I rewrote this in PHP, hence reducing the security concerns I think. (If someone more experienced would like to comment on this, please do, I love being proved wrong.). Secondly I think that automated email do have their advantages – having received 4 about one link motivated me to do something about it; logs are very good for statistics but don’t give me the imperative to do something (stress on the “me” there).
And if James is following these comments, it would be interesting to know what harm emails cause.
posted at 10:03 pm on November 22, 2008 by Harry Burt