Discuss: A More Useful 404
by Dean Frickey
- Editorial Comments
2 You can do something about wrong links in serach r
Hi Dean,
of course you can do something about wrong links in search results. The best solution would be to redirect the user to the correct page, assuming that just the link has changed.
Otherwise, you can use the Google Webmaster Tools / Yahoo! Site Explorer / or any of the other search engines webmaster tools to remove that page from their index and allow users to have a better experience on the web.
Regards,
Olaf
—
Olaf Offick
http://www.learn-skills.org
posted at 03:44 pm on November 18, 2008 by Olaf Offick
3 Google 404
Has anyone used the Google 404 code from Webmaster Tools?
Its a javascript snippet you put in the body of the page and it will try to suggest other pages in your site (that are in the Google index) that match the bad URL the user typed.
Just wondering how well it works.
posted at 03:45 pm on November 18, 2008 by Jeremy Flint
4 Untitled
Sometimes it is also a good idea to redirect a 404 to the index-Page. Especially if you totally changed your site structure and you can’t redirect each old url to the new one, this might be an option. Otherwise you may lose a lot of link power.
posted at 03:48 pm on November 18, 2008 by Stefan Gebinder
5 Not so great
I’m sorry, but I just don’t subscribe to this notion at all.
For one, the execution could easily result in lots of emails from any number of badly configured web spiders. I mean, we’ve all seen the number of 404s our sites get.
It’s certainly not unique to get reports on the location of outdated links. But the article is just a taster of what is possible. It would be much more worthwhile to see an article on how to utilize something similar as part of a 500, with a complete debug/trace going to the developer. Perhaps thats the coder in me speaking out and has little place on ALA.
posted at 03:51 pm on November 18, 2008 by Peter Brown
6 Easy with Ruby on Rails
I’m currently using Ruby on Rails as my default web application framework, and it makes it incredibly easy to handles these missing requests. Simply create a “catch-all” controller that will log what the user requested, the number of times that has been requested, and you can put in the logic for directing them to a proper page (say you’ve analyzed where multiple users are going, and you know what they are trying to get to).
posted at 03:56 pm on November 18, 2008 by Brandon Martinez
7 Web Security Issues
This is helpful since handling errors is often forgotten about during the rush to go live.
If you are scripting this type of thing, it may be useful to log all 404 errors per session/IP address/IP range and choose some sort of threshold to terminate a session or temporarily ban the IP address. The threshold level will depend on the sensitivity of data on the site and say whether a user is logged in. If there are many ‘not founds’ in a short period of time, this can be an indicator of someone scanning the site. But if you have opted in to something that scans in this way (perhaps a remote vulnerability assessment tool), you’ll need to exclude that from any filtering. 404 logging should also be correlated with server error logging (as Peter alludes to above).
When using any data that can be modified by a user such as HTTP_REFERER or the REQUEST_URI be very careful about using it in scripts, writing it to your database, including it in an email or displaying it back to screen. If you are not careful, these could lead to added vulnerabilities in the web site.
The 404 page/script should also return a 404 ‘Not found’ HTTP status code. Interestingly on ALA, the link to your (Dean Frickey’s) details:
http://www.alistapart.com/authors/f/deanfrickey
returns a ‘not found’ type of page, but the status code is ’200 OK’ like other ‘not found; errors on ALA. Reference:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
posted at 04:20 pm on November 18, 2008 by Clerkendweller (London)
8
Interestingly on ALA, the link to your (Dean Frickey’s) details:
Temporary CMS hiccup. Sorry about that. Dean’s bio is of course online and the link works.
The status code is ‘200 OK’ like other ‘not found; errors on ALA
Thanks for alerting us to the issue.
posted at 04:53 pm on November 18, 2008 by Jeffrey Zeldman
9 Use the logs, Luke!
This solution is a duplication of effort, more complex than it needs to be, and opens up a potential attack vector that could otherwise be closed.
Web servers log the HTTP referer for every request, in addition to the user agent string, originating IP, etc. The same thing could be done with a script to pull out all 404s from the access log and analyze them the same way. If you want the script to e-mail you, it can be run in a cron job.
Using the logs means not needing to run extra (interpreted) code for every 404 request to pull the same information from the environment that’s already available in the log. In addition you can turn of server-side includes, which removes a potential exploit vector for your server.
The goal of informing the developer when users are seeing 404s is laudable; the method proposed here is inelegant.
posted at 05:04 pm on November 18, 2008 by Kevin Bullock
10 At least send headers
I think it’s at least important to send 404 headers.
posted at 05:57 pm on November 18, 2008 by Jupiter Florida
Got something to say?
Discuss this article. We reserve the right to delete flames, trolls, and wood nymphs.
Create a new account or sign in below if you’d like to leave a comment.
Subscribe to this article's comments: RSS (what’s this?)



1 Very good info!
It’s true, a 404 page isn’t often something that is in the forefront of the designer/developer’s mind when building a site but it really is important. One error or bad link is often enough to make the user leave and never look back but if your 404 page soothes them and makes them feel like you’re sorry for any inconvenience and you want to help them find what they’re looking for then you stand a good chance of keeping the visitor.
All good stuff!
posted at 03:42 pm on November 18, 2008 by Daniel Schonhaar