Discuss: Community Creators, Secure Your Code!
by Niklas Bivald
- Editorial Comments
12 Re: Specifications
This can be partially mitigated by using a proper DTD for your documents, but you’re right. I suppose the idea about giving users a limited toolset would help prevent malformed code. However… if the parser makes the code valid as well, it shouldn’t be a problem.
posted at 07:38 pm on April 18, 2006 by Edward Yang
13 Re: Actual HTML Filtering
Using real HTML, CSS and URI parsers seems like the most secure solution, and it only has to be done when processing the input, not every time it’s displayed.
In Java, there’s TagSoup for parsing just about any input HTML, which is a good idea anyway, a few CSS Parsers and the provided URI parser, and presumably other languages have the same.
If you include only the elements, attributes, CSS rules and URI methods you don’t understand, and correctly escape the output with the right character encoding I don’t see how anything can slip through.
posted at 08:13 pm on April 18, 2006 by Carey Evans
14 The word JavaScript
Seems to me that if I was to summarise this article in a few words, capturing all the important information that is not already second-nature to most web developers, it would be: “The word ‘javascript’ can have line breaks (and spaces, and other separating chars?) in it”.
All the stuff about escaping special characters, white-listing HTML elements, and being careful about CSS input, have been well-known for years. It’s just that this new ‘ja-vas-cript’ IE trick has come into the limelight recently, because of the MySpace exploit that the author mentions.
posted at 12:44 am on April 19, 2006 by Jeremy Epstein
15 Try HTML::Scrubber
For the web app I develop in Perl, I’ve found HTML::Scrubber to be a good way to help in cleaning up input from untrusted sources – nicely customizable in a very perlish way. Check it out –
http://search.cpan.org/~podmaster/HTML-Scrubber-0.08/Scrubber.pm
does well with javascript, although I do not know about css (you can strip out any tag you’d like), or js in css.
posted at 03:55 am on April 19, 2006 by Justin Simoni
16 This just outlines how exciting the whole emerging
It is facicnating to see this issue discussed, we work on a number communitiy based sites and will now work on a solution for this. Upto now we have been using the HTML tidy variations for .net at…
http://tidy.sourceforge.net/
I’m sure this will resolve most issues. We have found it fantiastic! Also for editing HTML try using
http://tinymce.moxiecode.com/
We find this brilliant and has a range of options for narrowing the HTML tags allowed.
Hope this helps
posted at 10:06 am on April 19, 2006 by Trevor Spink
17 A broad definition of XSS
Partially in response to Brian Lepore (comment #8), I’d like to help underscore the threat of XSS. Any time user input is accepted (even if that input comes directly from a GET or POST variable), it needs to be properly escaped on output or you are at risk of an attack.
Here’s an example that brings it home for me. Imagine a site where you store a cookie for authorization. As you may know, cookie contents are accessible through Javascript. Imagine if this site’s login page would display error messages in the URL, like login.php?error=Incorrect password.
Seems innocent and common enough, but if I am an attacker, and I IM one of the site’s users with a URL like login.php?error=Incorrect password [removed]alert([removed])[removed]
(contrived example), you can see how I’d be able to manipulate the cookies and send their login cookie to my site (through a Javascript redirect). Similarly this can be used for phishing by sending the user to a fake login page via a link on the real site.
With XmlHttpRequest, I’d even be able to force the user to perform actions (via HTTP POST/GET) on the site – such as the voting example given in this article.
To make matters worse, many of the new community sites that are springing up encourage the user to enter HTML, and correctly differentiating valid HTML from invalid HTML is a difficult process. This means that you can’t really use stuff like PHP’s strip_tags() function.
Hope this helps!
posted at 12:19 pm on April 19, 2006 by thomas lackner
18 The way I see it
At the moment we’ve got a half-empty glass here. I can’t judge the contents just yet, because if I really want to get to the taste I need the whole thing.
I’m sorry to say, but I find the half of this article useless. It might start making more sense when part two is out and about, but until then this seems like a very lengthy introduction.
posted at 02:12 am on April 20, 2006 by Matthew J Matthiesen
19 Re: A broad definition of XSS
thomas:
I’m sorry that my previous post did not state this, but I am aware of the idea of validating input to protect users.
That said, I have never really understood why many sites have a tendency to use GET data like in your example, rather than keeping then the use of error code numbers. I know, it is quite annoying to have to look up the different numbers when you want to use something, but it saves the worry of someone injecting HTML into your site. In my opinion, the security benefit outways the simplicity in development.
I like the check_tags function in the first link that ban jax posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.
posted at 03:58 am on April 20, 2006 by Brian LePore
20 Re: A broad definition of XSS
I like the check_tags function in the first link that ban jax posted. It looks like a beefed up version of strip_tags that fits the needs of most developers that need to allow HTML.
The Iamcal code is quite interesting, but it doesn’t guarantee XHTML 1.0 valid code, since it doesn’t check the children of the elements (a tag within a tag). Also, it’s not easily extensible to environments that need a broader tag base: there’s a lot more to XSS in attributes than a few protocols. Especially true if you decide to allow the style attribute (which, as the article points out, can execute JavaScript too! Fun.)
Shoot me, but I’m not sure why anyone would need image tags for most applications either.
posted at 06:29 am on April 20, 2006 by Edward Yang
Discussion Closed
New comments are not being accepted, but you are welcome to explore what people said before we closed the door.
Got something to say?
Discuss this article. We reserve the right to delete flames, trolls, and wood nymphs.
Create a new account or sign in below if you’d like to leave a comment.
Subscribe to this article's comments: RSS (what’s this?)






11 Specifications
I think the problem there is that the browser (specifically IE in this case) doesn’t adhere to the specifications; or at least, that it goes beyond the spec by parsing – and executing – sloppy and/or malformed code.
posted at 06:46 pm on April 18, 2006 by Phil Stewart-Jones