DeXSS -- Java program for removing JavaScript from HTML

Dynamic web sites which allow users to enter text content containing HTML are at risk for so-called cross-site scripting attacks (Wikipedia, Securitydocs) attacks.

A common approach taken to mitigate this risk is to allow some HTML content, but block content that is potentially harmful. One problem with a straightforward approach to blocking such content is that HTML parsing in browsers differs from the ideal, and nefarious individuals can take advantage of these differences to obscure content.

DeXSS uses TagSoup, an open-source HTML parser that attempts to mimic how web browsers work. TagSoup reads wild HTML and generates SAX2 events. DeXSS invokes TagSoup and follows it with a pipeline of SAX2 filters to remove HTML tags such as script and attribute values containing such scripts.

Status

DeXSS 1.2 is an Alpha release. You should be aware of the following issues:

If you have an interest in working on these issues, please consider contributing to the project.

DeXSS API

DeXSS includes the following classes for direct use:

Documentation

Download

Current Version

How to build

  1. Type ant dist -emacs

Dependencies

How to test

  1. Test for false positives
     java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/benign/*.txt 
    
    or
     java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist\lib\dexss-1.2.jar org.dexss.Test tests/benign/*.txt 
    
  2. Test for false negatives
     java -classpath lib/tagsoup-1.2.1.jar:lib/osbcp-css-parser-1.4.jar:dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txt 
    
    or
     java -classpath lib\tagsoup-1.2.1.jar\;lib\osbcp-css-parser-1.4.jar\;dist/lib/dexss-1.2.jar org.dexss.Test tests/xss/*.txt 
    

Other Similar Projects

If DeXSS does not meet your needs, see freecode.com for a list of similar libraries in other languages such as PHP and Perl.

Todo

Warranty

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Copyright and License