XSS and CRLF injection prevention with HST-2
1. Introduction
Preventing XSS (Cross Site Scripting) vulnerability has been one of the most important responsibility of developers. If you are unfamiliar with XSS (Cross Site Scripting) vulnerability, please search on the internet about the topic. You will probably get tons of pages. Or you might want to read the following pages simply:
- http://en.wikipedia.org/wiki/Cross-site_scripting [1]
- http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet [2]
Especially, the second page provides simple rules on how to prevent XSS vulnerability.
CRLF injection is less familiar, and prevented by some containers (like Tomcat) by default. Since we support different containers, the HST also by default protects for CRLF injection. From http://www.veracode.com/security/crlf-injection :
"CRLF refers to the special character elements "Carriage Return" and "Line Feed". These elements are embedded in HTTP headers and other software code to signify an End of Line (EOL) marker. Many internet protocols, including MIME (e-mail), NNTP (newsgroups) and more importantly HTTP use CRLF sequences to split text streams into discrete elements. Web application developers split HTTP and other headers based on where CRLF is located. Exploits occur when an attacker is able to inject a CRLF sequence into an HTTP stream. By introducing this unexpected CRLF injection, the attacker is able to maliciously exploit CRLF vulnerabilities in order to manipulate the web application's functions."
HST has been implemented and tested with careful cautions on these kind of vulnerability issues. For example, any URLs generated by HST with HST link or url tags are always safe from this XSS vulnerability. Also, HST provides a servlet filter to make your web application more secure.
However, even though HST has been implemented with careful cautions and provided some useful safeguard utilities, you as a web developer should always check if you are fully following the suggested rules in [2].
For example, if your pages generate HTML markups based on untrusted data (visitor input data like a search field), then you always must encode the data properly. The following techniques we use in general:
-  
When you use JSTL core tags, you can use either ${expression} directly or <c:out value="${expression}" />. The former doesn't encode the value, but the latter encodes the value. So, if the value needs to be encoded to prevent any possible XSS vulnerability, you should use <c:out value="${expression}" />. 
-  
If you should use JSTL fn tags, you can use escapeXml function to encode the value properly. For example, ${fn:escapeXml(fn:join(document:tags, ', '))}. 
-  
When you add DOM elements directly from the client-side scripts, you may want to use encodeURIComponent(value) function if you don't want to add unexpected DOM elements from the string value. 
-  
If you use freemarker templates, use ${expression?html} to encode the data. 
2. XSSUrlFilter to enforce XSS and CRLF injection Prevention
The archetype ships by default with a configured XSSUrlFilter in the created web application to enforce XSS vulnerability and CRLF injection prevention.
You can put this filter first in the execution chain in order to check any malicious URLs first.
  <filter>
    <filter-name>XSSUrlFilter</filter-name>
    <filter-class>org.hippoecm.hst.container.XSSUrlFilter</filter-class>
  </filter>
  <!-- SNIP -->
  <filter-mapping>
    <filter-name>XSSUrlFilter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping> 
This filter actually works very simply. If it finds any character or string like ' <', ' >', ' %3C', ' %3c', ' %3E' and ' %3e' in the request URI or query string, then it simply returns HTTP BAD REQUEST (400) error.
For example, if it detects a malicious URL requests like http://www.example.com/news/search?p=<script>alert('attack')</script>, the filter simply returns 400 HTTP error without processing further.
This will be really helpful and complementing if you are not 100% sure that all your web pages are fully safe with proper encodings against XSS vulnerabilities. The filter also wraps the http servlet response to inspect which headers are written to it, to avoid possible CRLF injection. If the headers contain a CRLF characters, the visitor gets a server error.
3. Utility to encode untrusted data which may contain malicious scripts
In some cases, your application allows users to enter some data to be stored in the repository, and you need to be more careful.
Suppose the commenting example in a news article. What if some malicious script codes are entered with a comment? Then users will end up experiencing the XSS vulnerable page whenever they visit the news article page just because somebody commented on that with malicious script codes!
To prevent this risk, you might always want to strip out any script tags from a user's input or take text content only without any markups before storing it to the repository or any backend database.
There are many libraries such as htmlcleaner ( http://htmlcleaner.sourceforge.net/) that you can leverage to strip out any script tags. Please find one if you need to allow some limited markups only from a user's input.
Or, if you want to take simple text only without allowing any markups like the simple commenting example, then you can use a utility that HST provides: org.hippoecm.hst.utils.SimpleHtmlExtractor#getText(String html)
This static method extracts only text content from the html markup string.
Here's an example found in the demosuite site application of HST ( org.hippoecm.hst.demo.components.Detail.java) :
import org.hippoecm.hst.utils.SimpleHtmlExtractor;
//...
String title = request.getParameter("title");
String comment = request.getParameter("comment");
// ...
commentBean.setTitle(SimpleHtmlExtractor.getText(title));
commentBean.setHtml(SimpleHtmlExtractor.getText(comment));
// ...
// update now
wpm.update(commentBean); 
4. Summary
HST has been implemented with careful cautions on any possible XSS vulnerabilities and safe complying all the rules suggested in [2].
However, developers must pay careful attentions to their applications because their web application can generate contents from untrusted user input or data. They should always encode untrusted data properly with the best practices.
HST also provides XSSUrlFilter to complement any unsafe application codes which can remain in a specific HST based applications. If the filter detects any risky URLs, it simply returns HTTP 400 BAD REQUEST error.
Developers should be careful when they are storing untrusted data into the repository or any backend database, too. They can leverage a library to clean user inputs containing markups, or they can simply use a static utility method of an HST class if only text content is needed.