Ask the Expert – Jim Manico

Jim Manico is the VP of Security Architecture for WhiteHat Security, a web security firm. Jim is a participant and project manager of the OWASP Developer Cheatsheet series. He is also the producer and host of the OWASP Podcast Series.

1. Although SQL Injection continues to be one of the most commonly exploited security vulnerabilities in the wild, Cross Site Scripting (XSS) is still the most common security problem in web applications. Why is this still the case? What makes XSS so difficult for developers to understand and to protect themselves from?

Mitigation of SQL Injection, from a developer point of view, is very straight forward. Parameterize your queries and bind your variables!

Unfortunately, mitigating XSS can be very difficult. You need to do contextual output encoding in 5 or more contexts as you are dynamically creating HTML documents on the server. You also need validate untrusted HTML that is submitted from widgets like TinyMCE. You also need to parse JSON using safe APIs such as JSON.parse. And then you need to deal with the very challenging issue of DOM Based XSS, a challenge that even tools have a problem discovering. And this problem is getting worse in the era of rich internet application development.

2. What’s the real risk of XSS – what can attackers do if they find an XSS vulnerability? How seriously should developers take XSS?

Attackers can use XSS to set up keyloggers, deface a website, steal session cookies or other sensitive data, redirect the user to an untrusted website, and circumvent CSRF protections … just to get started. If developers want to build secure web applications they NEED to take XSS defense seriously. And it can be quite difficult to accomplish this – especially for modern RIA/AJAX applications.

3. Where can developers and testers and security analysts go to understand XSS? What tools can people use to prevent XSS today and where can they find them?

One of the best XSS prevention guides is the OWASP XSS Prevention Cheat Sheet. Over 500,000 hits and counting.

For advanced practitioners there is the OWASP DOM XSS Prevention Cheatsheet as well.

4. When is XSS going to be solved for good, or will we have to keep on living with the risk of XSS exploits for a long time?

If developers are forced to manually output encode ever variable, I feel XSS will always be with us.

But there is hope in standards.

Content Security Policy 1.1 is a W3C draft  which promised to make XSS defense a great deal easier on developers. There is only mixed browser support for CSP today, but in 2-3 years when all browsers full support the CSP standard, there will be a browser-based highly effective AntiXSS methodology available to all.

I’m also fond of the HTML5 iframe sandboxing mechanism for XSS defense.

Top 25 Series – Summary and Links

As requested here are the links to all the posts on the Top 25 Most Dangerous Programming Errors. Please let us know if you have any suggestions or comments.

1 – Cross-Site Scripting (XSS)
2 – SQL Injection
3 – Classic Buffer Overflow
4 – Cross-Site Request Forgery (CSRF)
5 – Improper Access Control (Authorization)
6 – Reliance on Untrusted Inputs in a Security Decision
7 – Path Traversal
8 – Unrestricted Upload of Dangerous File Type
9 – OS Command Injection
10 – Missing Encryption of Sensitive Data
11 – Hardcoded Credentials
12 – Buffer Access with Incorrect Length Value
13 – PHP File Inclusion
14 – Improper Validation of Array Index
15 – Improper Check for Unusual or Exceptional Conditions
16 – Information Exposure Through an Error Message
17 – Integer Overflow Or Wraparound
18 – Incorrect Calculation of Buffer Size
19 – Missing Authentication for Critical Function
20 – Download of Code Without Integrity Check
21 – Incorrect Permission Assignment for Critical Response
22 – Allocation of Resources Without Limits or Throttling
23 – Open Redirect
24 – Use of a Broken or Risky Cryptographic Algorithm
25 – Race Conditions

Top 25 Series – Rank 20 – Download of Code Without Integrity Check

Teaching Sec503, our intrusion detection class last week, we yet again wrote a signature for a CVS exploit from a few years back. Sure, it is kind of old news by now. But I think it is is very timely if you are concerned about the integrity of your software. If you are not familiar with it: CVS is software used to manage source code repositories. A compromise of your CVS server means that you can no longer trust the software maintained by this server. Hardly anybody installs software from a CD anymore. Most software today is downloaded and installed. But how do you know if the download has been tampered with?

There are two main methods to verify file (and with that software) integrity: Cryptographic hashes and digital signatures. Cryptographic hashes are essentially fancy checksums. Unlike a checksum, it is very hard to find two documents with the same hash. An attacker will not be able to for example add a backdoor to your code, and then add additional comments or empty code to make the hash fit. This may no longer be fully true for MD5, but even MD5 is still “pretty good” and exploits at this point only work if certain conditions are met.

But there is a fundamental weakness that comes with using a hash to ensure file integrity: How do you make sure that the hash didn’t get changed? In many cases, the hash is stored in the same directory, with the same permissions, as the original code. An attacker could just replace the code, and the software using the same attack.

This problem is solved in part by using digital signatures. A digital signature starts out with a cryptographic hash of the software, but this hash is now signed using a private key. Nobody needs to know this private key, and it should be kept tugged away and out of reach. An attacker will now no longer be able to modify the hash undetected. However, in order to verify the signature, a user will need a copy of the public key. How do we make sure the public key is correct? This may be easy if you already have the public key, or are able to validate it out of band. But in many cases, you obtain the public key at the same time you obtain the signature and the code.

One solution to this problem is the hierarchical system implemented by SSL code signatures. In this case, the code is signed by a public key which in itself has been signed by a trusted entity. You operating system will usually trust a number of these certificate authorities by default and then trust every key signed by one of these trusted certificate authorities. This works very well, as long as these certificate authorities are careful in how they hand out these signed certificates. Sadly, bad certificates have been handed out in the past.

So we got our software. We verified that the signature is correct and installed it. We are not in the clear yet. Next thing you want to do is either download additional components or updates. In particular updates are typically verified by the application itself. There are a number of pitfalls that can cause problems:

– the application validates the signature, but does not ensure that the signature was created using a valid certificate.
– the application allows downgrades. An attacker can offer an older version (and the attacker of course has the valid signature for it) and then have the application downgraded so an old vulnerability can be exploited.
– a badly implemented update mechanism can lead to a DoS issue if the upgrade fails half way and does not provide for a simple “undo”.

So in short, here a quick checklist on what to look for:

– if you use regular hashes, store them on a different system then the original software, and secure them well
– try not to use MD5. SHA256 is probably the best algorithm to use at this point. Offer multiple hashes if you can.
– if at all possible, use proper code signing certificates.

Top 25 Series – Rank 25 – Race Conditions

Flying a lot, it happens once in a while that I arrive at the airport early enough to be offered to check in on an earlier flight. Usually the check-in Kiosk offers the option and lists the flight. Last year, I tried to took advantage of this offer, only to be told that the fligt was no longer available after selecting the earlier flight.

Well, not a big deal. I went to the gate, and waited for the later flight. As I tried to board it, my boarding pass in hand, I was told that there was no record of my reservation for this flight.

So what had happened? In this case, I can only speculate. But likely, a race condition occurred. Someone was being added to the earlier flight just as I was added. But before the system was able to check that my seat was actually available, I was removed from the later flight. Finally, as the change failed, the system “forgot” to place me back on the later flight.

Race conditions are very common in shared applications like web applications. Several users may try to order a limited resource. However, race conditions are not uncommon in other applications. For example in accounting, an account may be over drawn by deducting money multiple times before the balance has been adjusted.

Detecting and fixing race conditions is difficult. Usually, race conditions escape traditional testing as they frequently require multiple users to issue requests just at the wrong time, or one user sending requests carefully timed, sometimes multiple requests at exactly the same time.

This issue is usually best solved during the architecture and design phase. Critical functions that may be subject to race conditions need to be identified. Once identified, the functions can be isolated and by using techniques like proper database transactions, it is possible to isolate these operations sufficiently and roll them back completely if they should fail at any point.

Top 25 Series – Rank 24 – Use of a Broken or Risky Cryptographic Algorithm

There are a few rules every developer should follow when applying encryption:

– don’t invent your own algorithm
Cryptography is a difficult topic, best left to the experts. Implementing encryption algorithms is difficult and there are many traps waiting. Many times, you can get away with a broken custom algorithm, but only because nobody challenges the implementation. If you are happy coding unimportant websites nobody needs, then your time is probably cheap enough where you don’t mind wasting a few hours implementing your own broken algorithm.

It is best to stick with standard algorithms. Currently, AES (American Advanced Encryption Standard) is the standard encryption algorithm. The advantage of using a standard like AES is that you will find support in various programming languages and that future support is likely as well.

– use the strongest algorithm you can find
Cryptography is a constant battle against the ever increasing abilities to break encryption. Not only do researchers use better and better hardware to brute force encryption keys, but they also come up with more efficient algorithms to search for the key.

It is important to “over design” encryption. The goal should not be to find a “sufficient” algorithm, but the best you can find/afford. It is very hard to predict how de-ciphering techniques will evolve in the next 5 or 10 years.
– RTFM

Even if you use a strong encryption algorithm, understanding the details of implementing the particular algorithm is important. Whenever encryption is implemented, it is important to read any instructions that accompany the related libraries.

Top 25 Series – Rank 23 – Open Redirect

Open redirect (CWE-601) allows phishing attack to be more effective. Redirection is commonly used within all web applications for various purposes. From the login page, it is a common practice to redirect the user to another page once the user logs in. Sometimes the user goes directly to a content page and is redirected to a login page, in order to bounce the user back to the right content page, a redirection link is sometimes used.

Internal URL redirection is sometimes used throughout the site to get the user to the right place on the site. For example, the user can type in the name of the file in a field, and the web script can direct the user to a download page such as http://www.sans.org/download?=http://sans.org/files/[userinput]

Search engines are commonly used as open redirects, simply because search engines wants to keep track of where the user went, so the user clicks  on a link within the search results and is then redirected to the site they want to visit. Google had been such an open redirect in the early days.

The problem with the redirect is

http://www.sans.org/redirect?=http://phishingevilsite

This URL may look like it is pointing user to sans.org but in fact it is redirecting the user to phishingevilsite. This allows the phishing list to be more real looking.

For mitigation, if a link redirection is necessary, put in a hash to the URL querystring. The hash should be based on a secret key and the URL itself. Before redirection, validate the hash to make sure the redirection is legitimate.

The redirection can then happen on the HTTP header. Before sending out the final redirection header, check to make sure the referer tag is from an internal source, not somewhere else on the Internet or blank. Robots.txt can be used to exclude the redirect script from being indexed by the search engine. This attracts less attention to the redirection scripts.

If at all possible, avoid using full URL redirection, allowing only part of the path to be controlled by the user can cut down on some risks. To be more secure against open redirect, use a number or character substitute for the URL if possible.

Top 25 Series – Rank 21 – Incorrect Permission Assignment for Critical Response

Incorrect Permission Assignment for Critical Response (CWE-732) is a complicated name for a problem that is easy to understand. If you don’t go out of the way to do a few steps to secure your resources, they are probably not secured by default. Often enough in development, the responsibility to secure resources and components of infrastructure becomes the job of developer. If the developer isn’t told explicitly to secure the resources and setting the permission settings correctly, they properly won’t do it. In many programming languages, when a file is created within the code, the permission of the file is rather loose. Developer has to write extra code to protect those files. This lead to resources with improper or insecure permission settings.

To ensure the security settings are actually set, the security policy must be there to specify the requirements and with standards and procedures for changing it the right way. In the development lifecycle, such requirements should be observed and be part of the project requirement. Use of standard API and process to perform functions are encouraged, this takes the responsibility away from the developers who are busy enough trying to develop code. With careful planning and regular audit, this is a vulnerability that can be addressed.

Top 25 Series – Rank 22 – Allocation of Resources Without Limits or Throttling

A number of years ago I was conducting a black box test of a fairly large web application. As part of this testing I used an automated script to send malicious inputs to a number of forms on the site in question. I sent a lot of requests. Turned out that, under the covers, the form would send an email to a customer service representative every time it was submitted. The poor CSR got to work in the morning and had thousands of emails in his inbox. Fortunately, my testing didn’t DoS the site (although it probably did DoS the CSR), but it’s this type of situation that is covered by CWE-770.

If you have functionality in your application that can lead to some form of resource exhaustion you should define requirements that set limits on the number of resources that can be used. This can be implemented in the application itself or potentially with a WAF. Has anyone used a WAF for such a purpose?

Top 25 Series – Rank 18 – Incorrect Calculation of Buffer Size

Incorrect Calculation of Buffer Size (CWE-131) is another shameful member in the buffer overflow family. Buffer overflow is generally caused by copying or moving a piece of data to a smaller memory location hence overwriting some important data in the memory and corrupting the execution path of the computer. The most basic case of buffer overflow is not checking for buffer length before copying data. Even if the developer writes code to check length, there is still lots of room for error, this is exactly where incorrect calculation of buffer size fits in.

When the developer writes code or routine to check for the length of buffer to be moved or copied, sometimes the arithmetic is not exactly correct and this leads to incorrect calculation of size which in turn lead to buffer overflow. Most of the occurrences are due to human errors. A very common and well known flaw called the off by one is usually caused by developer forgetting about the NULL terminator at the end of the string or the fact that arithmetic starts at 0 rather than 1 within the programming language.

I will borrow an example from MITRE (example 4 on the page).

int *id_sequence;

/* Allocate space for an array of three ids. */

id_sequence = (int*) malloc(3);
if (id_sequence == NULL) exit(1);

/* Populate the id array. */

id_sequence[0] = 13579;
id_sequence[1] = 24680;
id_sequence[2] = 97531;

In this example, the intention of the developer is to create space for 3 integer (int), but coding mistake lead to only 3 bytes being reserved in memory. Each of the integer is 4 bytes in length, the three integers add up to 12 bytes. Now we have twelve bytes being put into 3 bytes which causes an overflow.

The solution to this problem is developer education and also review process. Peer review and also code scanner can help tremendously. Using more modern languages also tend to significantly reduce the possibility of these vulnerabilities.

Top 25 Series – Rank 17 – Integer Overflow Or Wraparound

At first sight, the integer overflow doesn’t look all that serious. Any system has a maximum integer number it is able to represent. For example, this would be 255 for an 8 bit system. Right? (keeping it simple to 8 bits for now) Not always. if the number is a signed integer, the maximum integer represented would be 128. Anything “larger” would become a negative number. The description for CWE 190 [1] got a number of nice examples, and I don’t just want to repeat them here. Let me instead point to a less common, but similar issue I ran into coding in PHP, to illustrate the problem from a different angle.

Like most languages, PHP implements a function to retrieve random numbers. In the PHP case, the function is called “rand” and it takes two argument. One sets the lower end and the other one the higher end of the output range. For example, rand(100,999) will create three digit random numbers.

At some point, I needed some decent random numbers for an ill conceived session implementation. I knew enough at the time to ask PHP for “large” random numbers. So I wrote code that looked a bit like this sample:

<?php
$min=1000000000000;
$max=9999999999999;
for ($b=1;$b<10;$b++) {
print rand($min,$max)."\n";
 }
?>

If you run this code, you will get something like this (yes… this is the output I got from php 5.1.6 right now on my Mac)

841607426
957487601
848768972
958968875
-102310784
-489293383
-337233534
-472732979
105331880

So what happened? These are 9 digits numbers, and they are negative as well as positive!

When I filed a bug report back when, I was essentially told: RTFM! I swallowed my ego and broke down to find the manual and read it [2]. Turns out, that the arguments only work as advertised if they are smaller then “getrandmax”. If you don’t stick within these limits, you essentially end up with an integer wraparound. Depending on the exact parameters you choose, you may actually get numbers that “look ok”, but only the last couple digits change.

Lesson learned: RTFM and be aware of the limitations of your platform!

[1] http://cwe.mitre.org/data/definitions/190.html
[2] http://us2.php.net/rand