Web Requests and Data Leakage

Even if your site is browsed over HTTPS, it can be insecure if any assets (images, scripts, styles) are transferred over an HTTP connection. This will trigger a "mixed content" warning in the browser that many will brush off as unimportant. The warning can be a major issue for some sites, though, and I want to explain why.

When I wrote earlier on securing images and solving mixed content issues on websites, my friend Mike Schinkel asked if I could explain why it mattered.  I briefly covered information leakage in the comments, but wanted to go over things a bit more in detail for anyone interested.

The Tools

I use a variety of network tools to monitor requests going back and forth from the browser to the server.  Seeing exactly what the browser sends and exactly what the server returns helps debug interactions, detect potential optimization routes, and gain a better sense of what’s going on under the hood.

On my Windows machine, I use Fiddler as a proxy so I can see every piece of data passing over the wire.  On both my Windows and Mac I use Wireshark to monitor both web requests and, well, everything exchanged over the network.

Wireshark, in particular, is a fun tool to run when connected to any network as it lets me see not just the traffic from my machine but any machine on the network.  If you want proof that people are still using insecure network connections to send sensitive data, spend 5 minutes reading a Wireshark feed.  It’s a sobering experience.

The Experiment

To demonstrate how information could leak with images, I did a quick experiment on my own site.  I made a web request to [cci]http://eamann.com/biz/does-the-fold-still-matter/?username=testing&password=1234[/cci].  This is to demonstrate two things:

  1. The relative insecurity of GET parameters (variables passed in a URL)
  2. How data passed in a referrer header can be intercepted by an eavesdropper

Once upon a time I worked on a stateless REST-based web application that still authenticated users on each request.  The user authentication required the username and password of the current user to be passed as GET parameters in every request.  Every. Request.  This experiment recreates that environment as an illustration.

Assume, though, that we’re requesting an https url.[ref]I don’t have an SSL certificate set up for my site because I’m cheap. Also, none of the data on this site is of a sensitive nature, so encryption isn’t a big deal.  If, however, I decide to ever host sensitive data the first thing I’ll do is lock everything down over SSL.[/ref]  The document will be passed over an encrypted connection and eavesdroppers won’t grab anything.  The images, however, could be passed over HTTP and can be viewed by anyone intercepting network traffic.  If the browser is passing a Referrer header, eavesdroppers will now know exactly what page you’re viewing, can see any cookies on the site, and can see your GET parameters – your username and password.[ref]Good browsers are now locking down on this and beginning to omit the Referrer header when making an http asset request on an https page. But there’s no guarantee here, and you’ll never know what browser your visitors are using.  If the page needs to be secure, make sure your assets are secure as well or you’ll be leaking information to would-be attackers![/ref]

The Results

After making the request and taking a look in Wireshark, it was fairly easy to find an image request on the page.[ref]Remember, we’re taking the example of an SSL site that contains a vanilla HTTP image in its body.[/ref]

Screenshot-Request

If this were truly an SSL site, the other requests in this image wouldn’t appear (I’m filtering on HTTP requests only), but the GET for the image would still be present and (in most browsers) would still contain the same headers.  The image itself is an innocuous request.  It doesn’t expose anything vital.

The headers are a different story:

Screenshot-Referrer

I’ve highlighted the referral header in the example above.  In plain text, it would be [cci]Referer: http://eamann.com/biz/does-the-fold-still-matter/?username=testing&password=1234[/cci].  Here we have, exposed for all the world to see, the original web url and it’s plaintext GET parameters.[ref]I want to point out once more this will still be exposed on most SSL-secured websites if the image is requested over HTTP.  I requested the page over HTTP because my site doesn’t currently have an SSL certificate.  However, I wanted to present a real demonstration from a real website with data captured live over an open WiFi network.  This is a real request, captured remotely, using readily-available tools and eavesdropping over a real network.  I used my own site as an example so no one can accuse me of hacking their data.[/ref]

The Implications

Honestly, few of us are ever going to send a username and password pair via GET parameters.  But there are situations where this could still cause an issue.

I use a plugin on my WordPress site called “Share a Draft.”  This plugin allows me to very easily draft a post, and share a private link for that post with peers to review content before publishing anything.  This private link contains a hash, affixed to the URL via a GET parameter, that tells WordPress to give a visitor access to a not-yet-published post.

Again, my site doesn’t contain any sensitive information – but what if it did.  What if, instead of blogging about personal topics, I was writing news articles about major international affairs for a news media organization?  What if I sent a Share a Draft link to a colleague to proofread a post on not-yet-public international policy decisions?  What if my colleague read the post in a Starbucks over open WiFi?

Even if the site itself is encrypted and transferred over SSL, any non-SSL images in the post body can be intercepted by anyone else on that network, potentially revealing the referrer URL, exposing the GET parameter, and allowing an eavesdropper access to sensitive news data before it’s available to the public.

Is this a major security flaw?  No.  But it’s a flaw nonetheless and something security-minded developers and publishers should keep in mind.  I’ll be among the first to admit that thinking about security in this way is extremely paranoid.  I’d also like to point out, though, that professional paranoia is the mark of a good security engineer.

Have I convinced you that “mixed content” warnings are a valid security concern yet?