Tuesday, 25 November 2014

How to design a high performance web site

Recently, I have been optimising login.pixelpin to try and make it perform as well as possible. I have learned a few things along the way, including the fact that it is not always obvious how to make things better!

There are basically 3 things that you are aiming for when making a high performance web site: 1) To reduce the overall content of your pages; 2) To reduce the network latency of your page impressions - related to but not the same as 1 and 3) To reduce the load on the web server so that it can perform better for the same number of page impressions.

Reduce Overall Content

This might seem obvious but if you look at a poor example, like amazon.co.uk, the page weight is a terrible 3MB! That's right, to see everything on the front page, you have to download 3MB of data, much of it contained in the 89 images it displays!

There are various ways to reduce the overall content and clearly, some of this happens early on in the design phase where designers should know about keeping their designs simple, not putting too much on each page and using images properly. Consider something like an e-commerce site with a section for "related products". If you make the related product content too heavy, every time someone views an item, they might have to wait another few seconds for the related products to load before they can see what they clicked on. On a fast connection this might not be too bad but on mobile networks or areas with poor connection speeds, you might quickly annoy your customers.

There are examples of content heavy sites in various places but on the Halifax banking site I reviewed in my last post, clicking on an account gives you transactions, direct debits and standing orders and statement searching functionality all on the same page. This is inefficient, even if the additional content is not massive.

Reduce Network Latency

Once you have reduced the content in the design of your pages, reducing network latency means 3 things: Removing redundant content, compression and caching.

Removing redundant content might be ensuring that commented out code is not kept in pages, unused scripts are removed and one that can be quite hard, removing unused CSS rules - which can account for quite a lot of CSS. If you use a framework, this is not usually very easy and something you might not use now, might be used on another page or might be needed in the future so be careful here!

Compression should be automatic on your web server. Text-heavy content, such as scripts and css files can often be gzip compressed by as much as 75%. The additional time to compress and decompress is usually small compared to the extra network speed obtained so this is always a good idea on resources that compress well. Formats that are already compressed, such as audio, video and images will not usually compress well so avoid these. NOTE that on CDNs, you do not usually get automatic compression and you might have to do something dynamically in your pages to check for the client supporting gzip compression before rewriting the link to point to a .gz version of the resource. Do not assume that all clients support gzip, they don't!

Another form of compression is to minify the css and javascript. Minifying is removing whitespace and replacing long variable names (that are easier to read when developing) with things like single character names e.g. var myLongVariableName; might be minified to var a; This can be done using a dynamic plugin that will do it on the fly (and should use web server cache to store the result) or it can be done manually and saved (usually as filename.min.js for example) and then set in the script or css links.

Caching is also really important and ideally, you should use long expiration times and some kind of cache-busting for when resources need to be updated. This is usually achieved with a querystring added to the link (even if it is not used) and then by changing the querystring, the client will assume it needs to re-download the resource. Caching time should usually be no longer than 365 days, since some clients get confused if it is longer! Cache tags for http responses are slightly confusing but usually the cache-control header is preferred and allows you to specify the type of caching (public or private) and the max-age in seconds of the resource. The older "expires" header had to use a date and time in the GMT timezone (anything that was wrong was ignored!). You can also set an ETag, which the client can resend to the server and ask "does this resource still have this ETag". If a server checks and the resource is still up-to-date, it should send a 304 (not modified) response which avoids sending the entire resource back across the network. This check still involves the server but much less network traffic so is still much faster than simply re-downloading everything. What is REALLY important with caching is to test it from a browser or a tool like Fiddler, which has a caching tab and which can report whether the caching responses match what you expect! You should also test what happens when caching expires and the client re-requests the resource - does it send an "If-Modified-Since" or "If-Not-Match" header and get a 304 response?

Reduce the load on the web server

Clearly, the more work the web server does, the slower it will be. Some of this is simply related to the number of customers/connections hitting the server but other slowness is due to poor design or designing a site that needs to make too many connections to the web server for each page.

As an example, login.pixelpin.co.uk references something like 10 scripts, 5 css files and even the front page references about 6 images. That means, including the page itself, the client has to make 22 connections to the web server to request all these items. There is usually a limit on the number of connections that can be made to the same domain at the same time so the client might even have to wait for some responses before it can continue requesting more resources - even if it already knows it needs them.

There are several ways we can make things faster for the web server and client.

Firstly, in our site, I was creating a web service connection as a local variable when processing requests from a page, which meant it was created and then disposed pretty much every time a user hit the site. I realised that since this was slow, it should be created in a static variable in code and accessed via a property so it was only created once when the code was first called.

Secondly, we can reduce the number of connections by combining (bundling) scripts and/or css into single files. For instance, rather than having 6 different bootstrap scripts, I simply copied and pasted them into a single script - the same amount of network traffic but 1 connection instead of 6.

Another common way of reducing server load is to use a content delivery network (CDN). Effectively, it just means serving content from another web server to keep connections away from the main one but generally these networks have a few features that help. Firstly, they are usually geo-cachable, which means the content is copied to various places around the world. This means that people who are in, say, Australia, get the content from a location in or near Australia even if your main web server is in London. Secondly, CDNs will be on another domain to your web server and won't set cookies. This is a minor thing but means that unnecessary cookie data is not sent with each request for these resources. CDNs can usually be attached to your web server content so that they will automatically pull new content as it is published. They can also be attached to a storage account and run completely separately from the web server. As mentioned before, you have to be careful with compression which is not usually applied automatically as it would be with most web servers, also, there is some concern that an additional SSL handshake to another domain might add some additional delay to the page load but I haven't seen this on my site.

Monday, 24 November 2014

Where Halifax have failed on their web site

We have kind of assumed that banks are mostly very slow at doing things so a couple of years ago when Halifax released their new web site (and disabled some of the old functionality a few months before it was available on the new site!) I expected a shiny fresh, up-to-date site using the best of the web etc.

What I saw was not really very impressive. After having a go at Argos the other day, I only thought it fair to point out where other people have also badly failed!

Definitely the most noticeable (and large fail) is the navigation across the site! It is ALL OVER THE PLACE. There are buttons at the left, the right, the top, the middle. There are tabs, links, text and adverts all over the place. The first rule of web design is to keep the main purpose easy and obvious, not to overwhelm the user with a 1000 possibilities.


This is actually two halves of the same screen! How many options?

Let us take a step back though. There are other examples of poor design, some are just typical crappy management issues where designs go through far too many people and far too much time to be consistent. Others are just plain wrong.

1) Welcome screen

I've just come from halifax.co.uk and been bombarded with cross-sales - the bane of the financial world. It's OK, though, I am going to login to my personal account now and what do I see? Accounts/home page? Nope, another attempt to sell me something I don't want - but of course, I will see it every time I enter the site!


There is a line with cross-sales. Sure mention something in passing, "would you like a loan as well?" but don't rub it in my face. Honestly, this is the kind of stuff that doesn't increase sales but makes your customers fed up and leave!

2) Security markings

I HATE the way that sites like Halifax (and Argos) have taken it upon themselves to use fake security icons to tell their users that the site is secure:




Having some text saying, "securely signed in" does not make you securely signed in. Security is a number of things, including TLS (commonly called SSL) but also data being secure in transit internally at the bank and also in storage - both on hard disks and any backups. This text means precisely nothing but worse than that is that it misleads people into thinking they are secure based on the promise of the site itself. It's a bit like a shop calling itself "Quality Produce" i.e. a conflict of interest. The ONLY time these types of badges are acceptable is when they are independently audited and checkable - and there are several available - so that a customer does not have to take your word for it.

3) Memorable Information

All secure sites have worries about logins. Passwords are generally not acceptable by themselves and whereas some banks have issued hardware tokens for their customers, Halifax has decided to use "Memorable Information", basically another password but which you only enter some of the characters of.


At best it is a minimal increase in security, just another 1st factor but the real rub is that it is so hard to use quickly. Scrolling through dropdown lists is not easy when they are small and you are using a mouse. Why not have boxes instead of dropdowns so I can type-tab-type-tab-type? What these are like for people with motor or sight problems I don't know.

4) The bottom of the account page


Check out this content at the bottom of the account page. How many links? How much functionality? Most of this could be pulled into other pages with simple buttons like "Search Statements" or "Export" so that the content of each can be clear and focussed. Instead, this nightmare must give people who are not confident on the web nervous shakes!

5) Sign out

This is by far the WORST example of the site from a security perspective. When you click sign out (bearing in mind that people often don't anyway), there is only thing that is permitted. Wipe all the session data, delete cookies and sign out - full stop. It shouldn't even check that you're currently logged in because if it does, that could fall over and cause some error - just delete all the session data and ideally go to a home page.

What have Halifax's Directors decided? We need to sell more stuff. What happens when I press sign out? I see this (complete with advert underneath)


So I have to press Sign out again or wait for the system to sign me out in 30 seconds. Long enough to read the advert and interestingly, plenty of time for the person behind me in the internet cafe to jump onto my computer, click to go back into my account (with no re-check of password!) and do what they want. Granted there are other checks against things like transferring money but still very sloppy and a classic example of why I want to see more legislation about what isn't and isn't acceptable when balancing security, ease of use and in this case making money as a business.

Conclusion

Not as bad as Argos who had loads of rendering and performance problems but also some major fails for such a big brand. At least the language in the security information page sounds like it was written by someone who knows what they are talking about (unlike Argos!)

6 out of 10

You can't use an absolute script url in a .Net script tag when runat=server

I found something annoying but probably unsurprising about using a CDN, they are not configured to serve content in GZip format automatically, as most web servers are (or should be) configured to do.

Using a CDN meant I got rid of some YSlow warnings while then getting some new ones "Some components are not being compressed".

It sounded like the best workaround was to upload normal and GZipped versions of the resources to the CDN and then on server, dynamically alter the src attributes for the various scripts to use either the normal one or the gzipped one depending on whether the client supports gzip (Although browsers pretty much all do, some other types of clients, including some anti-virus software do not).

This sounded easy enough but simply adding a runat=server to the script tag and referencing it in code compiles OK but at run time fails with an error: 'url' is not a valid virtual path. Of course it isn't, it's absolute because it is referencing a CDN but somewhere in the guts of System.Web.dll there is a check for a virtual path (like "~/Scripts/myscript.js") and it goes bang!

It seems like an over-zealous error check but there was a workaround to this bug:

1) Use an Asp:literal control (which will not render anything other than its text content) and set the default content between the opening and closing tag with the normal script tag pointing to the non-Gzip version (or you could default it the other way, doesn't make any difference)
2) In the Page_Load of the code behind of your page (or in my case, the Master page), check that the client supports gzip and if it does, replace the literal control text with the relevant version of the script like this:
aspnetScript.Text = "";

The literal doesn't do anything clever like escaping HTML characters so the correct effect is achieved in the page rendering.

Thursday, 20 November 2014

Azure CDN and Storage Not Suitable for Production

I've been spending some time trying to optimize loading of the pixelpin web site. There is all the usual nonsense with the "magic" MS scripts which appear automatically (although I found a handler to help this) but then I looked at using a CDN to make things better.

What is a CDN?

A Content Delivery Network really has two benefits over serving content directly from the web server. Firstly, it obviously reduces the number of requests to hit the web server, which is improves performance there, it also allows the browser to make more connections since the CDN will exist on another domain to the web server. Secondly, the CDN caches objects at multiple locations around the world and a call to a single URL uses geo-location information to retrieve it from the physically closest CDN server.

As an example, imagine that static content makes up 50% of your page weight and your web server is based in London, what happens when someone in Sydney, Australia, accesses your site? All of the content has to traverse the globe and at best, it will be slow. Now imagine you use a CDN, that has a location in Australia, only the dynamic content needs to come from London, the other 50% can arrive "locally" - all good.

Why I was considering it

Over a third of our page weight is in the form of 4 font files and these are obviously not going to be changed any time soon. It makes sense to move these to a CDN and I tried a single image as well for comparison. Since we have stuff hosted on Azure, it seemed to make sense to use Azure CDN (although it doesn't really matter and I might try Amazon for comparison).

How did I do it?

Azure allow you to create a CDN endpoint linked to either your storage account or to a folder under your site. I did the site one first but I can't remember now why this wouldn't work. I then created an endpoint that pointed to my storage account.

One thing that is not obvious is that there are two settings that are not exposed on creation and are not obviously present in the CDN page - but you can click on the endpoint to go into a settings page. The first of these is that https is available but disabled by default. I needed this because my page is https. The other option allows query strings and this allows you to cache bust items that you need to reload and, again, this is disabled by default.

Once this is done, you have to upload items to the public container you are using. Bear in mind that any public container in the storage account will be exposed, I created a single "asset" folder and made it public and uploaded all my web fonts and the single png image. Note that you can't upload (for some reason) in the portal so I used Azure Storage Explorer to upload the items. The version I used didn't correctly set the mimetype so I had to edit the blobs afterwards and set them correctly.

I waited a while and tried the CDN endpoint with the image and all seemed to be good.

What went wrong?

I changed my site to link to the fonts on the CDN and deployed it to Azure. It didn't work. The first thing I noticed was that Firefox had downloaded two types of fonts (woff and ttf) but had not applied either. I didn't twig at first but a quick question to the office and I learned that by default, Firefox, IE and others will not load fonts from a different domain by default - you have to specify the root domain in the font response to basically say, "I allow this font to be used on this domain" - this uses something called CORS and uses a response header "Access-Control-Allow-Origin", which specifies the allowed origin. OK, that sounded OK and it looks like Azure supports this so I ran some code in the project (it is not possible via the portal, which is pretty rubbish) and supposedly this would add the header to all responses to the blob storage.

It didn't work. The code ran but the response headers did not change - even after deleting and re-uploading the files. I read that the CDN strips out these headers but as it turns out, even accessing the item directly in blob storage still doesn't return the header.

OK, so that was fonts out the window but I noticed something else that wasn't working correctly. The caching.

The images seemed to cache correctly and refreshing from the CDN would generate a 304 (not modified), which was great. Trying this with fonts however never worked, the response was always 200 and the body returned. Since the fonts were the largest of the items I needed to cache, if it didn't work on that, it was another show-stopper.

What I noticed was the strangeness of the ETag. This tag is a unique key that identifies a resource and which the client can return when checking for another one. If the server knows that the item previously sent to the client and the current one have the same ETag, it can send back a 304. The problem is that ETag MUST be quoted according to the HTTP spec. The reason escapes me and some browsers don't care but others do. Why is this a problem? It is virtually impossible to tell whether it is quoted or not:

  1. In the case of Azure blob storage, the etag is added automatically.
  2. You cannot modify it
  3.  In Azure Explorer, the ETag is displayed in the list view in quotes but get the blob properties and it shows as unquoted (and can't be changed)
  4. Edit the blob properties in the Azure portal and it shows it as quoted (but can't be changed)
  5. Look at it in Firebug for Firefox and it shows quoted - but so do all the other response header values - not helpful!
  6. Look in Fiddler for the request and it is NOT quoted and the caching tab complains about it.
So despite it looking correct in the config (mostly), it is not returned with the quotes. Is that anything to do with why the caching isn't working? Not sure, but it doesn't help trying to debug stuff.

Conclusion - tldr

CDN - possibly stripping out CORS tags - breaks fonts
CORS tags don't seem to work anyway
Caching not working consistently
Not a production ready setup and I won't use it for that reason.

Thursday, 13 November 2014

Some of my useful Github Projects

I don't use Github massively often but sometimes I have decided to upload something that can hopefully help people out a bit. A couple I want to mention are:

https://github.com/lukos/COM.PixelPin.Utilities.SChannel - A project to configure Windows SChannel (for SSL/TLS) using Powershell either from code or from the command-line.

https://github.com/lukos/CspPolicyBuilder - A .Net class that enables you to easily deploy a Content Security Policy to your .Net web site. It's pretty basic really but just makes creating the policy a little less manual than usual.

https://github.com/lukos/SQLDatabaseBackup - A useful utility for backing up SQL Azure databases to blobs and then downloading them locally to keep in cheap storage that you probably have.

Wednesday, 12 November 2014

Where Argos have failed on their 'new' web site

I can't remember when the new Argos site was release, I think it has been 6 months or so but for various reasons, it was slated and sadly, there are still many parts of it that are severely lacking. Really, the main issue is that people need to trust the web sites they use. Poorly rendered pages, misleading information or words that sound like Argos don't really understand what they are doing gives the whole site a fairly amateur feel and doesn't leave me feeling particularly safe on their site.

First, let's get some of the basics out of the way. Last night, I was given a "sorry" page since it said, "there were an excessive number of people visiting the site" and therefore some rate-limiting was in place. Unless this was some kind of Denial of Service attack, there is no excuse for a large online retailer to ever have to do this. It's like locking your shop door for no good reason. The system should be able to scale to very large numbers of customers and if it is designed properly, that is actually very easy to do.

Even today, I went to order something and various bugs occurred, meaning I needed to start again. Not the end of the world but pretty poor.

Secondly, design and layout is incredibly important, again to make people trust the quality of your brand but also to make me trust that the people who made the site knew what they were doing. The overall design is so bland, there should be no room for mistakes - it is hardly at the cutting edge of complicated design. Some of the images below show various parts of the site that have not rendered properly. The fact these bugs still exist is a testimony to either a very poor lack of testing or a lack of a system to enable them to be fixed.












Let us now turn our attention to the search and or population of the "Popular" lists in the menu. I chose (randomly) Sport and Leisure -> Popular -> Car Covers and guess what I saw? This:

How very trendy! A page that tries to be cool by saying, "Hmmm" but even worse, it's telling me that my search term was not correct. There should be nothing in the menu that returns no results and if that is not possible to fix, the page should be less "Look what you did wrong" and more "Looks like there isn't anything here any more..."

So then, I turned my attention to the "Secured by Argos" badge. I feel REALLY strongly against companies doing this (and Argos aren't the only ones). The point of a badge is to represent something official, something audited or awarded, not just the fact that Argos want their site to be secure. This badge means nothing at best but at worst, it misleads the customers to believing that it does mean something. Things like "Verified by Visa" are OK in that they are external organisations that can describe exactly what they do to protect you, although they are fairly weak in terms of real protection for a web site. Argos should really have something much more meaningful like one of the Safe Site seals you can buy from various security companies.

But, it continues. Let us click through and see the (poorly rendered) Secure Online Shopping page. Interestingly, argos don't seem to like using page titles correctly, which makes their site difficult for Google and other search engines to correctly index. If you click on Nike, for instance, it's actually a search and the title of the page you go to is "Results for NIKE" not terrible but sloppy. Anyway...

"A padlock will be visible in the bottom right of your web browser, this means you are in secure mode – this should always be when entering personal or payment details."

NO, NO, NO. A padlock that the web site has drawn in the bottom right of the screen means exactly nothing! It is misleading and it is wrong. The icon for security is the padlock in the browser URL bar, which means the TLS connection is in place and verified. You might as well say that a picture of a pink banana means you are in secure mode. It is a non-standard, non-transferrable piece of information that does not increase web security. For goodness sake, teach people about the actual technology, not just your intention to be secure.

"Being in secure mode means all your details are encrypted for increased security (256 bit)"

Not if your TLS is not correctly set up it doesn't. It also only refers to details in transit, not the data stored in your databases, which may or may not be encrypted. Again, it sounds like you are trying to be user friendly but are being misleading in the process.

"We use advanced encryption so that your card details cannot be seen"

Similar to above, this is a half-truth.

"We use the Secure Socket Layer (SSL) for all orders placed – this prevents you from inadvertently revealing any personal information"

You actually use TLS, not SSL. TLS does not prevent you revealing any personal information at all. There are lots of ways that can occur and only one of these (transmission) is protected by TLS.

"We use your card details to process your orders and we will ask for them for every order placed"

This is not clear. Are you saying that you do not STORE them, which is why you ask for them each time? If so, say that, otherwise it is unclear why this is a good thing.

"The three digit card signature code will be required to place an order – this ensures that your card cannot be misused"

Really? How? Again, it sounds like you are implying something that is not clear in the statement. It also does not prevent, for instance, your child from using your card to order something, in which case it doesn't ensure that your card cannot be misused.

The recommendation block is about the only thing that is almost true.

Moving down to the browser recommendations, I'm not sure which version of Firefox you are talking about because under my security tab, I don't have any section for warning messages. Also, your definition of old browser should really read "Ancient". There is no reason for anyone to be using a browser older than IE8, which is available on Windows XP. IE5? That was superseded in 2001 and even IE6 is woefully insecure for modern browsing. This whole thing smells of a quick dirty attempt to put a security page together rather than an informed and well-thought out page. Sadly, that is par for the course on most web sites.

It is unclear why you are only offering a link to Internet Explorer, especially since not all operating systems can download and use it. Why not offer a browser choice link instead, so people can choose the browser they want to use? A site like http://www.browserchoice.eu/browserchoice/browserchoice_en.htm

Conclusion?

If this site was built by a small company with limited expertise, most of the issues wouldn't surprise me too much, although I would definitely expect the usability issues to be fixed by now. The performance problems should only be a problem if you don't have the money to scale up the web server farm. For a large company like Argos however, most of these issues should never have been made in the first place.

Misleading language and non-standard security guidance is unforgiveable. The fact that your presumed expertise has not been able to resolve the technical and layout problems can only lead me to deduce that either the Argos management have had too many fingers in the design process and have ended up producing a very poor product (this is very common in the software world) or the whole thing has simply been managed so badly that the result is exactly what is expected: A site that is just about usable but which does not fill the customer with trust.

5 out of 10

Monday, 10 November 2014

Why shouldn't you use SELECT * FROM TABLE

Software development, like database development is a strange discipline. They usually start teaching you with simple things like SELECT * and then spend the next 10 years telling you not to use it. Likewise in software engineering, you get taught quick and dirty coding and then told to code properly!

I'm sure this is to try and get people productive more quickly but it doesn't serve the industry well. A lot of people don't even know why they shouldn't use SELECT *.

Why you WOULD use it: The only reason you would use SELECT * FROM TABLE is because it is quick (to write) and easy. (You might use it if your database table is dynamically modified but this is not common)

There are two main reasons why you SHOULD NOT use SELECT * FROM TABLE.

ONE reason, which might or might not be significant is that pulling the data from all the columns of a table, if you are not using them, wastes network and disk bandwidth. This doesn't just mean the 'phone line', which might easily support the bandwidth but on a larger system, there might be several (or more) web servers all using a single database server via a single network card and one or more network switches. This card is probably also transmitting lots of other network data so hitting it with unnecessary data is not good on principle and does not scale.

The SECOND, more important reason is related to the way that databases index data to improve performance. Imagine you have a table with 50 columns and 10 million rows and you use SELECT * FROM...WHERE even though you only need 5 of those columns in your result set. The entire table needs to be loaded into memory, possibly in blocks, a load of row or table locks might be required while this is happening and only after the entire table has been scanned, can the result set be returned. Clearly, some tables are several GB in size - this is not a good option. By adding an index to the table, that only contains the 5 columns you are using (including the column(s) in the WHERE clause), only the index needs to be loaded into memory, which is probably considerably smaller and therefore faster, and depending on the index, if it is ordered by the column(s) in the WHERE clause, the database might be able to access the index directly rather than scanning the whole thing.

The differences here can literally be hundreds or thousands of times quicker than simply table scanning but this optimisation cannot happen if you use SELECT *.

A fair question is, "what if my table is really small, why bother?". Firstly, it is not a lot of extra work to only select columns you need and secondly, it is a good habit to get into. There is no obvious line between when it will improve performance and when not but selecting specific columns will always be the same or quicker than select *.

"What if I am selecting the same number of columns as my table has, I might as well use SELECT * right?". The answer is the same as above but there are other advantages to using named columns, even when you are selecting all of them. Firstly, if you use SELECT * and the table has other columns added, your statement will slow down. Unless you test every statement after the modification (I'll assume you don't) then you don't know what you might make unusable after modifying the table. Another advantage is much clearly visibility of what you are selecting and the ability to rename columns you are selecting to make their names more meaningful to the code that has called the SELECT statement. If you are joining two tables that both have a rowid, SELECT * will fail anyway but what if one is called rowid and the other row_id? This is hidden inside SELECT * but could lead to coding errors if the wrong one is accidentally used.

Another question is, "Why not use indexes for everything then?". There is a trade off if the number of indexes starts getting high. What needs to happen is that when you insert or update, some or all of the indexes will need to be updated as well as the table itself. This performance hit will obviously vary depending on the size of the indexes, how much they are fragmented and how many of them there are as well as which columns are indexed and where the new data is added/updated. Generally speaking, data that is massively read-only and only occasionally modified suits indexing very well. The more updates/inserts you do, the more impact will be noted. You might easily have 10 or more indexes on a table with no real performance problems.

To be honest, if you get to the point where your indexes are starting to slow your system down, you are probably in the area where you need to employ some experienced DBAs to look at your specific requirements and suggest either a design change or potentially even some advanced techniques like sharding and replication.

You don't need an index per-select statement. If you have two selects one that selects columns A, B and C and another that selects A, B, C and D then you could simply have a single index on all 4 columns, that both select statements would automatically use.

Another gotcha is two select statements. One that selects A, B, C and D and another that selects C, D, E and F. One index or two? In this situation, it would depend on the size of the unwanted columns in the index as to whether these could simply become a single index. If E and F are large VARCHAR or TEXT columns, then there is not necessarily much performance benefit to the first SELECT by sharing an index. If, however, all 6 columns are fairly small types, then a single index might be acceptable.

One thing I've learned about databases is that they don't always behave as expected so test it! There are various tools that can create dummy data for a database table to give you, say, 10 million rows, which you can then experiment with. The closer to the real usage you can get, the better, but any reasonable testing is better than nothing.

One example of an optimisation I managed on a database was decreasing the time to run a "Lookup address" stored procedure from 25-30 seconds down to less than 1/4 of a second! (clearly it should never have taken 30 seconds, if it was written properly but hey!)

Tuesday, 4 November 2014

Tightening up your Azure Cloud Service SSL Settings

The Background

A lot has happened in the world of SSL in the last 12 months and even though our site started with a nice A grade on the wonderful SSL Labs test site, it dipped recently to a paltry C!

Firstly, when Heartbleed happened, us Microsofties breathed a sigh of relief that it didn't affect Windows schannel.dll but two others were of concern, the BEAST attack and the more recent POODLE.

The BEAST attack was a vulnerability in the CBC mode used by a lot of SSL ciphers and even though RC4 didn't have the vulnerability, there were concerns that other weaknesses in RC4 might mean that it is still better to use CBC. Some fixes were then applied to various components by people like MS and Apple and although TLS 1.1 and 1.2 are not prone to BEAST, they are also less widely adopted by browsers!

Anyway, BEAST has kind of died away although POODLE is an attack against SSL v3 and is a reasonably small attack to make (256 requests = 1 byte of information revealed). Although most browsers support TLS 1.0 (the next version above SSLv3), since many/most servers still support SSLv3 from the olden days, not only is the weakness still present for users of older browsers (IE8 on Windows XP for example) but is also allows a downgrade attack, where a man-in-the-middle could negotiate the use of SSLv3 between a newer browser and the server, then allowing it to record and decrypt traffic (potentially). The only way to avoid this is to completely disable SSLv3 and that is what SSL labs is recommending and what Google is planning to ditch in newer versions of Chrome.

The bottom line is that people with old browsers will need to upgrade their browser and possibly their OS to continue using these secured sites but also, if there are old servers that still support SSLv3 and older ciphers, it is about time they updated and upgraded.

The Plan

So the plan is simple. Disable all versions of SSL in my Azure Cloud Service, disable older weaker ciphers such as DES, 3DES, RC2, RC4 and disable the by now soon-to-be-dead MD5 hashing algorithm. This leaves me with AES encryption (128 and 256 bit), CBC and GCM counter modes and 3 sizes of the SHA hashing algorithm. According to https://wiki.mozilla.org/Security/Server_Side_TLS this should be enough to support a medium range of browsers - i.e. not just all the latest and greatest but browsers back to IE8 on Windows 7, for instance. I have since tested the setup on SSL labs and it seems to be happy.

The Solution

How to do it? Well, you can run a startup script on your Azure cloud service but the problem is that if you reboot at the end (which you need to do for the settings to take effect), then when the role restarts, it will run the script again and reboot, which means you need some kind of flag to control how it reboots (only do it once). Fortunately, I found a very helpful script on the MS site that basically does this (although not everything I needed to do). For reference, the original is here.

Their trick is that it checks registry values before setting them and only rebooting if they have changed. This way, when the role reboots and runs the script again, nothing will have changed and the role won't reboot. The problem is that their script only disables SSLv3, it does nothing with removing weak ciphers or hashes but I had the bones of what I needed in their script.

I tried simply adding some more registry entries for ciphers that need to be disabled (like "RC4 40/128"), which is when I found out why this was not very easy to do! The existing script uses the generic new-item function to create the registry keys but Powershell, being very helpful, changes the forward slash in the name to a path character, effectively attempting to create RC4 40 and then a sub key of 128 - it fails!

What I had to do was rework it to use actual RegistryKey objects so I could call things like "CreateSubKey()" which don't change the slash to a path character.

The script is below but some notes are in order!
  1. The script is large based on the MS one but with some additions
  2. I have not used the original suggested cipher suites (although in the original script, this wasn't used anyway) but have used the Mozilla wiki to stick to ECDHE where possible.
  3. I have included some cipher suites which are not enabled by default (the ones with the curve number on the end like P256). However, I don't know how to enable these so they don't appear in the list of supported ciphers on the SSL labs test, if anyone knows, let me know.
  4. By all means modify the script if you want to keep some of the ciphers I have removed
  5. If you save the script as ciphers.ps1 file, make sure its properties say to copy it to the output and you can call it from the startup.cmd file just with PowerShell -ExecutionPolicy Unrestricted .\ciphers.ps1 >> "%TEMP%\StartupLog.txt" 2>&1
# from http://azure.microsoft.com/blog/2014/10/19/how-to-disable-ssl-3-0-in-azure-websites-roles-and-virtual-machines/


$nl = [Environment]::NewLine
$regkeys = @(
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Client",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.0\Server",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.1",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.1\Client",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.1\Server",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Client",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Server",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Client",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Server",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 3.0",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 3.0\Client",
"HKLM:\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 3.0\Server",
"HKLM:\SOFTWARE\Policies\Microsoft\Cryptography\Configuration\SSL\00010002"
)
# Cipher order as per the original script
# $cipherorder = "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256,"
# $cipherorder += "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256,"
# $cipherorder += "TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,"
# $cipherorder += "TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,"
# $cipherorder += "TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_RC4_128_MD5"

# Cipher order as per Mozilla: https://wiki.mozilla.org/Security/Server_Side_TLS (Intermediate set - as mapped to Windows names)
$cipherorder = "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256_P521,"
$cipherorder += "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384_P521,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P521,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P384,"
$cipherorder += "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256_P521,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P521,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_P521,"
$cipherorder += "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P521,"
$cipherorder += "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384_P521,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P521,"
$cipherorder += "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA_P521,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,"
$cipherorder += "TLS_RSA_WITH_AES_256_CBC_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA"


# If any settings are changed, this will change to $True and the server will reboot
$reboot = $False

Function Set-CryptoSetting {
  param (
    $keyindex,
    $value,
    $valuedata,
    $valuetype,
    $restart
  )

  # Check for existence of registry key, and create if it does not exist
  If (!(Test-Path -Path $regkeys[$keyindex])) {
 Write-Host "Creating key: " + $regkeys[$keyindex] + "$nl"
    New-Item $regkeys[$keyindex] | Out-Null
  }

  # Get data of registry value, or null if it does not exist
  $val = (Get-ItemProperty -Path $regkeys[$keyindex] -Name $value -ErrorAction SilentlyContinue).$value

  If ($val -eq $null) {
    # Value does not exist - create and set to desired value
 Write-Host "Value " + $regkeys[$keyindex] + "\$value does not exist, creating...$nl"
    New-ItemProperty -Path $regkeys[$keyindex] -Name $value -Value $valuedata -PropertyType $valuetype | Out-Null
    $restart = $True
  } Else {
    # Value does exist - if not equal to desired value, change it
    If ($val -ne $valuedata) {
  Write-Host "Value " + $regkeys[$keyindex] + "\$value not correct, setting it$nl"
  Set-ItemProperty -Path $regkeys[$keyindex] -Name $value -Value $valuedata
  $restart = $True
    }
 Else
 {
  Write-Host "Value " + $regkeys[$keyindex] + "\$value already set correctly$nl"
 }
  }
  return $restart
}

# Special function that can handle keys that have a forward slash in them. Powershell changes the forward slash
# to a backslash in any function that takes a path.
Function Set-CryptoKey {
 param (
  $parent,
  $childkey,
  $value,
  $valuedata,
  $valuetype,
  $restart
 )

 $child = $parent.OpenSubKey($childkey, $true);

 If ($child -eq $null) {
  # Need to create child key
  $child = $parent.CreateSubKey($childkey);
 }

 # Get data of registry value, or null if it does not exist
   $val = $child.GetValue($value);

   If ($val -eq $null) {
  # Value does not exist - create and set to desired value
  Write-Host "Value $child\$value does not exist, creating...$nl"
  $child.SetValue($value, $valuedata, $valuetype);
  $restart = $True
   } Else {
  # Value does exist - if not equal to desired value, change it
  If ($val -ne $valuedata) {
   Write-Host "Value $child\$value not correct, setting it$nl"
   $child.SetValue($value, $valuedata, $valuetype);
   $restart = $True
  }
  Else
  {
   Write-Host "Value $child\$value already set correctly$nl"
  }
   }

 return $restart
}

# Check for existence of parent registry keys (SSL 2.0 and SSL 3.0), and create if they do not exist
For ($i = 9; $i -le 12; $i = $i + 3) {
  If (!(Test-Path -Path $regkeys[$i])) {
    New-Item $regkeys[$i] | Out-Null
  }
}

# Ensure SSL 2.0 disabled for client
$reboot = Set-CryptoSetting 10 DisabledByDefault 1 DWord $reboot

# Ensure SSL 2.0 disabled for server
$reboot = Set-CryptoSetting 11 Enabled 0 DWord $reboot

# Ensure SSL 3.0 disabled for client
$reboot = Set-CryptoSetting 13 DisabledByDefault 1 DWord $reboot

# Ensure SSL 3.0 disabled for server
$reboot = Set-CryptoSetting 14 Enabled 0 DWord $reboot

# Set cipher priority
$reboot = Set-CryptoSetting 15 Functions $cipherorder String $reboot

# We have to do something special with these keys if they contain a forward-slash since
# Powershell converts the forward slash to a backslash and it screws up the creation of the key!
#
# Just create these parent level keys first
$cipherskey = (get-item HKLM:\).OpenSubKey("SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers",$true)
If ($cipherskey -eq $null) {
 $cipherskey = (get-item HKLM:\).CreateSubKey("SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers")
}

$hasheskey = (get-item HKLM:\).OpenSubKey("SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Hashes",$true)
 If ($hasheskey -eq $null) {
 $hasheskey = (get-item HKLM:\).CreateSubKey("SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Hashes")
}

# Then add sub keys using a different function
# Disable RC4, DES, EXPORT, eNULL, aNULL, PSK and aECDH
$reboot = Set-CryptoKey $cipherskey "RC4 128/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "Triple DES 168" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC2 128/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC4 64/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC4 56/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC2 56/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "DES 56" Enabled 0 DWord $reboot  # It's not clear whether the key is DES 56 or DES 56/56
$reboot = Set-CryptoKey $cipherskey "DES 56/56" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC4 40/128" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $cipherskey "RC2 40/128" Enabled 0 DWord $reboot

# Disable MD5, enable SHA (which should be by default)
$reboot = Set-CryptoKey $hasheskey "MD5" Enabled 0 DWord $reboot
$reboot = Set-CryptoKey $hasheskey "SHA" Enabled 0xFFFFFFFF DWord $reboot

$cipherskey.Close();
$hasheskey.Close();


# If any settings were changed, reboot
If ($reboot) {
  Write-Host "Rebooting now..."
  shutdown.exe /r /t 5 /c "Crypto settings changed" /f /d p:2:4
}

Monday, 3 November 2014

Newbie Web Programmers - what you HAVE to know about security

Clearly, programming is a journey of experience. Having lots of experience does not mean you stop making mistakes but hopefully, they become fewer and inject less risk. The problem for new programmers and the general public who consume the output of these programmers is that there is no level, no bar, which is required before that programmer can publish stuff on the web, so even for web sites used by a good number of people, it is possible that the data on those sites is completely insecure, even for an entry-level hacker.

What do we do? I would like to hope that most programming courses would cover security, although whether they all do it properly is anyone's guess. Also, many people either despise the idea of spending money to improve their own skills or perhaps they don't have any access to structured training - and even if this training exists, how do we find out about it? How do we know if it is any good? We only know once we have learned enough to look back on it.

I think, however, there are a handful of basic principles that a new programmer needs to understand in order to produce something that should be reasonably secure from the beginning. This is my attempt at setting those principles:
  1. A web application is hosted on a server. Servers can have vulnerabilities in various ways such as services running on other ports, weak web server configuration and vulnerabilities in software modules such as we saw in openssl. If you are not experienced in setting up servers securely do NOT do it. Use a hosting/cloud company who can manage all of that for you and allow you just to upload your web application. Sure, it costs some money but it is reasonably cheap and for any site you create, you will still need to pay for the relevant DNS/IP addresses etc. anyway.
  2. Use a modern well-featured framework. Do not think that writing applications in raw HTML  or raw PHP etc. is a useful learning tool. If you want to play around with stuff, by all means write files that you test locally but always use a framework for published apps. There are several for many languages and they provide a lot of security by default. Learn how to do things properly in the frameworks to ensure you keep this security.
  3. As soon as you need to deal with any user information, you MUST learn about data security, hashing and encryption choices. These services are available in most frameworks, but, you must use them correctly to ensure you do not compromise the data security. NEVER invent your own security protocols/encryption/hashing mechanisms unless you are working somewhere that does that kind of thing. If you can think of it and someone hasn't done it yet, it's probably because it wouldn't work.
  4. Database injection is a very common weakness in poorly written web applications. You absolutely must NOT trust any data received from the client (browser) and you must NOT rely on browser validation of this data. All input MUST be validated on the server in some way and in some cases, this data can arrive in multiple ways (GET, POST, query string etc.) so sometimes the good data can be overwritten with bad data. Validate input using whitelists where possible (field x can only be 1-9 or field y must be an email address). Use regular expressions to validate complex data and where you accept free-form data, you MUST encode the data before storing it (to avoid injection attacks) and ensure it is suitably encoded in the browser for the same reasons.
  5. If you need to use passwords, you instantly create a honey pot for an attacker who will assume those same passwords are used on other sites. Best practices include giving the user an indication of password strength, having some basic minimum requirements, checking the password against known common/bad passwords (like Password123), using slow hash algorithms such as PBKDF2 or bcrypt, using per-user salt to slow down attackers and consider symmetrical encryption on other personal data. Imagine someone stole your users table from the database, would you be comfortable with the data they could see or easily crack?
  6. I highly recommend spending some time learning about web application security. If you won't pay money for formal training, then you will need to read lots of web articles (make sure they are newer ~ last 3-4 years) before you can make sense of things. owasp.org is a great resource for all things web-security related although the site can seem a little unfriendly. Read and learn about the owasp Top 10, that is a great place to start to get the feel for what is important.
Never write things like I saw on Stack Overflow the other day:

var query = "SELECT * FROM CATEGORIES WHERE ID=" + catField.Text + ";";

If you ever do that, you deserve to go to prison!