Monday, 28 April 2014

What Developers could learn from the Airlines

I have been getting really into the Air Crash Investigation programs on the TV. I like the technical detail and the suspense of an investigation, followed by a conclusion which often points at a chain of errors, many of which are not serious by themself but which, when added together, equals a crash and possibly many deaths.

Most of us wouldn't stop to question the importance of airline safety, being several thousand metres in the air is not a natural position for humans and anything that can make 100 tonne airplane lose control is probably going to end badly. How do airlines maintain this safety? With two things that I think could save much egg on Developer faces, 1) Checklists and 2) A continuous learning experience.

Checklists are the bane of most Developers lives. However experienced we are, we are almost pathologically opposed to paperwork and process. "I don't need a checklist, I know how to code". This is because we misunderstand the point of process and are often not taught its place, especially if we are self-taught, in which case we probably wouldn't even have considered it. Looking again at a pilot's experience, checklists are used for all manner of things. Pre-start checklist, start checklist, post-start checklist, taxi checklist, before takeoff, after takeoff, climb etc. Now, it is very often the case that a pilot is a pilot for their entire career, in other words, many people flying your planes are very experienced, possibly accruing more than 20,000 hours of flying (that would be 833 continuous days!) but they still use checklists. A Captain doesn't argue with their boss about the fact that their experience negates the need for checklists. Checklists are different for each aircraft and they are updated as issues or risks appear so it is in everyone's interest that they are followed to the letter. Perhaps we just don't take our craft seriously enough. So a web site breaks, who cares? We can fix it quickly, worst-case is a little inconvenience, a minor loss of face. It is unlikely that anyone is going to die. The problem with that view, and it appears to be very common from the stories I read online, is that it pervades the whole discipline of development. If I don't care that much, I won't care that much about security or useability or data protection or the fact that people might be relying on my service and it should work properly! A checklist is a very simple beast that mitigates the fact that people forget or they have a bad day or they're tired. It mitigates the fact that information changes and best-practices update and it seems rather foolish to wait till we personally make a mistake before we make allowances for it, rather than allowing the industry to experience problems in one place and let everyone else learn from that.

This brings me onto the second point. The airlines have a very established continuous learning methodology. The air transportation safety departments around the world have an established (and thankfully unchallenged) idea that an issue should only ever happen once to make changes. "What went wrong, why and how can we avoid it in the future". The industry can often make quite large demands on aircraft manufacturers or operators to retro-fit a new device or update a part to take into account something that went wrong and caused a crash or even a near-miss. In Software, I would suggest our corporate ability to do the same is lamentably bad. How many sites have lost data due to a poorly set username/password combo? 100s? 1000s? How many sites have used a very poor hash to store passwords and effectively given them to an attacker for free? 100s? 1000s? more? One of the problems with the world of software engineering is that there is no coherence. No mandatory training, no worldwide certification that says, "I can write software correctly". No laws anywhere, as far as I know, even require a software process. How can you learn from someone elses mistake when you have no process to modifying accordingly?

The answer is not straight-forward. Developers are funny beasts. Some are extremely quiet, shy and conformist, others are wild crazy hippies who can't do what they're told. How do you find agreement across an industry that takes 10 years even to approve HTML5 (and it still didn't really end up nailed down). What would the ultimate intention even be? Software more than any other commodity is cross-border and will only adhere to local laws, if there are any. If you get a high price from a Western country with lots of regulation, you can easily buy the same software from a country with less regulation and therefore less overheads. Naturally, this would be like buying cheap knock-off aircraft parts from EBay, but again, the risks of poor software engineering are still not really appreciated so a customer will see a company that looks good on the web and order something, even if it's poorly written and insecure.

Schemes like ISO27001 are designed to demonstrate a quality management process for information security but like many ISO documents aim to be all things to all people so that they become the lowest common denominator, which in document terms means very wooly abstract documents which can be ticked-off with a process whether or not the process is actually much good at producing quality. The same was true of the older ISO9001, which I saw several companies achieve by box ticking (and spending lots of money) without the culture of the company being very quality-oriented. Sadly, they set out to achieve something that is of limited use at best and in a way that is all but out of reach for many smaller companies.

What we really need is something specifically geared towards software development. An ISO that can be recognised worldwide but is more rooted in specifics. Perhaps there would be different documents for web applications than desktop applications but the intention would be the same. "You will have a development process", "Your process is reviewed regularly and updated in line with mistakes, both internal and external", "You have a code review process that ensures at minimum the following issues, where relevant to your code, are checked". Since I believe in continuous iteration, we could start with a very minimal implementation and increase it to include more factors as people use it and get used to it. To make it useable for smaller companies, you could self-certify compliance which would demonstrate willing and with a suitable insurance policy, could help customers know that a supplier is worth using. If you are self-certified, perhaps a larger company buying your product could pay to have an auditor sign-off on your certification.

As with any idea like this, thought, who starts it? Who would people listen to? Which big name, famous, respected person could say to everyone, "Let's do this and make it work?". The answer is, I don't know. Sigh.

Tuesday, 22 April 2014

Stray carriage return/line feed causes hotmail to mangle hyperlink

Headline: Don't put carriage return/linefeeds/newlines in your href attributes!

The Details


Our system sends out HTML emails to users with a few images and some links. Image my surprise when one of these emails had a link that appeared to be broken by Hotmail and no obvious Google search results as to what had happened.

The test link in the email was sent as href='https://127.0.0.1:446/something' and this was also the display text. When the email arrived, it looked like this:



But when hovering over the link, the status bar showed: https://%20%20https://127.0.0.1:446/something which naturally was a broken link. The %20 is a space character encoded, which was strange, especially as the display text looked correct but on closer examination of the email source, I noticed a lot of =0D=0A sequences, which are encoded carriage return/line feed pairs. I expected a few of these since the HTML is built up in code using a StringBuilder and various calls to Append and AppendLine but it then became apparent that we were using AppendLine while generating each part of this email link, which produced something like:

<a href='
https://127.0.0.1:446/something
?querystring
'>
https://127.0.0.1:446/something
?querystring
</a>

Which is possibly illegal HTML, certainly unusual. Since CR/LF are ignored in display text in HTML, the text was displayed correctly but the whitespace in the href attribute obviously confused something and made the link get mangled.

The moral? Generate your links properly!

Tuesday, 15 April 2014

Dynamic IP Security (dynamicIpSecurity) on Azure

Introduction

When trying to make a site resilient against denial-of-service attacks, you are limited in what you can do but one useful method is to block ip addresses temporarily if they are deemed to be making too many requests to your site either within a certain time frame or attempting too many concurrent requests against the same server. This might be malicious or accidental but either way it is useful to protected against this.

Downloading/Installing

Enter IIS Dynamic IP Security module from Microsoft, introduced in IIS8 by default but available in IIS7 and 7.5 as a downloadable extension.

What Does It Do?

The theory is easy. Configure how many requests can be made within a given time frame. Any more made in this time can return one of a number of error codes or simply drop the connection. You can also setup a maximum number of concurrent requests, which if exceeded returns the same error. You can choose one or the other or both and they live in the configuration under system.webserver/security/dynamicIpSecurity with some other options set under system.webserver/security/ipSecurity

Azure vs IIS

The problem with most of the help online is that it assumes you are directly accessing IIS on a server rather than deploying Azure web roles, where much of this is automated and to do it manually each time would be a pain. The steps required for Azure are:
  • Create a startup file in your role project (not in the cloud project), called whatever you like but startup.cmd is pretty standard. Click the properties for this file and ensure that "Copy to Output Directory" is set to Copy always or copy if newer.
  • Modify your ServiceDefinition.csdef file in your cloud project and add a Startup element under the WebRole for the role you are trying to setup. It can look like this:
  • <startup priority="-2">
        <task commandline="startup.cmd" executioncontext="elevated" tasktype="simple"></task>
    </startup>
  • It is important that the executioncontext is set to elevated otherwise the script will not be permitted to change the applicationHost.config file on the cloud server, which is where these settings end up.
  • Modify your startup.cmd and add in the calls (below) to command-line executables that will make the updates for you to the IIS configuration. We will use two commands, one will ensure the dynamic ip restrictions module is installed into IIS and the other (AppCmd) can be used to update the configuration files with the relevant sections.
  • NOTE: This instruction is for IIS which runs on Server 2012 and which includes a powershell cmdlet to install the IP restrictions module in IIS. I don't know if this is included in previous versions of Windows Server, and I've seen comments that a different script is required to install the (optional) module in IIS 7 and 7.5. The first command to add to startup.cmd is: PowerShell Install-WindowsFeature -Name Web-IP-Security
  • This feature includes both the static IP restriction functionality and the dynamic features. 
  • These next 3 stages are optional but sounds useful to me. It instructs IIS to do a little more work when working out the source ip address and not just looking at the request header which might contain a proxy ip address. This is handled in the ipSecurity section which must first be unlocked (it might already be but we do this just in case): %windir%\system32\inetsrv\AppCmd.exe unlock config /section:system.webServer/security/ipSecurity
  • Secondly, we ensure that an ipSecurity element is created (this will not overwrite the element if it is already there): %windir%\system32\inetsrv\appcmd.exe set config -section:system.webServer/security/ipSecurity /~ /commit:apphost
  • Thirdly, we add the child attributes we are interested in: %windir%\system32\inetsrv\appcmd.exe set config -section:system.webServer/security/ipSecurity /allowUnlisted:"true" /enableProxyMode:"true" /commit:apphost
  • The next stages are to enable the dynamic ip security following a similar idea but this example is done in a slightly different way (all merged into one). Firstly, the section unlock: %windir%\system32\inetsrv\AppCmd.exe unlock config /section:system.webServer/security/dynamicIpSecurity
  • And then add the element and it's children in one go: %windir%\system32\inetsrv\appcmd.exe set config -section:system.webServer/security/dynamicIpSecurity /denyByRequestRate.enabled:"True" /denyByRequestRate.maxRequests:"40" /denyByRequestRate.requestIntervalInMilliseconds:"5000" /denyByConcurrentRequests.enabled:"True" /denyByConcurrentRequests.maxConcurrentRequests:"10" /commit:apphost
  • Although you might prefer one method or the other, I've shown both because these are examples I found that work so I don't want to break it.
  • You can now deploy ready to test.

Testing

The obvious way to do testing is to set your limits quite low so they can easily be exceeded. I started with max requests of 10 every 5 seconds and max concurrent requests at 5. Once this is done, all you have to do is run some proxy tool like Fiddler to make the response codes obvious and then go to your site and start hitting lots of ctrl-F5s (or whatever does a force refresh of a page) and you should see the first requests return normally and then a bunch of 403s (the default code for failed IP addresses). After your timeout, it should start returning pages again but then block them. Note that you can increase the timeout and multiply your  requests (100 in 100 seconds instead of 10 in 10) which might be good in terms of making it harder for attackers to blast your site (it blocks them for longer) but this will also affect people who accidentally go over the limit and also increase the memory required for IIS to keep track of these requests so go carefully. Usually something is better than nothing - it doesn't have to be perfect.

Troubleshooting

This has become a mandatory heading for Azure articles!

Remote Desktop is kind of essential for debugging these types of issues. The questions are:
1) Is the module installed
2) Is the config correctly configured

Is the module installed?

This should have been taken care of by the first entry in your startup.cmd but before you try and install it via remote desktop, you can probably work out whether your startup .cmd has run correctly. Firstly, open a PowerShell command prompt and enter Get-WindowsFeature which will take a while and then display the list of available and installed windows features. Scroll up to near the top and ensure that the module called Ip and Domain Restrictions is shown as installed under Web Server (IIS) -> Web Server -> Security. If it is NOT, then either your startup.cmd did not run or it ran but your command was not correct. If it is installed move onto the next section "is the config correctly configured".

Copy the text that installs the module from your startup.cmd and paste it into the PowerShell terminal and press enter.  If this succeeds then either your startup.cmd is failing before it gets to this command (try pasting in the previous command in order, one at a time) or your startup.cmd is simply not running at all. Also, possibly, you have got confused between the server that is being deployed to and the server you are testing. If pasting this into PowerShell does NOT work, then you should get either an error due to something you need to fix on the box (unlikely), or your command is not typed correctly and you need to fix it in startup.cmd.

NOTE: If for some reason, your startup.cmd didn't run either at all or if it failed part-way through, you should re-deploy your project after fixing it otherwise the AppCmd entries will not have been run and will not have completed the installation.

Is the Config Correctly Configured?

If you didn't change the example above, the configuration will be written to d:\windows\system32\inetsrv\ApplicationHost.config. In my case, the dynamic element was right at the bottom and the ipsecurity element was near the top but if you open that file and search for those elements first then ask did one or both get written and do they appear to be OK. If one or both are missing then the section might not have been unlocked and/or the command to add the elements might be wrong or might have failed. Copy your startup.cmd commands that use appcmd.exe one at a time into powershell and replace %windir% with d:\windows. At each point, check that you don't get an error and that the command runs correctly.

If it very unlikely that your startup.cmd ran OK and the commands don't cause an error because otherwise they would have added the configuration. The most likely problem if you don't get errors is that the startup.cmd failed mid-way through and didn't run your AppCmd entries in the first place.

If the entries were added but they look incorrect, then you have probably just mis-typed something in the command, take a close look and make sure they look correct. If you are not sure what they should look like, use IIS on your server to setup the values that you want to use and then open applicationHost.config to see what IIS wrote to file.