Wednesday, 17 September 2014

Building a Scalable Application in .Net on Windows Azure

Obviously there are many ways to build web apps and many patterns to follow but the thing with scaling is that you have to design your app to scale before you try and scale it otherwise the chances are you will have to change a lot of code or even re-design your application from scratch.

Hopefully the following simple steps will spell out the basics of using Windows Azure Cloud Services (Platform as a Service) and .Net to design your app architecture to scale.

Step 1 - Databases are HARD to scale. Sure you can add more and more data to a database but the more throughput you need, the more memory will be used, the more network bandwidth will be required and generally the slower everything will run. Databases are relatively expensive and scaling them across servers with replication etc. is difficult, expensive, error-prone and is basically best avoided. So what? Keep your database as a database. Do not perform computationally expensive operations in the database. Sure you can crunch data in a stored procedure but you're just eating up valuable CPU cycles, get it out of the database and do the work in a web service or web application instead.

Step 2 - Share your session. Windows Azure does a lot of the heavy lifting when it comes to scaling. You don't need to worry at all about load balancing, you simply increase the numbers of instances of your web application and Azure load balances them for you. This does mean that the same user accessing the same application might not always hit the same physical web server and that means you cannot keep session in memory - it will not be present on the other machines. You could use a database but that tends to be slower (and see step 1). Azure provides a mechanism to create cache roles (you can have as many of these as you need) and I have taken their suggestion to use at least 3. This allows 1 to die while another might be rebooting and the data is still safe. These are configured in the Azure cloud project settings. You access session in your .Net code in exactly the same way but you configure your session state to use a class called DistributedCacheSessionStateStoreProvider (from Microsoft), which is designed to invisibly work with your cache roles to store your session data. This session has a timeout like any other session and it does pretty much just work.

Step 3 - Because your web application is shared across multiple web servers, you should consider whether this is a problem for your workflow. All of your data should be in the (shared) database or the session so you should not get any other problems as long as you avoid doing crazy stuff in code to store local copies of things. In most instances, you do not have to write your application code any differently just because it is load balanced. You should set a common machine key in your web.config though to ensure that the same value is used for encrypting viewstates and the like.

Step 4 - Do anything resource intensive in the web application. These are the parts that are multiplied when scaling up so it makes sense to do the hard stuff here, not in the database or anything other shared service.

Step 5 - Use shared storage. Whereas you might normally store files or objects in the local file system, this is obviously not usually possible in a scaled environment since only one instance of the web application can access its local file system. Most (all?) cloud providers, including Azure, provide storage services that are accessed via URL. You can upload/retrieve/delete these objects in the usual way and they can be accessed either from code using one of the Azure helper dlls or directly via URL (for instance, your web pages can directly embed the links to images so that the request can easily go via a CDN or directly to the storage service without needing to access your web application on the way). These services are usually pretty cheap and the CDN functionality that many provide can also improve the performance of your web application by serving content from locations close to the end-user.

Step 6 - Start simple and add functionality. Some bugs are hard to track down and fix so be sensible! Start with a small basic app and add the features, one at a time, and make sure they work before adding another one. This way, you can sort out issues as they arise rather than peeling through several layers of unknowns to find out what's wrong.

Be warned that there are different versions of Azure floating around, especially in online examples. Something that maybe used to work 3 years ago might not work any more because it is deprecated in a new library. Other class names were re-used but did different things just to add to the confusion. Stack Overflow should help you find the answers you need!

Monday, 15 September 2014

Learning from our mistakes? Not a chance!

There is a very famous saying that we, "learn from our mistakes". What a cute pithy saying, which is of course completely untrue. How many times have we got drunk after promising we would not do it again? How many times have we stopped exercising even though we have already learned that exercise is good? How many times have I copied and pasted code including the bits that I should have changed in the process but didn't? Loads of times!

Every few weeks, we read another story in the paper about some kind of abuse scandal or corruption probe or another case of where a government official/council worker/teacher has made some monumental cock-up costing time, money and embarrassment. One thing that is clear amid the claims of, "we are investigating how we can avoid this happening again" is that human beings are fairly useless at learning from our mistakes, certainly others' mistakes but even our own.

Why is this? Well, there are lots of reasons, often the right choice is balanced with some more powerful force like laziness, tiredness or greed. Often, in my opinion, there are just an awful lot of mediocre people in jobs who either don't really care or who don't have the ability to either see whether improvements can be made (before something bad happens) or to do the best in their job. But there is another reason which I want to mention, without which it sounds like a fairly downbeat post about a mostly un-winnable situation - that is that we are also terribly bad at information sharing.

Sharing information sounds fairly straight-forward but it isn't. One of the reasons why is that truth is not always known objectively - there might be 2 or more opinions on the correct way to do something. Another big issue is that it is hard to present knowledge in other format than pure information, which is the least effective at inspiring people to follow and learn from it.

Let's take software development. One of the most effective ways of ensuring quality software development are code-reviews and code-release checklists. Have you spell-checked messages, have you added translations into the translations database, have you signed off x and y and informed the test team about the release etc. Guess what? Most developers can't stand this stuff. We enjoy coding, we don't enjoy "paper work" so this process-oriented approach is great in theory but doesn't work in practice.

Another problem is how you distill the vast and abstract world of knowledge, even a specific subset like computer science, into a form which is searchable and useful? Currently, most programming knowledge (I would suggest) is obtained by searching for specific phrases on Google, such as "how to access MySql from C#" and then copying and pasting the answer from somewhere like Stack Overflow without necessarily understanding all the meta-data. Is it up to date? Is it the best (or one of the best) ways to carry out the task? Are there settings in the code that I need to change for my implementation? Are there security considerations? Is there any way to verify the expertise of the author? In fact, the scoring mechanism on Stack Overflow is extremely poor. You can ask one question that a lot of people up-vote (for whatever reason) and achieve a multi-1000 reputation without having any knowledge, you could likewise answer 1000 questions and not even get 1000 reputation from that.

There is also an awkwardness about the idea of "community-driven", the idea that by adding lots of opinions into the mix, the right answer will pop out. This is, of course, not true in the same way that the most popular vote in a General Election does not necessarily equate to the best choice of government. It also does not distinguish between the quiet but clever person and the loud but stupid person who thinks that the way to do something is whatever "worked for them". It is good that people are able to review and question information and 'facts' but it is not a good idea, in my opinion, to air all of this in public in comments that just confuse the uninitiated into not knowing what is what.

So what would be the solution? I would like to see a site where the information is reasonably structured into smallish chunks, each of which could be split into separate headings such as "legal requirements", "security considerations", "platform considerations" etc. and which perhaps is produced and published by a single individual who then becomes responsible for curating and updating the content based on people's feedback. Useful feedback could be rewarded with reputation, trolls could be down-voted or banned and curators can themselves be reported if a reviewing thinks their work is unfair or misleading. Perhaps people could apply to curate certain sections that they are more qualified to curate based on job/qualification etc. there could then be either a contest or the incumbent could plead no contest and hand over to the more qualified person. With rate limiting, it would prevent people from just spouting nonsense like you see on Yahoo answers all the time, perhaps each person only gets to comment once per day - and could even win another vote if their feedback was considered useful.

It seems that we should be able to spare thousands of developers around the world the problems associated with making all the same mistakes that have probably already been made by someone else. Even with topics as high-level as Defense-in-depth or "keep it short and simple", there is no reason why these shouldn't be published, debated and updated. Old information could be recorded so that people can easily see why something that saw on a web forum might not be relevant (it was required before a library was updated or a new release of a framework was made).

I'm not sure if this is something that is doable or whether, as I have learned from software, the first 90% is easy and the last 10% is impossible! Who knows!?

How governments spend ages trying to fix the wrong problem

I had a strange experience with Google the other day. While trying to pay for some music in the Play Store, an error occurred, which long-story-short was because my wallet had been suspended.

Apparently, Google wanted me to verify my account before use, something they claim (probably correctly) is required by "law" to identify the owner of an account.

Now, since this is something I have not seen before on other providers, I am guessing that Google have become subject to these laws because their wallet is now designed to be much more than just a payment engine, like Google Checkout used to be and more like a Bank Account. Bank Accounts certainly incur requirements for identification (apparently to avoid money laundering?).

Anyway, Google expect me to upload a form of government id (drivers licence, passport etc) and a proof of address. Haha. As if I would upload copies of these documents to an anonymous web server in the cloud owned by a US company. It is one thing to show them to someone at the bank, who at worst photo-copies them and keeps them on file, but am I really going to upload them electronically to the US? Nope, I am not!

This requires some root cause analysis and leads to these anti money-laundering laws. The Money Laundering Regulations 2007 were brought in to address, fundamentally, the worry that money for terrorism was moving through UK banks and that by identifying or preventing this would help reduce terrorism. The government also justified it because it also helps remove organised crime!

I am not an expert on organised crime and terrorism but I am guessing that the criminals are not heavy users of banks and are probably not worried by the regulations too much. They also suppose that the majority of crime is run by a few organised bosses - again something that sounds plausible except that there is still organised crime in this country (and elsewhere) despite the regulations. What the government have done, again, is tried to look like they were doing something useful which has hit the honest people dis-proportionally more than the criminals that they were intended to stop. It presumably means that loads of your financial information is under scrutiny secretly ever day by people looking into various crimes and God-forbid you are a business person who deals with erratic amounts of money and will raise red flags all over the place.

The other big problem is that, I would suggest, most crime us still based on cash. Whether this is tradesmen not paying their tax, drug dealers or theft/robbery, we are talking about a large amount of small amounts, which add up to a vast amount of money but which are still largely undetectable and which no government is really doing anything about. In Germany, if you cannot justify where you obtained a large amount of cash from, it will be seized and you will be investigated. In the UK, you can have £5,000 in small bills hidden in parts of your car and if the police can't prove that it was illegally obtained, you get it back! (I saw this on a TV show!).

The government and police should want people to use electronic systems to move money because it is easier to trace the amounts moving around from source to destination. They should also have an international law that if anyone wants to receive money from UK banks, they must provide law enforcement here with details of where money has gone to if requested.

Stop fixing the wrong problem and adding all this extra red-tape that does nothing to stop crime, if just shifts it to somewhere else and leaves the rest of us having to pick up the pieces!

Tuesday, 9 September 2014

FileZilla - Quickconnect works but Site Manager doesn't!

I don't use FTP much but was trying something simple - connecting to a new site.

I started with Quickconnect and it worked fine but then realised I was supposed to be using FTP over TLS, which is only an option in the Site Manager so, created a new site, setup the options etc. and every time I tried to connect with the Site Manager, I got a 500 error response from the server.

Quickconnect mostly worked but only if I cleared the console first (although that might be a red-herring). I tried switching off TLS in the options but Site Manager would always fail and Quickconnect would mostly work.

I found a similar problem reported a few years ago on the FileZilla forum and they suggested another route. Connect using Quickconnect and then choose File->Copy current connection to Site Manager.

When I did this, it worked fine. I then added the TLS option back in and it still continued working. There was NOTHING different between the connection I created and the one that FZ created when it copied the Quickconnect connection so it must be a bug!

Time to head to FileZilla to report it.