Friday, 31 July 2015

mod_deflate in apache is breaking caching

I was planning a video of caching in Yii framework and had a quick look at one of my sites to see what happens by default.

I was surprised to see that although all the files had ETags and although they had no cache-control by default (which is what I expected), when the page was refreshed, all the items were re-requested and SOME of them got a 304 response, while others got 200.

The short answer is that mod_deflate breaks it (and there is a workaround below) but first, a bit more background if you don't understand.

Caching is good. Rather than send you the same files every time you visit my site, I make you keep copies of them in your browser. When you request the page, if resources are already present locally and haven't "expired", then you get the local copies and save BOTH the request/response to the web server and also the network bandwidth to send them from server to browser. This strength is also it's weakness. How long should I let my resources live for before they expire? Too short and I lose the advantage of caching. Too long and I might change something but the browser keeps using the old file, which hasn't been expired.

Because the answer to cache duration depends on the actual site, most web servers do very little or no caching by default. The 304, however is not a bad solution for uncached objects. When my browser visits the page, if it already has copies of the objects, it still goes back to the server and basically says either "give me this object if it has been modified since X" or otherwise "Give me this object if the etag I have doesn't match the etag of the latest version". The etag is more flexible since it allows you to change an object and retain its etag if it is functionally the same as the previous version and you could also replace a newer one with an older one and still get the same functionality. The ETag is just a quoted string and could be a hash or date or anything the server wants to do.

If the server looks at these requests and decides that the object hasn't changed, instead of sending the whole thing back, it just sends a 304 "Not modified" and the browser can use its cached version. This all works well but why is it not working consistently on Apache?

mod_deflate!

mod_deflate uses (usually) gzip to reduce the bandwidth required for the object being sent to the browser. What mod_deflate also does is to modify the etag by adding "-gzip" to the end of it. Why? I can't quite understand but I think it is to match the W3C spec for something but effectively it is trying to differentiate between the gzipped and the non-gzipped responses. I don't think this is correct (whatever the spec says) because the underlying object is the same so if the browser has a version that worked previously, all it is asking is whether the underlying object has changed - the transport is irrelevant.

So what is actually breaking is that the original object is, say, ETag 123456 and mod-deflate makes this 123456-gzip. The browser revisits the page and says, "Can I have this object if it doesn't match 123456-gzip" and the server looks and thinks, "The object being requested is 123456 so it DOESN'T match, so I'll send the correct one". The whole thing then repeats each time.

Some people suggest disabling ETags as a workaround but another suggested workaround is a bit more specific and involves rewriting the ETag in the "If-Not-Match" request header and removing the gzip bit. When the server then checks, the match will succeed and everything is happy!

The second line, I am unsure about. The original code, commented out in my example, seems to add "gzip" to every ETag that doesn't have it already. Commenting out the code and none of the ETags have gzip in them, even the ones that were gzipped. I think I prefer the second line commented out. I put this code into .htaccess in the web root and have to set AllowOverride to FileInfo in the Apache config. You also need to ensure that "sudo a2enmod headers" says that it is enabled or already enabled!

<IfModule mod_headers.c>
    RequestHeader  edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""
#   Header  edit "ETag" "^\"(.*)(?<!gzip)\"$" "\"$1-gzip\""
</IfModule>



Friday, 17 July 2015

Companies still get Customer Feedback sooooooo wrong

Most things in life are on a spectrum from the very good to the very bad but customer feedback seems to fall at the bad end more often than the good. This is not just a web issue, in fact it crosses into Call Centres as well as in-person but some companies are paying a fortune because they get it so wrong!

Why do we want or need feedback? Well the first point, and one which many get wrong, is that we want feedback for different reasons. We might want feedback to improve services, we might want feedback because something is broken, because someone might have a really good idea or even because someone needs to complain. We might also need feedback for everyday things like cancellations, address changes or other Customer Service issues.

If you try and funnel your feedback down the same channel, you will fail unless your company is tiny. But companies like Air Canada, do exactly that. Want to contact them by "email"? Well, click the link on their site and you get the mother of feedback forms. It would be OK if all of the fields were optional apart from perhaps email, so they can contact you back but even that should not be required. If I want to report that a link is not working, my email address is irrelevant and if you are not going to confirm the email address, I can type a fake one in anyway.

But no! You have to fill in your address and even passenger details for the flight. What flight? The flight you went on. But I haven't been on a flight, I'm having trouble booking. Oh.

You need to trust people a bit. Why make the passenger details mandatory? Why not say, "If your contact is about a particular flight, please enter the passenger details so we can find the records"? Why not have a drop-down list for your reason for contacting us? If you did that, you could filter the other fields and not show 50 pointless fields to a person who wants to "report a website issue".

This is one example and a fairly typical one but there are plenty of other ways in which we get feedback wrong.

Lots of big companies use automated telephone systems to "direct your call to the correct department" but for some reason, in EVERY example that I have used, the options are abstract or esoteric and do not obviously relate to everything I might want to do. "Press 1 if you need to change an address; 2 if you need to adjust a Direct Debit amount..." What if I want to do both? What is the problem with saying very clearly, "If your call is about your customer information like name and address, press 1; if it requires access to your financial records, please press 2; If you want to complain or provide feedback in general, please press 3"?

If you get this all wrong (which most people seem to) you generate a massive overhead in Customer Services. How many people are calling back because they got confused? Do you know? Are you employing twice as many people as you need because you are not doing it well.

Most importantly, do you ever think about things from a Customer point of view because I'm certain that many companies quite simply don't!

Let's use Air Canada again. I would bet money on the fact that most people visiting their site are economy passengers looking to get the best price on a ticket with perhaps the desire to have the quickest flight or at least a direct flight. Can you do this? Nope. You type in the dates and say that you're flexible (fine so far) then you see a grid of prices - hmmmm OK. I can choose the dates purely on the basis of price in this grid and then when I select the dates, I go to a list of flights. Oh - none of these are direct and take at least 4 hours longer than a direct flight so I need another day. I'll change my search results and choose another day, that's better, I can get a direct flight and click next. Oh, the return flight is also not direct so now I have to change my search results again. Now I've done that, the price grid now says that the cheapest flight I can get is double the original price! So the two direct flights cost double the flights that are longer, further and have a stopover. Why? No idea.

I tweeted Air Canada, usually the best way to get a quick response and they asked me to contact their reservation lines on the phone. So despite them paying however many millions for their site, I now have to use up some person's time on the phone to do what I should have been able to do on the web site! Waste, waste, waste.

I wanted to feedback this issue but the contact form was so poorly designed, I didn't bother. This is the real danger in getting it wrong, you will push people away until eventually you will not even know why people stopped using your service and you will go bust - just because the few of us who want to help with feedback couldn't do so easily enough!

I sometimes wonder whether this is another skill that falls between the cracks. Customer Service Manager's job to sort the website? Probably not. Technical departments job to make the user journey effective? Probably not.

Thursday, 16 July 2015

Why Web Design is such a pig

Introduction

I remember writing an article a while back about the argos.co.uk website and how very poorly it was designed. It kind of looks OK on paper, the colours are fairly consistent and the layouts are varying shades of average but trying to actually interact it with was painful (it's slightly better but still has some problems).

How can someone like Argos get it so wrong? Let's be honest, the site probably cost several hundred thousand pounds to produce, either in contractor or Argos time and salaries so it pains me that something is already so poor at time of release (as opposed to a site that starts to feel old over time).

The reason is mostly because design is a pig. There are sometimes technical issues but in my experience, most of the difficult things in the technical area are related to design decisions. There are various reasons why design is a pig, some of them easier to fix than others but they all add up to the mess that most of us experience.

1.Terminology

Terminology is really important in so many fields, including software development. We still use this idea of Design/Build, something that has probably been around since the stone-age, despite the fact that software is much more subtle than that.

Software is actually a combination of several different but inter-related practices. The look and feel of the software, the functionality of it, the user journey design and the technical decisions all add up to that single entity that exists in many an amateur's mind: "the product" or the "web site".

Why is this important? Because most web design companies do not have people who specifically look after each of those areas. Sure, they will have "designers" and they will have "developers" but whose job is the functional design? Who is supposed to ensure the user journey works or that we are not asking developers to do things that will cost far more in time than they are worth in usability?

Even the phrase "Designer" irks me in this world of software. What is a Designer? They design stuff? Like what? They definitely do the obvious stuff like choosing Lime Green and Sunbeam Yellow to form a colour pallette but since these designer jobs tend to attract artists, you end up with very nice graphical work but still a vast void between that and the code that needs to make that design happen. We should not be allowed to use phrases like Web Designer but should be more specific. Graphic Designer for Web or User Experience Designer or something.

2.Most design tools are not web friendly

What do most "designers" use for web designs? Photoshop! Of course, the de-facto leader of the pack, the only package that is cool enough to be allowed in Web Design but apart from some small add-ons for web, this package is totally not designed for web application design. It is a glorified photo-editor. What do you get sent from these tools? A load of HTML that can be the start of a web site? Nope. A load of Photoshop or PNG format drawings that a developer has to chop up into individual images.

Back in the day, this was not ideal but was OK because sites were sites were sites but now it is not acceptable to not consider designing for mobile. How do "designers" design for mobile? Many don't - it's the developer's job to work it out and of course because the developer decides, once it is finished, a load of people will disagree with those decisions and make them change it - each of these a potential bug, another load of time and money that either has to be coughed up by the customer or swallowed by the developer. Those that do will often send a second set of drawings showing mobile layouts but again, this can be unhelpful. The two layouts are actually the same site in most cases and why should the developer waste time trying to work out how to make the site respond in the correct way for two drawings that took an hour to draw on Photoshop?

Adobe do actually produce a tool called Edge Reflow, which looks interesting because it allows designers to consider many of these things earlier so if they don't work, it can be designed out now rather than after the developer has either hacked something together or made it work using loads of duplicate code. However, I have never seen one of these designs or what they look like as a design source for the developer, I would like to though!

3. Customers don't know anything

If you went to a doctor for some surgery, would you tell her which way she should cut into you or which part of the liver to remove? No, because she is a doctor and you are not. You might know some stuff about it but you still trust their decision and you know that they don't really care about your opinions.

One of the most annoying things in Web Design is customer input - that is "design" decisions given by the person paying you to build the site. Of course, they are the customer so they get to choose right? Actually, not really. If you let a customer walk all over you, then you probably have an issue with assurance. If they don't believe that you will do something well, they are likely to try and direct it too much. If they do trust you, then you can be nice and clear with them that they get top-level input into the general style of the web site but otherwise you will decide what looks good and what is usable because you are the experts.

We often get this wrong and it can be hard but you should be good enough at your job that if somebody is really insistent on controlling everything, you turn down their work. I have seen some shocking web sites and they have a link to some company "web design by..." and I think to myself that there is no way I would put my name on a site so badly implemented. I would not use that company purely on the basis of one terrible web site so be warned, that site that you take on because you think you need the work might also be the reason why you don't get any more!

4.Functional design is missing

This is pretty common on smaller web sites but who is designing, documenting and SIGNING OFF the functionality of the site? This relates to issue 3 because part of the problem is often that the customer does not actually know what should happen where. Why take on the job if that can't be agreed up front and signed off? Do you think it will come out in the wash because it won't! For any site other than the most basic shop stuff, it is madness to design a site with no functional design.

Employing a Business Analyst is not high on the list of priorities for a web design company but why not? For their wages, which can be half that of a developer, you can employ somebody who is really good at simplification, spotting inconsistencies, nailing customers down to make decisions and producing something that is MUCH easier to work from for a Developer than some half-arsed drawings made on post-it notes. If you are a small company, this person could do other jobs if there isn't enough work but actually functional design is a pretty involved job that reduces the cost of design changes because they are made before the technology has been developed and becomes a risky place to make changes.

5.No-one owns User Experience

I am shocked by how many web sites of all different qualities fail in the most fundamental way: A user needs to do whatever they need to do on your web site as easily as possible!

Your site can look amazing and achieve some slick functional goals but yet the user cannot find out how to journey through the site. This might be because of major bugs but is more likely because the customer and web company did not actually think about things from a user's point of view. Your users may have a wide range of technical abilities, ages, races or whatever might affect their ability to use your site. Just because the Developer could get round it doesn't mean that someone's Grandma will be able to.

User Experience and Design are linked because some things are potentially related to both like the colour and size of buttons but other things exist above the level of design. How do I lead my users through this web site? How will they know what to press? Where do they get taken to after certain things happen like adding items to the shopping basket? Where does it make sense for them to go.

Some of this is, of course, subjective but there are still things that are just considered good practice - period. Navigation must be obvious and not too many levels deep. Button colours need to reflect the seriousness of the action - deleting an account should probably be red or orange as a warning. Update my details should be green because it is saying, "yes, you've finished". You can also cary out user tests if you are unsure, there are companies who provide this or you can organise them yourself with local schools or old peoples homes or whatever.

At the end of the day, you need someone with influence who can fight for the user experience and make sure that no other decisions impact this badly.

6.Design is not always documented

If your company is heavily biased towards Developers then it is likely that you don't like documentation. Graphic Designers are also very keen on making pretty drawings but not documenting their decision process.

A few years ago when we first started designing our product at PixelPin, I wanted that design decisions need to be documented so that if a change was needed, we would know whether we were making things better or regressing. For example, we might have moved a button because user testing showed that the only way for them to press it was to move it away from other buttons. If we know that and someone suddenly wonders about moving the buttons, we can take a view on whether the original decision is still valid. We didn't do this and have paid the price in certain ways.

Let's be honest, designers are known for creativity, not for technical detail and documenting these things is probably not what graphic designers want to spend their time doing but actually, it is part of the process and an important part too. Branding agencies are better because they often have to justify their time and choices more clearly to their customers but as with all jobs, a critical part of being good at your job is knowing what you are doing well or not and changing something to make you do better.

Conclusion

Although we might not think we have the money to have everything the way we want it, there are two types of companies: those who strive towards the way things should be or those that don't bother trying. Guess which ones do the best?

These are just some of the issues in moving from decision to product in the world of web design and there are others but if you actually work out in your own company who should be doing these and educate your customers that these are not just nice-to-haves or excuses to charge money but are a way of making the process robust, the decisions transparent and agreed as well as producing a site that is attractive, usable and consistent.

What will you do?

ServerTooBusyException on Azure

A weird one with an obvious cause (once I worked it out). I had a staged and a live instance of a web service with two urls - one pointing to each. I wanted to make the staged web service live and did a swap in Azure, after doing this, the live web application stopped working and I got the exception above as well as:

The HTTP service located at https://live.mycompany.co.uk/WebService2.svc/standard is unavailable.  This could be because the service is too busy or because no endpoint was found listening at the specified address. Please ensure that the address is correct and try accessing the service again later.

The weird thing was that when I swapped them back, the test web application talking to this same web service worked fine! The server was definitely too busy. I tried again just in case it was temporary but same error.

I panic'd but realised I could run the live web app on my local machine in debug mode and see what was going on. I pointed the live web app to the test web service using stage.mycompany.co.uk.... and it worked fine. I then had an idea. What if I pointed it to live.mycompany.co.uk BUT use a hosts entry to point that URL to the test web service. That should be the most accurate reproduction of the live problem.

Hey presto: I got the same error but this time I could drill down into the inner exception, which actually was a 503 error from IIS - Service Unavailable. The additional detail showed that the problem was the web app was looking for a web service with a specific host header (live.mycompany...) but since this was the test web service, it only had a host header for stage.mycompany...) so there was no web site to serve the content. Like many errors in IIS and .Net, there is some ambiguity which is not helpful but at least I found out the problem:

Solution

Edit the csdef file to set the correct hostHeader for the live site binding element and I also edited a publish profile, which I didn't know was relevant or not but which also had the test values in it.

Re-publish and voila. Relief!

Friday, 3 July 2015

The weird world of .Net Cookies and Forms Authentication

If you've ever had to do anything other than a default implementation of forms authentication in .Net, you might well have also come across some confusing bugs in your application. You might find the site still being logged in after logging out, seeming to be logged out when you just logged in and seeing auth cookies still present when you are not expecting them!

Actually, the system is quite simple but there are some things you need to understand, which will help you to debug your application and what you are doing wrong if it isn't working.

Cookies for Authentication

Firstly, you might have worked out that authentication is not really covered by HTTP. Although you can restrict access to a single resource using the auth header, it is not designed to keep state between requests, which means you have to track people. Why? Because HTTP was designed to be stateless and largely unrestricted. If I want a document, I just ask for it,.. the end. Of course, nowadays, we have applications that function more like desktop applications and totally have to be able to track people across page requests so we know who they are, what they have put in their shopping cart, what their preferences are for the site etc.

The answer is a simple HTTP concept called a cookie - a small text file which can be sent to a client from a server and which will be automatically sent back with each request to the same domain until either cookie expires or if it is a session cookie, until the browser is closed.

In most cases, an authentication mechanism will authenticate the user and then put some kind of encrypted identifier into a cookie. Each time the user comes back to the site, this cookie is sent back, the site can decrypt the contents and see if the user is authenticated. If so, continue, if not, they are sent to the login page.

In fact, in .Net, what is actually put into the auth cookie is an encrypted and encoded packet which describes a System.Web.Security.FormsAuthenticationTicket. This ticket contains a number of fields including the user name, an issue and expiration date and the path used to store the cookie.

When a user logs in, this data is collected, put into an auth cookie and sent to the client. When the client goes to another page, the data needs to be decrypted, verified and then the expiry date check to ensure the auth is still valid. If it IS, then the username is used to login the current user and potentially to load any other data you want to load.

All good so far.

Expiry Dates

There is a problem that surfaces very easily and that relates to COOKIE expiry dates. If you set your forms authentication element in web.config to use, say, 20 days expiry for the cookie but then you create a cookie that only has a 10 day expiry then guess what? The cookie will expire in 10 days and it will look like the auth hasn't correctly remembered the auth duration. If you use the built-in forms authentication, then you shouldn't see this but IF you write your own auth cookies, then you MUST make sure that you set the cookie expiry to the same as the auth expiry in the FormsAuthenticationTicket to ensure they expire at the same time.

If you do not set an expiry date on the cookie then another problem can occur: Session cookies.

Session Cookies

A session cookie (not to be confused with the cookie that stores the session in) is a cookie whose expiry is not explicitly set. What happens if you do this? The cookie will live for as long as the browser is open. This is useful for security reasons since it will help avoid accidental issues with other people logging into your account at a later date. There are two problems with this. Firstly, with multi-tabbed browsers, you have no control over if or when the user will close the browser. They might not close it for weeks, in which case, the auth ticket is still present and can be used - to avoid this, you should set the auth expiry to be a sensible value in the FormsAuthenticationTicket so that even if the browser isn't closed, the auth will still expire after a suitable time.

The second problem is related to deleting cookies.

Deleting Cookies

Lots of people have problems deleting cookies, including the auth cookie. You might have called FormsAuthentication.SignOut() but for some reason, the cookie still seems to be there and when you go back to the site, you are still logged in. This problem is not specific to .Net but is a very easy problem to cause if you use session cookies for auth (either always or as an option).

You can't actually delete a cookie from the server-end. Why? I'm not sure, but the suggested mechanism is to send back another cookie with an expiry date in the past so that the browser should (and usually does) delete the cookie.

The problem you can get is that a browser can hold multiple cookies for the same site which differ on domain or type (session or persistent). What happens if you have a session cookie for auth and then you sign out? .Net will send back a cookie with a negative expiry but guess what? If you have a cookie expiry date set (even an old one), the cookie is automatically a persistent cookie and will NOT replace the session cookie you are trying to delete. What will actually happen is the browser will think that it is a different cookie, will immediately expire the new cookie because of its expiry date and then it looks to you like the auth cookie has not been deleted. In fact, it looks like nothing has worked correctly and you are right!

How do you return an expired cookie for sessions then? You can't! All you can do is to send back an empty cookie with no expiry date set so that it overwrites the session cookie. You will then need to ensure that your system correctly handles a blank auth cookie.

If you allow session OR persistent cookies (perhaps a remember me check box) then you need to handle both cases. One will be taken of for you with FormsAuthentication.SignOut(), the other you need to code.

Some Code

The following code is a sign out function that calls SignOut() to effectively delete any persistent auth cookie and also sends back a blank session cookie to overwrite any session cookies. In this case, it also deletes the session cookie, although this is less important because in my case, the session can be abandoned in code and would effectively be deleted anyway.

internal static void SignOut(Page source)
{
    FormsAuthentication.SignOut();
    source.Session.Abandon();
    HttpCookie cookie1 = new HttpCookie(FormsAuthentication.FormsCookieName, "");
    cookie1.Domain = FormsAuthentication.CookieDomain;
    cookie1.HttpOnly = true;
    cookie1.Secure = true;
    source.Response.Cookies.Add(cookie1);
    HttpCookie cookie2 = new HttpCookie("ASP.NET_SessionId", "");
    cookie2.Expires = DateTime.Now.AddYears(-1);
    source.Response.Cookies.Add(cookie2);
    FormsAuthentication.RedirectToLoginPage();
}

Note, my version is a static method shared between all pages and also, the RedirectToLoginPage will put the current page into the url as the "return url", which you might or might not want. Otherwise, just source.Response.Redirect(..etc..); Also note that you must setup the secure and httponly flags to match your auth cookie so that it matches and overwrites the one that is already there.

Cookies for Session

Something else that people find confusing is the overlap between session and authentication. They are not the same and in most cases cannot be treated as consistent with each other. You can and probably would use session very early on in an application, before you even know who the user is. They might have selected a language, even added items into a basket, whatever, and they haven't authenticated yet. Session is tracked in a similar way to authentication using a cookie (ideally a session identifier which maps onto data stored internally on the server somewhere) but most of the time, a user can have session whether or not they are logged in.

So should these two systems ever interact? Possibly but not definitely. In some articles on security, they suggest that if the user is still logged in as indicated by a valid auth cookie BUT the session is empty, implying the session has expired, then you should log out the user and force them to log back in. This is a mechanism, to avoid the very easy problem of session and auth expiries that don't match. Why not make them match? Because session expiry extends every time you use it whereas auth doesn't by default. Is that always useful? No. If you allow the user to stay logged in for, say, 4 weeks, then their session will very likely be empty when they come back and you don't want them to login again. What you would need to do is to ensure that any important session is repopulated by the system when the user returns.

For security reasons, thought, it is recommended that when a user clicks "log out" then you abandon the session as well as calling SignOut(). This ensures that no-one can hijack the data that might have been put into the session while the user was logged in, which they could do by going back to the same site on a shared computer and the system potentially using the session to populate a load of functionality that the second person shouldn't have access to.

If possible, do not store security or auth data in the session but if you do, keep it to a minimum to reduce the outcome of a session hijack attempt.