Friday, 28 March 2014

Friday Opinion: Why I don't like "learn to code" blitz courses

These have become popular recently: courses that are basically offering tidy, canned, straight-forward code teaching so that within 1/2/3 months, you can write your own web sites. There are various free online sites and even others that you pay for, which promise a lot and which, in the most simple terms probably do what they say - except, in my opinion, software is rarely if ever "in simple terms".

The difficult parts of designing/writing/deploying and maintaining software are not found in for loops or "what is a class". It is in the subtle art/magic of requirements capture; in determining how the whole system should work before you start coding so you don't spend your entire time re-writing the site's functionality; it's found in the tools that you use, sometimes being forced to use a tool you don't normally use, learning all it's key shortcuts, it's bugs and even how to do the most simple things; it's working at annoying file permissions and deployment problems and those bugs that people annoy you with that you can't re-create because you didn't add enough logging or you don't have command-line access to your hosting server or the knowledge of how to tail or grep error logs.

There are many things which make software development a pain and which are the accumulated learnings of often many years of experience which someone new to the business can simply not know or be taught. Why? Because there are so many potential problems/errors/software applications that it would be impossible to cover everything.

What I am saying is that Software Engineering really needs to be considered as a professional craft and not as some hobby that people with fast typing skills can use to quickly hack the Pentagon or work out which type of badger caused an imprint on a crime scene in an episode of CSI. I think it should be considered more like medicine!

You would not dream of offering courses to become a doctor in 3 months (and would probably not be allowed) and why not? For EXACTLY the same reasons that you can't learn a decent amount of software in 3 months. There are so many variations/conditions/drugs/habits/budget constraints etc that even knowing a good amount about the human body can simply not prepare you for. Imagine that you knew exactly what all the parts of the heart were called, what they did and how they join together - would that be enough to perform a heart bypass? Think about it and there are so many things that anatomy itself could not tell you. What staff am I required to use for the operation? what tools do I need and what is available? What kinds of things can go wrong and therefore do I need to mitigate? Do I have to decide the anesthetic or do I have to agree it with the anesthetist?

I think this is a good analogy because the two really are similar. The basic mechanics of the job are reasonably straight-forward and potentially very quick to learn but the peripheral skills can take years to learn and then master. During this period, you need to experience many scenarios, have the support of many and differently skilled professionals and learn when to ask for help and when you should have learned what to do. Sure, the outcome of failing a medical procedure is potentially much worse than poor software but poor software can have a cost. One example was a trading system that because of a poorly designed deployment cost the company around $70 million dollars inside the space of a few minutes!

So, what does this mean for how we should train people in software? Well, we can again, in my opinion, use the example of medicine. We should have high-school level basic computer skills equivalent to learning biology or physics. This gives a nice groundwork and most importantly, allows people who might be interested to find out about the subject in school where they are attending anyway (i.e. they don't have to use their own time just to find out whether they are interested without the help and experience of a teacher and their class). You could then have an advanced course perhaps, or a vocational course that could cover, let's say, entry-level computing - suitable to get a basic job in the industry but not something that suggests that you should be allowed to single-handedly produce anything other than a framework-based site by themself. This could be a bit like the pre-med studies carried out for medicine. They wouldn't allow you to become a doctor but they would allow access to medical-related jobs or as a basis for a similar and related job. If someone was still into it and wanted to be the sort of person who signs-off a complete site, like someone who could be in charge of a medical operation, they should need a degree-level qualification (however that would be defined/judged). There is no reason why these couldn't be treated in the same way as other professional qualifications - law, architecture, medicine - where relevant institutions would have to certify that a course meets a certain criteria.

I know many people don't like the idea of academia to prove competence because if we're honest, we know that a degree does not prove competence and we can all name unqualified people who are very able to carry out a certain job better than a qualified person. However, the important thing is to consider the alternative - we have no qualification requirement and just have to hope that people know what they are doing. That sounds much worse to me - it's like saying that just because some qualified doctors make mistakes, it undermines the whole value of their training and qualification - which is nonsense.

This all, of course, does not preclude people without qualifications from doing anything. In the same way as trainee doctors can practice with supervision, trainee architects can design buildings but cannot sign them off and trainee lawyers can produce certain documentation that would often have to be checked by a qualified lawyer.

If we continue to be too blazé about software, we will just keep making mistakes and some of these will be very costly (for someone or other!) and perhaps even if the problem seems small - it might be you who has their identity stolen or their life's collection of photos deleted by an attacker etc. Mistakes are easy in all professions but mistakes that are made for reasons that are easily addressed with structured training are pretty poor in my opinion. Why can't we accept that software is not a hacker's craft and it needs to be taken seriously?

Monday, 24 March 2014

How does Google Analytics work (for non-techies)?


Caveat: I don't work for Google and this is only a surface treatment of the subject. There are likely to be details that are not 100% accurate but they should tell you enough to get you running.

Introduction


If you have seen the various reports you can produce in Google Analytics, you might be amazed how all of this information can be obtained. It is surely mysterious and magical, perhaps being powered by unicorns and wizards. Actually, it isn't, but it does rely on a number of tricks to work out the relevant data - most of which rely simply on how widely used Google Analytics is but what needs to be understood is why and when the information in Google Analytics is not correct.

What is crucial is that you do not expect GA to be 100% accurate, it cannot be and in this post I will explain why. For instance, you setup GA on your site, visit it, go and look at the demographics report and expect to see 1 visitor from your age group and gender and the information is either missing or incorrect (ignoring the fact that you usually have to wait a while for it to appear anyway).

Also, it is crucial that you do not expect data taken from a few number of points to be accurate. GA is definitely designed for large numbers of visitors where a few percent error is acceptable.

Web Requests Don't Contain Much Data

When you request a web page, there is a range of data contained in the request. Some of it, like the source and destination addresses are needed for the mechanics to work correctly, other information is added for the convenience of web servers to know a little something about the "agent" that is requesting the page (the browser, program, hacking tool etc.). This allows the web server to send back slightly different content to certain browsers that might not support certain functionality. There are also fields that the browser can fill in to tell the web server whether, for instance, it can compress the data being sent back which reduces the load on the network. The only other thing of consequence that is sent to the web server are cookies (see later). What is NOT sent to the web server, by default, is anything that specifically identifies you the user. The reasons are simple: 1) There is no unified definition of what a web user is - no standard "identity" and 2) The web server does not need to know!

For instance, when you go to amazon.co.uk, unless you have an account and sign in, amazon does not know who you are, how old you are or where else you have been. If it does, then it does it with the same trickery that GA uses to track people and decide who they are.

Cookies

The reason cookies are so important is that they allow web servers to store information on each machine that visits the site. Note that the cookies are just small text files and they are stored on disk, so if you log in from another machine, that cookie will not exist there unless the site you visit sends it back to you again.

The cookies can contain anything that the web server decides to store there, including usernames and passwords if they are stupid, but usually, a cookie will be used to store your "session" when you are logged in so that when you change between pages, the cookie will be sent in each time and the site knows who you are, otherwise you would be lost between pages since your identity is not sent in the web page request.

The trick here is sharing cookies between sites so that a whole group of sites can follow you and potentially work out who you are (you might have seen virus checkers complaining about tracking cookies). Now you can't actually share cookies across sites normally, they are only sent to sites that contain the same "origin" as the cookie. So if a cookie has an origin domain of google.co.uk, then it will only be sent to web page requests for a page at google.co.uk. Amazon, for instance, cannot demand that your browser sends it a Google cookie. Note also that the browser sends the cookie(s) automatically to any site with the correct domain.

What actually happens though, is that a site, let's say Amazon, can get permission from Google to embeded a small piece of Google code into its own page. When that Google code is called, and sends a request to, say,  google.co.uk, the google cookie will be sent with it, even though you visited Amazon. What Google can then do is know that you visited Amazon and more usefully, Amazon can know that you visited Amazon (naturally, if you have an account at Amazon, they would already know that).

The Clever Part

The question, though, is how much can Google really determine from this? Well, of course, it depends on whether you have a Google account or not. If I have a Google account and am logged in, a local Google cookie will contain a user identifier (as would all sites - not just Google). If I then visit Amazon and this cookie is sent to Google, Google don't just know that someone visited Amazon, but they know that I, Luke Briner, thirty-something, living in the UK etc. have visited Amazon. This information is much richer than the basic information and is something which is both interesting and from Google's point of view, very valuable commercially to a large company that sells to millions of people, like Amazon. It can allow Amazon to ask what age range they attract, what gender, what nationalities. Perhaps it shows that French people don't last very long on the site so they might consider building a French language web site.

Of course, this can be extended across other sites. Google owns YouTube, Google+ etc so any of these sites that contain membership information can be used by GA to identify the user.

What Else Can They Determine?

Earlier on, we mentioned the web request containing information about the browser, it looks something like this: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 and this tells Google what operating system and browser you are using. It can also determine whether you are on a mobile phone browser and to an accurate degree, even what handset you are using. There are 1000s of these unique strings which Google can easily translates into real-world data. (They look really weird but there is a funny history about how they evolved to be so weird)

GA can have a good guess at your location and language, either by seeing what language the browser has sent in the request and/or by using your machine IP address to get a rough geographic position (although often this is the location of your service provider rather than your actual machine).

One of the other useful features is the referrer information which tells us which page the visitor came from. This is helpful because we might compare how useful Google is compared to Bing or Yahoo but it also can help us decide whether our internal links are working in the way we expect - are people clicking the "buy it" link or simply clicking through from Google shopping?

GA can determine which pages people visit with reasonable accuracy by using an identifier in a cookie, it won't tell us who it is specifically but it will say that user X went to pages A, B and C and stayed on B for 10 minutes. It will then say where the user left. It will add timings for all of these visits.

What Can We Add to GA?

Google knows that GA cannot determine everything, so it allows us to add custom information to our reports. For instance, we can add events that record specific things happening on our site or we can add e-commerce values to certain events to GA can track how much money certain pages are making (and therefore which might need the most attention).

Why Isn't It Perfect?

GA isn't perfect for several reasons.

The main reason is that a lot of the data is inferred rather than being directly know. For instance, to determine that a user has been on a page B for, say, 10 minutes, is inferred by the fact that they went to page A, then page B and then page C and the gap between the arrival time of pages B and C is 10 minutes. Of course, this might not be accurate. Perhaps a user clicked on page B but then went to another browser tab to do something else - they might have got through page B very quickly once they went back to it. GA cannot know this - and this is why we need lots of data for these odd events to come out in the averages.

The second way in which is suffers is due to general web latency and other measures outside the control of Google. For instance, you might visit a page that sends information to Google but this packet might get lost, it might take a  long time to reach Google, which may or may not affect its timings, people might be using various security tools that block expected behaviour or override browser settings. All of these would affect individual readings, but again, over time and with increasing numbers, these values will fall out in the wash.

Another problem is that it relies on cookies and cookies can be disabled, blocked or deleted which can have a whole other effect on data for low numbers of users. Again, GA works on the basis that most people do not play around with Cookie settings.

What Can We Do to Improve It?

Firstly, by understanding how GA works, we cannot expect it to do too much. We cannot expect any specific piece of data to be 100% accurate.

Secondly, we can learn how to interpret the data and decide between what is actually a problem and what might just look like a problem because there is not enough data.

Thirdly, if we really need split-second timing information, we need to code this into the web site in question so we can have complete visibility of what is happening when and with much greater accuracy. Note, that we can still suffer the same problems in that we don't know whether a specific user is struggling with signup or has gone to make a cup of tea, although we could mitigate by having some kinds of timeouts with messages that a user can click which tells the system that the user is still there.

Fourthly, if we want to track incoming links from pages we have control over, rather than hoping that GA works it out (which it might), we can add some custom parameters to the URL which allows us to tell Google exactly what we want GA to know. This is especially useful if we have a campaign on social media and want to know how many people came to our site from that specific link. We do that using the Google URL builder. Usually, it's then easiest to use a URL shortener service like bit.ly to make the URL short enough to post on Twitter etc.

As with any management, we should allow GA to be what it is (it is free after all!) and not coerce it too much to do what it is not designed to do, but probably more importantly, we need to ask a very important question that is often lost in all the discussions and that is, "what do we actually need to know?". It is easy to say, "I want to know what device the user is using" but that is not actually a need, the reality might be that we want to know if users are visiting from Android so we could write an Android app but the question might actually be, "are users visiting from Android on the app or are they using the Android web browser?" Unless you can answer these questions before you start on your data crusade, you'll end up like those managers or politicians who are addicted to stats for the sake of it and who spend large amounts of time and money on things that are not actually business critical.

Wednesday, 12 March 2014

RESOLVED: Button postback not firing from GridView or DataGrid

I am porting an older site to a newer site, replacing old code with new code, bootstrap 2 to bootstrap 3 etc and have moved a DataGrid into a new location but it needs to do the same thing. I had an edit and a delete button in a TemplateColumn and although it looked all fine and dandy, it just wouldn't post back. The generated .Net Javascript looked correct but nothing. I could get a Javascript confirmation but nothing on the server side.

I then noticed an answer in one of the hundreds of forum posts I tend to search when it doesn't work.

CausesValidation=False

This has caught me before. By default, postbacks will (correctly) call the validation functions on all validators on the page. This would normally be obvious since those validators would then complain about whatever you got wrong. In my case, I had a hidden form with loads of validators so, of course, nothing appeared to happen to me.

Setting CausesValidation=False to the link button in question means it passes a boolean false into the post back javascript function and this allows it to postback without validation! Amazing.

(Other potential problems)
1. If you have built your site from scratch, you might have forgotten to include the MS webforms Javascript files which include the relevant Javascript for the postback functionality.
2. Linkbuttons possibly won't always work since they no longer have a "name" attribute. You could add this manually during Page_Init if you wanted to.

Learning about error handling the hard way!

Let's be honest, error handling is not fun. We like to spend time on the functionality right? The cool stuff that people see or use and then remark how good your software is! Error handling is like sweeping the roads or emptying the bins - necessary but very dull.

So two things have happened recently that has taught me the hard way about trying to see potential errors and handling them properly. One of them more serious than the other. You should know that I do have error handling. For instance, all of my database functions will catch exceptions, log them and then return some kind of error code but this is not enough!

The big, bad experience I had was with a worker task that is designed to delete unused images from our cloud storage. It works pretty simply, every 4 hours it queries the database for images that are referenced and then queries the storage for images that are stored and then any on disk that are not in the database are deleted. Easy right? How many errors could you possibly see and what effect could they have? How can I improve it? Well, there were no sanity checks anywhere so when anything failed, it just threw an exception, pretty much ignored it and then waited another 4 hours. However, one weekend, the database connection failed temporarily and so the query returned 0 entries. The code then decided that 'if not images are referenced in the database, I need to delete everything from the images storage', which it did! Every user's uploaded profile image was deleted - just like that.

It wasn't the end of the world, I had to edit the database and set everyone's profile picture to a placeholder than included text describing what had happened and how they needed to re-upload an image.

The second issue was a raft of errors that were logged to my error logs (and emailed to me!) and which meant that the database connections were being forcibly closed, which SQL Server is known to do when it feels like it. I don't know how this works on SQL Azure which is essentially VM style databases on a shared host but what I do know is that I don't want people to see generic error messages which mean nothing and which make the problem my fault in their eyes.

I realised I needed more. I firstly needed some more checking for conditions external to my system: did the query suceed? How can I tell if it succeeded? How many rows were returned because I would expect a few thousand - certainly not zero. What should I do if this happened? Log the error? Disable the worker? Instead of deleting the images, perhaps I could move them to another folder so they could be restored if required (for perhaps a month?).

The second thing I needed was for the users to know that when a database connection fails, it is not my fault and they can try again soon. I can also automatically retry calls to the database if they fail for a reasonable amount of time until the user would get annoyed and then either display a "sorry, it's not my fault" page or even just a "please wait, the system is attempting to re-connect to the database".

In conclusion, I get annoyed as a user when something on a site crashes - even when it shows you a nice message to say it has gone wrong. I therefore need to realise that my users will also get annoyed if I don't correct prevent and/or recover from errors in the system - whether code bugs or network issues. Logging is also essential so that these errors are quickly tracked down and resolved.