Thursday, 28 May 2015

Windows Firewall not allowing DNS queries through.

Arrrrggggghhhhhhh

tl;dr Restart the DNS service

I just spent 2 hours trying to find out what was wrong with my Firewall setup. There was nothing wrong with it.

I have a Windows Server 2012 Domain Controller which also runs DNS (but only for the Windows network, it lives inside a Linux network!). I had successfully connected one machine to the domain but when I tried to do the same with the second, it couldn't find the domain.

The whole management around AD and Domain is horrifically complicated, not something for the faint-hearted but I could work out a few things.

Firstly, I knew the DNS was working, including forwarders because it worked on the domain controller itself. I also knew that because the first machine had joined the domain, the domain was basically setup correctly.

I tried to compare network setup between the machine that worked and the one that didn't and there was nothing obvious. I started trying ping and nslookup (they get their results from different places so they can come out with different answers!) and to make sure I wasn't getting led up the garden path, I disabled the second network card onto the Linux network leaving only one physical route via the host-only network and through the DC to the outside world.

I eventually worked out that with the firewall OFF on the DC, the DNS lookup worked correctly but if it was ON, it didn't. Easy right? Some rule problem? I went through it all loads of times, double-checked rule settings, ports, Googled the correct rules and all sorts but it still didn't work. I even tried switching logging on in GP editor to see what rule was being hit and it didn't work at all - it logged precisely nothing.

For some reason, after being confused, I decided to try and restart the DNS service. Guess what? It all started to work as expected and I could join the second machine to the domain!

WTF?

I have no idea what I had done to the DC to make it's firewall basically block everything even though it wasn't supposed to. I also have no idea why restarting it would make it work but I have decided that I dislike this whole area. Look through the firewall and there are loads of services, many of which I have never heard of and which might be required for something useful or not. Other services require a plethora of weird and unrelated ports to be opened and through all of that, all of the DC setup is carried out through a load of tree-view windows using old-fashioned languages and millions of dialogs. There is no distinction between the settings you are likely to be interested in and most of them have no way to restore defaults for those moments when after trying 100 things, you eventually fix it and want to reset everything you tried in the process!

Friday, 22 May 2015

Cyber Security. It's really hard but it's also not that hard!

On December 20th 1995, American Airlines flight 965 crashed into a mountain in Buga, Columbia while en-route from Miami to Cali airport. The aircraft was fully functional and the crew both experienced and concious but a series of errors led to the deaths of 159 people - only 4 passengers survived.

The flight departed 2 hours late from Miami due to both waiting for some connecting passengers and then, missing their slot, a host of other flights had to depart before another space was open. In some ways, this was the foundation for everything that followed. Being in a rush is rarely a good thing in aviation since it makes you skip things that you would normally do and more importantly, it reduces the thinking time you have when unexpected conditions arise in the air. It is also serious because of FAA rules on working hours and breaks, which although well-meaning can have the effect of causing people to rush even more to avoid an embarrassing situation, such as Flight 965, where such a delay would have caused a knock-on delay in the subsequent departure from Cali in the understandably but not ideal short turnaround times that airlines plan their schedules around.

The flight was largely uneventful until it arrived close to Cali and was planning its final approach. Cali being under civil conflict at the time had lost its radar system so it had no visibility of approaching aircraft, relying instead on the pilots to inform the tower of their position. The approach should have been largely textbook. It involved approaching a beacon called TULUA, after which another called ROZA close to the airport would be tracked, after which the plane would have passed the airport, turned and landed from the south. The approach here, as in other airports, is critical because the airport is in a valley surrounded by 4000m high mountains and it was night time. (I think sometimes, us developers feel like we are always flying at night!)

A second factor now comes into play. The air traffic controller was Columbian and spoke Spanish as his first language. There was no suggestion that his English was not understood but some confusion as to the language used causes the Captain to make the first of a series of errors. The flight is told that they are "cleared" to ROZA and to report TULUA. The intention here was that although they were still supposed to fly the approach as planned, they were clear all the way to ROZA (they didn't need any more permissions) but when they flew over the TULUA beacon en-route, they would inform the tower so he knew where they were. The Captain heard, "ignore TULUA and go straight to ROZA", causing him fatefully and incorrectly to delete the TULUA beacon from his flight plan.

The tower also informs them that because the wind has died down, they can land directly from the north if they want to. Of course they want to, they are 2 hours late and that kind of thing is helpful to save some minutes and potentially avert any further delays. They are, however, too high to make this a simple procedure and against the background of being rushed, they deploy the speed brakes to enable the flight to descend more quickly and continue their approach.

A series of more confusion leads the two pilots to decide to put ROZA back into their Flight Management System (FMS) and go there as part of the ROZA 1 standard approach.

Entering a new waypoint is usually done before the engines have even started from the comfort of the ground and without anything else distracting you. You can think more clearly about each decision and the FMS can warn you about any discontinuities - which is waypoints that don't appear to logically connect and which therefore are not permitted. Do the same thing in a rush in the air and you are presented with a list of all the waypoints listed under the letter R (in this case for ROZA) and you would normally assume that the top one is both the closest to the current position and therefore the correct one in this situation. It is not. It turns out that for unknown reasons, ROZA cannot be found under R in the FMS on the 757 aircraft - it would be found only by its full name, the Captain doesn't know this and blindly selects this new waypoint, executes it (without checking with his First Officer as process requires) and the plane starts a strong left bank to point to a waypoint that happens to be 100 miles away in the wrong direction near Bogota.

Remember, this is nighttime and it is not always obvious when you are turning. They are also still descending and they are also still confused about what is going on.

They reach a point a couple of minutes later where they realise that something is not right with their heading and they can't seem to agree how to reset their bearings and bring them back to the normal approach. The Captain tries TULUA but can't find it on his radio so instead plots to ROZA on his NAV radio instead of the FMS - which gives him a much more unambiguous heading. What they haven't done is kept an eye on their flight, which has now descended so far that they are on the other side of a high mountain from Cali but without visual references, they don't know this until an alarm suddenly blasts the cockpit with a continuous "Terrain, terrain, pull up". They quickly throttle up and pull back as they have been trained to do but it is too little and too late. The flight crashes into the side of the mountain. In a scary twist, investigators find that the speed brakes are still deployed from earlier and that if they hadn't been, the pilots would probably have cleared the mountain!

Why have I told this story? Firstly, I find flight crash investigations incredibly interesting but also there are clear parallels with the software industry and particularly Cyber Security.

What if I said to you that the incident above was unavoidable? What if I said that there was no practical way that the plane could have avoided crashing? I hope you would disagree. Of course it could have been avoided. In fact, 99.99999% of flights avoid this every day by following procedures, learning from other people's mistakes, working out where weaknesses lie and doing something to mitigate the risks that they carry.

How is this is very different from Cyber Security? A breach is rarely caused by a single thing but by a series of events which, when added together, create the opportunity for an attacker to take advantage and for your site to crash.

Sure, in some ways Cyber is very difficult because there are many different attack vectors, also many different types of attack vectors. One person rarely has enough expertise to understand all of them (except in Hollywood movies) and it seems every day there is some new exploit or malware or weakness. Also, there are sometimes advanced, persistent threats. These are not easy because they occur over periods of time and exploit human factors as well as systems but they are still systems that can be quantified, risk assessed and mitigated. You can still create processes that help and don't hinder your security.

So what can we learn from flight 965 and how can Cyber be easier instead of harder?

You need knowledge. With the best systems in the world, if you do not have one person who understands the basic attack surface of each type of system that you expose to the web (hardware, software, remote desktop, web application etc.) you will not be able to have any security assurance. Even though external contractors can help you, you still must have someone in-house that understands broadly what's going on - a domain expert. In flight 965, this was the flight crew. You cannot get a non-pilot and expect him to fly a plane safely, even with everything that is known about the subject being available to that person.

Secondly, you need a good map. The pilots would not have been able to even attempt a night landing without their maps, including the beacons that specify waypoints to safely navigate the valley. But yet in our companies, most of us have, at best, a general idea of our systems and how they are connected and exposed (or not) to the outside and inside world. This is partly a software issue since the only programs I am aware of that present this sort of data tend to be expensive networking tools from people like Cisco and HP, not the kinds of software that most people can afford. There is also, sometimes, an assumption that things like Network monitoring tools are non-productive and are therefore a luxury. Why pay someone to maintain something that if done properly is never an issue? You might as well employ someone to keep an eye on your office carpets. Of course, this logic is the same for many things that only betray their value if something goes wrong and most of us are fortunate enough to either never get attacked or not to find out that we are.

Thirdly, you need processes and procedures. Is someone allowed to spin up an FTP server on one of your web servers without any oversight or approval or risk management? Could you imagine if the pilots of airlines were allowed to fly in their own way depending on what worked for them? "I always fly faster than planned to give myself some slack if I hit a delay", "I never follow that route exactly because I think it goes too close to those mountains". How are your networks wired? Do you have anything that ensures that things only get connected because they have to, not just because it is easier than buying another firewall or another web server? Do you have any code development checklists and approvals to ensure that someone - who might have all the right skills - hasn't forgotten something and opened up a hole?

The fact is, sometimes just one of these measures will be enough to stop a hack in teh same way as any one of the various links could have prevented flight 965 from crashing. Of course, it is better to have several measures - defence in depth - so that we can afford for one to break under some certain conditions and rely on the others to help us. We do, however, need to know when the individual checks break so that we can see whether those measures are fit for purpose. If flight 965 had retracted the spoilers and got over the mountain, an investigation might have decided that the processes for verifying flight plan changes or even the language used between Air Traffic Control and aircraft needed tightening up for this specific scenario.

Whatever happens. Do something. Your worst processes are probably 100 times better than nothing at all but if you have an improvement mentality, you start where you are, you learn from yourselves or others and you improve things over time.

So Cyber Security...it's not that hard!

Thursday, 21 May 2015

Sell like Steve Jobs

It is common in the startup world to compare yourself to others. Why did they succeed when we didn't? Why did they sell the company for £10M and we can't even sell something for £50?

We try and evaluate it and ask, is our team good enough? Are we in the correct location? Are we listening to customers and meeting their needs? Are we targetting the correct market?

I think lots of these refer to my previous post about mistakes we make when we look too closely rather than taking a step back. Firstly, there is not necessarily a repeatable reason why some companies succeeed when others don't. Some companies produce garbage but seem to be in business, others have a great idea and it doesn't bite. You might call it serendipity, which is a posh way of saying luck.

Maybe you were in the right place to meet that key person, maybe you met the first large customer by chance in a conference, maybe you have some good money men friends who made your company look good, not because it was but because that's what money men do.

So let's not worry about the why's and look at it in a different way. We need to sell like Steve Jobs. Imagine your product was made by Apple, there are certain things that you know would be true.

  1. You would be ruthless in your design. "Alright" wouldn't be an option and although you might not get everything in that you wanted, everything that was in there would be optimum at least as far as any of your customers are aware. This might require a ruthless system of employment that rewards success and fires failures - something that most of us don't like.
  2. Your user experience would be familiar and consistent across any other products that you have. People need to feel that they are buying into a bigger family, even if they are getting the cheap version of something. Remember the iPod? It was like a small iPhone.
  3. You would make sure that you know what people will buy. You might do this by various research or you might have a visionary who can just see it but once you have decided it, you will not deviate and you will never, ever, doubt the value of your product. Steve Jobs wouldn't be on a stage even hinting that the new MacBook might have been a mistake. You have put too much effort in at this stage to second-guess yourself.
  4. When you launch, you will not even think about suggesting that your product is sub-par or has features that will come later because everything that is in it has been done to perfection.
  5. You will not care about your customers opinions on whether they would have made it differently because you are the expert and what you have produced is quality. If they don't think they need it then they are either stupid (don't tell them that) or they haven't caught up yet with the next big thing. You will find plenty of doubters and critics but who cares? Don't pretend you can ever avoid that.
We all know that Steve Jobs was known as an arrogant man but is there a difference between arrogance and extreme self-belief? He didn't need to please people because he knew what he was doing and did it well.

Interestingly, Apple have taken a few knocks in the past which is weird for a company who most of us assume cannot put a foot wrong but that's just the luck again.

So if you want success, you can do much, much worse than sell like Steve Jobs.

Why can't everybody back off?

I am not talking about people leaving me alone, I am talking about people being able to quantify a problem from the right distance. This affects software development but also affects many other decisions we make in life.

If you had a leak in a water pipe and someone offered you a bucket, you would use it temporarily but you would know (hopefully) that the only sensible long-term plan is to fix the leak. That is an example of looking from the right distance - the actual problem is the leaking pipe and not the water dripping onto the carpet.

When we look at other areas of life, however, we note very quickly that people seem unable to look from the correct distance. We get too close to something and then we either miss the bigger issue or otherwise we get too stuck to our particular view and therefore become less and less able to make good long-term decisions.

In the UK, it is NOT unlawful to park on the pavements (sidewalks) unless you are causing an obstruction, a definition that is rarely invoked because it is too abstract. As a result, we have many pavements that are broken from vehicles parking on them - particularly heavy vehicles. So what do the Councils do? They look at the symptom of broken pavements and decide, quite rightly, that it is too expensive to try and keep them maintained. What do they do? Either nothing or they make some token repairs knowing full well that the pavement might be broken again within weeks. They argue - from the wrong distance - that it is an unwinnable position.

They are wrong.

When you look from the correct distance, you either decide a) cars should be allowed to park on the pavements, in which case, they need to be designed and built to withstand the weight involved or b) cars should not be allowed on pavments, therefore the law should be changed. (You might also decide to mix a and b in different areas). The idea that pavements are not strong enough but cars are allowed to park on them is neither sensible or logical but is the outcome of an evaluation that is too close to the symptom.

In the software world, many organisations have and continue to have massive cost and time overruns on projects (not quite software but the F-35 project is an obscene example of this). It happens why? Because the people who order these systems foolishly believe that a) They need the system and b) The contractor is competent therefore it will all be fine. Why is this foolish? Because we all know many projects that have failed miserably and many of them were probably run by people much more competent than you and me but yet they failed? The mistake is that the issue is not viewed from a pragmatic view, from a distance that says, "why did these projects actually fail?" I don't think that it is a hard question to answer. If you ask most people, they would tell you what went wrong: Unclear requirements, changing requirements due to long-duration projects, lack of expertise from people in charge of design or requirements, inventing "new tech" that is unknown - sometimes you don't even know if it might work, pricing based on gross estimates.

It happens time and time again and the question is, "How do we act differently so that the outcome is different" I don't know who said that madness is doing the same thing in the same way multiple times and expecting a different outcome.

Maybe the answer is that projects should never last more than 12 months. Maybe if something is larger, it needs to be devloped in stages, each of which is a deliverable in its own right. Why wait for the F-35 to build a new super helmet? Design and build one. If the F-35 dies, we use it on the next plane. New engines? Same. Maybe the answer is that the whole way a project team works needs to be reduced and simplified. Maybe a domain expert needs to be involved in every decision making process rather than assuming they are only needed when we get to design stage.

All of these decisions can be taken if we are able to recognise we are too close and take a step backwards and the same is true in Software Development and other singular job roles. Is what I am producing of a suitable quality? If not, I definitely need to recognise that and I then need to ask what is going on. Am I trying to do too much?, am I lacking process or independent review etc.? If I do it again, will it be better from what I've learned? One of the problems is that we don't teach people this, it is something that some people know instinctively, something that others have learned the hard way and something that some people don't even get - but they still write code!

I don't know what it about humans that we seem to always do it wrong. Are we too egotistical? Perhaps we care too much and want to make something work at all costs - even if it is taking too long and costing too much.

I just wish we would learn how to back off.

Wednesday, 13 May 2015

Encryption - so necessary but so dangerous?

The public are an alarmist lot. Despite the fact that most people spend their time online sending pointless messages to each other on Facebook or reading rude jokes, as soon as there is the possibility that GCHQ or the NSA can read your data, everyone gets up in arms. How dare they read about my visit to a Birmingham shopping centre or my latest status update involving a large glass of beer.

So we end up with quite a wide deployment of SSL/TLS. Not such a bad idea. For most of us, especially those in business, the cost of an SSL cert although much greater than it needs to be is fairly cheap in the scheme of it. The server overhead is minimal and everyone's happy right?

No. Of course not.

Enter the Systems Engineers.

You see, TLS is all OK and everything but what happens after the TLS is terminated at some server somewhere? How is the data transmitted around the data centre or stored on disk? Your password is usually sent to the server to login so even though it isn't stored in a readable format, it could still be interecepted reasonably easily when you are in the process of logging in. Even if you have it setup wonderfully well, I'm pretty sure the NSA can sign their own certificates and could presumably man-in-the-middle most sites without most people spotting it so what can we do?

Apparently we should encrypt everything at rest and encrypt our comms end-to-end from the browser itself right through to some trusted other end.

So now we're good?

No. You see theoretically, if someone has a computer the size of the moon and 50 years, they could potentially crack your Facebook traffic because it uses TLS 1.0. TLS 1.0? You still use that ancient, weak, crackable scheme? You might as well just send stuff in plain text. Or so some of the Systems Engineers would have us believe. The reality, of course, is that most of these theoretical weaknesses are so hard to achieve they are only of interest to people who have things of real value to crack. No-one is going to spend a year trying to access my Facebook account - although they will spend that time trying to access Lockheed Martin or Boeing.

We are running to stand still, the perfectionists are dictating good-practice and we all get sucked up into it. Including me.

I needed to re-install OS-X on my MacBook (although according to one blogger, I should never have to do that because it has never been necessary for him to do it!) and I was REALLY careful to backup everything and to double-check that the backups at least appeared to be on the backup disk, some Western Digital NAS box. It all seemed good and after taking a deep breath, I took the plunge and reinstalled Yosemite.

And then I accessed the Time Machine backup. Which I had encrypted, of course. I'm not a privacy junkie but I have work code on my home laptop and I would be more comfortable if I had the NAS stolen to know that it's encrypted.

But I couldn't remember the password. I tried all the usual ones and nothing. I can't even get the "hint" because that's stored in Key Chain and the MacBook is reinstalled. Apparently there is no recovery process because these systems are designed to be perfect.

And that brings me to my point. Why do we insist on pefect security? Our houses are not perfectly secure by a long shot and they are much more likely to be attacked than any of my software systems. My front door key can be easily copied, the lock could be bumped or snapped fairly easily, the windows could be shattered and an intruder could get in so easily but somehow I live with that risk. In the computer world though, we are told that this risk is not acceptable. We are not taught, certainly by the vocal engineers, to risk assess what we are doing. Things like encryption key rotation and such like are all very well but are they really necessary or do they just increase the chance that they will cause us to be stuck with a lost key and an inaccessible system?

Wednesday, 6 May 2015

Why has Google lost the plot?

I am not a designer but like most software developers, I know there are certain simple rules that you should adhere to when creating web applications. Many of them make sense visually, such as using a few complementary colours and having consistent font sizes, others are just practical, such as consistency with other web sites, obvious navigation and not overloading the user with too much information.

I would expect a company like Google to be top of their game at this, especially with their continuous release of new "Beta" applications, whether that is Google Docs or the new Contacts interface.

I am wrong. Google, seem to have lost the plot. Let me give you some examples. I opened my GMail contacts to add a mobile number for someone. I didn't know if they were in my contacts or not, so let's start by typing into the search box:



Has this worked? It looks like it's still waiting but this is multi-billion dollar Google's attempt at a "no results found" page.

OK, so he doesn't exist, I delete the name and go back to the main contacts page.



I've blacked out the details but you get the idea. I need to add someone new. Remember what I said about obvious navigation? Where am I drawn to? Top left where the menu is? Nope. Top right? The Google shared toolbar? Nope. Tabs in the middle, near the search box? Oh - it's that lone button in the bottom-right by itself! The button which is not near any other navigation! Great. Let's click that.


Now you get this extremely bland form, which is mostly distracting since, in this case, I have only entered a name, wich you can marely see amongst the noise. I also have a mobile number. Now where do I put that? At first, I thought it wasn't on the form, but it is. Underneath really useful things like nickname and job title. So you enter it and then what?


Where's the done button or the X to close the dialog? Nope. You have to press the "Back" arrow. Of course, because back is not confusing at all. It couldn't possibly mean go back to the edit.

Now I'm more than happy that certain trend-setters push boundaries and maybe this is Google's attempt at Material design which makes it all nice and cross-platform but honestly? This is crap. It beggars belief that the number of people who work at Google cannot get something much more consistent and instinctive after spending what must be a vast amount on development and testing of this new Contacts form. It honestly looks like something a design or developer student would produce and then be told by their teachers to go and make it good!

Why Google? Are you really the new Microsoft where you exist in your own version of reality where you don't care what is good, just whatever you feel like doing and people have to use it because you have them hooked on GMail?