Happy Christmas to my blog readers (圣诞快乐)

It's hard to believe that we're back to Christmas time. I just wanted to take a moment to thank all those who've read my blog during the year and look forward to great interactions again next year.

It's been a big year for us. I've moved to a new blog, moved to a new website, and moved to a new house. All have been "entertaining" but I'm happy with the outcome in each case.

I hope you all have a happy Christmas period, no matter what your belief system is. For most it's a holiday period regardless.

If you are travelling, take care and travel safely. We had another tragedy in Melbourne yesterday, and it just shows how fleeting life can be.

圣诞快乐 to my Chinese readers too. I've made quite a bit of progress on my Chinese this year. The most interesting (yet sad) thing I did was to MC a memorial service for a friend's father, in both English and Mandarin. That was quite a challenge. Many of the things that I had to say were in very formal language. But I think I got through it OK. I'd never spoken to a crowd of people in Mandarin before, and particularly not a crowd of Chinese speakers. I just hope I didn't mess it up too much.

Thank you all for your support.

Opinion: Designing Databases to Minimize Damage During Application Intrusions

Intrusions into computer systems are happening all the time now. We need to address this issue as an industry, but it’s important to understand that the way we design databases plays a big role in the impacts that occur during intrusions.

If you don’t accept that you could have an intrusion, you are living in La La Land. (See https://en.wikipedia.org/wiki/Fantasy_prone_personality)

A bug in any one of the frameworks that you use, the code that you write, the protocols that you use, the operating system or hosting services that you use can potentially expose you to an intrusion.

So do we just give up?

No, what you need to ensure is that when an intrusion occurs, the damage or impact is minimized. We do this in all other industries. For example, people working in high locations don’t expect to fall but they (generally) make sure that if they do, while something nasty might happen, it won’t be disastrous.

I routinely see web applications and middleware that can access any part of a database that it wants. The developers love this as it’s easy to do. But it exposes you to major risks. If the application is trampled on, you’ve opened up everything.

I always want to put mitigation in place and to limit the damage.

If your plan is to have your application connect to the database as one user, and you make that user a database owner (db0), or a combination of db_datareader and db_datawriter, or worse, a system administrator; then you don't have a plan.

A better plan is this:

  • Create a schema for the application – let’s call it WebApp
  • In the WebApp schema, create only the views and procedures that define what you want the application to be able to do (ie: it’s basically a contract between the database and the application)
  • Create a new user (from a SQL login or, better-still, a domain service account) for the application to connect through.
  • Grant that user EXECUTE and SELECT permission on the WebApp schema (and nothing else)

Then if the application is trampled on, the most that it can do is the list of things that you’ve defined in that schema and nothing else.

We need to start building systems more defensively, and this is reason #82938429 for why I just don’t like most ORMs as they tend to encourage entirely the wrong behavior in this area. (Some let you do it better begrudgingly).

Opinion: Why ask accountants and lawyers for IT advice?

If I want accounting advice, it's unlikely that I'd ask my dentist for that advice.

Many years ago, I created applications for food wholesalers. When the owners of these businesses decided to get a new or better computing system, invariably they'd speak to their accountants. I understand the reasons why that might seem logical to them at first, but what I saw when these clients did this, is that they invariably ended up with the wrong systems.

Why?

If you talk to the accountants, their recommendations would often be based on how good the general ledger was. They wanted to make sure that the figures that came to them from the business were already in a good state.

But to someone selling meat or fish or small-goods, that's not the issue. It's far more important for the system to understand how they sell and price food, how to track both quantity and weight, not just one value, etc. It's critical to have a system that lets them manage their warehouses properly.

Very few of the systems recommended by the accountants did that. We often gained new clients who had made an initial misstep by purchasing what their accountant recommended. (And I'll ignore the situations where the accountant was also being paid a commission by the software vendor).

So why am I raising this today?

I spend a lot of time working in large financial organizations, and security is a big issue for them. However, what I see time and again, is that they hire large accounting firms or legal firms to perform pen-testing (penetration testing), security audits of applications and systems, etc.

It's hard to imagine why anyone would expect their accountants or legal advisers to be at the cutting edge of computer security. And as someone who's involved in training people from those types of firms, I know that they might try hard but I can assure you that they aren't anywhere near the current state of the art.

Perhaps they think these firms are large enough that they'd be a good litigation target if something goes wrong (even though you can be sure their terms and conditions would prevent that), or that it somehow "looks good to the market" to use a big name accounting or legal firm.

If I really needed to secure or test the security of a system though, I'd be looking to use a boutique consultancy that specializes in that type of work. There are many consultants who are outstanding at this type of work.

They are good at what they do, and I'll bet they don't offer dental advice either.

SQL: Database Design -> What's in a Name?

Just after I was born, my mother and father called me Gregory. Ever since then, everyone has called me Greg. And that included my parents. To this day, my mother calls me Greg and so did my dad while he was alive (miss you dad).

However, every time I need to fill in an official form, I have to write Gregory. I could change that to Greg if I changed my name legally but I'm not going to do that. People who have had previous names will tell you that can add even more complexity.

But I have to say that every time I get a letter from a bank, a utility company, etc. or every time I'm addressed by someone in a hospital or government office, they address me as Gregory. Each and every time they do that, at first, I end up momentarily thinking "who?".

Then it's obvious to me that as much as this person is trying to sound friendly, they haven't managed to do so. It immediately puts a barrier between us. Clearly they don't actually know me.

You might think "well what can I do about that?" or "how's that my problem?" or "what's this got to do with SQL?"

And I'll tell you.

Every time you build a computer system or database that has no option for a customer/member/client/patient/etc. to record what they'd like to be called, you add to the problem.

Please consider always having a PreferredName column or something similar in every design you create.

 

 

Opinion: Don't Design Databases for One Version of One App

I've pointed out in previous blog posts that I'm not a fan of ORMs. What I'm even less of a fan of is code-first design, particularly in combination with an ORM like the Entity Framework.

It might seem cool and shiny and if you are just whipping up a small proof of concept or test app, please feel free to do it, but the minute you think of doing it for enterprise systems, just don't. And if you have colleagues wanting to do this, please just say no.

For most organizations, the data that they own is one of the most (if not the most) valuable asset the company has. The data will generally outlive generations of applications and just be morphed from shape to shape over time.

The data will often be accessed by many different applications, often created with different technology stacks. You might think you'll be able to corral all access to the data via your app; and again you'll be wrong.

So designing the data storage to suit the needs of a single version of a single application at a single point in time, is what we baseball umpires know as "a big call".

Umpires know to make calls like this emphatically and confidently.

But this is not a call that you should be making. It's the quickest way to start building disconnected silos of information that don't represent the business or how the information in the business is inter-related.

 

Opinion: Don't just hire clones of yourself

Many years back, I was invited to chair a course accreditation panel for a local TAFE (Technical and Further Education) course. They had started to offer a computing-related 3 year diploma, and the hope was that it wasn’t too far below the 3 year degrees offered at local universities. One part of that accreditation process involved me discussing the course with the staff members who were teaching it.

After talking to almost all the staff, what struck me was how similar they all were. In the requirements for the course, there was a standard that each staff member needed to meet, but there was also a requirement for the group of staff to be diverse enough to have broad knowledge of the industry. There was no individual staff member that you could identify as not being at the appropriate standard, but almost all of them had exactly the same background, career progression, etc.

The manager had basically hired clones of himself. It’s an easy mistake to make. If you feel you are the right person for a particular type of job, then hiring more people like yourself must help correct?

A similar problem happens in areas like medical research. Taking a whole bunch of people with the same background and experience isn’t going to let you cut through tricky problems that need someone to think outside the box. Adding someone like a civil engineer into the mix might seem odd but can have surprising outcomes. At the very least, they might ask a question that leads someone else in the team to think differently.

I’m remembering this story because I see the same issue in application development groups.

I’ve done some work at a company that has over 400 developers. Data is almost all that they do, yet for most of the time the company has existed; they’ve had no-one focused on data. Everyone involved in development has a similar development background. They had many intractable data-related problems yet more and more of the same type of people wasn’t going to solve those.

Hiring a team of people who think and work like you do might seem like a good idea but it’s not. You need a mixture of people if you want to be really effective. (And that also means having gender and cultural diversity too).

Opinion: Mature consultants don't always want to tear down the house

I work with data. I understand that for most organizations, that the data they own is the most valuable asset the company owns.

One thing I've learned from working with data is that unlike application software, data generally outlives generations of applications, is often used by many different applications, and typically just morphs from shape to shape over time. It almost never gets totally refreshed.

This is a good thing.

I've been in the industry long enough to see many types of consultants. One type that I have the least time for, is the type that always wants to tear down or replace whatever is already there at an organization. It's far easier to just say "let's replace it all" than to try to work out what to do.

Many of these consultants don't really understand what's already in place, but because it looks different to what they are used to, it must be wrong, and it must be replaced.

A mature consultant might not like what they see but they take the time to consider what's already there.

The mature consultant is skilled enough to work out how to take the organization from a place that they don't want to be, to a place which is better than where they are now.

That's the real skill.

Opinion: Sticking with a plan even if you don’t like it

Something I really struggle with in this industry is when newcomers to a system want to change standards within existing systems because they think something else is better. It’s a sign of immaturity yet it often applies to people who should be senior. Many system architects fall into this category.

For example, a vendor system that I’ve been working with has single column primary keys in all tables, and all the primary key columns are named PKey_ID (I’ve changed it a bit to protect the guilty). Now I can’t say I like that naming at all, but that’s not the point. There are a large number of tables that already have that naming scheme.

Enter the new architect who has a purist view where he wants to name the columns ID instead. Again, I really dislike this naming but I don’t care what the vendor does, apart from being consistent.

Yet what the architect does is change so that a handful of tables now have ID. That is a really poor outcome. If he really wants to change them, then come up with a plan to change all of them.

image

I see the same thing in SQL Server.

We already had:

  • tinyint
  • smallint
  • int
  • bigint

And we already had:

  • smalldatetime
  • datetime

So when a larger precision version needed to be added, it wouldn’t have taken Einstein to come up with bigdatetime.

image

The answer certainly wouldn’t have been datetime2 which is what we got.

Oh, you say, but datetime2 is really a combination of the new date and time data types, and we already had a datetime data type, so it couldn’t be that.

Yes, but what then happened with datetimeoffset? It’s a datetime2 with an offset, so why isn’t it datetime2offset?

What’s the answer? I’m not 100% sure but I wish these teams had more cranky old dudes who look at proposed designs and say “nah, have another go at it”.

image

Opinion: ORMs: Are they a good idea when developing for SQL Server?

Many people know that I have an issue with ORMs (object relational mappers). It’s worth spending a blog post to describe why.

Unfortunately, I spend my life on the back end of trying to deal with the messes involved. The following are the key issues that I see:

Potentially horrid performance

image

I’ve been on the back end of this all the time. There are several reasons. One is that the frameworks generate horrid code to start with, the second is that they are typically quite resistant to improvement, the third is that they tend to encourage processing with far too much data movement.

I keep hearing that “the databases are getting so fast now that this loss of performance doesn’t matter”. I can tell you that I don’t live in a world where that’s true.

In the world that I live in, I regularly end up in software houses with major issues that they don’t know how to solve.

I was in a large financial in Sydney a while back. They were in the middle of removing the ORM that they’d chosen out of their app because try as they might, they couldn’t get anywhere near the required performance numbers.

Why had they called me in?

Because before they wrote off 8 months’ work for 240 developers, the management wanted another opinion. They were making the right decision but they wanted someone else to agree with them.

As another example, I was at a start-up software house last year. They had had a team of 10 developers building an application for the last four years. The business owner said that if it would support 1000 concurrent users, they would have a viable business. 5000 would make a good business. 500 they might survive. They had their first serious performance test two weeks before they had to show the investors. It fell over with 9 concurrent users. The management (and in this case the devs too) were panic-stricken.

For yet another recent example, I was in a  software house that had to deliver an app to a government department. They were already 4 weeks overdue and couldn’t get it out of UAT. They wanted a silver bullet. That’s not the point to then be discussing their architectural decisions, yet the architectural decisions that they’d made were the issue.

I’ve seen background processing jobs in a large UK-based financial service organisation. On a system with 48 processors, and 1.2 TB of memory, and 7 x 1 million UK pound 20TB flash drive arrays, one task. During that time, it issued 550 million SQL batches to be processed and almost nothing else would work well on the machine at the same time. The replacement job that we wrote in T-SQL issued 40,000 SQL batches and ran in just over 2 minutes. I think I can get that to 1/10 of that with further work. Guess which version is the right answer? What had caused this? Re-use of entity-level code through a custom ORM.

Minimal savings yet long term pain

image

Many of the ORMs give you an initial boost to “getting something done”. But at what cost? At best, on most projects that I see, it might save 10% of the original development time, on the first project. But as David Connor pointed out in his excellent TechEd talk with Joel Pobar (and as I’ve seen from so many other places), the initial development cost of a project is usually only around 10% of the overall development cost.

So what are we talking about? Perhaps 1% of the whole project? Putting yourself into a long-term restrictive straightjacket situation for the sake of a 1% saving is a big, big call. The problem is that it’s being decided by someone who isn’t looking at the lifetime cost, and often 90% of that lifetime cost comes out of someone else’s bucket.

Getting stuck in how it works

image

For years, code generated by tools like Linq to SQL was very poor. And this was a tool designed fully for SQL Server. Now imagine what the code that’s generated by a tool that doesn’t even know which backend database it’s using looks like.

That’s where Entity Framework (EF) started. Very poor choices are often made in the design of these tools. The whole reason that “optimize for ad hoc workloads” was added to SQL Server was to deal with the mess from the plan cache pollution caused by these tools.

A simple example is that on the SqlCommand object, they called AddWithValue() to add parameters to the parameters collection. When I used to teach ADO.NET, I’d explain why using that method call was a really bad idea. It provides the name of the parameter and the value, but no data type. So it tried to derive the data type from the data. SQL Server would quickly end up with a separate query plan for every combination of every length of string for every parameter. This is plan cache pollution.

And what could the developers do to fix it?

image

Nothing. Nada. Nil. NULL.

Because it was baked into how the framework worked. The framework eventually got changed a bit to have more generic sizes but still never addressed the actual issues.

Getting stuck with old code

image

SQL Server 2008 added a bunch of new data types. Spatial was one of my favourites. When did that get added to EF? Many, many, many years later. What could the developers do about it? Almost nothing except very hacky workarounds.

When you use a framework like this, you are totally at the mercy of what its developers feel is important, and that’s if they haven’t already lost interest in it yet.

When you code with a tool that was targeted cross-DB, you usually end up with a very poor choice of data types. You end up with lowest common denominator of everything, or you end up with something that doesn’t fit well with the framework.

Many of the authors of the frameworks quickly lose interest. Do you want to be stuck with it?

Boilerplate code

image

“Oh I don’t want to write all that boilerplate code” I hear people say.

Sure. I get that but are you incapable of code-generation? At least give yourself control over how the code works. And that’s mostly a one-time cost. You get to use it again, project after project. But at least if you don’t like it, you can fix it.

Summary

Look, I could go on for days about this stuff and the astonishing messes that I’ve seen.

You need to decide what you’re building. But at least consider where you are taking your advice from.

image

If you’re taking architectural advice from someone who hasn’t built real systems at scale, at least get further opinions.

If you are building toy apps ie: a replacement for CardFile, none of this matters but I live in a world where it really, really matters. And so do the vast majority of software houses that I spend time in.

Future

Also worth noting that if you want real performance from upcoming DB engines, you might want to rethink the idea of just using the database an an object repository. Even NOSQL options like CosmosDB (was DocumentDB) have a concept of a stored procedure within the engine. You write them in Javascript. But there’s a reason they are there.

In SQL Server 2016, this is even more stark. I have simple examples on SQL Server 2016 where a batch of code sent from an app executes in 26 seconds. If I move the table to in-memory, it runs in 25.5 seconds. If I move the code to a natively-compiled stored procedure, it runs in 60 milliseconds.

Rett Syndrome Research needs help

Very proud of my daughters this week. As some of you will know, one of my daughters has Rett syndrome. It's a rare chromosomal disorder that basically only impacts girls. (Boys can get it but they usually do not survive long after birth). Girls develop fairly normally up to about 12 to 18 months of age, then regress markedly. As it affects only about 1 in 10,000 girls, it's not the sort of thing that gets much research funding.

So my other two daughters have decided to raise some funding to help with a local Rett research program. If anyone is interested, you can make a donation directly here: https://www.mycause.com.au/…/fundraiserforrettsyndromeresea…

Alternately, my youngest daughter is organising a staged musical for early next year (in Brisbane) with all profits again going directly to the Rett research team. I’ll post more about it closer to when tickets for the musical are available.