Opinion: What's with the lack of coding standards in Data Science?

I've been spending a lot of time over the last few years working through data science and AI topics. One thing that's struck me consistently is the total lack of reasonable coding standards in almost all the sample code that I see.

I was doing an AI lab in eDX recently, and one of the questions got me to open some sample Python code for a virtual environment, and asked me to work out how the virtual world that it created operated.

After working on it for quite a while, I realized that the #1 reason I was finding it hard, was not because the concepts were crazy difficult, it was because the person writing the sample thought it was reasonable to have variables, arrays, etc. with names like r, x, np, d, and so on.

What's with that?

Suddenly it felt like I was reading code written by a self-taught programmer in 1970, at their first attempt at using Basic. There is absolutely no need for anyone to be doing this.

Please don't.

I was left wondering who on earth would write this and interestingly enough, found that the person who translated the environments was in fact self-taught. I admire his efforts in teaching himself but this is not acceptable code to be sharing with anyone else.

There is no reason for data science or AI code in Python, R, or whatever language to be written like this. (Yet I see it all the time)

SQL Server Q&A: Does SQL Server have a boolean data type? Is that a bit?

Simple answer: No

SQL Server has a concept of boolean data type but it doesn't have an actual boolean data type. I can't declare a column with a boolean data type, and I can't use it that way. However, the concept of boolean data is present.

For example, if I write the following:

SELECT *
FROM Sometable
WHERE Somecolumn = Someothercolumn

the outcome of that comparison is TRUE, FALSE, or (don't forget) NULL. And remember that NULL isn't a value; it's the lack of a value. WHERE clauses are satisfied when the value of the predicate is TRUE.

So, there is a boolean concept in the language.

However, note that I cannot write this:

SELECT Somecolumn = Someothercolumn AS TheSameValue
FROM Sometable

I can't just return a TRUE or FALSE value.

If I write it without the alias, it will work:

SELECT Somecolumn = Someothercolumn AS TheSameValue
FROM Sometable

but in that case, I'm returning the value of Someothercolumn aliased (named as) Somecolumn. I'm not returning the comparison.

But isn't that just the same as a bit?

No. I can't use a bit value like a boolean. For example, I can't just write this:

IF ISNUMERIC('somestring')
BEGIN

END

Instead, I have to write this:

IF ISNUMERIC('somestring') <> 0
BEGIN

END

Note that I could have written "= 1" but I prefer with bit values to always compare them to zero across all languages, given some languages use -1 for TRUE and others use 1 for TRUE.

 

Learning Mandarin: Tones are a significant challenge

Before learning a language like Mandarin, people might have heard that Chinese is a tonal language. But what does that mean exactly?

In Mandarin (as in many other languages), the way that a syllable is pronounced determines its meaning. Mandarin is typically regarded as having four tones, plus an additional neutral tone. The tones are numbered from 1 to 4. The first tone is a flat tone. The second tone is a rising. The third tone is a fall and a rise. The fourth tone is a fairly sharp drop.

I mentioned previously about keyboard input and showed what happened when I typed a syllable. Let's take a look at "ma" for example. When I type it, I get the following pop-up:

These (and others) are all "ma", but they have very different meanings.

Some just have different tones. This is similar to "lead" (the metal) and "lead" (what leaders do) in English.

Others have the same tone but different meanings. This is like homonyms in English. For example, lead (the metal) and led (what leaders did).

The pop-up doesn't show the tones but it shows the characters. The first character is similar to a question mark; the second is a horse; the third is a mother; and so on.

These are really quite unrelated words but notice that each of these three has the horse character within it. These are called radicals.

So the challenge with a tonal language is that you really need to get the tones right, and from pretty early on.

In English, if someone said "it contains lead" but pronounced lead like leed, you might work out what they meant to say, even though you might stumble a bit.

But my experience with Mandarin is that if you say the right thing, but get the tones wrong, people stare blankly at you. It's important to understand that if you say ma (3rd tone) instead of ma (2nd tone), the listener won't hear "ma" and mentally work out how you've screwed it up. You thought you said "mother" and they heard "horse". To them, they sound like completely different words.

Here's a great example from Wikipedia, showing the four main tones and the word "ma":

Learning Mandarin

I'll write more soon on the best methods for learning. If you want to get a taste for it in the meantime though, my current favorite is Tutor Ming. If you decide to try it, click here and it's a bit cheaper for you, and for me.

Book Review: The Rosie Effect – by Graeme Simsion

I mentioned in a previous review about how much I enjoyed The Rosie Project by Graeme Simsion. I couldn't wait for the sequel to come out and so I eagerly read The Rosie Effect: A Novel as soon as it was released.

The first book introduced us to Don Tillman, working on genetics at a local university and trying to help Rosie find her father. All along, he's trying to find a wife using scientific principles.

In this book, we move ahead to where Don is now a dad. Nothing goes quite to plan. And of course that's the way it always happens with children.

I clearly remember a friend of mine whose wife was a project manager, and she had planned out the entire pregnancy and afterwards using Microsoft Project. My friend was allocated various tracked tasks to complete.

I won't say more than it ended in tears.

So it's no surprise that there are a few bumps ahead for Don in this book.

I didn't find this book as compelling as the first (I think that's a syndrome involving sequels) but it's still a way better book than most.

I really liked it.

Greg's rating: 9 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway 🙂

Shortcut: Connecting to Azure Storage and other services in SSMS

SQL Server Management Studio (SSMS) is a great tool for working with SQL Server relational databases but it can do much more than that.

In Object Explorer, note that you can easily connect to other types of services:

For a long time, it has been able to connect to Analysis Services to manage SSAS databases, both tabular and multi-dimensional. It can connect to Integration Services but that's to the older style interface for SSIS. Nowadays, you should use the SSIS Catalog instead. There are a few items that you can configure via the Reporting Services connection as well.

One option that is often quite unexpected though, is that you can connect to Azure Storage. Here's an example. After I click Azure Storage, I'm prompted for the storage account and the key. This can be the primary or secondary account key. (Note that it can't just be a shared access signature connection).

From there, once the connection is made, you can drill into the containers and their contents:

Generally, I would use Azure Storage Explorer to manipulate these storage accounts, but this option can be useful if you are using BACKUP TO URL and you need to just check the backup files that you are creating.

SDU Tools: Convert a hexadecimal character string to an integer in T-SQL

I mentioned in a previous post about how I sometimes need to work with binary strings in SQL Server using T-SQL.  The literal values are hexadecimal strings. T-SQL doesn't have a simple function to just convert one of these character-pair strings to an integer.

In our free SDU Tools for developers and DBAs, as well as the HexCharStringToChar function, we added a function HexCharStringToInt to do just this.

You can see the outcome in the main image above.

You can see it in action here:

 

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

Opinion: Passwords are a completely broken concept

For a long time, passwords have troubled me conceptually. I now believe that, as a concept, they are beyond broken. As an industry, we need to do better.

I might well need to do with more identities, passwords, multi-factor authentication options, etc. than the average consumer but I know it's beyond me to get this right, at least in the way that vendors, financial institutions, and source providers expect me to.

Let's look at at the "simple" requirements that we now ask people to comply with:

  • Use a complex password
  • Don't write it down or record it anywhere
  • Change it regularly
  • Use different passwords for every site that you deal with

Is that even humanly possible?

Is requiring someone to do something that almost no human could do, even legal?

I'd love to see it tested.

Then let's compound this with completely different complexity rules for almost every site. You can't have any sort of mental pattern of how to do this either. Some sites won't allow special characters, some won't allow more than a small number of characters, some want only alphabetic characters, some want numbers only, some want alphabetic characters and numbers mixed, some want upper and lower case combinations, etc. etc. etc.

Password Managers

And yes, I hear some say, everyone just needs to use a password manager. But while they can help, are they all really safe? What do you know about who wrote them? Is putting all your credentials into a single spot really a brilliant idea?

Password Rules

Many of the ridiculous rules that we confront users with on a daily basis are justified on the basis of "security", but how does a user challenge the validity of the requirements?

For example, forced password expiry has been shown time and again (by detailed research) to actual reduce security overall, yet how many organizations still force people to do this.

We need to do better

As an industry, we should be ashamed of what we've created.

SQL: Implementing Optimistic Concurrency in SQL Server with RowVersion

It's common to need to have a way to read a row of data from a table, to be able to modify it, and to then update it back into the table, but only if it hasn't been changed in the meantime. But we don't want to lock it in the meantime. That's the basis of optimistic concurrency. By contrast, pessimistic concurrency would hold locks the whole time. In SQL Server, you should use the rowversion data type to implement optimistic concurrency.

Early versions of SQL Server and Sybase had a timestamp data type. That was a really poorly named data type because it had nothing to do with time. It was an incrementing binary value that could be used to implement optimistic concurrency. Since SQL Server 2005, it has been renamed to the rowversion data type. That's good because it more accurately describes what it actually is.

Let's take a look:

First I'll create a table that uses a column that has a rowversion data type.

I've called the column Concurrency but it could be called any standard column name. It would be good though, if you named these columns consistently across your tables where you use them. Now let's insert some data.

Note that I didn't include the Concurrency column in my INSERT statement. You can't insert directly into that column. Let's see what got stored.

You can see that a different value was stored in each row. The values are actually quite predictable and the binary value just keeps increasing. Now let's look what happens when we update a row.

The value in the updated row has increased. Any time the row is modified, the value changes. But what if you make a change but leave the value the same?

Notice that the value changed again. So the issue isn't whether or not any column value changed; it's just whether or not an update was performed on the row.

This is perfect for an optimistic concurrency system. All I need to do is to read the value of the Concurrency column when I read other columns from the table, and when I update the row, I include a WHERE clause that checks the value is still the same. Then if no rows are matched for my UPDATE, I know that someone else modified the row in the meantime.

Having this support in the back end of the database avoids all the potential race conditions that might come from trying to implement this in code yourself.

As an interesting note, the last value used for the rowversion column is calculated at the database level, not the table level. If I create another table, the same set of values will just continue on. At times, we've used these to find all changes in a database. You can find the last value for a database by reading the @@DBTS system variable.

Learning Mandarin: What do the Chinese Lunar Rover and Mid-Autumn Festival have in common?

中秋节 (or Zhōng Qiū Jié) is the mid-autumn festival and it's this weekend. (Mid-autumn in China -> Northern Hemisphere).

It's also called the Moon Festival. In fact, it's celebrated on the 15th day of the 8th lunar month. Similar to the way that the date for Easter moves around, the mid-autumn festival will normally end up some time in September or October each year.

For many Chinese, this is a time for family reunions. The round shape of the full moon symbolizes this.

Chinese have long held the moon in great esteem. The legend of Chang ‘E is based around the wife of a tyrannical ruler who has lived on the moon since being flung out a window by her nasty husband. He was the king at the time. She is said to have lived there with her friend, a giant jade rabbit.

This legend is so embedded in the culture that the country named its lunar orbiters after it: Chang'e I, Chang'e II, and Chang'e III. In fact, their lunar rover was named Jade Rabbit (玉兔 or yù tù).

The traditional food for this festival is Moon Cake (月饼 or yuèbǐng). When I first tried this, I can't say I loved it but it has now grown on me somewhat. It's basically a very sweet and salty desert pastry filled with lotus seed paste or red bean paste. It might have a center of salted egg yolk (looking a bit like the moon). Because it's not to everyone's taste, many other flavors of moon cake are now made.

Anyway, this year the festival is this weekend so 中秋节快乐 (happy mid-autumn festival) to my Chinese friends !

Book Review: The Case Against Sugar – Gary Taubes

You would have to be hiding under a rock to have missed the current low-carb ways of eating, and particularly the ketogenic thinking about food. Central to much of this thinking though is cutting carbs in general. The hard part about this for most people, is cutting the primary source of carbs and that's sugar.

If ever there was a leader of the fight against sugar, it's Gary Taubes. His book The Case Against Sugar is very significant.

I remember him shocking people years ago by pretty much pronouncing sugar as toxic. When I look back over my own life to date, I think he's right. If there's anything that I wish I could go back and change, it would have been my various forms of addiction to sugar, in all its forms, including innocuous ones like bread and rice.

In this book, Gary presents a case directly against sugar and implicates it primarily as the the number one thing that's led the world into a Type 2 Diabetes epidemic. I'm one of those people, and when I make the best progress, it's when I've basically eliminated sugar and carbs.

The verdict isn't totally in, as you can imagine, but I think you'd be pretty hard-pressed to read or listen to this book, and not come to the same conclusion. And every single day, I now see the results of this in various forums that I'm part of.

As a book, it's compelling but I do think he belabors a number of points. I'm sure he's doing that to simply drive the main message home. But there are some sections where I think he made the same point again, and again, and again. That could have been less.

I really suspect that in the future, when the dust settles on all of this, Gary's thinking will be seen to have been seminal in starting to fix our current issues.

It's interesting that the sugar industry is now seeing the writing on the wall. Sugar was one of Australia's key exports in the past. Fortunately that's in decline. But I note with interest the pop-up stalls in shopping malls now where sugar is being defended with near religious zeal. I'm reminded of how cigarettes were defended when I was young.

Bottom line: If you have any doubts about the role that sugar plays in our lives, just read it or listen to it. You won't think the same way again.

Greg's rating: 10 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway 🙂