Fix: Incorrect status output in the SSIS Data Quality transform

I hear comments from people all the time that say it's not worth posting details on the Connect website as they don't get actioned anyway. While it can seem a frustrating process, and often take quite a while, improvements do come from there.

At a SQL Saturday event in Brisbane, I was speaking to an attendee (Ian Roberts) who was complaining that DQS was returning what he perceived as the wrong status. More importantly, the results from using the DQS client differed from using the SSIS transform. He described it this way:

He set up a domain for Australian States and one for Australian suburbs, based on files that he had downloaded from the Australia Post website. He then created a Data Quality Project to test this. In that tooling, the results were as expected. Hobarrt got transformed to Hobart; Victoria got transformed to VIC; Melburn, New York and NWS all got tagged as New as they couldn't be found. With the Record Status, New seemed to override Correct, and Corrected seems to override them all.

However, in the DQS Client Transform, since Correct overrides New, unless every field was in error, the record would go to the Corrected output.

The problem with this is that in the SSIS conditional split transform, you should just be able to test the Record_Status to determine what to do with the record, rather than look at the status for each field. Looking at the status for each individual field, it would make the conditional split transform unwieldy.

The other real issue is that the two tools should provide the same outcome. You should be able to look at the Record_Status, direct correct and incorrect records to their appropriate destination, then deal with the incorrect ones after the fact. Having the field status in the output would make application error reporting more meaningful as you could nominate the actual field in error. The main issue is that Correct gazumps New in SSIS, but New gazumps Correct in the DQS Client. That's not appropriate.

In one of my podcasts for SQL Down Under, I had interviewed Gadi Peleg from the DQS team and Gadi agreed that it looked incorrect. He agreed that the output should be:

•    If 1 of the values is Invalid – the record is invalid
•    If 1 of the values is corrected – the record is corrected
•    If 1 of the values if New – record is new
•    If all values are Correct – only then record is correct

Gadi encouraged me to log it on the Connect website.

Today I got an email to say it's been fixed. That's a great outcome, although the update doesn't indicate in which version of service pack the change will apply:

Leave a Reply

Your email address will not be published. Required fields are marked *