Azure Data Factory (ADF) – Issues with parsing strings in CSVs

It's scary how much of the world's data lives in CSVs. Yet, as a standard, it's problematic. So many implementations just don't work as expected. Today, I had a client asking about why Azure Data Factory wouldn't read a CSV that, while odd, was in a valid format.

The simplest equivalent of the file that wouldn't load, would be one like this:

There are meant to be four columns. The source system wrapped strings in quotes only when the string contained a comma, as that was the delimiter for the file.

But the fourth line in that file would not load. ADF would split it across multiple columns. It was not interpreting the double-double-quotes.

I was puzzled at first, as the file seemed ok. If I upload that file an any of the common online CSV file validators or linters, they all said it was ok.

The client had configured the dataset as follows:

The options seemed ok to them.

Turns out the issue was with the escape character. ADF defaulted to using a backslash. For the majority of CSV files that I see, it would need to be a double-quote:

And sure enough, with that change in place, it loaded files similar to the one above without issue.

I don't know why that's not the default for CSV files in ADF but I do hope it helps someone else.

2 thoughts on “Azure Data Factory (ADF) – Issues with parsing strings in CSVs”

  1. Thanks, Greg. This really helped me today to get data into SQL today. With html body in one of the columns backslash really messed things up when used as an escape charater. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *