Azure Data Factory (ADF) - Issues with parsing strings in CSVs

It’s scary how much of the world’s data lives in CSVs. Yet, as a standard, it’s problematic. So many implementations just don’t work as expected. Today, I had a client asking about why Azure Data Factory wouldn’t read a CSV that, while odd, was in a valid format.
The simplest equivalent of the file that wouldn’t load, would be one like this:
First Column,Second Column,Third Column,Fourth Column
12,Terry Johnson,Paul Johnson,031-23423
13,Mary Johnson,"Paul,Johnson",031-23423
14,Mia Johnson,"Paul ""the beast"", Johnson",031-23423
16,Cherry Johnson,Paul Johnson,031-23423
There are meant to be four columns. The source system wrapped strings in quotes only when the string contained a comma, as that was the delimiter for the file.
But the fourth line in that file would not load. ADF would split it across multiple columns. It was not interpreting the double-double-quotes.
I was puzzled at first, as the file seemed ok. If I upload that file an any of the common online CSV file validators or linters, they all said it was ok.
The client had configured the dataset as follows:
The options seemed ok to them.
Turns out the issue was with the escape character. ADF defaulted to using a backslash. For the majority of CSV files that I see, it would need to be a double-quote:
And sure enough, with that change in place, it loaded files similar to the one above without issue.
I don’t know why that’s not the default for CSV files in ADF but I do hope it helps someone else.
2023-09-12