ADF: Passing a file attachment via a POST to a REST service using Azure Data Factory

I was recently doing work for a client where I needed to replace some Python code that was being used to upload a file attachment to a REST based service using a POST command. I was trying to do it using a Web Activity in Azure Data Factory. What was surprising was how little information I could find on how to do that.

While I can't post all the code, I did want to post the core parts here in case it helps others, and for my own future reference. The basic steps involve:

  • Put the file into a storage account
  • Read the file into a variable
  • Construct the body of the POST call, including the attachment
  • Send the POST call using a Web Activity
  • Test the REST-based call

Put the file into a storage account

Ideally, the Web Activity would have a way to just point to a file in an Azure Storage Account and get it to attach that file to calls. It doesn't have that option.

What it does have is an option to include a dataset in the body of the call.

Add dataset UI

I was immediately hopeful and tried that, and while it was interesting, it didn't include the data as an attachment. It just appended it into the body of the call.

So let's load up a file into a variable and use the variable to provide part of the body of the web call.

First, I've created a resource group called rest_post_test using the Azure portal, then created a storage account called greglowfileupload. I created a text file called cities.csv with the following contents:

List of cities

And I uploaded it to a container called uploadfiles.

Cities file in a storage container

Read the file into a variable

I created an Azure pipeline called RestUploadTest, and added the following parameters:

Pipeline parameters

The APIKey was provided by the REST service and is used to log on, instead of using other forms of authentication.

The SiteURL is the address that the POST is being sent to. I've used a test site that I'll talk about in a moment.

The ContainerName, FolderName, and FileName are the location of the file in the storage account. (It is not in a subfolder)

All of these values would normally be read from a database, but for simplicity, I've made them parameters to the pipeline.

I then added the following variables:

Pipeline variables

Next I added a linked service StorageFile to connect to the storage account.

Linked service definition

Note that I would normally use a Key Vault for holding key details for the storage account, but I've gone for simplicity in this demo.

To read the file from storage, we can use a Lookup Activity, but it requires a Dataset, so we create the dataset first. I called it StorageTextFile.  I like to make datasets reusable, so I added parameters to it:

Dataset parameters

Then I configured those parameters as dynamic values in the Connection tab:

Dataset connection tab

Dataset connection tab 2

Note that because I used a Delimited Text File as the dataset type, I had to configure a column and row delimiter. I chose values that will not occur in my file. I used the preview option to make sure I could read the data.

In the pipeline, I added a Lookup Activity called Get File Contents, configured the timeout as 20 minutes (not the default of 12 hours !) and set the details as follows:

Get file contents activity

The dataset properties were populated dynamically from the pipeline parameters.

Finally, I added a Set Variable Activity called Set AttachmentFileContents.

Set attachment file contents

The dynamic code for setting the variable was:

I used the output of the Get file contents activity, and got the first row from it. Finally, because I didn't use the first row as a header row, it allocated its own headers. In this case, I have one column called Prop_0.

And then tested the code up to this point. I clicked on the output of the Set AttachmentFileContents Activity and can see the values. (Encoded in JSON in this viewer).

First test ok

All good to here. We have the file loaded into a variable.

Construct the body of the POST call, including the attachment

The next two things I need to configure are the variables that are holding the content type and body for the web request. I put them in variables separately to make it easier to debug.

First, I configured the web request content type variable:

Web request content type

The value was actually:

This is multipart form data, and because the body can contain many values, you need a boundary that's used to separate them.

Then the body values:

Web request body

The value was comprised of:

Note: the positioning of the carriage return/linefeeds in this is critical. You can't just concat them by using '\r' or '\n'. I've found that doesn't work. The line breaks need to be in the dynamic values.

Also note: you'll see the boundary value in here, but the first time I used it, I added a double-dash prefix, and at the end, I used a double-dash prefix and suffix.

Now we have the pieces ready for the Web Activity.

Note on Line Endings

I found this works for most web servers, but some web servers are fussy about how new lines are formed. Some are happy to just get a line feed (LF) character, others want a carriage return and line feed (CRLF).

If you only supply a LF and it wants a CRLF, the error returned is normally one that says you've reached the end of the stream unexpectedly (or similar words). In that case, you'll need to adjust the body to have consistent line endings with CRLF.

The replace() function in ADF works fine, but you can't just specify '\n' or '\r', etc. I've found you need to use the base64 encodings to get what's needed. (Suggestion from fellow MVP Simon Sabin)

They are as follows:

  • Instead of '\r' write base64ToString('DQ==')
  • Instead of '\n' write base64ToString('Cg==')
  • Instead of '\r\n' write base64ToString('DQo=')

Send the POST call using a Web Activity

I then added a Web Activity called Upload the file attachment:

Web activity

Note the Authorization header contained:

That supplies the API key as the bearer.

Note also: you shouldn't need the Accept header but I've found some services complain if it isn't there. The other reason for having it set to application/json is so that when an error occurs, you can read it without having to decode it as you would with gzip compression, etc.

Uploading Larger Files

If you are uploading larger files, there are two things to consider:

  • I'm not sure what the current maximum size is for an ADF variable, but the method that I've used assumes that the whole file will fit in a variable. It's pretty large though.
  • In the advanced settings for the Web Activity, there is a setting for an HTTP Request Timeout. That defaults to 1 minute. If the upload will take longer than that, you'll need to increase it. It has a 10 minute maximum value.

HTTP request timeout

Test the REST-based call

Now we get to test it. What we need is something to catch the requests while we're testing. The easiest site I've found for this is webhook.site. You can set up a site that listens on a URL for free. It's awesome. When you start it, it gives you a temporary URL to use:

That's the URL that I used earlier as a pipeline parameter. Let's test the POST by debugging the pipeline.

First, we make sure the pipeline ran as expected:

When we check the website, we see the request received, and with the file attachment!

Post received

Lower-level debugging

While this site is great when debugging most things, at first I struggled with it. The problem was that when I had malformed requests, I wished that it would just show me the raw data that it was receiving. I needed to compare my requests to other well-formed requests. I asked the authors and it can't do that. (At least not right now)

What did help, is that I found a GitHub project from Janne Mattila, called Echo. You'll find it here: https://github.com/JanneMattila/Echo. It looks like Janne also has a hosted version of it available here.

I decided to build that project and deploy it to an Azure Website, in case I wanted to modify it. And it was so very helpful. It just shows you the detail of everything it receives, rather than formatting the request:

Echo output

Thanks Janne !

I know this has been a longer blog post than normal but I hope it helps someone work this out.

 

25 thoughts on “ADF: Passing a file attachment via a POST to a REST service using Azure Data Factory”

  1. Hi Dr Greg Low,

    It's a very good topic to discuss and thanks for writing this blog.
    and we too have similar requirement to handle the multipart-form-data with boundary. we tried the way you explained above, but somehow, it's not working, and I doubt because of new line(\n) feed in body part. it would be great if you can help us how to pass the new line char in Body dynamically as you mentioned and sample body part in Web activity Input.
    we tried as below;
    @concat('–d2f98be7a8485c3315a1ad3624ff2a09
    Content-Disposition: form-data; name="primaryKeys"
    ',variables('Primary Key'),'
    –d2f98be7a8485c3315a1ad3624ff2a09–
    ')

    web Activity Input the body going as below:
    "body": "–d2f98be7a8485c3315a1ad3624ff2a09\nContent-Disposition: form-data; name=\"primaryKeys\"\nid\n–d2f98be7a8485c3315a1ad3624ff2a09–\n",

    Thanks,
    Phani Kumar

    1. Hi Phani, try having \r\n as the line separator. At present in most places, you have \n only. Many web servers won't accept that.

      1. Hi Dr Greg Low,

        Apologies for the late reply as I was down with sickness.

        I have tried as below but still same issue. Only "variables ('Primary Key')" is dynamic, rest all i hardcoded.
        Body as :
        @concat('–d2f98be7a8485c3315a1ad3624ff2a09\r\nContent-Disposition: form-data; name="primaryKeys"\r\n',variables('Primary Key'),'\r\n–d2f98be7a8485c3315a1ad3624ff2a09–\r\n')

        which results in ADF Activity input as below :
        "body": "–d2f98be7a8485c3315a1ad3624ff2a09\\r\\nContent-Disposition: form-data; name=\"primaryKeys\"\\r\\nid\\r\\n–d2f98be7a8485c3315a1ad3624ff2a09–\\r\\n",

        seems I am doing something wrong. please help.
        Thanks,
        Phani

        1. Hope you're feeling better.

          Unfortunately, embedding the \r\n like that won't work. What I did in my code was to insert a new line in the editor. That then added a \n to the code. I then replaced all \n with \r\n but using the base64 functions that I mentioned. And yes, it's very confusing that it doesn't work like you expected. But there is something weird about how it handles \r and \n in those editors and expressions. Wish it wasn't so.

    1. Great question. And yes, there might be a way to do this. When I first did this, I used a Lookup to read the file, and I wasn't all that happy with it. But I realise now that I could have just used another Web Activity to read the file. If I get a chance, I'll create another example that does that. The next challenge is to then see if we can pass a binary file that way. I'll have to check out how they're meant to be encoded.

  2. Hi Dr Greg Low.
    This is very good and helped me a lot.
    But I am having issue at the lookup step. Why are you check "First row only"?
    Its output is:
    {
    "firstRow": {
    "External_Account_Number__c": "100000001",
    "Name": "Test",
    "Zip": 12345
    }
    }
    Then the set variable for AttachmentFileContents will only give the value of the column I specified ( I do not understand why you specified "Prop_0"
    @activity('Get file contents').output.firstRow.Prop_0

    1. It's because the whole file comes in as a single column in a single row. When you do this, it doesn't have a column name, and it calls it Prop_0.

      Based on what you posted, it looks like you're reading the rows, row by row, and parsing the file, not reading the whole file at once. Check the values you have for row and column delimiters. They should be values that do not occur in the file.

      1. Thanks SO MUCH for response.
        I think you are right, my existing dataset defined for DelimitedText is using "Row delimiter" as "Default (\r,\n, or \r\n)" and Column delimiter as "Comma (,)" instead of yours Row delimiter" as "\u0007" and Column delimiter as "Start of heading (\u0001)", which will read the whole file as string, right?

      2. Using a dataset defined like yours, then it works exactly like you have posted, thanks so much. I will continue next couple steps to POST it to Salesforce bulk API.

  3. Dr Greg Low,
    Thanks so much again, this post has saved me.
    In my case, my web endpoint is Salesforce BULK API.
    So Content-Type "multipart/form-data" does not work for me as I need to pass in text/csv. So I got errors from API endpoint about wrong file type and later about line feed "CRLF".
    I skipped both "SetWebRequestType" and "SetWebRequestBody" steps. And my Web Activity Content-Type just as simple as "text/csv" and Accept is "*/*".
    And the Body is raw string of @variables('AttachmentFileContents'); Later I skipped this variable step, from "Get file content" lookup directly to Web activity, using "@activity('Get_full_center_csv_file').output.firstRow.Prop_0".

    1. Glad to hear it's helped.

      You need to decide what the API wants. You need to use multipart/form-data if it wants an attachment. If it just wants the text/csv data in the body of the request, than you can do that by using the option to attach a dataset.

  4. Hi Dr Low, just wanted to leave a note of thanks for your blog post – it has been immensely helpful for a very particular requirement we've had to interact with a legacy API. Without your tip on the base64ToString encoding for newlines we would have been stuck. What a legend 🙂

    Just to be more specific on that point, our API didn't like Value1\nValue2, so we had to replace \n with \r\n using the following: @replace(variables('WebRequestBody'), base64ToString('Cg=='), base64ToString('DQo='))

    1. You are most welcome! I can't take credit for the base64ToString bit. That was Simon Sabin brilliance.

  5. Hi Dr Greg,
    We have tried the same example which you have created. We didn't give Api key as a parameter.
    We need the file attachment in the Api.
    In the webrequestcontenttype activity we have given the value as multipart/form-data;boundary =d2f98be7a8485c3315a1ad3624ff2a09
    The pipeline is running fine. But the file is not generating.

    Could you please suggest Me the solution.

    1. You need to use something like the echo tool that I mentioned, to make sure that what the API is receiving is in fact what you think you're sending. And use a tool like the other site I mentioned to check that you are actually sending an attachment. The main thing I've seen go wrong if the call looks right, is the line endings. (Also, is it definitely a text file that you're uploading? A binary file would need different work).

  6. Hi Greg,

    Thank you for your article as there is almost no info on the net how to parse a file with ADF

    One problem though, MS has a 4MB limit on an ADF – The length of execution output is over limit (around 4MB currently)

    So I am not sure if your way works for bigger csv files

    1. Glad to hear that helped.

      Sorry, can't answer on the larger files though. It's an area I haven't tested out ideas on, at least as yet.

      1. Hi thanks for responding I am glad ,
        I want to pass the content file and folder name dynamically in the pipeline variable, how can I make those pipeline paramters dynamic.

      2. I have 100 folder and each folder has 5000 csv files, how to pass those values in the pipeline parameters.

        1. I already have those values as parameters to the pipeline. You just need to pass the values to the pipeline when it's called.

          What you could do is create a pipeline that reads the list of folders and cycles through each, reading the list of files, and for each one, calling this pipeline. However, I don't think that running the pipeline 500000 times is a good idea. It would probably cost too much.

          Instead, I think you should look to using a python notebook, and making loops that upload from there, rather than doing this directly with web activities in ADF

Leave a Reply

Your email address will not be published. Required fields are marked *