Duplicates, duplicated blob triggered Azure Function invocation

I wrote this blog How to reprocess or retrigger a blob triggered Azure Function and it enlighten me enough to write this one.  There may be more reasons for the scenario, so this is just a way in which you might encounter a blob triggered Azure Function being triggered multiple times for the same blob.  In short, this scenario has nothing to do with the Azure Function or the platform, it has to do with how you are sending the blob to the storage container.  Here are the steps I took to prove this happens.

  1. Upload a blob into my Azure Storage container, the first time
  2. I checked the Azure Storage container to see that the blob was indeed uploaded
  3. I watched the Azure Function get triggered
  4. I checked the $logs container for the Azure Storage container PUT operation, this is the same Azure Storage account as in step 2
  5. I checked the blobreceipts directory in the azure-webjobs-hosts Azure Storage container, this is the Azure Storage container which is the value of the AzureWebJobsStorage application setting
  6. I uploaded a new blob with the same name as the one in step 1, and performed step 1 – 5 again.

Here are some images which explain what happened.

Here, Figure 1, I uploaded a blob named helloworld0.txt using the Azure Function Consumer.

image

Figure 1, Upload a blob into my Azure Storage container, the first time

I checked the blobs container in my Azure Storage account, Figure 2.  Notice the timestamp.

image

Figure 2, I checked the Azure Storage container to see that the blob was indeed uploaded

The Azure Function which I had configured to be triggered when a blob is loaded into it, triggered as expected for that specific file.  See Figure 3.  Also, look at Figure 4 which shows the Function.json.

image

Figure 3, I watched the Azure Function get triggered

image

Figure 4, blob triggered Azure Function, function.json example

I checked the $logs container for the Azure Storage log for the blob, Figure 5.

image

Figure 5, I checked the $logs container for the Azure Storage container PUT operation, this is the same Azure Storage account as in step 2

Here is a sample of the PUT operation within that log for the helloworld0.txt blob.

1.0;2021-04-8T12:56:19.5259961Z;PutBlob;Success;201;7;7;authenticated;
csharpguitarreprocessor;csharpguitarreprocessor;blob;
"https://csharpguitarreprocessor.blob.core.windows.net:443/blobs/helloworld0.txt";
"/csharpguitarreprocessor/blobs/helloworld0.txt";c42299fd-d01e-00e3-1e2d-3cbd8a000000;0;
:20164;2018-03-28;457;15;249;0;15;"bunQcZqeFqf6LKqdyKv6Wg==";"bunQcZqeFqf6LKqdyKv6Wg==";
""0x8D90A4504911CEB"";Wednesday, 28-Apr-21 12:56:19 GMT;;
"Azure-Storage/9.3.2 (.NET Core)";;"6c1c17f9-3646-4239-88e9-a535fa0221bc"

Then I checked the content in the azure-webjobs-hosts container, Figure 6.  Notice the timestamp closely matches the one in Figure 2.  Also notice the ETAG in the $logs file (0x8D90A4504911CEB) that contains the PUT log for the specific blob matches the directory (purple rectangle) on the AzureWebJobsStorage azure-webjobs-hosts container.

image

Figure 6, I checked the blobreceipts directory in the azure-webjobs-hosts Azure Storage container

Then I uploaded a blob with the same name to the same Azure Storage container.  Here is what happened.

  1. The time stamp of the helloworld0.txt file changed, which means the one which was there is overwritten
  2. The Azure Function was triggered again
  3. Another log was written to the $logs directory
  4. Another entry was added to the AzrueWebJobsStorage azure-webjobs-hosts container

If you look at the Azure Storage container which contains your uploaded blobs, you will only see the file once, because it gets overwritten.  Looking at the $logs and AzrueWebJobsStorage azure-webjobs-hosts container will show that some client uploaded the file again, with the same name, which, correctly in my opinion triggered the Azure Function and processed the blob again, it looks like a duplication, it is, but it isn’t.

image

Figure 7, The time stamp of the helloworld0.txt file changed, which means the one which was there is overwritten

I watched the Azure Function get triggered again.

image

Figure 8, The Azure Function was triggered again

Here is the content from the log file from the $logs container.  Notice again that the ETAG in the log file matches the one in Figure 9.

1.0;2021-04-8T13:32:54.5222725Z;PutBlob;Success;201;8;8;authenticated;
csharpguitarreprocessor;csharpguitarreprocessor;blob;
"https://csharpguitarreprocessor.blob.core.windows.net:443/blobs/helloworld0.txt";
"/csharpguitarreprocessor/blobs/helloworld0.txt";00d9c585-a01e-0032-0432-3cdf00000000;
0;167.220.197.37:48970;2018-03-28;457;15;249;0;15;"bunQcZqeFqf6LKqdyKv6Wg==";
"bunQcZqeFqf6LKqdyKv6Wg==";""0x8D90A4A20E3610E"";
Wednesday, 28-Apr-21 13:32:54 GMT;;"Azure-Storage/9.3.2 (.NET Core)";
;"373e8bef-565f-43d8-a0cf-287bfec4766a"

A second entry was added to the AzrueWebJobsStorage azure-webjobs-hosts container which container the same file name, but has a different ETAG.

image

Figure 9, Another entry was added to the AzrueWebJobsStorage azure-webjobs-hosts container

This scenario is kind of an easy one because I created these Azure products all from scratch and the experiment had only 2 blobs.  It gets a little harder to find troubleshoot when you have a large or massive amount of data.  It did however find that you can search for the ETAG via Storage Explorer, Figure 10.  I wasn’t able to use the Storage Explorer to search the $logs directory however.

image

Figure 10, Azure Storage Explorer to find duplicate Azure Function invocations

You may be able to get stronger search capabilities if you enable the new Diagnostic settings which is currently in preview.

image

This will let you store the logs into a Log Analytics workspace which has greater querying capabilities.  Read more about this here:

Create diagnostic settings to send platform logs and metrics to different destinations