PR.

AzureAzure FunctionsAzure Form RecognizerAzure SQLAzure Key VaultC#

Converting PayStub logic to an Azure Function

published 02-20-2023

  1. Step 1: Initial Functionality
  2. Step 2: Convert To Solution and an Azure Function
  3. Step 3: Setup Azure environment and deploy Azure Function

In the previous article on this topic I created a console application that can be used to process a Workday Paystub (because thats what my company uses) PDF file and convert it to SQL. In this article I we will migrate the working console application to an Azure Function.

While the console application was mainly to test the logic, the Azure Function will be the first version of this feature in a more formal sense, like all applications it will need to be observed and optimized once it is in production. If you find any issues with the code the source code is available on GitHub

NOTE: Something you will notice is the console app was written in dotnet 7.0 and the Azure Function is written in dotnet 6.0 (LTS). This is because Azure Functions use the dotnet 6.0 runtime, that being said this code is still compatible with dotnet 7.0 and can be upgraded when needed.

Prerequisites

You will need to install the Azure Functions Core Tools in order to follow along with the code.

Getting Started

Just to outline the steps we will be taking:

  1. Move core logic into a Class Library.
  2. Create an Azure Function and implement the core functionality.
  3. Test the Azure Function locally.

STEP 1: Move core logic into a Class Library

We begin by creating a dotnet solution with a class library, we will do all of this with the dotnet cli; the process is easier to follow along that way. Run the following to create a new solution and class library:

dotnet new sln --output PaystubService # Create Solution
cd PaystubService # Change to Solution Directory
dotnet new classlib --framework net6.0 --output PaystubLibrary # Create Class Library
dotnet sln add PaystubLibrary # Add Class Library to Solution so it can be referenced

Now we have a dotnet solution with a class library. We will move the core logic from the console application into the class library. If you recall from the console application the core logic was broken down into the following steps:

  1. Get the Paystub file from the request.
  2. Get KeyVault Values for Form Recognizer.
  3. Call Form Recognizer.
  4. Convert the Form Recognizer Table result to SQL.
  5. Connect to SQL and run the SQL commands.

We will skip the first step here as it will be handled in the actual function, but the rest of the steps will be moved into the class library. The class library will then be referenced by the Azure Function.

Before we begin adding the code to the class library, we need to install our NuGet packages.

NOTE: The first four packages are the same ones used in the console application. The last package had to be updated to ensure compatibility with dotnet 6.0.

dotnet add package Azure.Identity
dotnet add package Azure.AI.FormRecognizer
dotnet add package Azure.Security.KeyVault.Secrets
dotnet add package Microsoft.Data.SqlClient
dotnet add package AzFormRecognizer.Table.ToSQL

Now we can begin adding the code to the class library. The first step is to create a new file called PaystubProcessor.cs and add the following code:

using Azure;
using Azure.Identity;
using Azure.Security.KeyVault.Secrets;
using Azure.AI.FormRecognizer.DocumentAnalysis;
using AzFormRecognizer.Table;
using AzFormRecognizer.Table.ToSQL;
using Microsoft.Data.SqlClient;

public class PaystubProcessor
{
    public async Task<(DocumentDetails, IEnumerable<string>)> ProcessPaystubAsync(Stream fileStream, string fileName)
    {
        throw new NotImplementedException();
    }

    private async Task<(string Endpoint, string ApiKey)> GetKeyVaultSecretsAsync(string keyVaultName)
    {
        throw new NotImplementedException();
    }

    private async Task<AnalyzeResult> AnalyzeWithFormRecognizerAsync(Stream fileStream, string endpoint, string apiKey)
    {
        throw new NotImplementedException();
    }

    private IEnumerable<string> ConvertToSQL(PaystubFormat format, AnalyzeResult result, DocumentDetails details)
    {
        throw new NotImplementedException();
    }

    public async Task UploadToSQLAsync(string serverName, string databaseName, IEnumerable<string> commands, bool createTables = false)
    {
        throw new NotImplementedException();
    }
}

We will now begin filling in the code for each of the methods. The first method ProcessPaystubAsync and the last method UploadToSQLAsync will be called by the Azure Function. The middle three methods are helper methods that will be used by the first and last methods, we will fill those out first.

GetKeyVaultSecretsAsync

The first method we will fill out is GetKeyVaultSecretsAsync. This method will be used to get the Form Recognizer endpoint and API key from KeyVault.

public async Task<(string Endpoint, string ApiKey)> GetKeyVaultSecretsAsync(string keyVaultName)
{
    var keyVaultUri = new Uri($"https://{keyVaultName}.vault.azure.net");
    var secretClient = new SecretClient(keyVaultUri, new DefaultAzureCredential());

    var formRecognizerEndpoint = await secretClient.GetSecretAsync("FormRecognizerEndpoint");
    var formRecognizerKey = await secretClient.GetSecretAsync("FormRecognizerKey");

    return (formRecognizerEndpoint.Value.Value, formRecognizerKey.Value.Value);
}

AnalyzeWithFormRecognizerAsync

The next method we will fill out is AnalyzeWithFormRecognizerAsync. This method will be used to call Form Recognizer and get the results.

public async Task<AnalyzeResult> AnalyzeWithFormRecognizerAsync(Stream fileStream, string endpoint, string apiKey)
{
    string modelName = "prebuilt-layout";

    AzureKeyCredential credential = new AzureKeyCredential(apiKey);
    var documentClient = new DocumentAnalysisClient(new Uri(endpoint), credential);

    AnalyzeDocumentOperation operation = await documentClient.AnalyzeDocumentAsync(WaitUntil.Completed, modelName, fileStream);
    return operation.Value;
}

ConvertToSQL

The next method we will fill out is ConvertToSQL. This method will be used to convert the Form Recognizer results into SQL commands. Notice the Workday.FormatPaystub method, this is a method that is used to assign relationships between the tables and to add names to any tables that don't have a name. Similar to the previous console application, you should pass in a method that will format the results for your specific file.

public IEnumerable<string> ConvertToSQL(PaystubFormat format, AnalyzeResult result, DocumentDetails details)
{
    switch (format)
    {
        case PaystubFormat.Workday:
            var listOfSQLCommands = result.Tables.ToSQL(details, Workday.FormatPaystub);
            return listOfSQLCommands;
        default:
            throw new Exception("Paystub format not supported.");
    }
}

ProcessPaystubAsync

The next method we will fill out is ProcessPaystubAsync. This method will be used to call the other methods and return the results.

public async Task<(DocumentDetails, IEnumerable<string>)> ProcessPaystubAsync(Stream fileStream, string fileName, string keyVaultName)
{
    var details = new DocumentDetails()
    {
        Title = fileName,
        Id = Guid.NewGuid().ToString()
    };

    var (endpoint, apiKey) = await GetKeyVaultSecretsAsync(keyVaultName);
    var result = await AnalyzeWithFormRecognizerAsync(fileStream, endpoint, apiKey);

    var commands = ConvertToSQL(PaystubFormat.Workday, result, details);

    return (details, commands);
}

UploadToSQLAsync

The last method we will fill out is UploadToSQLAsync. This method will be used to connect to SQL and run the SQL commands.

public async Task UploadToSQLAsync(string serverName, string databaseName, IEnumerable<string> commands, bool createTables = false)
{
    string ConnectionString = $"Server={serverName}.database.windows.net; Authentication=Active Directory Default; Encrypt=True; Database={databaseName}";

    using (SqlConnection connection = new SqlConnection(ConnectionString))
    {
        await connection.OpenAsync();

        if (createTables)
        {
            var createTableCmds = commands.Where(cmd => cmd.Contains("CREATE TABLE")).ToList();
            foreach (var sqlTableStr in createTableCmds)
            {
                using (var command = new SqlCommand(sqlTableStr, connection))
                {
                    command.ExecuteNonQuery();
                }
            }
        }

        var inserValuesCmds = commands.Where(cmd => !cmd.Contains("CREATE TABLE")).ToList();
        foreach (var sqlTableStr in inserValuesCmds)
        {
            using (var command = new SqlCommand(sqlTableStr, connection))
            {
                command.ExecuteNonQuery();
            }
        }

        await connection.CloseAsync();
    }
}

Now that we have filled out all the methods, we have our core functioanlity. Notice we are not retrieving any environment variables or configuration settings. We will be grabbing all our environment variables from the Azure Function App Settings.

STEP 2: Create Azure Function and reference code

First step is to create the Azure Function app, to do this we will need the Azure Function Core Tools. Once you have installed the Azure Function Core Tools, you can create a new Azure Function with the following commands, just be sure you are in the same directory as the PaystubService solution.

  1. Create PaystubFunction directory and change directory to it
mkdir PaystubFunction && cd PaystubFunction
  1. Create Azure Function project
func init --language "C#" --worker-runtime dotnet --name PaystubFunction
  1. Create Azure Function
func new --template "HttpTrigger" --name "Processor" --authlevel "function"
  1. Add Azure Function to solution
cd ../ && dotnet sln add PaystubFunction
  1. Return to the Azure Function directory and add the PaystubLibrary project reference
cd PaystubFunction && dotnet add reference ../PaystubLibrary

You can go ahead and remove the default code in the Processor.cs Run method and replace it with the following:

[FunctionName("Processor")]
public static async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Function, "post", Route = null)] HttpRequest req, ILogger log)
{
    var formCollection = await req.ReadFormAsync();
    if (formCollection.Files.Count == 0)
    {
        return new BadRequestObjectResult("No files were uploaded.");
    }
    log.LogInformation($"$Processing ${formCollection.Files.Count} files.");

    string keyVaultName = Environment.GetEnvironmentVariable("KEY_VAULT_NAME");
    if (string.IsNullOrEmpty(keyVaultName))
    {
        throw new Exception("Please set the KEY_VAULT_NAME environment variable.");
    }

    string serverName = Environment.GetEnvironmentVariable("SQL_SERVER_NAME");
    if (string.IsNullOrEmpty(serverName))
    {
        throw new Exception("Please set the SQL_SERVER_NAME environment variable.");
    }

    string databaseName = Environment.GetEnvironmentVariable("SQL_DATABASE_NAME");
    if (string.IsNullOrEmpty(databaseName))
    {
        throw new Exception("Please set the SQL_DATABASE_NAME environment variable.");
    }

    var processor = new PaystubProcessor();
    List<string> filesProcessed = new List<string>();
    List<string> commands = new List<string>();

    foreach (var file in formCollection.Files)
    {
        log.LogInformation($"Processing file: {file.FileName}");
        var fileStream = file.OpenReadStream();

        var (details, sqlCommands) = await processor.ProcessPaystubAsync(fileStream, file.FileName, keyVaultName);
        commands.AddRange(sqlCommands);
        filesProcessed.Add(details.Id);
    }

    try {
        await processor.UploadToSQLAsync(serverName, databaseName, commands, createTables: true);
    } catch (Exception ex) {
        log.LogError(ex, "Error uploading to SQL");
        return new BadRequestObjectResult(ex.Message);
    }

    return new OkObjectResult(filesProcessed);
}

STEP 3: Test the Function Locally

Inorder to test the function locally you will need to populate the Azure Function App Settings with your environment variables. Look for the local.settings.json file in the PaystubFunction directory. You can add the following to the Values section:

{
    "Values": {
        "KEY_VAULT_NAME": "YOUR_KEY_VAULT_NAME",
        "SQL_SERVER_NAME": "YOUR_SQL_SERVER_NAME",
        "SQL_DATABASE_NAME": "YOUR_SQL_DATABASE_NAME"
    }
}

Once you have added the environment variables, navigate to the PaystubFunction directory and run the following command to start the Azure Function:

func start

Assuming there are no build errors and the Azure Function starts successfully, you can test the function by sending a POST request to the Azure Function endpoint. You can use the following PowerShell script to test the function:

NOTE: You will need to update the payslip-1.pdf file path to match your local file path.

curl http://localhost:7071/api/Processor -i -X POST -H "Content-Type: multipart/form-data" -F "Files=@payslip-1.pdf" 

You will notice the request takes some time to complete, this is an indicator that the Azure Function is working as expected, but is in need of some optimization. We will be optimizing the Azure Function in the next few articles. In the meantime, I hope you found this article helpful. If you have any questions or comments, please feel free to reach out to me.