Automate Azure API Management: Backup and Restore

Once you setup the service in Azure and add all your APIs, you should make sure you have a good Backup strategy in place, in case things go wrong.

If you do regularly backup the service, then you can also use those backups for a couple of purposes:

you can for example deploy more than one instance and put a Traffic Manager in front of them, in order to handle geographical distribution.

But you can also restore a specific backup to a different Subscription, and create a separate environment, as per the DTAP street.

Azure API Management provides a REST API that you can use to automate things; unfortunately that works for most features and activities, but not for backing up and restoring the whole service.

For this you need to use the good "old" ARM, which also provides a REST API with specififc API Management operations.

By reading the article above, you might think that this is as easy as getting a token, backing up and restoring the service...

Well, not if you want an idempotent process; in such case, you are on the right post, so just keep reading.

Backing up the service is quite straightforward, as long as you have a good Retry strategy (I suggest Polly); in fact, since some API Management operations can take several minutes (see up to 45 minutes...), once you send a request to backup your service, you get a 202 Accepted HTTP Response, with a Location URL, that you can use to regularly check for the backup operation to be completed.

The same is true for restoring, as well as creating the service in the first place (and keep in mind that you need the service instance to be restored to to exist before you start restoring).

And what if something goes wrong while you wait for completion? Most likely you will lose the Locaion URL, making it harder to know when the process is done (unless manually checking the Azure Portal, which defies the automation).

So in a true DevOps manner, in the diagram here is the fully automated process for backing up and restoring an API Management service.


As you can see in the above picture, there are several steps just to make sure things won't break in different scenarios.

There is also an Azure Storage Table used for auditing (or saving the backup and restore status each time).

So here is the pseudo-code for the Backup operation:

ExecuteRestRequestReturnLocation()
AddBackupEntryToStorageTable()
CheckForOperationCompletion()
UpdateBackupEntryToStorageTable()


Where the most interesting part is the CheckForOperationCompletion(), using Polly and a RetryPolicy, as shown below here:

private async void CheckForOperationCompletion(string locationUrl, string token)
        {
            try
            {
                HttpStatusCode[] httpStatusCodesWorthRetrying = {
                   HttpStatusCode.Accepted, // 202
                };

                var policy = Policy
                  .HandleResult<HttpResponseMessage>(r => httpStatusCodesWorthRetrying.Contains(r.StatusCode))
                  .WaitAndRetryForeverAsync(retryAttempt => TimeSpan.FromMinutes(Math.Pow(_retryAfterMinutes, retryAttempt)));

                HttpResponseMessage result = null;
                result = await RestHelper.ExecuteRestRequestRetryResponseMessage(locationUrl, "Bearer", token, new RetryPolicy<HttpResponseMessage>[] { policy });
                if (result.StatusCode != HttpStatusCode.OK)
                {
                    Log.Error("Exiting CheckForOperationCompletion because of StatusCode: {StatusCode}", result.StatusCode);
                }
            }
            catch (Exception ex)
            {
                Log.Error(ex.Message + " " + ex.StackTrace);
            }
        }

Now, that was quite straightforward..

The Restore part itself is nearly identical to the Backup pseudo-code above, however, when put in the bigger picture of fully restoring an existing API Management backup, there is a bigger logic involved (as you can already see in the diagram above).

private static bool CreateNewApimService(string svcName, string destinationSubscriptionId, string destinationResourceGroupName, string appName)
        {
            //TODO: make async!!
            //1) check if svc already exists
            if (!armClient.VerifyApimServiceExists(svcName).Result)
            {
                //2) if not, check name availability
                if (armClient.CheckApimServiceNameAvailability(svcName, destinationSubscriptionId).Result)
                {
                    //3) then create svc
                    if (!armClient.CreateApimService(svcName, destinationSubscriptionId, destinationResourceGroupName, "West Europe", "Developer").Result)
                    {
                        Log.Error("The Service {0} was not created, something went wrong.", svcName);
                        return false;
                    }
                    
                    //4) Create App Registration (DevPortal) - AzureGraphClient
                    var res = azGraphClient.CreateAppRegistration("new-aad-app-registration", appName, serviceName).Result; 
                    if (String.IsNullOrEmpty(res) || res.StartsWith("Error"))
                    {
                        Log.Error("Cannot create App Registration for {0}", svcName);
                    }
                    else
                    {
                        var appId = res;
                        Log.Information($"API Management service {svcName} was succesfully created along with its AAD App Registration (AppId {appId}).");
                        Log.Information($"The App Registration needs to be authorized before it can be used. An Admin needs to Grant Permissions to it from the Azure portal.");
                    }
                }
                else
                {
                    Log.Error("The name {0} is not available for API Management.", svcName);
                    return false;
                }
            }
            return true;
        }

As you can see there is quite some logic involved when creating a new API Management service programmatically, such as:

1) Check if the service exists already,
2) Check service name availability,
3) Create APIM service,
4) Create App Registration for the APIM Developer Portal...

In particular, here I use Polly to retry after the API Management service creation, which unlike the Backup and Restore operations (where you get a 202 - Accepted until a 200 - OK), it always returns 200 - OK, and in the Response Content Body it contains a JSON field called provisioningState, which will be initially "Created", and only after several minutes (up to 45...) it will become "Succeeded".


Inside the CreateApimService method there is the Polly Retry Policy:
public async Task<bool> CheckForCreationCompletion(string url)
        {
            var response = await WaitForCreationCompletion(() => RestHelper.ExecuteRestRequest(url, "Bearer", _accessToken, HttpMethod.Get, ""));
            if (!response)
            {
                Log.Error("CheckForCreationCompletion failed");
            }
            return response;
        }

        private Task<bool> WaitForCreationCompletion(Func<Task<HttpResponseMessage>> requester)
        {
            var policy = Policy
                    .HandleResult<bool>(false)
                    .WaitAndRetryForeverAsync(retryAttempt => TimeSpan.FromMinutes(Math.Pow(_retryAfterMinutes, retryAttempt)));

            //var retryPolicy = new RetryPolicy<HttpResponseMessage>[] { policy };
            //you can subscribe to the RetryPolicy.Retrying event here to be notified 
            //of retry attempts (e.g. for logging purposes)
            return policy.ExecuteAsync(async () =>
            {
                HttpResponseMessage response;
                try
                {
                    response = await requester().ConfigureAwait(false);
                }
                catch (TaskCanceledException e) //HttpClient throws this on timeout
                {
                    //we need to convert it to a different exception
                    //otherwise ExecuteAsync will think we requested cancellation
                    throw new HttpRequestException("Request timed out", e);
                }
                //assuming you treat an unsuccessful status code as an error
                //otherwise just return the respone here
                return response.CheckForSucceededState();
            });
        }

And to check for the custom message returned, I use a C# Extension:
public static bool CheckForSucceededState(this HttpResponseMessage r)
        {
            if (r == null || !r.IsSuccessStatusCode) return false;
            var readResult = r.Content.ReadAsStringAsync().Result;
            JObject o = JObject.Parse(readResult);
            var provisioningState = (string)o["properties"]["provisioningState"]; //Created, Succeeded
            var targetProvisioningState = (string)o["properties"]["targetProvisioningState"]; //Activating
            return provisioningState == "Succeeded";
        }

And this is the ResilientHttpClient used for the regular Polly Retry in the CheckForOperationCompletion:
public class ResilientHttpClient<T> : HttpClient where T : class
    {
        private HttpClient _client;
        private RetryPolicy<T> _policy;

        public ResilientHttpClient(RetryPolicy<T>[] policies) 
        {
            _policy = policies[0];
            _client = new HttpClient();            
        }

        private Task<T> HttpInvoker(Func<Task<T>> action)
        {
            return _policy.ExecuteAsync(() => action());
        }

        public Task<T> SendAsync(string uri,
            HttpMethod method,
            string authorizationToken = null,
            string authorizationMethod = "Bearer")
        {
            return HttpInvoker(async () =>
            {
                var requestMessage = new HttpRequestMessage(method, uri);
                requestMessage.Headers.Authorization = new AuthenticationHeaderValue(authorizationMethod, authorizationToken);
                var response = await _client.SendAsync(requestMessage);
                return response as T;
            });
        }
        
    }

While for the CheckForOperationCompletion a standard HttpClient is used.

Now, as you might have noticed, I added a comment to make this code Async (right now this is a prototype Console App).

Also, ideally an Email should be sent to the user who starts those long running processes (such as creating API Management service or backing it up and restoring it),

And this code should probably reside inside a WebJob (or Service), so it can run on its own, no matter which client calls it.

But all that is out of scope for this prototype and blog post, so that's all for now!

Comments

  1. You have provided a nice article, Thank you very much for this one. And I hope this will be useful for many people. And I am waiting for your next post keep on updating these kinds of knowledgeable things.
    MS Azure Online Training

    ReplyDelete

Post a Comment

Popular posts from this blog

Cloud Computing using Microsoft Azure for Dummies

RabbitMQ on Kubernetes Container Cluster in Azure

AD vs AAD (Active Directory vs Azure Active Directory)