Monday, 31 July 2017

Transient Fault Handling in Xamarin.Forms using Polly

Previously I wrote about transient fault handling in Xamarin.Forms, and discussed an implementation of the retry pattern that uses exponential backoff. The advantage of the implementation was that the retry pattern was implemented without requiring any library code, for those sensitive to bloating their application package size.

There are, however, transient fault handling libraries available and the go to library for .NET is Polly, which includes fluent support for the retry pattern, circuit breaker pattern, bulkhead isolation, and more. In Polly, these patterns are implemented via fault handling policies, which handle specific exceptions thrown by, or results returned by, the delegates that are executed through the policy.

This blog post will discuss using Polly’s RetryPolicy, which implements the retry pattern.

Implementation

The sample application, which can be found on GitHub, is similar to the sample application from my previous blog post, with the custom implementation of the retry pattern replaced with Polly’s.

Initialization

The App class in the sample application initializes the classes that are responsible for communicating with the REST service:

TodoManager = new TodoItemManager( new RestService( new ResilientRequestProvider()));

The RestService class provides data to the TodoItemManager class, with the RestService class making REST calls using the ResilientRequestProvider class, which uses Polly to implement the retry pattern, using exponential backoff.

ResilientRequestProvider

The following code example shows the GetAsync method from the ResilientRequestProvider class, which makes GET requests to a specified URI:

async Task<HttpResponseMessage> HttpInvoker(Func<Task<HttpResponseMessage>> operation) { return await retryPolicy.ExecuteAsync(operation); } public async Task<TResult> GetAsync<TResult>(string uri) { string serialized = null; var httpResponse = await HttpInvoker(async () => { var response = await client.GetAsync(uri); response.EnsureSuccessStatusCode(); serialized = await response.Content.ReadAsStringAsync(); return response; }); return JsonConvert.DeserializeObject<TResult>(serialized); }

The GetAsync method code is identical to my previous blog post. The lambda expression is passed to the HttpInvoker method, which in turn passes it to the ExecuteAsync method of the RetryPolicy instance. Therefore, the code in the lambda expression is what will be retried if the GET request fails.

The RetryPolicy type is a Polly type, which represents a retry policy that can be applied to delegates that return a value of type T. The following code example show the RetryPolicy<T> declaration from the sample application:

RetryPolicy<HttpResponseMessage> retryPolicy;

This declares a RetryPolicy that can be applied to delegates that return a HttpResponseMessage.

There are 3 steps to using a fault handling policy, including the RetryPolicy<T> type, in Polly:

  1. Specify the exceptions you want the policy to handle.
  2. Optionally specify the returned results you want the policy to handle.
  3. Specify how the policy should handle any faults.

The following code example shows all three steps for defining the operation of the RetryPolicy<T> instance:

HttpStatusCode[] httpStatusCodesToRetry = { HttpStatusCode.RequestTimeout, // 408 HttpStatusCode.InternalServerError, // 500 HttpStatusCode.BadGateway, // 502 HttpStatusCode.ServiceUnavailable, // 503 HttpStatusCode.GatewayTimeout // 504 }; retryPolicy = Policy .Handle<TimeoutException>() .Or<HttpRequestException>() .OrResult<HttpResponseMessage>(r => httpStatusCodesToRetry.Contains(r.StatusCode)) .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), (response, delay, retryCount, context) => { Debug.WriteLine($"Retry {retryCount} after {delay.Seconds} seconds delay due to {response.Exception.Message}"); });

The Policy.Handle method is used to specify the exceptions and results you want the policy to handle. Here it specifies that the delegate should be retried if a TimeoutException or HttpRequestException occurs, or if the resulting HttpResponseMessage includes any of the HTTP status codes contained in the httpStatusCodesToRetry array. Therefore, the RetryPolicy<T> instance handles both exceptions and return values in a single policy.

After specifying the exceptions and results you want the policy to handle, you must specify how the policy should handle any faults. Several of Polly’s methods can be used here, including Retry, RetryForever, WaitAndRetry, or WaitAndRetryForever (along with their async variants). I chose to use one of the WaitAndRetryAsync overloads, for which three arguments must be specified:

  1. The maximum number of retries to make. Note that the overall number of attempts that will be made is one plus the number of retries configured. Therefore, four attempts can be made with this code: the initial attempt, plus up to three retries.
  2. A delegate, expressed here as a lambda expression, that calculates the duration to wait between retries based on the current retry attempt.
  3. An action to be called on each retry, that provides the current exception, duration, retry count, and context.

The advantage of using this overload is that it allows an exponential backoff strategy to be specified through calculation, and to output messages as retries are attempted.

Executing an Action through the Policy

The overall operation is that the GetAsync method in the ResilientRequestProvider class invokes the HttpInvoker method, passing an action that represents the GET request. The HttpInvoker method invokes Polly’s RetryPolicy.ExecuteAsync method, passing the received action as an argument.

The retry policy then attempts the action passed in via the ExecuteAsync method. If the action executes successfully, the return value is returned and the policy exits. If the action throws an unhandled exception, it’s rethrown and the policy exits – no further retries are made. However, if the action throws a handled exception, the policy performs the following operations:

  • Counts the exception.
  • Checks whether another retry is permitted:
    • If not, the exception is rethrown and the policy terminates.
    • If another try is permitted, the policy calculates the duration to wait from the supplied sleep duration configuration, waits for the calculation duration, and returns to the beginning of the cycle to retry executing the action again.

Running the Sample Application

The sample application, which can be found on GitHub, connects to a read-only REST service hosted by Xamarin, and it’s most likely that when running the sample the GET operation will succeed on first attempt. To observe the retry pattern in operation, change the RestUrl property in the Constants class to an address that doesn’t exist – this can be accomplished by adding a random character to the end of the existing string. Then run the application and observe the output window in Visual Studio. You should see something like:

Retry 1 after 2 seconds delay due to 404 (Not Found) Thread started: <Thread Pool> #10 Thread started: <Thread Pool> #11 Retry 2 after 4 seconds delay due to 404 (Not Found) Retry 3 after 8 seconds delay due to 404 (Not Found) ERROR: 404 (Not Found)

This shows the GET operation being retried 3 times, after an exponentially increasing delay. Remember that the number of retries, and backoff strategy can be specified with Polly’s WaitAndRetry method. This allows the RetryPolicy to be customized to fit individual application requirements.

The advantage of using Polly over implementing your own retry pattern is that Polly includes multiple transient fault handling patterns that can easily be combined for additional resilience when handling transient faults.

Summary

Polly is a .NET transient fault handling library, which includes fluent support for the retry pattern. In Polly, the retry pattern is implemented by the RetryPolicy type, which handles specific exceptions thrown by, or results returned by, the delegates that are executed through the policy.

The RetryPolicy type is highly configurable, allowing you to specify the exceptions to be handled, the return results to be handled, and how the policy should handle any faults.

The advantage of using Polly over implementing your own retry pattern is that Polly includes multiple transient fault handling patterns that can easily be combined for additional resilience when handling transient faults.

Wednesday, 26 July 2017

Transient Fault Handling in Xamarin.Forms

All applications that communicate with remote services and resources must be sensitive to transient faults. Transient faults include the momentary loss of network connectivity to services, the temporary unavailability of a service, or timeouts that arise when a service is busy. These faults are often self-correcting, and if the remote access request is repeated after a suitable delay it’s likely to succeed.

Transient faults can have a huge impact on the perceived quality of an application, even if it has been thoroughly tested under all forseeable circumstances. To ensure that an application that communicates with remote services operates reliably, it must be able to:

  1. Detect faults when they occur, and determine if the faults are likely to be transient.
  2. Retry the operation if it’s determined that the fault is likely to be transient, and keep track of the number of times the operation is retried.
  3. Use an appropriate retry strategy, which specifies the number of retries, the delay between each attempt, and the actions to take after a failed attempt.

This transient fault handling can be achieved by wrapping all attempts to access a remote service in code that implements the retry pattern.

Retry Pattern

If an application detects a failure when it tries to send a request to a remote service, it can handle the failure by:

  • Retrying the operation – the application could retry the failing request immediately.
  • Retrying the operation after a delay – the application could wait for a suitable amount of time before retrying the request.
  • Cancelling the operation – the application should cancel the operation and report an exception.

The retry strategy should be tuned to match the business requirements of the application. For example, it’s important to optimize the retry count and retry interval to the operation being attempted. If the operation is part of a user interaction, the retry interval should be short and only a few retries attempted to avoid making users wait for a response. If the operation is part of a long running workflow, where cancelling or restarting the workflow is expensive of time-consuming, it’s appropriate to wait longer between attempts and retry more times.

If a request still fails after a number of retries, it’s better for the app to prevent further requests going to the same resource and to report a failure. Then, after a set period, the application can make one or more requests to the resource to see if they’re successful. I’ll return to this topic in a future blog post.

Retrying after an Exponential Delay

If transient faults are occurring because a remote service is overloaded, or being throttled at the service end, the service could reject new requests. While this scenario can be handled by the retry pattern, it’s possible that retry requests could add to the overloading of the service, which means that the service could take longer to recover from its overloaded state.

Exponential backoff attempts to deal with this problem by exponentially increasing the delay between retries, rather than retrying after a fixed delay. The purpose of this approach is give the service time to recover, in case the transient fault is due to a service overload. For example, when the initial request fails it can be retried after 1 second. If it fails for a second time, wait 2 seconds before the next retry. Then if the second retry fails, wait for 4 seconds before the next retry.

Implementing the Retry Pattern

In this blog post I’ll explain how I implemented the retry pattern, with exponential backoff. The advantage of the approach presented here is that the retry pattern is implemented without requiring any library code, for those sensitive to bloating their application package size.

My implementation of the retry pattern adds to Xamarin’s TodoREST sample. This sample demonstrates a Todo list application where the data is stored and accessed from a RESTful web service, hosted by Xamarin. However, I’ve modified the original implementation so that the RestService class moves some of its responsibilities to the RequestProvider class, which handles all REST requests.This ensures that all REST requests are made by a single class, which has a single responsibility. The following code example shows the GetAsync method from the RequestProvider class, which makes GET requests to a specified URI:

public async Task<TResult> GetAsync<TResult>(string uri) { var response = await client.GetAsync(uri); response.EnsureSuccessStatusCode(); string serialized = await response.Content.ReadAsStringAsync(); return JsonConvert.DeserializeObject<TResult>(serialized); }

Note, however, that the sample application, which can be found on GitHub, doesn’t use the RequestProvider class. It’s included purely for comparison with the ResilientRequestProvider class, which the application uses, and which implements the retry pattern.

Initialization

The App class in the sample application initializes the classes that are responsible for communicating with the REST service:

TodoManager = new TodoItemManager ( new RestService ( new ResilientRequestProvider( new RetryWithExponentialBackoff())));

The RestService class provides data to the TodoItemManager class, with the RestService class making REST calls using the ResilientRequestProvider class, which uses the RetryWithExponentialBackoff class to implement the retry pattern, using exponential backoff.

ResilientRequestProvider

The following code example shows the GetAsync method from the ResilientRequestProvider class, which makes GET requests to a specified URI:

async Task<HttpResponseMessage> HttpInvoker(Func<Task<HttpResponseMessage>> operation) { return await retry.RetryOnExceptionAsync<HttpRequestException>(operation); } public async Task<TResult> GetAsync<TResult>(string uri) { string serialized = null; var httpResponse = await HttpInvoker(async () => { var response = await client.GetAsync(uri); response.EnsureSuccessStatusCode(); serialized = await response.Content.ReadAsStringAsync(); return response; }); return JsonConvert.DeserializeObject<TResult>(serialized); }

Notice that the code from the GetAsync method in the RequestProvider class is still present, but is now specified as a lambda expression. This lambda expression is passed to the HttpInvoker method, which in turn passes it to the RetryOnExceptionAsync method of the RetryWithExponentialBackoff class. Therefore, the code in the lambda expression is what will be retried if the GET request fails.

RetryWIthExponentialBackoff

The RetryWithExponentialBackoff class has a constructor with three arguments, as shown in the following code example:

public RetryWithExponentialBackoff(int retries = 10, int delay = 200, int maxDelay = 2000) { maxRetries = retries; delayMilliseconds = delay; maxDelayMilliseconds = maxDelay; }

The constructor arguments specify a maximum number of retries, the initial delay between retries (in milliseconds), and a maximum delay between retries (in milliseconds). However, the three constructor arguments also specify default values, allowing a parameterless constructor to be specified when the class is initialised in the App class.

The RetryOnExceptionAsync method in the RetryWithExponentialBackoff class is shown in the following code example:

public async Task<HttpResponseMessage> RetryOnExceptionAsync<TException>(Func<Task<HttpResponseMessage>> operation) where TException : Exception { HttpResponseMessage response; var backoff = new ExponentialBackoff(maxRetries, delayMilliseconds, maxDelayMilliseconds); while (true) { try { response = await operation(); break; } catch (Exception ex) when (ex is TimeoutException || ex is TException) { Debug.WriteLine("Exception: " + ex.Message); await backoff.Delay(); } } return response; }

This method is responsible for implementing the retry pattern - retrying failed operations with an exponential backoff up to a maximum number of retries. It uses an infinite loop to execute the operation that was specified as a lambda expression in the GetAsync method in the ResilientRequestProvider class. If the operation succeeds the infinite loop is broken out of, and the response received from the web service is returned. However, if the operation fails due to a transient fault, that is a TimeoutException or a HttpRequestException, the operation is retried after a delay controlled by the ExponentialBackoff class.

Obviously, the implementation of the RetryWithExponentialBackoff class could be tidied up so that it takes a dependency on an IBackoff type, which the ExponentialBackoff struct would implement. This would allow different backoff strategy implementations to easily be swapped in and out. However, the current implementation does adequately demonstrates what I’m trying to show – retrying requests that failed due to transient faults.

ExponentialBackoff

The ExponentialBackoff struct has a constructor requiring three arguments:

public ExponentialBackoff(int noOfRetries, int delay, int maxDelay) { maxRetries = noOfRetries; delayMilliseconds = delay; maxDelayMilliseconds = maxDelay; retries = 0; pow = 1; }

All three arguments must be specified when creating an instance of the struct, and they should be identical to the values of the arguments in the RetryWithExponentialBackoff class constructor.

The RetryOnExceptionAsync method in the RetryWithExponentialBackoff class invokes the Delay method in the ExponentialBackoff class if a transient fault has occurred. The Delay method is shown in the following code example:

public Task Delay() { if (retries < maxRetries) { retries++; pow = pow << 1; } else { throw new TimeoutException($"{maxRetries} retry attempts made. Retries failed."); } int delay = Math.Min(delayMilliseconds * (pow - 1) / 2, maxDelayMilliseconds); Debug.WriteLine($"Retry {retries} after {delay} milliseconds delay. Maximum delay is {maxDelayMilliseconds} milliseconds."); return Task.Delay(delay); }

This method implements a roughly exponential delay, up to a maximum number of milliseconds specified by the maxDelayMilliseconds variable, while ensuring that the maximum number of retry attempts isn’t exceeded by throwing a TimeoutException when the number of actual retries is equal to the maximum number of retries allowed.

Running the Sample Application

The sample application, which can be found on GitHub, connects to a read-only REST service hosted by Xamarin, and it’s most likely that when running the sample the GET operation will succeed on first attempt. To observe the retry pattern in operation, change the RestUrl property in the Constants class to an address that doesn’t exist – this can be accomplished by adding a random character to the end of the existing string. Then run the application and observe the output in the output window in Visual Studio. You should see something like:

Exception: 404 (Not Found) Retry 1 after 100 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 2 after 300 milliseconds delay. Maximum delay is 2000 milliseconds. Thread started: <Thread Pool> #10 Exception: 404 (Not Found) Retry 3 after 700 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 4 after 1500 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 5 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 6 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 7 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 8 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Thread started: <Thread Pool> #11 Thread started: <Thread Pool> #12 Exception: 404 (Not Found) Retry 9 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) Retry 10 after 2000 milliseconds delay. Maximum delay is 2000 milliseconds. Exception: 404 (Not Found) ERROR: 10 retry attempts made. Retries failed.

This shows the GET operation being retried 10 times, after a roughly exponential increasing delay, up to a maximum of 2000 milliseconds. Remember that the number of retries, initial delay (in milliseconds), and maximum delay (in milliseconds) can be specified when creating the RetryWithExponentialBackoff instance. This allows the retry pattern to be customized to fit individual application requirements.

Summary

The retry pattern allows applications to retry a failing request to a remote service, after a suitable delay. Remote access requests that are repeated after a suitable delay are likely to succeed, if the fault in the remote service is transient.

This blog post has explained how to implement the retry pattern, with exponential backoff. The number of retries, initial delay, and maximum delay can all be specified, allowing the retry pattern to be customized to fit individual application requirements.The advantage of the approach presented here is that the retry pattern is implemented without requiring any library code, for those sensitive to bloating their application package size.

In my next blog post I’ll show how to re-implement the retry pattern using Polly.