A quick guide to Resilience4j with Spring Boot.

Resilience4j with Spring Boot is really simple and awesome.

Resilience4j is a fault tolerance library inspired by Netflix Hystrix, that offers implementations for many microservices stability/fault tolerances patterns. As per their documentation – it is light weight and easy to use. With Spring boot it is very easy to define and incorporate them in our apps using annotations.

With Spring Boot, Resilience4j is preferred over Hystrix for implementing fault tolerance patterns like Circuit breaker, bulkhead, timeouts, rate limiters and retries.

Use cases for resilience4j :
  1. Circuit Breaker
  2. Ratelimiter
  3. Bulkhead
  4. Timelimiter
  5. Retry

Circuit Breaker

At home a circuit breaker protects the home and electrical appliances by breaking the circuit and stopping the flow of electricity when there is excess current. The same pattern can be applied in software to protect the system and individual microservices from huge failures.

How it works ?
Circuit Breaker

Circuit Breaker has 3 States:

  1. CLOSED – This is the normal state in which all requests flow through the circuit without any restriction.
  2. OPEN – For the past n number of requests (or) n number of seconds, if the failures / slow response rate is equal to or greater than a configurable threshold, the circuit opens. In this state all calls will be restricted with CallNotPermittedException.
  3. HALF_OPEN – After a configurable wait time in OPEN State, the circuit breaker allows a small number of (configurable) requests to pass through. In the HALF_OPEN state:
    • If the failures / slow responses are above the threshold, it moves back to OPEN state and wait again.
    • If the failures / slow response are below the threshold, it moves to CLOSED state and functions normally.

The advantage of this pattern is, if there are failures or slow responses the system will not get overwhelmed. With the wait time in the OPEN state, it allows time to heal. By opening the circuit we can send back some meaningful response to the users instead of waiting and timeout.

How to configure circuit breaker in spring boot:

Maven pom.xml: Add below 3 jars.

<dependency>
   <groupId>io.github.resilience4j</groupId>
   <artifactId>resilience4j-spring-boot2</artifactId>
   <version>1.7.0</version>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Then we need to define the configs for a circuitbreaker in application.yml. Here we are going to OPEN the circuit if

  • 50% of the calls failed (failureRateThreshold: 50) (or) 100% of the calls (slowCallRateThreshold: 100) took greater than 2 seconds (slowCallDurationThreshold: 2000) – in a 60 second window (slidingWindowSize: 60 & slidingWindowType: TIME_BASED)
  • Once the Circuit OPEN s it rejects the requests with CallNotPermittedException for 5 seconds (waitDurationInOpenState:5s) and then moves to HALF_OPEN state.
  • It permits 3 calls in HALF_OPEN state (permittedNumberOfCallsInHalfOpenState: 3) and if the failure or slow call rate is still above threshold it moves to OPEN state again.
  • minimumNumberOfCalls: 10 mentions the minimum number of calls to calculate the failure or slow call rates for the Circuitbreaker to move from CLOSED to OPEN state. It is not applicable when the Circuitbreaker is in HALF_OPEN state. For example if there are only 9 calls in a 60 second window and even if all of them failed, the circuit will not OPEN.
  • ignoreExceptions: specifies a list of exceptions that will be ignored when counting the failure.
resilience4j.circuitbreaker:
  instances:
    greetingClientCB:
      registerHealthIndicator: true
      slidingWindowSize: 60
      slidingWindowType: TIME_BASED
      permittedNumberOfCallsInHalfOpenState: 3
      minimumNumberOfCalls: 10
      waitDurationInOpenState: 5s
      slowCallRateThreshold: 100
      slowCallDurationThreshold: 2000
      failureRateThreshold: 50
      ignoreExceptions:
        - com.jsession4d.feignclient.BusinessException

Note: The default slidingWindowType is COUNT_BASED. If it is COUNT_BASED then slidingWindowSize represents the number of requests instead of number of seconds. For a complete set of configs refer resilience4j docs.

Then we just need to add @CircuitBreaker(name="greetingClientCB") annotation to the method that is calling the external service. In the sample below we have annotated a FeignClient method.

@FeignClient(name="greetingClient", url="${serviceUrl}")
public interface SampleFeignClient {

@CircuitBreaker(name="greetingClientCB")
@RequestMapping(method= RequestMethod.GET, value="/greeting/{greetingId}")
String getGreeting(@PathVariable("greetingId") int id);
}

At a bare minimum that’s all we need to configure circuit breaker.

Circuit Breaker details and statuses in Spring Boot Actuator
Endpoint: /actuator/health

We can see circuit breaker statuses in /actuator/health endpoint. It is disabled by default. Add the below configs to enable it.

management:
  endpoint:
    health:
      show-details: always         #To show all details in /health endpoint.

management.health.circuitbreakers.enabled: true #To show Circuit Breaker status

resilience4j.circuitbreaker:
  instances:
    greetingClientCB:                         #An identifier for our Circuitbreaker
      registerHealthIndicator: true           #Needed to show the greentingClientCB status
                                              #in /actuator/health endpoint

This will show the Circuit breaker statuses in /actuator/health endpoint.

Endpoint: /actuator/metrics

As metrics endpoint is not exposed by default, add the below config to expose all the actuator endpoints.

management:
endpoints:
web:
exposure:
include: "*"

Then we can see all the available metrics under the endpoint.

More details on these metrics can be viewed by appending the metric name to /actuator/metrics url as shown below.

  • /actuator/metrics/{requiredMetricName}
  • /actuator/metrics/resilience4j.circuitbreaker.calls
  • /actuator/metrics/resilience4j.circuitbreaker.failure.rate
  • /actuator/metrics/resilience4j.circuitbreaker.slow.call.rate
  • /actuator/metrics/resilience4j.circuitbreaker.state
Endpoint: /actuator/circuitbreakerevents

The emitted circuitbreaker events are stored in a circular consumer event buffer. The size of the buffer eventConsumerBufferSize can be configured in application.yml

resilience4j.circuitbreaker:
instances:
greetingClientCB:
eventConsumerBufferSize: 10

All the below endpoints are available.

http://localhost:8080/actuator/circuitbreakerevents
http://localhost:8080/actuator/circuitbreakerevents/{name}/{eventType}
http://localhost:8080/actuator/circuitbreakerevents/{name}
http://localhost:8080/actuator/circuitbreakers
http://localhost:8080/actuator/circuitbreakers/{name}

Rate Limiter

This is another very useful pattern in microservices.

Say if you are running a batch which calls a microservice. If that microservice can handle only 10 transactions per second (TPS), the requests will be rejected with HTTP 429 (too many requests) error if the consumer crosses the 10 TPS limit. Unless we have some kind of fault tolerance pattern the batch will fail.

In this scenario, we can use a ratelimiter to limit the TPS at consumer side, to avoid HTTP 429s. If the consumer reaches the rate limit then subsequent calls will wait till the rate of calls decreases or a (configurable)timeout occurs whichever happens first.

Spring Boot Configs for ratelimiter.

Maven pom.xml: Same 3 jars that we added for CircuitBreaker. resilience4j-spring-boot2 has the implementations for all the fault tolerance patterns provided by resilience4j .

Note: Actuator & starter-aop jars are needed to support resilience4j.

<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Here we are limiting the rate of calling a method to 5 TPS, using ratelimiter configs in application.yml. The excess requests will wait for 3 seconds as configured below(timeoutDuration: 3s)

resilience4j.ratelimiter:
  instances:
    my5TPSRateLimiter:
      limitRefreshPeriod: 1s
      limitForPeriod: 5
      timeoutDuration: 3s
      registerHealthIndicator: true    #To register ratelimiter details to /actuator/health endpoint.
      eventConsumerBufferSize: 100     #Buffer size to store the ratelimiter events - viewable via /actuator/ratelimiterevents

limitRefreshPeriod – specifies the time window to count the requests. Here 1 second.
limitForPeriod – specifies how many requests or method invocations are allowed in the above limitRefreshPeriod.
timeoutDuration – if the thread did not get permission to invoke the method within this time, the thread will be terminated with io.github.resilience4j.ratelimiter.RequestNotPermitted exception.

Now all we need to do is to annotate the method with @RateLimiter(name="my5TPSRateLimiter") as shown below.

@RateLimiter(name="my5TPSRateLimiter")
public void callService() {
try {
System.out.println(Thread.currentThread().getName() + "...running " +
LocalTime.now().format(DateTimeFormatter.ofPattern("HH:mm:ss")));
sampleFeignClient.getGreeting(5000);
} catch (Exception e){
System.out.println(e.getLocalizedMessage());
}
}

Here is the Output: Called the method with 20 threads. First 5 threads invoked the method at 16:06:39(HH:mm:ss). Second 5 at 16:06:40. Third 5 at 16:06:41 and fourth 5 at 16:06:42.

But note it will not lock the method for the first 5 threads to complete execution and exit the method. It will only limit the method invocation. For example, if the method takes 5 seconds to complete, then all the 20 threads will be there inside the method at the 4th second (at 16:06:42). If you want to avoid that we should use another famous pattern called “Bulkhead” pattern.

Thread20…running 16:06:39
Thread18…running 16:06:39
Thread3…running 16:06:39
Thread15…running 16:06:39
Thread17…running 16:06:39
Thread16…running 16:06:40
Thread12…running 16:06:40
Thread13…running 16:06:40
Thread1…running 16:06:40
Thread2…running 16:06:40
Thread11…running 16:06:41
Thread14…running 16:06:41
Thread4…running 16:06:41
Thread7…running 16:06:41
Thread6…running 16:06:41
Thread8…running 16:06:42
Thread10…running 16:06:42
Thread9…running 16:06:42
Thread5…running 16:06:42
Thread19…running 16:06:42
Rate Limiter details in Spring Boot actuator.

Actuator configs are similar to that of Circuit Breaker.

management:
  endpoints:
    web:
      exposure:
        include: "*"              #To expose all endpoints
  endpoint:
    health:
      show-details: always         # To show all details in /health endpoint.

management.health.ratelimiters.enabled: true

resilience4j.ratelimiter:
  instances:
    my5TPSRateLimiter:
      registerHealthIndicator: true

Screen print of /actuator/health endpoint.

Like Circuit Breaker rate limiter also has the below actuator endpoints.

/actuator/metrics/{requiredMetricName}
/actuator/ratelimiters
/actuator/ratelimiterevents
/actuator/ratelimiterevents/{name}
/actuator/ratelimiterevents/{name}/{eventType}

Bulkhead

Bulkhead is another interesting and important fault tolerance pattern made famous by Michael T Nygard in his book “Release It”. In a ship, bulkheads are partitions that create separate watertight compartments if sealed. If there is a damage or water leak in one part of the ship, bulkheads prevent water from reaching other compartments, eventually limiting the damage to the ship and prevents it from sinking.

The same isolation that bulkhead provides can be applied to software. In this pattern the dependencies are isolated such that resource constraints in one dependency will not affect others or the whole system. That isolation can be achieved by assigning a thread pool to each dependency.

Here we assigned a pool of 10 threads to call Service1, 5 threads to call Service2, and 4 threads to call Service3. It offers two important benefits.

  1. The main request thread can walk away if the dependency thread took more time to respond. If not we will run out of request threads.
  2. We can control how many concurrent threads will be calling a service, thereby protecting both the consumer and provider of the service by limiting the requests.

It also offers a way to control the count of concurrent threads using semaphores. So resilience4j offers bulkhead pattern with threadpool and semaphores.

Note 1: Semaphore based bulkheads will use the same user request thread and will not create new threads. Whereas thread pool bulk heads create new threads for processing.

Note 2: ThreadPool Bulkhead is only applicable for Completable Future.

Note 3: Semaphore Bulkhead is the default.

Spring Boot Configs for bulkhead.

Maven pom.xml: It is the same 3 jars that we added for CircuitBreaker / ratelimiter. resilience4j-spring-boot2 has implementations for all the fault tolerance patterns provided by resilience4j .

Note: Actuator & starter-aop jars are needed to support resilience4j.

<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Configs in application.yml. Most of the config names are self explanatory. keepAliveDuration (default: 20ms): When the number of threads is greater than the core, this is the maximum time that excess idle threads will wait for new tasks before terminating. The default works best in most of the cases.

#Threadpool bulkhead
resilience4j.thread-pool-bulkhead:
  instances:
    my-service1:
      maxThreadPoolSize: 3
      coreThreadPoolSize: 2
      queueCapacity: 1
      keepAliveDuration: 20ms 


#Semaphore bulkhead
resilience4j.bulkhead:
  instances:
    my-service2:
      maxConcurrentCalls: 2  #Max Amount of parallel execution allowed by bulkhead.
      maxWaitDuration: 5s

Next annotate the service calls with Bulkheads. The default is Semaphore bulkhead. Hence to specify Thread pool bulkhead we are adding the type parameter to the annotation @Bulkhead(name="my-service1", type=Bulkhead.Type.THREADPOOL). Thread pool bulkhead is only applicable for CompletableFuture. So you cannot add thread pool bulkhead for callService2 method as it doesn’t return CompletableFuture.

@Bulkhead(name="my-service1", type=Bulkhead.Type.THREADPOOL)
public CompletableFuture<String> callService() {
     return CompletableFuture.completedFuture(sampleFeignClient.callHelpdesk());
}

@Bulkhead(name="my-service2")
public String callService2() {
   return sampleFeignClient.callHelpdesk();
}
Bulkhead details in Spring Boot actuator.

Like Circuit Breaker bulkhead also has the following actuator endpoints.

/actuator/metrics – We can view the available metrics in metrics endpoint.
/actuator/metrics/{requiredMetricName}
/actuator/bulkheads
/actuator/bulkheadevents
/actuator/bulkheadevents/{name}
/actuator/bulkheadevents/{name}/{eventType}

The only config needed is to expose all the endpoints.

spring:
  jackson:
    serialization:
      INDENT_OUTPUT: true      #To indent the JSON in actuator.

management:
  endpoints:
    web:
      exposure:
        include: "*"              #To expose all endpoints

Note: /actuator/health will not have any details about bulkhead.

Time Limiter

Using time limiter you can set a limit on the time it takes to execute a service call in a separate thread. It throws a java.util.concurrent.TimeoutException: TimeLimiter 'service1-tl' recorded a timeout exception if the thread takes more time than the configurable limit.

Spring Boot Configs for timelimiter.

Maven pom.xml: It is the same 3 jars that we added for CircuitBreaker / ratelimiter / bulkhead. resilience4j-spring-boot2 has implementations for all the fault tolerance patterns provided by resilience4j .

Note: Actuator & starter-aop jars are needed to support resilience4j.

<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

application.yml configs for timelimiter.

resilience4j.timelimiter:
instances:
service1-tl:
timeoutDuration: 2s
cancelRunningFuture: true

Timelimiter aspect is only applicable for reactive methods or Completable futures. Methods returning CompletableFuture should also run in a threadpool. So for a TimeLimiter aspect to work we should also need a Threadpool bulkhead.

@TimeLimiter(name="service1-tl")
@Bulkhead(name="my-service1", type=Bulkhead.Type.THREADPOOL)
public CompletableFuture<String> callService() {
return CompletableFuture.completedFuture(sampleFeignClient.callHelpdesk());
}

Here if the service call takes more than 2 seconds, then the below exception will be thrown.

java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: TimeLimiter 'service1-tl' recorded a timeout exception.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
TimeLimiter details in Spring Boot actuator.

Like bulkhead, timelimiter also has the following actuator endpoints.

/actuator/metrics – We can view the available metrics in metrics endpoint.
/actuator/metrics/{requiredMetricName}
/actuator/timelimiters
/actuator/timelimiterevents
/actuator/timelimiterevents/{name}
/actuator/timelimiterevents/{name}/{eventType}

The only config needed is to expose all the endpoints.

spring:
  jackson:
    serialization:
      INDENT_OUTPUT: true      #To indent the JSON in actuator.

management:
  endpoints:
    web:
      exposure:
        include: "*"              #To expose all endpoints

Retry

Retry is a must have pattern for all service calls.

Spring Boot Configs for Retry

Maven pom.xml: It is the same 3 jars that we added for CircuitBreaker / ratelimiter / bulkhead / timelimiter. resilience4j-spring-boot2 has implementations for all the fault tolerance patterns provided by resilience4j .

Note: Actuator & starter-aop jars are needed to support resilience4j.

<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

application.yml configs for Retry. Here we are retrying 3 times with a wait duration of 1 second between the retries. We can also specify the exception list for which we have to retry and ignore list to not retry.

resilience4j.retry:
  instances:
    my-service-retry:
      maxAttempts: 3
      waitDuration: 1s
      enableExponentialBackoff: true
      exponentialBackoffMultiplier: 2
      retryExceptions:
        - org.springframework.web.client.HttpServerErrorException
        - java.io.IOException
        - feign.RetryableException
      ignoreExceptions:
        - com.jsession4d.samplerest.BusinessException

Then we just need to annotate the method that needs retry.

@Retry(name="my-service-retry")
public String callService2() {
return sampleFeignClient.callHelpdesk();
}
Retry details in Spring Boot actuator.

Like bulkhead, retry also has the following actuator endpoints.

/actuator/metrics – We can view the available metrics in metrics endpoint.
/actuator/metrics/{requiredMetricName}
/actuator/retries
/actuator/retryevents
/actuator/retryevents/{name}
/actuator/retryevents/{name}/{eventType}

The only config needed is to expose all the endpoints.

spring:
  jackson:
    serialization:
      INDENT_OUTPUT: true      #To indent the JSON in actuator.

management:
  endpoints:
    web:
      exposure:
        include: "*"              #To expose all endpoints

Resilience4j Aspect Order.

The default Aspect order of resilience4j is:

Retry ( CircuitBreaker ( RateLimiter ( TimeLimiter ( Bulkhead ( Function ) ) ) ) )

First Bulkhead creates a threadpool. Then TimeLimiter can limit the time of the threads. RateLimiter limits the number of calls on that function for a configurable time window. Any exceptions thrown by TimeLimiter or RateLimiter will be recorded by CircuitBreaker. Then retry will be executed.

This default makes sense. But for any reason if we want to change the order of the Aspects we can specify that in application.yml as shown below. Higher number is higher priority. As per the config below retry will be done before the Circuit Breaker.

resilience4j:
circuitbreaker:
circuitBreakerAspectOrder: 1
retry:
retryAspectOrder: 2

Always refer to: https://resilience4j.readme.io/docs for up to date information.

%d bloggers like this: