resilience4j

Resilience4j is a fault tolerance library for Java™

Resilience4j is a lightweight fault tolerance library inspired by Netflix Hystrix, but designed for functional programming. Lightweight, because the library only uses Vavr, which does not have any other external library dependencies. Netflix Hystrix, in contrast, has a compile dependency to Archaius which has many more external library dependencies such as Guava and Apache Commons Configuration.

Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference with a Circuit Breaker, Rate Limiter, Retry or Bulkhead. You can stack more than one decorator on any functional interface, lambda expression or method reference. The advantage is that you have the choice to select the decorators you need and nothing else.

Supplier<String> supplier = () -> backendService.doSomething(param1, param2);

Supplier<String> decoratedSupplier = Decorators.ofSupplier(supplier)
  .withRetry(Retry.ofDefaults("name"))
  .withCircuitBreaker(CircuitBreaker.ofDefaults("name"))
  .withBulkhead(Bulkhead.ofDefaults("name"));  

String result = Try.ofSupplier(decoratedSupplier)
  .recover(throwable -> "Hello from Recovery").get();

With Resilience4j you don’t have to go all-in, you can pick what you need.

Get Started

CircuitBreaker

Getting started with resilience4j-circuitbreaker

Introduction

The CircuitBreaker is implemented via a finite state machine with three normal states: CLOSED, OPEN and HALF_OPEN and two special states DISABLED and FORCED_OPEN.

The CircuitBreaker uses a sliding window to store and aggregate the outcome of calls. You can choose between a count-based sliding window and a time-based sliding window. The count-based sliding window aggregrates the outcome of the last N calls. The time-based sliding window aggregrates the outcome of the calls of the last N seconds.

Count-based sliding window

The count-based sliding window is implemented with a circular array of N measurements.
If the time window size is 10, the circular array has always 10 measurements.
The sliding window incrementally updates a total aggregation. The total aggregation is updated when a new call outcome is recorded. When the oldest measurement is evicted, the measurement is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

The time to retrieve a Snapshot is constant 0(1), since the Snapshot is pre-aggregated and is independent of the window size.
The space requirement (memory consumption) of this implementation should be O(n).

Time-based sliding window

The time-based sliding window is implemented with a circular array of N partial aggregations (buckets).
If the time window size is 10 seconds, the circular array has always 10 partial aggregations (buckets). Every bucket aggregates the outcome of all calls which happen in a certain epoch second. (Partial aggregation). The head bucket of the circular array stores the call outcomes of the current epoch second. The other partial aggregations store the call outcomes of the previous seconds.
The sliding window does not store call outcomes (tuples) individually, but incrementally updates partial aggregations (bucket) and a total aggregation.
The total aggregation is updated incrementally when a new call outcome is recorded. When the oldest bucket is evicted, the partial total aggregation of that bucket is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

The time to retrieve a Snapshot is constant 0(1), since the Snapshot is pre-aggregated and is independent of the time window size.
The space requirement (memory consumption) of this implementation should be nearly constant O(n), since the call outcomes (tuples) are not stored individually. Only N partial aggregations and 1 total total aggregation are created.

A partial aggregation consists of 3 integers in order to count the number of failed calls, the number of slow calls and total number of calls. And one long which stores total duration of all calls.

Failure rate and slow call rate thresholds

The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed.
By default all exceptions count as a failure. You can define a list of exceptions which should count as a failure. All other exceptions are then counted as a success, unless they are ignored. Exceptions can also be ignored so that they neither count as a failure nor success.

The CircuitBreaker also changes from CLOSED to OPEN when the percentage of slow calls is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls took longer than 5 seconds. This helps to reduce the load on an external system before it is actually unresponsive.

The failure rate and slow call rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.

The CircuitBreaker rejects calls with a CallNotPermittedException when it is OPEN. After a wait time duration has elapsed, the CircuitBreaker state changes from OPEN to HALF_OPEN and permits a configurable number of calls to see if the backend is still unavailable or has become available again. Further calls are rejected with a CallNotPermittedException, until all permitted calls have completed.
If the failure rate or slow call rate is then equal or greater than the configured threshold, the state changes back to OPEN. If the failure rate and slow call rate is below the threshold, the state changes back to CLOSED.

The Circuit Breaker supports two more special states, DISABLED (always allow access) and FORCED_OPEN (always deny access). In these two states no Circuit Breaker events (apart from the state transition) are generated, and no metrics are recorded. The only way to exit from those states are to trigger a state transition or to reset the Circuit Breaker.

The CircuitBreaker is thread-safe as follows :

  • The state of a CircuitBreaker is stored in a AtomicReference
  • The CircuitBreaker uses atomic operations to update the state with side-effect-free functions.
  • Recording calls and reading snapshots from the Sliding Window is synchronized

That means atomicity should be guaranteed and only one thread is able to update the state or the Sliding Window at a point in time.

But the CircuitBreaker does not synchronize the function call. That means the function call itself is not part of the critical section. Otherwise a CircuitBreaker would introduce a huge performance penalty and bottleneck. A slow function call would have a huge negative impact to the overall performance/throughput.

If 20 concurrent threads ask for the permission to execute a function and the state of the CircuitBreaker is closed, all threads are allowed to invoke the function. Even if the Ring Bit Buffer size is 15. The size of the Ring Bit Buffer does not mean that only 15 calls are allowed to run concurrently. If you want to restrict the number of concurrent threads, please use a Bulkhead. You can combine a Bulkhead and a CircuitBreaker.

Example with 1 Thread:

Example with 3 Threads:

Create a CircuitBreakerRegistry

Resilience4j comes with an in-memory CircuitBreakerRegistry based on a ConcurrentHashMap which provides thread safety and atomicity guarantees. You can use the CircuitBreakerRegistry to manage (create and retrieve) CircuitBreaker instances. You can create a CircuitBreakerRegistry with a global default CircuitBreakerConfig for all of your CircuitBreaker instances as follows.

CircuitBreakerRegistry circuitBreakerRegistry = 
  CircuitBreakerRegistry.ofDefaults();

Create and configure a CircuitBreaker

You can provide your own custom global CircuitBreakerConfig. In order to create a custom global CircuitBreakerConfig, you can use the CircuitBreakerConfig builder. You can use the builder to configure the following properties.

Config property
Default Value
Description

failureRateThreshold

50

Configures the failure rate threshold in percentage.

When the failure rate is equal or greater than the threshold the CircuitBreaker transitions to open and starts short-circuiting calls.

slowCallRateThreshold

100

Configures a threshold in percentage. The CircuitBreaker considers a call as slow when the call duration is greater than slowCallDurationThreshold

When the percentage of slow calls is equal or greater the threshold, the CircuitBreaker transitions to open and starts short-circuiting calls.

slowCallDurationThreshold

60 [s]

Configures the duration threshold above which calls are considered as slow and increase the rate of slow calls.

permittedNumberOfCalls
InHalfOpenState

10

Configures the number of permitted calls when the CircuitBreaker is half open.

slidingWindowType

COUNT_BASED

Configures the type of the sliding window which is used to record the outcome of calls when the CircuitBreaker is closed.
Sliding window can either be count-based or time-based.

If the sliding window is COUNT_BASED, the last slidingWindowSize calls are recorded and aggregated.
If the sliding window is TIME_BASED, the calls of the last slidingWindowSize seconds recorded and aggregated.

slidingWindowSize

100

Configures the size of the sliding window which is used to record the outcome of calls when the CircuitBreaker is closed.

minimumNumberOfCalls

10

Configures the minimum number of calls which are required (per sliding window period) before the CircuitBreaker can calculate the error rate.
For example, if minimumNumberOfCalls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated.
If only 9 calls have been recorded the CircuitBreaker will not transition to open even if all 9 calls have failed.

waitDurationInOpenState

60 [s]

The time that the CircuitBreaker should wait before transitioning from open to half-open.

automaticTransition
FromOpenToHalfOpenEnabled

false

If set to true it means that the CircuitBreaker will automatically transition from open to half-open state and not call is need to trigger the transition.

recordExceptions

empty

A list of exceptions that are recorded as a failure and thus increase the failure rate.
Any exception matching or inheriting from one of the list counts as a failure, unless explicitly ignored via ignoreExceptions.
If you specify a list of exceptions, all other exceptions count as a success, unless they are explicitly ignored by ignoreExceptions.

ignoreExceptions

empty

A list of exceptions that are ignored and neither count as a failure nor success.
Any exception matching or inheriting from one of the list will not count as a failure nor success, even if the exceptions is part of recordExceptions.

recordException

throwable -> true

By default all exceptions are recored as failures.

A custom Predicate which evaluates if an exception should be recorded as a failure.
The Predicate must return true if the exception should count as a failure. The Predicate must return false, if the exception
should count as a success, unless the exception is explicitly ignored by ignoreExceptions.

ignoreException

throwable -> false

By default no exception is ignored.

A custom Predicate which evaluates if an exception should be ignored and neither count as a failure nor success.
The Predicate must return true if the exception should be ignored.
The Predicate must return false, if the exception should count as a failure.

// Create a custom configuration for a CircuitBreaker
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .failureRateThreshold(50)
  .slowCallRateThreshold(50)
  .waitDurationInOpenState(Duration.ofMillis(1000))
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .permittedNumberOfCallsInHalfOpenState(3)
  .minimumNumberOfCalls(10)
  .slidingWindowType(SlidingWindowType.TIME_BASED)
  .slidingWindowSize(5)
  .recordException(e -> INTERNAL_SERVER_ERROR
                 .equals(getResponse().getStatus()))
  .recordExceptions(IOException.class, TimeoutException.class)
  .ignoreExceptions(BusinessException.class, OtherBusinessException.class)
  .build();

// Create a CircuitBreakerRegistry with a custom global configuration
CircuitBreakerRegistry circuitBreakerRegistry 
  CircuitBreakerRegistry.of(circuitBreakerConfig);

// Get or create a CircuitBreaker from the CircuitBreakerRegistry 
// with the global default configuration
CircuitBreaker circuitBreakerWithDefaultConfig = 
  circuitBreakerRegistry.circuitBreaker("name1");

// Get or create a CircuitBreaker from the CircuitBreakerRegistry 
// with a custom configuration
CircuitBreaker circuitBreakerWithCustomConfig = circuitBreakerRegistry
  .circuitBreaker("name2", circuitBreakerConfig);

You can add configurations which can be shared by multiple CircuitBreaker instances.

CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .failureRateThreshold(70)
  .build();

circuitBreakerRegistry.addConfiguration("someSharedConfig", config);

CircuitBreaker circuitBreaker = circuitBreakerRegistry
  .circuitBreaker("name", "someSharedConfig");

You can overwrite configurations.

 CircuitBreakerConfig defaultConfig = circuitBreakerRegistry
   .getDefaultConfig();

CircuitBreakerConfig overwrittenConfig = CircuitBreakerConfig
  .from(defaultConfig)
  .waitDurationInOpenState(Duration.ofSeconds(20))
  .build();

If you don’t want to use the CircuitBreakerRegistry to manage CircuitBreaker instances, you can also create instances directly.

// Create a custom configuration for a CircuitBreaker
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .recordExceptions(IOException.class, TimeoutException.class)
  .ignoreExceptions(BusinessException.class, OtherBusinessException.class)
  .build();

CircuitBreaker customCircuitBreaker = CircuitBreaker
  .of("testName", circuitBreakerConfig);

Decorate and execute a functional interface

You can decorate any Callable, Supplier, Runnable, Consumer, CheckedRunnable, CheckedSupplier, CheckedConsumer or CompletionStage with a CircuitBreaker.
You can invoke the decorated function with Try.of(…​) or Try.run(…​) from Vavr. This allows to chain further functions with map, flatMap, filter, recover or andThen. The chained functions are only invoked, if the CircuitBreaker is CLOSED or HALF_OPEN.
In the following example, Try.of(…​) returns a Success<String> Monad, if the invocation of the function is successful. If the function throws an exception, a Failure<Throwable> Monad is returned and map is not invoked.

// Given
CircuitBreaker circuitBreaker = CircuitBreaker.ofDefaults("testName");

// When I decorate my function
CheckedFunction0<String> decoratedSupplier = CircuitBreaker
        .decorateCheckedSupplier(circuitBreaker, () -> "This can be any method which returns: 'Hello");

// and chain an other function with map
Try<String> result = Try.of(decoratedSupplier)
                .map(value -> value + " world'");

// Then the Try Monad returns a Success<String>, if all functions ran successfully.
assertThat(result.isSuccess()).isTrue();
assertThat(result.get()).isEqualTo("This can be any method which returns: 'Hello world'");

Consume emitted RegistryEvents

You can register event consumer on a CircuitBreakerRegistry and take actions whenever a CircuitBreaker is created, replaced or deleted.

CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.ofDefaults();
circuitBreakerRegistry.getEventPublisher()
  .onEntryAdded(entryAddedEvent -> {
    CircuitBreaker addedCircuitBreaker = entryAddedEvent.getAddedEntry();
    LOG.info("CircuitBreaker {} added", addedCircuitBreaker.getName());
  })
  .onEntryRemoved(entryRemovedEvent -> {
    CircuitBreaker removedCircuitBreaker = entryRemovedEvent.getRemovedEntry();
    LOG.info("CircuitBreaker {} removed", removedCircuitBreaker.getName());
  });

Consume emitted CircuitBreakerEvents

A CircuitBreakerEvent can be a state transition, a circuit breaker reset, a successful call, a recorded error or an ignored error. All events contains additional information like event creation time and processing duration of the call. If you want to consume events, you have to register an event consumer.

circuitBreaker.getEventPublisher()
    .onSuccess(event -> logger.info(...))
    .onError(event -> logger.info(...))
    .onIgnoredError(event -> logger.info(...))
    .onReset(event -> logger.info(...))
    .onStateTransition(event -> logger.info(...));
// Or if you want to register a consumer listening
// to all events, you can do:
circuitBreaker.getEventPublisher()
    .onEvent(event -> logger.info(...));

You could use the CircularEventConsumer to store events in a circular buffer with a fixed capacity.

CircularEventConsumer<CircuitBreakerEvent> ringBuffer = 
  new CircularEventConsumer<>(10);
circuitBreaker.getEventPublisher().onEvent(ringBuffer);
List<CircuitBreakerEvent> bufferedEvents = ringBuffer.getBufferedEvents()

You can use RxJava or RxJava2 Adapters to convert the EventPublisher into a Reactive Stream.


What's Next

Examples
Bulkhead

CircuitBreaker


Getting started with resilience4j-circuitbreaker

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.