CircuitBreaker

Introduction

The CircuitBreaker is implemented via a finite state machine with three normal states: CLOSED, OPEN and HALF_OPEN and two special states DISABLED and FORCED_OPEN.

The CircuitBreaker uses a sliding window to store and aggregate the outcome of calls. You can choose between a count-based sliding window and a time-based sliding window. The count-based sliding window aggregrates the outcome of the last N calls. The time-based sliding window aggregrates the outcome of the calls of the last N seconds.

Count-based sliding window

The count-based sliding window is implemented with a circular array of N measurements.
If the count window size is 10, the circular array has always 10 measurements.
The sliding window incrementally updates a total aggregation. The total aggregation is updated when a new call outcome is recorded. When the oldest measurement is evicted, the measurement is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

The time to retrieve a Snapshot is constant O(1), since the Snapshot is pre-aggregated and is independent of the window size.
The space requirement (memory consumption) of this implementation should be O(n).

Time-based sliding window

The time-based sliding window is implemented with a circular array of N partial aggregations (buckets).
If the time window size is 10 seconds, the circular array has always 10 partial aggregations (buckets). Every bucket aggregates the outcome of all calls which happen in a certain epoch second. (Partial aggregation). The head bucket of the circular array stores the call outcomes of the current epoch second. The other partial aggregations store the call outcomes of the previous seconds.
The sliding window does not store call outcomes (tuples) individually, but incrementally updates partial aggregations (bucket) and a total aggregation.
The total aggregation is updated incrementally when a new call outcome is recorded. When the oldest bucket is evicted, the partial total aggregation of that bucket is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

The time to retrieve a Snapshot is constant O(1), since the Snapshot is pre-aggregated and is independent of the time window size.
The space requirement (memory consumption) of this implementation should be nearly constant O(n), since the call outcomes (tuples) are not stored individually. Only N partial aggregations and 1 total total aggregation are created.

A partial aggregation consists of 3 integers in order to count the number of failed calls, the number of slow calls and total number of calls. And one long which stores total duration of all calls.

Failure rate and slow call rate thresholds

The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed.
By default all exceptions count as a failure. You can define a list of exceptions which should count as a failure. All other exceptions are then counted as a success, unless they are ignored. Exceptions can also be ignored so that they neither count as a failure nor success.

The CircuitBreaker also changes from CLOSED to OPEN when the percentage of slow calls is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls took longer than 5 seconds. This helps to reduce the load on an external system before it is actually unresponsive.

The failure rate and slow call rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.

The CircuitBreaker rejects calls with a CallNotPermittedException when it is OPEN. After a wait time duration has elapsed, the CircuitBreaker state changes from OPEN to HALF_OPEN and permits a configurable number of calls to see if the backend is still unavailable or has become available again. Further calls are rejected with a CallNotPermittedException, until all permitted calls have completed.
If the failure rate or slow call rate is then equal or greater than the configured threshold, the state changes back to OPEN. If the failure rate and slow call rate is below the threshold, the state changes back to CLOSED.

The Circuit Breaker supports two more special states, DISABLED (always allow access) and FORCED_OPEN (always deny access). In these two states no Circuit Breaker events (apart from the state transition) are generated, and no metrics are recorded. The only way to exit from those states are to trigger a state transition or to reset the Circuit Breaker.

The CircuitBreaker is thread-safe as follows :

The state of a CircuitBreaker is stored in a AtomicReference
The CircuitBreaker uses atomic operations to update the state with side-effect-free functions.
Recording calls and reading snapshots from the Sliding Window is synchronized

That means atomicity should be guaranteed and only one thread is able to update the state or the Sliding Window at a point in time.

But the CircuitBreaker does not synchronize the function call. That means the function call itself is not part of the critical section. Otherwise a CircuitBreaker would introduce a huge performance penalty and bottleneck. A slow function call would have a huge negative impact to the overall performance/throughput.

If 20 concurrent threads ask for the permission to execute a function and the state of the CircuitBreaker is closed, all threads are allowed to invoke the function. Even if the sliding window size is 15. The sliding window does not mean that only 15 calls are allowed to run concurrently. If you want to restrict the number of concurrent threads, please use a Bulkhead. You can combine a Bulkhead and a CircuitBreaker.

Example with 1 Thread:

Example with 3 Threads:

Create a CircuitBreakerRegistry

Resilience4j comes with an in-memory CircuitBreakerRegistry based on a ConcurrentHashMap which provides thread safety and atomicity guarantees. You can use the CircuitBreakerRegistry to manage (create and retrieve) CircuitBreaker instances. You can create a CircuitBreakerRegistry with a global default CircuitBreakerConfig for all of your CircuitBreaker instances as follows.

CircuitBreakerRegistry circuitBreakerRegistry = 
  CircuitBreakerRegistry.ofDefaults();

Create and configure a CircuitBreaker

You can provide your own custom global CircuitBreakerConfig. In order to create a custom global CircuitBreakerConfig, you can use the CircuitBreakerConfig builder. You can use the builder to configure the following properties.

Config property	Default Value	Description
failureRateThreshold	50	Configures the failure rate threshold in percentage. When the failure rate is equal or greater than the threshold the CircuitBreaker transitions to open and starts short-circuiting calls.
slowCallRateThreshold	100	Configures a threshold in percentage. The CircuitBreaker considers a call as slow when the call duration is greater than `slowCallDurationThreshold` When the percentage of slow calls is equal or greater the threshold, the CircuitBreaker transitions to open and starts short-circuiting calls.
slowCallDurationThreshold	60000 [ms]	Configures the duration threshold above which calls are considered as slow and increase the rate of slow calls.
permittedNumberOfCalls InHalfOpenState	10	Configures the number of permitted calls when the CircuitBreaker is half open.
maxWaitDurationInHalfOpenState	0 [ms]	Configures a maximum wait duration which controls the longest amount of time a CircuitBreaker could stay in Half Open state, before it switches to open. Value 0 means Circuit Breaker would wait infinitely in HalfOpen State until all permitted calls have been completed.
slidingWindowType	COUNT_BASED	Configures the type of the sliding window which is used to record the outcome of calls when the CircuitBreaker is closed. Sliding window can either be count-based or time-based. If the sliding window is COUNT_BASED, the last `slidingWindowSize` calls are recorded and aggregated. If the sliding window is TIME_BASED, the calls of the last `slidingWindowSize` seconds recorded and aggregated.
slidingWindowSize	100	Configures the size of the sliding window which is used to record the outcome of calls when the CircuitBreaker is closed.
minimumNumberOfCalls	100	Configures the minimum number of calls which are required (per sliding window period) before the CircuitBreaker can calculate the error rate or slow call rate. For example, if minimumNumberOfCalls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been recorded the CircuitBreaker will not transition to open even if all 9 calls have failed.
waitDurationInOpenState	60000 [ms]	The time that the CircuitBreaker should wait before transitioning from open to half-open.
automaticTransition FromOpenToHalfOpenEnabled	false	If set to true it means that the CircuitBreaker will automatically transition from open to half-open state and no call is needed to trigger the transition. A thread is created to monitor all the instances of CircuitBreakers to transition them to HALF_OPEN once waitDurationInOpenState passes. Whereas, if set to false the transition to HALF_OPEN only happens if a call is made, even after waitDurationInOpenState is passed. The advantage here is no thread monitors the state of all CircuitBreakers.
recordExceptions	empty	A list of exceptions that are recorded as a failure and thus increase the failure rate. Any exception matching or inheriting from one of the list counts as a failure, unless explicitly ignored via `ignoreExceptions`. If you specify a list of exceptions, all other exceptions count as a success, unless they are explicitly ignored by `ignoreExceptions`.
ignoreExceptions	empty	A list of exceptions that are ignored and neither count as a failure nor success. Any exception matching or inheriting from one of the list will not count as a failure nor success, even if the exceptions is part of `recordExceptions`.
recordFailurePredicate	throwable -> true By default all exceptions are recored as failures.	A custom Predicate which evaluates if an exception should be recorded as a failure. The Predicate must return true if the exception should count as a failure. The Predicate must return false, if the exception should count as a success, unless the exception is explicitly ignored by `ignoreExceptions`.
ignoreExceptionPredicate	throwable -> false By default no exception is ignored.	A custom Predicate which evaluates if an exception should be ignored and neither count as a failure nor success. The Predicate must return true if the exception should be ignored. The Predicate must return false, if the exception should count as a failure.

// Create a custom configuration for a CircuitBreaker
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .failureRateThreshold(50)
  .slowCallRateThreshold(50)
  .waitDurationInOpenState(Duration.ofMillis(1000))
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .permittedNumberOfCallsInHalfOpenState(3)
  .minimumNumberOfCalls(10)
  .slidingWindowType(SlidingWindowType.TIME_BASED)
  .slidingWindowSize(5)
  .recordException(e -> INTERNAL_SERVER_ERROR
                 .equals(getResponse().getStatus()))
  .recordExceptions(IOException.class, TimeoutException.class)
  .ignoreExceptions(BusinessException.class, OtherBusinessException.class)
  .build();

// Create a CircuitBreakerRegistry with a custom global configuration
CircuitBreakerRegistry circuitBreakerRegistry = 
  CircuitBreakerRegistry.of(circuitBreakerConfig);

// Get or create a CircuitBreaker from the CircuitBreakerRegistry 
// with the global default configuration
CircuitBreaker circuitBreakerWithDefaultConfig = 
  circuitBreakerRegistry.circuitBreaker("name1");

// Get or create a CircuitBreaker from the CircuitBreakerRegistry 
// with a custom configuration
CircuitBreaker circuitBreakerWithCustomConfig = circuitBreakerRegistry
  .circuitBreaker("name2", circuitBreakerConfig);

You can add configurations which can be shared by multiple CircuitBreaker instances.

CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .failureRateThreshold(70)
  .build();

circuitBreakerRegistry.addConfiguration("someSharedConfig", config);

CircuitBreaker circuitBreaker = circuitBreakerRegistry
  .circuitBreaker("name", "someSharedConfig");

You can overwrite configurations.

CircuitBreakerConfig defaultConfig = circuitBreakerRegistry
   .getDefaultConfig();

CircuitBreakerConfig overwrittenConfig = CircuitBreakerConfig
  .from(defaultConfig)
  .waitDurationInOpenState(Duration.ofSeconds(20))
  .build();

If you don’t want to use the CircuitBreakerRegistry to manage CircuitBreaker instances, you can also create instances directly.

// Create a custom configuration for a CircuitBreaker
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom()
  .recordExceptions(IOException.class, TimeoutException.class)
  .ignoreExceptions(BusinessException.class, OtherBusinessException.class)
  .build();

CircuitBreaker customCircuitBreaker = CircuitBreaker
  .of("testName", circuitBreakerConfig);

Alternatively, you can create CircuitBreakerRegistry using its builder methods.

Map <String, String> circuitBreakerTags = Map.of("key1", "value1", "key2", "value2");

CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.custom()
    .withCircuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
    .addRegistryEventConsumer(new RegistryEventConsumer() {
        @Override
        public void onEntryAddedEvent(EntryAddedEvent entryAddedEvent) {
            // implementation
        }
        @Override
        public void onEntryRemovedEvent(EntryRemovedEvent entryRemoveEvent) {
            // implementation
        }
        @Override
        public void onEntryReplacedEvent(EntryReplacedEvent entryReplacedEvent) {
            // implementation
        }
    })
    .withTags(circuitBreakerTags)
    .build();

CircuitBreaker circuitBreaker = circuitBreakerRegistry.circuitBreaker("testName");

If you want to plug in your own implementation of Registry, you can provide a custom implementation of Interface RegistryStore and plug in using builder method.

CircuitBreakerRegistry registry = CircuitBreakerRegistry.custom()
    .withRegistryStore(new YourRegistryStoreImplementation())
    .withCircuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
    .build();

Decorate and execute a functional interface

You can decorate any Callable, Supplier, Runnable, Consumer, CheckedRunnable, CheckedSupplier, CheckedConsumer or CompletionStage with a CircuitBreaker.
You can invoke the decorated function with Try.of(…) or Try.run(…) from Vavr. This allows to chain further functions with map, flatMap, filter, recover or andThen. The chained functions are only invoked, if the CircuitBreaker is CLOSED or HALF_OPEN.
In the following example, Try.of(…) returns a Success<String> Monad, if the invocation of the function is successful. If the function throws an exception, a Failure<Throwable> Monad is returned and map is not invoked.

// Given
CircuitBreaker circuitBreaker = CircuitBreaker.ofDefaults("testName");

// When I decorate my function
CheckedFunction0<String> decoratedSupplier = CircuitBreaker
        .decorateCheckedSupplier(circuitBreaker, () -> "This can be any method which returns: 'Hello");

// and chain an other function with map
Try<String> result = Try.of(decoratedSupplier)
                .map(value -> value + " world'");

// Then the Try Monad returns a Success<String>, if all functions ran successfully.
assertThat(result.isSuccess()).isTrue();
assertThat(result.get()).isEqualTo("This can be any method which returns: 'Hello world'");

Consume emitted RegistryEvents

You can register event consumer on a CircuitBreakerRegistry and take actions whenever a CircuitBreaker is created, replaced or deleted.

CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.ofDefaults();
circuitBreakerRegistry.getEventPublisher()
  .onEntryAdded(entryAddedEvent -> {
    CircuitBreaker addedCircuitBreaker = entryAddedEvent.getAddedEntry();
    LOG.info("CircuitBreaker {} added", addedCircuitBreaker.getName());
  })
  .onEntryRemoved(entryRemovedEvent -> {
    CircuitBreaker removedCircuitBreaker = entryRemovedEvent.getRemovedEntry();
    LOG.info("CircuitBreaker {} removed", removedCircuitBreaker.getName());
  });

Consume emitted CircuitBreakerEvents

A CircuitBreakerEvent can be a state transition, a circuit breaker reset, a successful call, a recorded error or an ignored error. All events contains additional information like event creation time and processing duration of the call. If you want to consume events, you have to register an event consumer.

circuitBreaker.getEventPublisher()
    .onSuccess(event -> logger.info(...))
    .onError(event -> logger.info(...))
    .onIgnoredError(event -> logger.info(...))
    .onReset(event -> logger.info(...))
    .onStateTransition(event -> logger.info(...));
// Or if you want to register a consumer listening
// to all events, you can do:
circuitBreaker.getEventPublisher()
    .onEvent(event -> logger.info(...));

You could use the CircularEventConsumer to store events in a circular buffer with a fixed capacity.

CircularEventConsumer<CircuitBreakerEvent> ringBuffer = 
  new CircularEventConsumer<>(10);
circuitBreaker.getEventPublisher().onEvent(ringBuffer);
List<CircuitBreakerEvent> bufferedEvents = ringBuffer.getBufferedEvents()

You can use RxJava or RxJava2 Adapters to convert the EventPublisher into a Reactive Stream.

Override the RegistryStore

You can override the in-memory RegistryStore by a custom implementation. For example, if you want to use a Cache which removes unused instances after a certain period of time.

CircuitBreakerRegistry circuitBreakerRegistry = CircuitBreakerRegistry.custom()
  .withRegistryStore(new CacheCircuitBreakerRegistryStore())
  .build();