Architecture #14870


Use ZIO for effect management in Rudder

Added by François ARMAND almost 2 years ago. Updated over 1 year ago.

Architecture - Refactoring
Target version:
Effort required:


Context: Error return type with LiftWeb Box

Historically, we used liftweb Box type to handle result which may fail in Rudder.

This was good and quite on the edge 10 years ago, and `Box` usage brings two majors features:

- a) Box clearly split appart the nominal path (Full[A]) from errors (Failure). In that regard, Box is the same than Either or any modern IO monad (but remember that at the time, scala Either wasn't right-biased...)
- b) and more importantly, errors come with the possiblity to iteratively build up explanation, giving more context depending where the error happen and chaining user-oriented message.

On that second point, it means that you can "chain" explanation, so that each layer can set the relevant context for the error, typically from very technical for lower layer, and more user-oriented in higher one. These stack of messages can then be formatted and displayed depending of the targetted audience.

For example, imagine that your UI did an ajax request to get some details on something, and the database is down at that moment.
The user oriented message then can be read as: There was a problem with data access, please retry or contact your administrator while the log display:

[.....] There was a problem with data access, please retry or contact your administrator <- JSON request to URL .... returned an error <- Impossible to get details for configuration with ID xxxxx <- Connexion to database error: (technical details of why)" 

This is a very good, and extremely important feature for a project like Rudder. So we use it pervasively in Rudder code (> 120k usage in Rudder 5.0).

Box limitations

Nonetheless, Box has three major problems:

- 1/ it's a lift dependency, and lift is web-framework oriented, and not much used beside that case. We would prefer to have an effect/error lib for that (and bridge it with `Box` in the web part). This one is not a breaking one, though.

- Box is tri-stated: Full[A], Empty, and Failure, and 10 years of usage lead us to believe that even with the stricter discipline, Empty semantic is impossible to maintain in time. We never know if it's a non-explained error, or if it is a expected empty case. The semantic of flatMap let us believe it's the first interpretation, but actually, we never want to have not contextualised errors. So that state is more of a burden than an help (we still have to deal with it even if we banned it), and contribute to massive puzzling for new commers.

- and more specifically, `Box` is not at all a principled effect management library, with verified applicative/monad laws, and helpful combinators.

Introducing ZIO for error management

ZIO answer the three draw backs of `Box`, because obviously it is a principled effect library, developped with 10 years of R&D evolution on the topic, even pushing the state of the art on it.

It goes even way further than what provided Box, bringing a whole new world of async and concurrent programing, Schedule, Software Transactionnal Memory, efficient purelly functionnal queues, etc etc. We were forced to used some other libraries for these topics (like `monix`), which multiplies the concept and dependencies.

But ZIO doesn't help directly for the contextualisation of errors. At least, it helps a lot for the developper, with an extremelly powerful tracing framework:

But (obviously) it does not help for what are `Result` in the context of Rudder. So I build one with the RudderError, PureResult and IOResult hierachies.

Povided in that evolution: our new error hierarchy and effect management

The PR corresponding to that evolution introduces 3 things:

  • 1/ the concept of RudderError which is the main type of errors in Rudder.

It provides common error cases and combinators. Notably, the defaults errors are:

- SystemError which encapsulate exception,
- Unexpected for value that should not happens for example when building a config,
- Unconsistant for things that should not happen from a business perspective like "that entity should really be in the base because I just checked";
- Chained error case which provides a ".chainError" combinator to (yes, suspens) chain an error with a new contextualised message
- And an Accumuted error for the Applicative accumulation of errors.

All Rudder errors implement a .msg method that allows for user rendering of the method, with the correct logic for each of them (particulary Chained / Accumulated ones).

These basic cases come with combinators like .notOptionnal, .chainError, .accumulate...

  • 2/ the concept of `Result`, with a pure and an effect variant. The pure variant is an alias to Either[RudderError, A] and is dedicated to non effectful code (ie: no execption, no I/O, no random, no "System time millis" etc) but with perhaps a business error. The effect variant is (TADAAAA) dedicated to effectful computation (java bridge that can throw error, I/O, etc) which is an alias to ZIO[Any, RudderError, A] - we don't use ZIO context for now.
  • 3/ and of course, ZIO as our effect library.

It also provides combinators to go from Box to/from Result, and port a substantial part of Rudder lower level to ZIO.

The main set-up is available in:

That basic framework also allows to have specialized domain error for a dedicated part of the application. I did it for LDAP connection module (also: technique parsing; entity mapping), as we want to manage LDAP errors with a more precise semantic than what provides the base errors described above.

This can be seen here:

Example of usage with combinator and contextualisation

Here comme very short example of what it looks like in the wild: (

1. def get(id: RuleCategoryId): IOResult[RuleCategory] = {
2.    for {
3.      con      <- ldap
4.      entry    <- getCategoryEntry(con, id).notOptional(s"Entry with ID '${id.value}' was not found")
5.      category <- mapper.entry2RuleCategory(entry).toIO.chainError(s"Error when transforming LDAP entry ${entry} into a server group category")
6.    } yield {
7.      category
8.    }
9. }

We have a business, middle layer method which try to get a category based on an ID.
The result (line 1) is an IOResult because data are stored in an LDAP database.

Line 3, we obtain an LDAP connexion (which is managed with Bracket in the backend). That connection returns LDAPIOResult, which is the specific error type used by the LDAP modules. But we see that it is seemlessly integrated with IOResult and doesn't need special error mapping.

Line 4. we have method getCategoryEntry that returns a IOResult[Option[LDAPEntry]]. This is the logical thing to do from that method point of view: it allows to distinguish between an LDAP error and the abscense of a corresponding entry. But for the domain, at that point, it is a failure to not have one, so we use the .notOptionnal(error message) combinator to say so.

Line 5., we map the LDAP entry into our business object. This is a pure computation with a PureResult, so we translate it to IOResult with .toIO. And we give more domain context to let people understand what was the trigger of the error ("ok, mandatory attribute 'xxxx' is missing, but why did I need it?").

And of course, everything is construct in the classical for comprehension.

Interesting little things

We have a pure version of slf4j loggers:

Example of bracket, as it seems to come again and again:

Subtasks 1 (0 open1 closed)

Architecture #15102: Port NuProcess and RunHooks to ZIOReleasedFrançois ARMANDActions

Also available in: Atom PDF