I can see several workflows for save in LDAP/log event storage/git storage, and none seems clearly superior to the other or to provide the awaited behavior.
For now, when we modify an object, that is what is done - an be aware that an error to any step stop the whole thing, leading to an error :
- 1. get context (current state, parents, etc) and sanity/prerequisite check (name availability, etc)
- 2. update the object and get the diff - that's atomic, and that's the only atomic part
- 3. log the event modification
- 4. if auto-archive is requested, autoarchive.
2. and 3. are so close because we really want to log as soon as the modification is done, to minimize the inconsistency window. Actually, these two steps should be atomic, but we technically can't do that.
The problem is that we want to latter know what happened on step 4 (the commit ID) to be available for step 3 (the log event).
So there is basically 3 ways to achieve that:
- 1. put step 3 in place of step 4, so that the commit ID is available in the event log
- 2. add a step 5 that will update some database saying "commit ID from step 4 is related to event log id from step 3". That database can be :
- the log event table (a new column);
- an other database or table, dedicated to store that relationship.
- 3. have a common, pre existing information shared by step 3 and step 4, so that we know that these steps are consequences of the pre-existing information.
1. and 2. above share fundamental problems:
- the solution is not extensible. If tomorrow, we want to add an other post-modification action, we will have to (again) find a way to order steps for technical reason or (again) add some column in the event log table.
- they brings false (technical) dependencies so that we don't know any more what is functional requirement and technical one: git storage is not really dependant from the event log, nor the other way around. But we want to store the fact that the two are related to the acceptance of a modification.
So it seems to means that we want to be able to uniquely identify each modification, and then create map between consumers of these modification.
Consumers may be the LDAP store, the log event repository, the git repos, etc.
These UUID would be specific to a given Rudder installation, and are meaningless outward that installation bounds. Typically, there is no meaning for such an UUID in the API to request a Rule modification (but the UUID can be returned by the successful request, and used to look-up what was the modification with that UUID).
Moreover, it seems to be the natural solution toward we are going with modification worflows.
Technical details
Having a "modification UUID" mean that:
- we want to generate that UUID at some step. Today, it seems to make sense to generate the UUID when the modification is commited, at the same time as the Diff. Tomorrow, we will certainly want to have modification (diff) existing before any actual modification in repository.
- we have to store that UUID in the log event table, either in a dedicated column, or as part of the detail modification. It seems to make sense to have that in the details as it's a part of the modification itself, but there is also great cons:
- performance wise, it will be horrible to look-up on that, so if we want to find all logs related to a given modification, that will be slow (and it seems to be an acceptable use case) ;
- it also make sense to consider the details (the XML part) as a common, generic part between what is part of a modification request from the outside, and the internal Rudder representation (with an UUID) of such request.
- we want to store the link between a commit id and the modification UUID