Bug #3211
closedThe git process conflicts when several operations happen at the same time
Description
Sometimes two git process are launched at the same time, it happens when :
- Two modifications are saved at the same time
- On the update process of 2.5.0rc1, updating Directives both from the cfengine variables migration and the technique library update
- modification from both process are included into one
- the second commit fails with a exception, as the first one is in progress
Rudder does not stop from that error and continue to works.
here are the logs that happens just after an update to 2.5.0rc1:
12:57:04.838 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.UpdatePiOnActiveTechniqueEvent - Executing archivage of PIs for UPT 'ActiveTechnique(ActiveTechniqueId(5ab6132e-cda0-4c1f-9332-0691f42cdab d),rpmPackageInstallation,Map(1.0 -> 2012-07-05T12:26:08.972+02:00, 2.0 -> 2012-07-05T12:26:08.972+02:00, 2.1 -> 2012-07-05T12:26:08.972+02:00, 2.2 -> 2013-01-23T12:57:04.026+01:00),List(DirectiveId(ad57e582-0c5 0-49bd-bbea-cb98d2e6ce0d), DirectiveId(e1c34f2a-0d18-49cc-a9b6-b52ddc5780be)),true,false)' 12:57:05.213 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/userlib_applications/rpmPackageInstallation /ad57e582-0c50-49bd-bbea-cb98d2e6ce0d.xml 12:57:05.222 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/userlib_applications/rpmPackageInstallation/e1c34f2a-0d18-49cc-a9b6-b52ddc5780be.xml 12:57:06.488 [main] DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/Rudder Internal/common/common-root.xml 12:57:07.208 [main] ERROR migration - Can not finish the migration process due to an error <- Exception caught during execution of commit command 12:57:07.267 [main] ERROR migration - Root exception was: org.eclipse.jgit.errors.LockFailedException: Cannot lock /var/rudder/configuration-repository/.git/index at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:224) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r] at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:301) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r] at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:267) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r] at org.eclipse.jgit.lib.Repository.lockDirCache(Repository.java:1023) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r] at org.eclipse.jgit.api.CommitCommand.call(CommitCommand.java:191) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r] at com.normation.rudder.repository.xml.GitArchiverUtils$$anonfun$commitAddFile$1.apply(GitArchiverUtils.scala:97) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.xml.GitArchiverUtils$$anonfun$commitAddFile$1.apply(GitArchiverUtils.scala:89) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at net.liftweb.util.ControlHelpers$class.tryo(ControlHelpers.scala:46) ~[lift-util_2.9.1-2.4.jar:2.4] at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) ~[lift-util_2.9.1-2.4.jar:2.4] at net.liftweb.util.ControlHelpers$class.tryo(ControlHelpers.scala:84) ~[lift-util_2.9.1-2.4.jar:2.4] at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) ~[lift-util_2.9.1-2.4.jar:2.4] at com.normation.rudder.repository.xml.GitArchiverUtils$class.commitAddFile(GitArchiverUtils.scala:89) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl.commitAddFile(GitArchiverImpl.scala:474) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2$$anonfun$apply$26.apply(GitArchiverImpl.scala:516) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2$$anonfun$apply$26.apply(GitArchiverImpl.scala:510) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2.apply(GitArchiverImpl.scala:510) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2.apply(GitArchiverImpl.scala:508) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4] at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl.archiveDirective(GitArchiverImpl.scala:508) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf un$apply$27$$anonfun$apply$28$$anonfun$apply$29$$anonfun$apply$30.apply(LDAPDirectiveRepository.scala:250) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf un$apply$27$$anonfun$apply$28$$anonfun$apply$29$$anonfun$apply$30.apply(LDAPDirectiveRepository.scala:249) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4] at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf un$apply$27$$anonfun$apply$28$$anonfun$apply$29.apply(LDAPDirectiveRepository.scala:249) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf un$apply$27$$anonfun$apply$28$$anonfun$apply$29.apply(LDAPDirectiveRepository.scala:248) ~[rudder-core-2.5.0-SNAPSHOT.jar:na] at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4] at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf un$apply$27$$anonfun$apply$28.apply(LDAPDirectiveRepository.scala:248) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
git commands executed looks like :
[Process1, t] git add modif1 [Process2, t] git add modif2 [Process1, t+1] git commit -m "message1" -> works but includes modif2 [Process2, t+1] git commit -m "message2" -> don't work, no commit done, and an error because the commit happens at the same time
It should be :
[Process1, t] git add modif1 [Process1, t+1] git commit -m "message1" [Process2, t] git add modif2 [Process2, t+1] git commit -m "message2"
Updated by Vincent MEMBRÉ almost 12 years ago
A solution would be to have a Thread dedicated to handle git process, and order the commits, preventing conflicts.
Updated by Jonathan CLARKE almost 12 years ago
Another approach would be to use a lock file on disk. That way automation and packaging scripts could respect it too, without having to communicate with a Java thread (too much overhead for simple scripts).
Updated by François ARMAND almost 12 years ago
Well, the lock file already exists - it is managed by Git, but that won't prevent the problem here. What we miss is something to sequentialize otherwhise parrellized tasks.
But the problem is deeper than that, because we have actually several non-transactionnal store that participates in composed action, and we must assured the consistency of the whole.
To keep it simple, that's an exemple for Rule with LDAP and Git: when we modify a Rule (we want to give it the semantic of a transaction: the modification happen, or not, but not partially), we have to write into LDAP, and (potentially in parallel, or not) write the Rule serialized file on the file system, and add it and commit it into Git.
We don't want to modify LDAP with a second Rule modification until the first transaction is validated, or we don't want to write the two rules serialized files before a add/commit happens.
For now, we have try to manage that in a ad-hoc manner, setting synchronization points around LDAP writes. That clearly doesn't scale, is complexe to manage, is not extensible, is not clean, is not the way it should be handled.
So, what have we to do ?
First, we have to decide if we want to split read and writes. Reads don't have problems between them, only writes have. But read/write sequence have problems, too. That implies that we will have seme read/write unconsistencies, but we already have them.
So, either we go to a fully synchronized process, but that come to the price of lesser performance, or we must specify what unconsistencies are acceptable, and check that we didn't have false assumption about these unconsistancies.
Next, we have to find what are all components participating in our business transaction. That's on two directions: calls to the world (I/Os), and business entities.
Business entities will help find what the business want to group together, and so what we have to be able to transact around.
I/O will gives us all the technical elements that will have to share the transaction logic in the code.
Of course, the fewer of both we have, or more preciselly to smaller the group of things we are able to do, the better we will be able to parallelis things.
Next, we have to build a common sequentiallizer logic for all the calls on business entities which trigger an I/O (write), and build a transactionnal logic around them.
Finally, we will be able to thing about optimization, like "as of today, with that action on business entity type A, we can safelly process in parallel that action on entity type B, even if they should be grouped in the general case".
Updated by François ARMAND almost 12 years ago
Now, concrettly: their is some work to do, but what seems to be the main gain point (with the least work to do) is:
(from now, configuration objects are Groups, Rules, ActiveTechniques and Directives).
- split all configuration repositories into a read-only and a read-write one;
- group together all the write configuration repositories and all other services writing I/O on the same components behind one big actor. That means deining a bunch of Message (roughtly one for each existing methods on the protected services/repos), and calling that in place of methods all over our code.
Good point: we will have to do the bunch of Messages for Workflows in all case, and that will buy us a nice API (from that, it is TRIVIAL to build REST API, console, etc).
Updated by Nicolas PERRON almost 12 years ago
- Target version changed from 2.5.0 to 2.6.0~beta1
Updated by François ARMAND almost 12 years ago
The pattern we try to implement here is something alike the "CQS pattern" : http://en.wikipedia.org/wiki/Command-query_separation
Updated by François ARMAND almost 12 years ago
- Status changed from New to In progress
- Assignee changed from Vincent MEMBRÉ to François ARMAND
Updated by François ARMAND almost 12 years ago
- Status changed from In progress to 8
Updated by Matthieu CERDA almost 12 years ago
- Subject changed from git process conflicts when several happen at the same time to The git process conflicts when several operations happen at the same time
Updated by Jonathan CLARKE almost 12 years ago
- Category changed from 11 to Web - Config management
Updated by Nicolas PERRON over 11 years ago
- Target version changed from 2.6.0~beta1 to 2.6.0~rc1
Updated by Nicolas PERRON over 11 years ago
- Status changed from 8 to Pending technical review
- Target version changed from 2.6.0~rc1 to 2.6.0~beta1
Updated by Nicolas PERRON over 11 years ago
- Status changed from Pending technical review to Pending release
Updated by Jonathan CLARKE over 11 years ago
- Status changed from Pending release to Released
This ticket has been addressed in version 2.6.0~beta1 of Rudder, which has just been released. Please see the changelog here: https://www.rudder-project.org/foswiki/System/Documentation:ChangeLog26.