Project

General

Profile

Actions

Bug #3211

closed

The git process conflicts when several operations happen at the same time

Added by Vincent MEMBRÉ almost 12 years ago. Updated over 11 years ago.

Status:
Released
Priority:
2
Category:
Web - Config management
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Sometimes two git process are launched at the same time, it happens when :

  • Two modifications are saved at the same time
  • On the update process of 2.5.0rc1, updating Directives both from the cfengine variables migration and the technique library update
There is two problems :
  • modification from both process are included into one
  • the second commit fails with a exception, as the first one is in progress

Rudder does not stop from that error and continue to works.

here are the logs that happens just after an update to 2.5.0rc1:

12:57:04.838 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.UpdatePiOnActiveTechniqueEvent - Executing archivage of PIs for UPT 'ActiveTechnique(ActiveTechniqueId(5ab6132e-cda0-4c1f-9332-0691f42cdab
d),rpmPackageInstallation,Map(1.0 -> 2012-07-05T12:26:08.972+02:00, 2.0 -> 2012-07-05T12:26:08.972+02:00, 2.1 -> 2012-07-05T12:26:08.972+02:00, 2.2 -> 2013-01-23T12:57:04.026+01:00),List(DirectiveId(ad57e582-0c5
0-49bd-bbea-cb98d2e6ce0d), DirectiveId(e1c34f2a-0d18-49cc-a9b6-b52ddc5780be)),true,false)'
12:57:05.213 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/userlib_applications/rpmPackageInstallation
/ad57e582-0c50-49bd-bbea-cb98d2e6ce0d.xml
12:57:05.222 [pool-3-thread-4] DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/userlib_applications/rpmPackageInstallation/e1c34f2a-0d18-49cc-a9b6-b52ddc5780be.xml
12:57:06.488 [main]            DEBUG com.normation.rudder.repository.xml.GitDirectiveArchiverImpl - Archived directive: /var/rudder/configuration-repository/directives/Rudder Internal/common/common-root.xml
12:57:07.208 [main]            ERROR migration - Can not finish the migration process due to an error <- Exception caught during execution of commit command
12:57:07.267 [main]            ERROR migration - Root exception was:
org.eclipse.jgit.errors.LockFailedException: Cannot lock /var/rudder/configuration-repository/.git/index
        at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:224) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r]
        at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:301) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r]
        at org.eclipse.jgit.dircache.DirCache.lock(DirCache.java:267) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r]
        at org.eclipse.jgit.lib.Repository.lockDirCache(Repository.java:1023) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r]
        at org.eclipse.jgit.api.CommitCommand.call(CommitCommand.java:191) ~[org.eclipse.jgit-2.1.0.201209190230-r.jar:2.1.0.201209190230-r]
        at com.normation.rudder.repository.xml.GitArchiverUtils$$anonfun$commitAddFile$1.apply(GitArchiverUtils.scala:97) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.xml.GitArchiverUtils$$anonfun$commitAddFile$1.apply(GitArchiverUtils.scala:89) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at net.liftweb.util.ControlHelpers$class.tryo(ControlHelpers.scala:46) ~[lift-util_2.9.1-2.4.jar:2.4]
        at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) ~[lift-util_2.9.1-2.4.jar:2.4]
        at net.liftweb.util.ControlHelpers$class.tryo(ControlHelpers.scala:84) ~[lift-util_2.9.1-2.4.jar:2.4]
        at net.liftweb.util.Helpers$.tryo(Helpers.scala:34) ~[lift-util_2.9.1-2.4.jar:2.4]
        at com.normation.rudder.repository.xml.GitArchiverUtils$class.commitAddFile(GitArchiverUtils.scala:89) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl.commitAddFile(GitArchiverImpl.scala:474) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2$$anonfun$apply$26.apply(GitArchiverImpl.scala:516) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2$$anonfun$apply$26.apply(GitArchiverImpl.scala:510) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2.apply(GitArchiverImpl.scala:510) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl$$anonfun$archiveDirective$2.apply(GitArchiverImpl.scala:508) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4]
        at com.normation.rudder.repository.xml.GitDirectiveArchiverImpl.archiveDirective(GitArchiverImpl.scala:508) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf
un$apply$27$$anonfun$apply$28$$anonfun$apply$29$$anonfun$apply$30.apply(LDAPDirectiveRepository.scala:250) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf
un$apply$27$$anonfun$apply$28$$anonfun$apply$29$$anonfun$apply$30.apply(LDAPDirectiveRepository.scala:249) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4]
        at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf
un$apply$27$$anonfun$apply$28$$anonfun$apply$29.apply(LDAPDirectiveRepository.scala:249) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf
un$apply$27$$anonfun$apply$28$$anonfun$apply$29.apply(LDAPDirectiveRepository.scala:248) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]
        at net.liftweb.common.Full.flatMap(Box.scala:493) [lift-common_2.9.1-2.4.jar:2.4]
        at com.normation.rudder.repository.ldap.LDAPDirectiveRepository$$anonfun$saveDirective$1$$anonfun$apply$17$$anonfun$apply$19$$anonfun$apply$21$$anonfun$apply$23$$anonfun$apply$25$$anonfun$apply$26$$anonf
un$apply$27$$anonfun$apply$28.apply(LDAPDirectiveRepository.scala:248) ~[rudder-core-2.5.0-SNAPSHOT.jar:na]

git commands executed looks like :

[Process1, t] git add modif1
[Process2, t] git add modif2
[Process1, t+1] git commit -m "message1" -> works but includes modif2
[Process2, t+1] git commit -m "message2" -> don't work, no commit done, and an error because the commit happens at the same time

It should be :

[Process1, t] git add modif1
[Process1, t+1] git commit -m "message1" 
[Process2, t] git add modif2
[Process2, t+1] git commit -m "message2" 

Subtasks 3 (0 open3 closed)

Bug #3250: Split LDAPConnection in Read Only / RWReleasedNicolas CHARLES2013-02-11Actions
Bug #3251: Merge configuration repository by entity type, split them in ro/rwReleasedNicolas CHARLES2013-02-11Actions
Bug #3252: Use RO/RW LDAPConnection in LDAPInventoryReleasedNicolas CHARLES2013-02-11Actions

Related issues 1 (0 open1 closed)

Blocked by Rudder - Architecture #3230: Transactionnal behavior accros several, non-transactionnal backendReleased2013-01-31Actions
Actions #1

Updated by Vincent MEMBRÉ almost 12 years ago

A solution would be to have a Thread dedicated to handle git process, and order the commits, preventing conflicts.

Actions #2

Updated by Jonathan CLARKE almost 12 years ago

Another approach would be to use a lock file on disk. That way automation and packaging scripts could respect it too, without having to communicate with a Java thread (too much overhead for simple scripts).

Actions #3

Updated by François ARMAND almost 12 years ago

Well, the lock file already exists - it is managed by Git, but that won't prevent the problem here. What we miss is something to sequentialize otherwhise parrellized tasks.

But the problem is deeper than that, because we have actually several non-transactionnal store that participates in composed action, and we must assured the consistency of the whole.

To keep it simple, that's an exemple for Rule with LDAP and Git: when we modify a Rule (we want to give it the semantic of a transaction: the modification happen, or not, but not partially), we have to write into LDAP, and (potentially in parallel, or not) write the Rule serialized file on the file system, and add it and commit it into Git.
We don't want to modify LDAP with a second Rule modification until the first transaction is validated, or we don't want to write the two rules serialized files before a add/commit happens.

For now, we have try to manage that in a ad-hoc manner, setting synchronization points around LDAP writes. That clearly doesn't scale, is complexe to manage, is not extensible, is not clean, is not the way it should be handled.

So, what have we to do ?

First, we have to decide if we want to split read and writes. Reads don't have problems between them, only writes have. But read/write sequence have problems, too. That implies that we will have seme read/write unconsistencies, but we already have them.
So, either we go to a fully synchronized process, but that come to the price of lesser performance, or we must specify what unconsistencies are acceptable, and check that we didn't have false assumption about these unconsistancies.

Next, we have to find what are all components participating in our business transaction. That's on two directions: calls to the world (I/Os), and business entities.
Business entities will help find what the business want to group together, and so what we have to be able to transact around.
I/O will gives us all the technical elements that will have to share the transaction logic in the code.

Of course, the fewer of both we have, or more preciselly to smaller the group of things we are able to do, the better we will be able to parallelis things.

Next, we have to build a common sequentiallizer logic for all the calls on business entities which trigger an I/O (write), and build a transactionnal logic around them.

Finally, we will be able to thing about optimization, like "as of today, with that action on business entity type A, we can safelly process in parallel that action on entity type B, even if they should be grouped in the general case".

Actions #4

Updated by François ARMAND almost 12 years ago

Now, concrettly: their is some work to do, but what seems to be the main gain point (with the least work to do) is:

(from now, configuration objects are Groups, Rules, ActiveTechniques and Directives).

- split all configuration repositories into a read-only and a read-write one;
- group together all the write configuration repositories and all other services writing I/O on the same components behind one big actor. That means deining a bunch of Message (roughtly one for each existing methods on the protected services/repos), and calling that in place of methods all over our code.

Good point: we will have to do the bunch of Messages for Workflows in all case, and that will buy us a nice API (from that, it is TRIVIAL to build REST API, console, etc).

Actions #5

Updated by Nicolas PERRON almost 12 years ago

  • Target version changed from 2.5.0 to 2.6.0~beta1
Actions #6

Updated by François ARMAND almost 12 years ago

The pattern we try to implement here is something alike the "CQS pattern" : http://en.wikipedia.org/wiki/Command-query_separation

Actions #7

Updated by François ARMAND almost 12 years ago

  • Status changed from New to In progress
  • Assignee changed from Vincent MEMBRÉ to François ARMAND
Actions #8

Updated by François ARMAND almost 12 years ago

  • Status changed from In progress to 8
Actions #9

Updated by Matthieu CERDA almost 12 years ago

  • Subject changed from git process conflicts when several happen at the same time to The git process conflicts when several operations happen at the same time
Actions #10

Updated by Jonathan CLARKE over 11 years ago

  • Category changed from 11 to Web - Config management
Actions #11

Updated by Nicolas PERRON over 11 years ago

  • Target version changed from 2.6.0~beta1 to 2.6.0~rc1
Actions #12

Updated by Nicolas PERRON over 11 years ago

  • Status changed from 8 to Pending technical review
  • Target version changed from 2.6.0~rc1 to 2.6.0~beta1
Actions #13

Updated by Nicolas PERRON over 11 years ago

  • Status changed from Pending technical review to Pending release
Actions #14

Updated by Jonathan CLARKE over 11 years ago

  • Status changed from Pending release to Released

This ticket has been addressed in version 2.6.0~beta1 of Rudder, which has just been released. Please see the changelog here: https://www.rudder-project.org/foswiki/System/Documentation:ChangeLog26.

Actions

Also available in: Atom PDF