Project

General

Profile

Actions

Bug #10165

closed

Incomplete inventory for one node - race condition?

Added by Alexis Mousset about 7 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
N/A
Category:
-
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Some fields were missing in the displayed inventory of a node (machine type, manufacturer, while they were in the inventory file, and the other identical machines). It fixed itself after a second inventory.

Log excerpt:

# grep -R 10289d03 /var/log/rudder/webapp/
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:41] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs'
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:42] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 1 second and 583 milliseconds ms, now checking signature
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:43] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 395 milliseconds ms, now saving
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.ldap.sdk.RwLDAPConnection - Exception ignored (by configuration) when trying to add entry 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,cn=rudder-configuration'.  Reported exception was: entry already exists
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de. Rollbacking. <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Error when updating the container value when moving nodes '10289d03-99b3-483e-a7de-0160101927de' <- Can not apply modifiction on 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] INFO  com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Unit acceptor error for node NodeId(10289d03-99b3-483e-a7de-0160101927de) <- Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.web.snippet.node.AcceptNode - Something bad happened while searching for node 10289d03-99b3-483e-a7de-0160101927de to log the acceptation, search 10289d03-99b3-483e-a7de-0160101927de
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:15] INFO  com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:15] INFO  com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager - Dynamic group hasPolicyServer-root: added node with id: [ 10289d03-99b3-483e-a7de-0160101927de ], removed: nothing
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:16] INFO  com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [10289d03-99b3-483e-a7de-0160101927de, root]
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:35] WARN  explain_compliance.10289d03-99b3-483e-a7de-0160101927de - Can not get compliance for node with ID '10289d03-99b3-483e-a7de-0160101927de' because it has no configuration id initialised nor sent reports (node just added ?)
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:46:31] INFO  com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [10289d03-99b3-483e-a7de-0160101927de]
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:52:45] INFO  com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager - Dynamic group hasPolicyServer-10289d03-99b3-483e-a7de-0160101927de: added node with id: [ 3e9baed9-21d6-4090-9830-c51874dc2eef ], removed: nothing
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:52:59] INFO  com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [3e9baed9-21d6-4090-9830-c51874dc2eef, 10289d03-99b3-483e-a7de-0160101927de, root]
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs'
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 453 milliseconds ms, now checking signature
/var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 223 milliseconds ms, now saving

Acceptation:

[2017-02-09 13:50:41] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs'
[2017-02-09 13:50:42] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 1 second and 583 milliseconds ms, now checking signature
[2017-02-09 13:50:43] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 395 milliseconds ms, now saving
[2017-02-09 13:50:43] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'server-root.ocs'
[2017-02-09 13:50:44] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'server-root.ocs' parsed in 1 second and 261 milliseconds ms, now checking signature
[2017-02-09 13:50:45] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'server-root.ocs' signature checked in 508 milliseconds ms, now saving
[2017-02-09 13:50:50] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Report file processed in 6 seconds and 540 milliseconds ms
[2017-02-09 13:50:51] INFO  com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Report file processed in 1 second and 460 milliseconds ms
[2017-02-09 14:35:30] ERROR com.normation.ldap.sdk.RwLDAPConnection - Exception ignored (by configuration) when trying to add entry 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,cn=rudder-configuration'.  Reported exception was: entry already exists
com.unboundid.ldap.sdk.LDAPException: entry already exists
>-at com.unboundid.ldap.sdk.LDAPConnection.add(LDAPConnection.java:1959)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$addAction$3(LDAPConnection.scala:524)
>-at scala.util.control.Exception$Catch.apply(Exception.scala:224)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$addAction$1(LDAPConnection.scala:524)
>-at com.normation.ldap.sdk.RwLDAPConnection.applyMod(LDAPConnection.scala:489)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyAdd$2(LDAPConnection.scala:535)
>-at com.normation.ldap.sdk.RwLDAPConnection.save(LDAPConnection.scala:598)
>-at com.normation.rudder.services.servers.AcceptFullInventoryInNodeOu.$anonfun$acceptOne$9(NewNodeManager.scala:731)
>-at com.normation.ldap.sdk.LDAPConnectionProvider.$anonfun$flatMap$1(LDAPConnectionProvider.scala:117)
>-at com.normation.ldap.sdk.LDAPConnectionProvider.withCon(LDAPConnectionProvider.scala:154)
>-at com.normation.ldap.sdk.LDAPConnectionProvider.withCon$(LDAPConnectionProvider.scala:143)
>-at com.normation.ldap.sdk.RWPooledSimpleAuthConnectionProvider.withCon(LDAPConnectionProvider.scala:385)
>-at com.normation.ldap.sdk.LDAPConnectionProvider.flatMap(LDAPConnectionProvider.scala:116)
>-at com.normation.ldap.sdk.LDAPConnectionProvider.flatMap$(LDAPConnectionProvider.scala:115)
>-at com.normation.ldap.sdk.RWPooledSimpleAuthConnectionProvider.flatMap(LDAPConnectionProvider.scala:385)
>-at com.normation.rudder.services.servers.AcceptFullInventoryInNodeOu.acceptOne(NewNodeManager.scala:730)
>-at com.normation.rudder.services.servers.ComposedNewNodeManager.$anonfun$acceptOne$1(NewNodeManager.scala:425)
>-at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:54)
>-at scala.collection.immutable.List.foreach(List.scala:378)
>-at com.normation.utils.Control$.sequence(Control.scala:54)
>-at com.normation.rudder.services.servers.ComposedNewNodeManager.acceptOne(NewNodeManager.scala:423)
>-at com.normation.rudder.services.servers.ComposedNewNodeManager.accept(NewNodeManager.scala:475)
>-at com.normation.rudder.services.servers.ComposedNewNodeManager.accept$(NewNodeManager.scala:441)
>-at com.normation.rudder.services.servers.NewNodeManagerImpl.accept(NewNodeManager.scala:155)
>-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$addNodes$1(AcceptNode.scala:163)
>-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$addNodes$1$adapted(AcceptNode.scala:161)
>-at scala.collection.immutable.List.foreach(List.scala:378)
>-at com.normation.rudder.web.snippet.node.AcceptNode.addNodes(AcceptNode.scala:161)
>-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$listNode$6(AcceptNode.scala:312)
>-at net.liftweb.http.S$NFuncHolder.$anonfun$apply$3(S.scala:185)
>-at scala.collection.immutable.List.map(List.scala:272)
>-at net.liftweb.http.S$NFuncHolder.apply(S.scala:185)
>-at net.liftweb.http.S$ProxyFuncHolder.apply(S.scala:114)
....
>-at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>-at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>-at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>-at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>-at java.lang.Thread.run(Thread.java:745)
[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de. Rollbacking. <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
[2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Error when updating the container value when moving nodes '10289d03-99b3-483e-a7de-0160101927de' <- Can not apply modifiction on 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
[2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Exception was:
com.unboundid.ldap.sdk.LDAPException: no such object
>-at com.unboundid.ldap.sdk.LDAPConnection.modify(LDAPConnection.java:2754)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$modifyAction$3(LDAPConnection.scala:541)
>-at scala.util.control.Exception$Catch.apply(Exception.scala:224)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$modifyAction$1(LDAPConnection.scala:541)
>-at com.normation.ldap.sdk.RwLDAPConnection.applyMod(LDAPConnection.scala:489)
>-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyModify$2(LDAPConnection.scala:552)
>-at com.normation.ldap.sdk.RwLDAPConnection.modify(LDAPConnection.scala:564)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$updateNodes$2(FullInventoryRepositoryImpl.scala:148)
>-at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:111)
>-at scala.collection.immutable.List.foreach(List.scala:378)
>-at com.normation.utils.Control$.bestEffort(Control.scala:111)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.updateNodes(FullInventoryRepositoryImpl.scala:147)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$9(FullInventoryRepositoryImpl.scala:216)
>-at net.liftweb.common.Full.flatMap(Box.scala:808)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$4(FullInventoryRepositoryImpl.scala:187)
>-at net.liftweb.common.Full.flatMap(Box.scala:808)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.move(FullInventoryRepositoryImpl.scala:181)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$15(FullInventoryRepositoryImpl.scala:408)
>-at net.liftweb.common.Full.flatMap(Box.scala:808)
>-at net.liftweb.common.Box$WithFilter.flatMap(Box.scala:475)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$13(FullInventoryRepositoryImpl.scala:407)
>-at net.liftweb.common.Full.map(Box.scala:806)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$12(FullInventoryRepositoryImpl.scala:400)
>-at net.liftweb.common.Full.flatMap(Box.scala:808)
>-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.move(FullInventoryRepositoryImpl.scala:397)
>-at com.normation.rudder.services.servers.AcceptInventory.acceptOne(NewNodeManager.scala:657)
>-at com.normation.rudder.services.servers.ComposedNewNodeManager.$anonfun$acceptOne$1(NewNodeManager.scala:425)
>-at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:54)
>-at scala.collection.immutable.List.foreach(List.scala:378)
...
>-at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:464)
>-at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:924)
>-at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:985)
>-at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
>-at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236)
>-at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>-at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
>-at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
>-at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>-at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>-at java.lang.Thread.run(Thread.java:745)
[2017-02-09 14:35:30] INFO  com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de
[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Unit acceptor error for node NodeId(10289d03-99b3-483e-a7de-0160101927de) <- Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object
[2017-02-09 14:35:30] ERROR com.normation.rudder.web.snippet.node.AcceptNode - Something bad happened while searching for node 10289d03-99b3-483e-a7de-0160101927de to log the acceptation, search 10289d03-99b3-483e-a7de-0160101927de
[2017-02-09 14:40:23] INFO  com.normation.cfclerk.services.impl.TechniqueRepositoryImpl - Reloading technique library, found modified technique(s): ['ffrfrezf': updated (1.0: VersionAdded)]
[2017-02-09 14:40:23] INFO  com.normation.rudder.services.policies.TechniqueAcceptationUpdater - Automatically adding technique 'ffrfrezf' in category 'User Techniques (4a63c5a3-347c-4a75-9786-e59b534f2944)' of active techniques library

Actions #1

Updated by Alexis Mousset about 7 years ago

  • Subject changed from Incomplete inventory for one node to Incomplete inventory for one node - race condition?
Actions #2

Updated by Alexis Mousset about 7 years ago

The problem is very likely running rudder-node-to-relay on a pending node.

Actions #3

Updated by François ARMAND about 7 years ago

It is exactly that: in "rudder-node-to-relay", we are creating the entry "nodeId=XXXX, ou=Nodes, cn=rudder-configuration" if it doesn't already exists. And in node acceptation, we also try to create it - and fail to do so.

So, the correct behaviour is to node fail if the entry exists in node acceptation, but converge toward the state "the entry exists".

Actions #4

Updated by François ARMAND about 7 years ago

  • Target version changed from 4.1.0~beta3 to 3.1.18

And that problem exists since the begining of times.

Actions #5

Updated by François ARMAND about 7 years ago

  • Status changed from New to In progress
  • Assignee set to François ARMAND
Actions #6

Updated by François ARMAND about 7 years ago

  • Target version changed from 3.1.18 to 4.1.0~beta3

In fact, we are already doing the correct thing:

  override def save(entry : LDAPEntry, removeMissingAttributes:Boolean=false, forceKeepMissingAttributes:Seq[String] = Seq()) : Box[LDIFChangeRecord] = {
    synchronized {
      get(entry.dn) match { //TODO if remocoveMissing is false, only get attribute in entry (we don't care of others)
        case f@Failure(_,_,_) => f
        case Empty =>
          try {
            applyAdd(new AddRequest(entry.backed))
          } catch {...         }
        case Full(existing) =>
          val mods = LDAPEntry.merge(existing,entry, false, removeMissingAttributes, forceKeepMissingAttributes)
          if(!mods.isEmpty) {
            try {
              applyModify(new ModifyRequest(entry.dn.toString,mods))
            } catch { .... }
          } else Full(LDIFNoopChangeRecord(entry.dn))
      }
    }
  }

So it would me that the node addition from relay server happened just between the get(entry.dn) and applyAdd - highly improbable. If we are able to reproduce it just an other time, the blame is to find elsewhere.

Actions #7

Updated by François ARMAND about 7 years ago

  • Status changed from In progress to Rejected

So. I tested several case, try to meet the race condition spot, and never succeeded in doing so. Each time, everything is fine. It means that either Alexis was very unlucky, or we are missing the problem completelly.

I'm going to close it as "can not reproduce". If it happens again, we will know that the problem is not in a race condition.

(because you know, there is that story of the statician who met a colleague who asks him why he never takes the plane. And the statician say "I did the calculus, and the likelyhood that a terrorist with a bomb is in the same plane are too hight for my confort. Then, some weeks latter, the same colleague meet our statician in a plane. He asks "and so, finally, you have decided to be ok with the likelyhood of a bomber in the plane?" And the statician answers "of course not, but I did the stat, and the likelyhood that there is TWO independant bombers in a plane are ok fir my confort. So you know, I'm just taking the plane with my own bomb".)

Actions

Also available in: Atom PDF