Bug #10165
closedIncomplete inventory for one node - race condition?
Description
Some fields were missing in the displayed inventory of a node (machine type, manufacturer, while they were in the inventory file, and the other identical machines). It fixed itself after a second inventory.
Log excerpt:
# grep -R 10289d03 /var/log/rudder/webapp/ /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:41] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:42] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 1 second and 583 milliseconds ms, now checking signature /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 13:50:43] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 395 milliseconds ms, now saving /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.ldap.sdk.RwLDAPConnection - Exception ignored (by configuration) when trying to add entry 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,cn=rudder-configuration'. Reported exception was: entry already exists /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de. Rollbacking. <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Error when updating the container value when moving nodes '10289d03-99b3-483e-a7de-0160101927de' <- Can not apply modifiction on 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] INFO com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Unit acceptor error for node NodeId(10289d03-99b3-483e-a7de-0160101927de) <- Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:35:30] ERROR com.normation.rudder.web.snippet.node.AcceptNode - Something bad happened while searching for node 10289d03-99b3-483e-a7de-0160101927de to log the acceptation, search 10289d03-99b3-483e-a7de-0160101927de /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:15] INFO com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:15] INFO com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager - Dynamic group hasPolicyServer-root: added node with id: [ 10289d03-99b3-483e-a7de-0160101927de ], removed: nothing /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:16] INFO com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [10289d03-99b3-483e-a7de-0160101927de, root] /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:45:35] WARN explain_compliance.10289d03-99b3-483e-a7de-0160101927de - Can not get compliance for node with ID '10289d03-99b3-483e-a7de-0160101927de' because it has no configuration id initialised nor sent reports (node just added ?) /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:46:31] INFO com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [10289d03-99b3-483e-a7de-0160101927de] /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:52:45] INFO com.normation.rudder.batch.UpdateDynamicGroups$LAUpdateDyngroupManager - Dynamic group hasPolicyServer-10289d03-99b3-483e-a7de-0160101927de: added node with id: [ 3e9baed9-21d6-4090-9830-c51874dc2eef ], removed: nothing /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 14:52:59] INFO com.normation.rudder.services.policies.nodeconfig.NodeConfigurationServiceImpl - Configuration of following nodes were updated, their promises are going to be written: [3e9baed9-21d6-4090-9830-c51874dc2eef, 10289d03-99b3-483e-a7de-0160101927de, root] /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 453 milliseconds ms, now checking signature /var/log/rudder/webapp/2017_02_09.stderrout.log:[2017-02-09 15:18:07] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 223 milliseconds ms, now saving
Acceptation:
[2017-02-09 13:50:41] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' [2017-02-09 13:50:42] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' parsed in 1 second and 583 milliseconds ms, now checking signature [2017-02-09 13:50:43] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'relay-10289d03-99b3-483e-a7de-0160101927de.ocs' signature checked in 395 milliseconds ms, now saving [2017-02-09 13:50:43] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - New input inventory: 'server-root.ocs' [2017-02-09 13:50:44] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'server-root.ocs' parsed in 1 second and 261 milliseconds ms, now checking signature [2017-02-09 13:50:45] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Inventory 'server-root.ocs' signature checked in 508 milliseconds ms, now saving [2017-02-09 13:50:50] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Report file processed in 6 seconds and 540 milliseconds ms [2017-02-09 13:50:51] INFO com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Report file processed in 1 second and 460 milliseconds ms [2017-02-09 14:35:30] ERROR com.normation.ldap.sdk.RwLDAPConnection - Exception ignored (by configuration) when trying to add entry 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,cn=rudder-configuration'. Reported exception was: entry already exists com.unboundid.ldap.sdk.LDAPException: entry already exists >-at com.unboundid.ldap.sdk.LDAPConnection.add(LDAPConnection.java:1959) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$addAction$3(LDAPConnection.scala:524) >-at scala.util.control.Exception$Catch.apply(Exception.scala:224) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$addAction$1(LDAPConnection.scala:524) >-at com.normation.ldap.sdk.RwLDAPConnection.applyMod(LDAPConnection.scala:489) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyAdd$2(LDAPConnection.scala:535) >-at com.normation.ldap.sdk.RwLDAPConnection.save(LDAPConnection.scala:598) >-at com.normation.rudder.services.servers.AcceptFullInventoryInNodeOu.$anonfun$acceptOne$9(NewNodeManager.scala:731) >-at com.normation.ldap.sdk.LDAPConnectionProvider.$anonfun$flatMap$1(LDAPConnectionProvider.scala:117) >-at com.normation.ldap.sdk.LDAPConnectionProvider.withCon(LDAPConnectionProvider.scala:154) >-at com.normation.ldap.sdk.LDAPConnectionProvider.withCon$(LDAPConnectionProvider.scala:143) >-at com.normation.ldap.sdk.RWPooledSimpleAuthConnectionProvider.withCon(LDAPConnectionProvider.scala:385) >-at com.normation.ldap.sdk.LDAPConnectionProvider.flatMap(LDAPConnectionProvider.scala:116) >-at com.normation.ldap.sdk.LDAPConnectionProvider.flatMap$(LDAPConnectionProvider.scala:115) >-at com.normation.ldap.sdk.RWPooledSimpleAuthConnectionProvider.flatMap(LDAPConnectionProvider.scala:385) >-at com.normation.rudder.services.servers.AcceptFullInventoryInNodeOu.acceptOne(NewNodeManager.scala:730) >-at com.normation.rudder.services.servers.ComposedNewNodeManager.$anonfun$acceptOne$1(NewNodeManager.scala:425) >-at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:54) >-at scala.collection.immutable.List.foreach(List.scala:378) >-at com.normation.utils.Control$.sequence(Control.scala:54) >-at com.normation.rudder.services.servers.ComposedNewNodeManager.acceptOne(NewNodeManager.scala:423) >-at com.normation.rudder.services.servers.ComposedNewNodeManager.accept(NewNodeManager.scala:475) >-at com.normation.rudder.services.servers.ComposedNewNodeManager.accept$(NewNodeManager.scala:441) >-at com.normation.rudder.services.servers.NewNodeManagerImpl.accept(NewNodeManager.scala:155) >-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$addNodes$1(AcceptNode.scala:163) >-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$addNodes$1$adapted(AcceptNode.scala:161) >-at scala.collection.immutable.List.foreach(List.scala:378) >-at com.normation.rudder.web.snippet.node.AcceptNode.addNodes(AcceptNode.scala:161) >-at com.normation.rudder.web.snippet.node.AcceptNode.$anonfun$listNode$6(AcceptNode.scala:312) >-at net.liftweb.http.S$NFuncHolder.$anonfun$apply$3(S.scala:185) >-at scala.collection.immutable.List.map(List.scala:272) >-at net.liftweb.http.S$NFuncHolder.apply(S.scala:185) >-at net.liftweb.http.S$ProxyFuncHolder.apply(S.scala:114) .... >-at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) >-at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) >-at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >-at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >-at java.lang.Thread.run(Thread.java:745) [2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de. Rollbacking. <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object [2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Error when updating the container value when moving nodes '10289d03-99b3-483e-a7de-0160101927de' <- Can not apply modifiction on 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object [2017-02-09 14:35:30] ERROR com.normation.inventory.ldap.core.FullInventoryRepositoryImpl - Exception was: com.unboundid.ldap.sdk.LDAPException: no such object >-at com.unboundid.ldap.sdk.LDAPConnection.modify(LDAPConnection.java:2754) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$modifyAction$3(LDAPConnection.scala:541) >-at scala.util.control.Exception$Catch.apply(Exception.scala:224) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$modifyAction$1(LDAPConnection.scala:541) >-at com.normation.ldap.sdk.RwLDAPConnection.applyMod(LDAPConnection.scala:489) >-at com.normation.ldap.sdk.RwLDAPConnection.$anonfun$applyModify$2(LDAPConnection.scala:552) >-at com.normation.ldap.sdk.RwLDAPConnection.modify(LDAPConnection.scala:564) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$updateNodes$2(FullInventoryRepositoryImpl.scala:148) >-at com.normation.utils.Control$.$anonfun$bestEffort$1(Control.scala:111) >-at scala.collection.immutable.List.foreach(List.scala:378) >-at com.normation.utils.Control$.bestEffort(Control.scala:111) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.updateNodes(FullInventoryRepositoryImpl.scala:147) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$9(FullInventoryRepositoryImpl.scala:216) >-at net.liftweb.common.Full.flatMap(Box.scala:808) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$4(FullInventoryRepositoryImpl.scala:187) >-at net.liftweb.common.Full.flatMap(Box.scala:808) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.move(FullInventoryRepositoryImpl.scala:181) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$15(FullInventoryRepositoryImpl.scala:408) >-at net.liftweb.common.Full.flatMap(Box.scala:808) >-at net.liftweb.common.Box$WithFilter.flatMap(Box.scala:475) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$13(FullInventoryRepositoryImpl.scala:407) >-at net.liftweb.common.Full.map(Box.scala:806) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.$anonfun$move$12(FullInventoryRepositoryImpl.scala:400) >-at net.liftweb.common.Full.flatMap(Box.scala:808) >-at com.normation.inventory.ldap.core.FullInventoryRepositoryImpl.move(FullInventoryRepositoryImpl.scala:397) >-at com.normation.rudder.services.servers.AcceptInventory.acceptOne(NewNodeManager.scala:657) >-at com.normation.rudder.services.servers.ComposedNewNodeManager.$anonfun$acceptOne$1(NewNodeManager.scala:425) >-at com.normation.utils.Control$.$anonfun$sequence$1(Control.scala:54) >-at scala.collection.immutable.List.foreach(List.scala:378) ... >-at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:464) >-at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:924) >-at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:985) >-at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) >-at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236) >-at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) >-at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) >-at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) >-at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >-at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >-at java.lang.Thread.run(Thread.java:745) [2017-02-09 14:35:30] INFO com.normation.rudder.services.servers.NewNodeManagerImpl - New node accepted and managed by Rudder: 10289d03-99b3-483e-a7de-0160101927de [2017-02-09 14:35:30] ERROR com.normation.rudder.services.servers.NewNodeManagerImpl - Unit acceptor error for node NodeId(10289d03-99b3-483e-a7de-0160101927de) <- Error when trying to accept node 10289d03-99b3-483e-a7de-0160101927de <- Error when executing accept node process named accept_new_server:inventory <- Can not move 'nodeId=10289d03-99b3-483e-a7de-0160101927de,ou=Nodes,ou=Pending Inventories,ou=Inventories,cn=rudder-configuration' to new parent 'ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-configuration': no such object [2017-02-09 14:35:30] ERROR com.normation.rudder.web.snippet.node.AcceptNode - Something bad happened while searching for node 10289d03-99b3-483e-a7de-0160101927de to log the acceptation, search 10289d03-99b3-483e-a7de-0160101927de [2017-02-09 14:40:23] INFO com.normation.cfclerk.services.impl.TechniqueRepositoryImpl - Reloading technique library, found modified technique(s): ['ffrfrezf': updated (1.0: VersionAdded)] [2017-02-09 14:40:23] INFO com.normation.rudder.services.policies.TechniqueAcceptationUpdater - Automatically adding technique 'ffrfrezf' in category 'User Techniques (4a63c5a3-347c-4a75-9786-e59b534f2944)' of active techniques library
Updated by Alexis Mousset almost 8 years ago
- Subject changed from Incomplete inventory for one node to Incomplete inventory for one node - race condition?
Updated by Alexis Mousset almost 8 years ago
The problem is very likely running rudder-node-to-relay on a pending node.
Updated by François ARMAND almost 8 years ago
It is exactly that: in "rudder-node-to-relay", we are creating the entry "nodeId=XXXX, ou=Nodes, cn=rudder-configuration" if it doesn't already exists. And in node acceptation, we also try to create it - and fail to do so.
So, the correct behaviour is to node fail if the entry exists in node acceptation, but converge toward the state "the entry exists".
Updated by François ARMAND almost 8 years ago
- Target version changed from 4.1.0~beta3 to 3.1.18
And that problem exists since the begining of times.
Updated by François ARMAND almost 8 years ago
- Status changed from New to In progress
- Assignee set to François ARMAND
Updated by François ARMAND almost 8 years ago
- Target version changed from 3.1.18 to 4.1.0~beta3
In fact, we are already doing the correct thing:
override def save(entry : LDAPEntry, removeMissingAttributes:Boolean=false, forceKeepMissingAttributes:Seq[String] = Seq()) : Box[LDIFChangeRecord] = { synchronized { get(entry.dn) match { //TODO if remocoveMissing is false, only get attribute in entry (we don't care of others) case f@Failure(_,_,_) => f case Empty => try { applyAdd(new AddRequest(entry.backed)) } catch {... } case Full(existing) => val mods = LDAPEntry.merge(existing,entry, false, removeMissingAttributes, forceKeepMissingAttributes) if(!mods.isEmpty) { try { applyModify(new ModifyRequest(entry.dn.toString,mods)) } catch { .... } } else Full(LDIFNoopChangeRecord(entry.dn)) } } }
So it would me that the node addition from relay server happened just between the get(entry.dn) and applyAdd - highly improbable. If we are able to reproduce it just an other time, the blame is to find elsewhere.
Updated by François ARMAND almost 8 years ago
- Status changed from In progress to Rejected
So. I tested several case, try to meet the race condition spot, and never succeeded in doing so. Each time, everything is fine. It means that either Alexis was very unlucky, or we are missing the problem completelly.
I'm going to close it as "can not reproduce". If it happens again, we will know that the problem is not in a race condition.
(because you know, there is that story of the statician who met a colleague who asks him why he never takes the plane. And the statician say "I did the calculus, and the likelyhood that a terrorist with a bomb is in the same plane are too hight for my confort. Then, some weeks latter, the same colleague meet our statician in a plane. He asks "and so, finally, you have decided to be ok with the likelyhood of a bomber in the plane?" And the statician answers "of course not, but I did the stat, and the likelyhood that there is TWO independant bombers in a plane are ok fir my confort. So you know, I'm just taking the plane with my own bomb".)