Project

General

Profile

Bug #12474

root node disapeared while upgrading from 4.1 to 4.3 on debian 9

Added by Nicolas CHARLES 8 months ago. Updated 5 months ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
User visibility:
Effort required:
Priority:
0

Description

WORAROUND: we are not sure of the cause of that ticket (hard to reproduce, but it DID happen two times at least). In all cases, you can correct the problem by executing the following command in the root server:

rudder agent inventory

Description:

After upgrading, root server disapeared from the node list
However, it looks like it is there in ldap:

dn: nodeId=root,ou=Nodes,cn=rudder-configuration
objectClass: rudderPolicyServer
objectClass: rudderNode
objectClass: top
cn: root
nodeId: root
description: the policy server
isSystem: TRUE
isBroken: FALSE
structuralObjectClass: rudderPolicyServer
entryUUID: 33359356-d6a4-1037-9ee3-1d253a7788ee
creatorsName: cn=manager,cn=rudder-configuration
createTimestamp: 20180417160031Z
entryCSN: 20180417160031.621579Z#000000#000#000000
modifiersName: cn=manager,cn=rudder-configuration
modifyTimestamp: 20180417160031Z

dn: nodeId=root,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-co
 nfiguration
objectClass: top
objectClass: node
objectClass: unixNode
objectClass: linuxNode
nodeId: root
osKernelVersion: 1.0-dummy-version
osName: Linux
osVersion: Linux
localAccountName: root
cn: root
localAdministratorAccountName: root
nodeHostname: server.rudder.local
policyServerId: root
inventoryDate: 19700101000000+0200
receiveDate: 19700101000000+0200
ipHostNumber: 127.0.0.1
agentName: Community
rudderServerRole: rudder-web
structuralObjectClass: linuxNode
entryUUID: 3335b732-d6a4-1037-9ee4-1d253a7788ee
creatorsName: cn=manager,cn=rudder-configuration
createTimestamp: 20180417160031Z
entryCSN: 20180417160031.622498Z#000000#000#000000
modifiersName: cn=manager,cn=rudder-configuration
modifyTimestamp: 20180417160031Z

Chaos follow :/


Subtasks

Bug #12980: Add missing test for serialisation of node infoReleasedFrançois ARMAND
Bug #12983: Add an inventory acceptation check for presence of security tokenReleasedVincent MEMBRÉ

Related issues

Related to Rudder - Bug #12475: Technique Inventory is deleted when upgrading from 4.1 to 4.3Rejected
Related to Rudder - Bug #12606: Restricted java security policy breaks Rudder (class configured for Cipher(provider: BC)cannot be found)Released
Related to Rudder - Bug #12988: NodeInfoCache is precise to the second but we need it to be precise to the millisecondReleased

Associated revisions

Revision c4a3f6e5 (diff)
Added by François ARMAND 5 months ago

Fixes #12474: root node disapeared while upgrading from 4.1 to 4.3 on debian 9

History

#1 Updated by Nicolas CHARLES 8 months ago

it seems that i did not have correct inventory for this machine, so when upgrading, the root disapeared, and caught everything with it
putting back root via an invnentory solved EVERYTHING

#2 Updated by Nicolas CHARLES 8 months ago

  • Related to Bug #12475: Technique Inventory is deleted when upgrading from 4.1 to 4.3 added

#3 Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 4.3.1 to 4.3.2

#4 Updated by Vincent MEMBRÉ 7 months ago

  • Target version changed from 4.3.2 to 410

#5 Updated by Benoît PECCATTE 7 months ago

We need to try to reproduce this

#6 Updated by Benoît PECCATTE 6 months ago

  • Target version changed from 410 to 4.3.2

#7 Updated by Vincent MEMBRÉ 6 months ago

  • Target version changed from 4.3.2 to 4.3.3

#8 Updated by Benoît PECCATTE 6 months ago

  • Status changed from New to Rejected

Cannot reproduce, feel free to reopen if needed

#9 Updated by Alexis MOUSSET 5 months ago

  • Status changed from Rejected to New

Reproduced between two 4.3 nightlies, reopening.

#10 Updated by Alexis MOUSSET 5 months ago

Actually, the inventory after 4.1 -> 4.3 ugrade was not accepted in my case:

[2018-07-12 08:55:25] ERROR com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Error when trying to check inventory signature <- class configured for Signature (provider: BC) cannot be found.

And after 4.3 -> 4.3 upgrade, I got:

[2018-07-16 05:07:59] ERROR com.normation.rudder.services.nodes.NodeInfoServiceCachedImpl - An error occured while updating node cache: can not unserialize node with id 'root', it will be ignored <- Error when mapping '{"agentType":"Community","version":"4.1.7.release-1.SLES.11"}' to an agent info. We are expecting either an agentType with allowed values in cfengine-nova, cfengine-community, dsc or a json like {'agentType': type, 'version': opt_version, 'securityToken': ...} but we get: Error when parsing JSON information about the agent type. <- Invalid value for security token: no value define for security token, and no public key were stored <- Wrong type of value for the agent '{"agentType":"Community","version":"4.1.7.release-1.SLES.11"}'
...
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'Rudder system policy: daily inventory' (inventory-all) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'distributePolicy' (root-DP) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'Rudder system policy: basic setup (common)' (hasPolicyServer-root) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root

#11 Updated by François ARMAND 5 months ago

  • Related to Bug #12606: Restricted java security policy breaks Rudder (class configured for Cipher(provider: BC)cannot be found) added

#12 Updated by François ARMAND 5 months ago

  • Status changed from New to In progress
  • Assignee set to François ARMAND

#13 Updated by François ARMAND 5 months ago

OK, so the problem is that:

- inventory from 4.1 are stored with :

agentName: {"agentType":"Community","version":"4.1.14~rc1~git201807130352-stretch0"}
publicKey: ....

But we consider that an error. If we parse the inventory (from 4.1), the parsing is done correctly (the problem is really in the unserialisation of an already stored inventory during a migration).

#14 Updated by François ARMAND 5 months ago

My previous comment was erroneous, cf the added tests in #12474.

So the problem is that for some reason, we get an inventory for root without a public key in it. That inventory is accepted (but it should not, key or certificate are mandatory in 4.3), and so afterward, the node can't be read back (because it does not met the requirement of having a key).

So we need to refuse inventory without either a public key or a certificate in 4.3

#15 Updated by François ARMAND 5 months ago

  • Description updated (diff)

I'm really not sure that accepting an inventory without a security token was the root cause. We don't have such inventories since... 2.8 ?

I will add the check. If the bug occures again, we will need to try to think to other problems that could happen. At least, we will know that it can't be from the inventory part, and more likelly there is something that breaks root stored inventory information in LDAP (replay of init data? Deletion of some attributes?)

#16 Updated by François ARMAND 5 months ago

So, more data !

Because of some error in cache, we reach a situation where:

- there is an inventory for Rudder 4.3 in LDAP,
- but there was an eviction problem with the cache of node info for the node, so we removed the faulty cache (ok, why not) but we never added back the fresh cache info.

So the cache thinks it is up-to-date, but without the node info.

The next time the node is modified (for ex with a new inventory or by clicking the "clear cache" button in Rudder settings), everything goes back to normal.

#17 Updated by François ARMAND 5 months ago

  • Related to Bug #12988: NodeInfoCache is precise to the second but we need it to be precise to the millisecond added

#18 Updated by François ARMAND 5 months ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Vincent MEMBRÉ
  • Pull Request set to https://github.com/Normation/rudder/pull/1988

#19 Updated by François ARMAND 5 months ago

  • Status changed from Pending technical review to Pending release

#20 Updated by Vincent MEMBRÉ 5 months ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 4.3.3 which was released today.

Also available in: Atom PDF