Project

General

Profile

Actions

Bug #12474

closed

root node disapeared while upgrading from 4.1 to 4.3 on debian 9

Added by Nicolas CHARLES almost 6 years ago. Updated over 5 years ago.

Status:
Released
Priority:
N/A
Category:
Web - Nodes & inventories
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

WORAROUND: we are not sure of the cause of that ticket (hard to reproduce, but it DID happen two times at least). In all cases, you can correct the problem by executing the following command in the root server:

rudder agent inventory

Description:

After upgrading, root server disapeared from the node list
However, it looks like it is there in ldap:

dn: nodeId=root,ou=Nodes,cn=rudder-configuration
objectClass: rudderPolicyServer
objectClass: rudderNode
objectClass: top
cn: root
nodeId: root
description: the policy server
isSystem: TRUE
isBroken: FALSE
structuralObjectClass: rudderPolicyServer
entryUUID: 33359356-d6a4-1037-9ee3-1d253a7788ee
creatorsName: cn=manager,cn=rudder-configuration
createTimestamp: 20180417160031Z
entryCSN: 20180417160031.621579Z#000000#000#000000
modifiersName: cn=manager,cn=rudder-configuration
modifyTimestamp: 20180417160031Z

dn: nodeId=root,ou=Nodes,ou=Accepted Inventories,ou=Inventories,cn=rudder-co
 nfiguration
objectClass: top
objectClass: node
objectClass: unixNode
objectClass: linuxNode
nodeId: root
osKernelVersion: 1.0-dummy-version
osName: Linux
osVersion: Linux
localAccountName: root
cn: root
localAdministratorAccountName: root
nodeHostname: server.rudder.local
policyServerId: root
inventoryDate: 19700101000000+0200
receiveDate: 19700101000000+0200
ipHostNumber: 127.0.0.1
agentName: Community
rudderServerRole: rudder-web
structuralObjectClass: linuxNode
entryUUID: 3335b732-d6a4-1037-9ee4-1d253a7788ee
creatorsName: cn=manager,cn=rudder-configuration
createTimestamp: 20180417160031Z
entryCSN: 20180417160031.622498Z#000000#000#000000
modifiersName: cn=manager,cn=rudder-configuration
modifyTimestamp: 20180417160031Z

Chaos follow :/


Subtasks 2 (0 open2 closed)

Bug #12980: Add missing test for serialisation of node infoReleasedFrançois ARMANDActions
Bug #12983: Add an inventory acceptation check for presence of security tokenReleasedVincent MEMBRÉActions

Related issues 3 (0 open3 closed)

Related to Rudder - Bug #12475: Technique Inventory is deleted when upgrading from 4.1 to 4.3RejectedActions
Related to Rudder - Bug #12606: Restricted java security policy breaks Rudder (class configured for Cipher(provider: BC)cannot be found)ReleasedAlexis MoussetActions
Related to Rudder - Bug #12988: NodeInfoCache is precise to the second but we need it to be precise to the millisecondReleasedVincent MEMBRÉActions
Actions #1

Updated by Nicolas CHARLES almost 6 years ago

it seems that i did not have correct inventory for this machine, so when upgrading, the root disapeared, and caught everything with it
putting back root via an invnentory solved EVERYTHING

Actions #2

Updated by Nicolas CHARLES almost 6 years ago

  • Related to Bug #12475: Technique Inventory is deleted when upgrading from 4.1 to 4.3 added
Actions #3

Updated by Vincent MEMBRÉ almost 6 years ago

  • Target version changed from 4.3.1 to 4.3.2
Actions #4

Updated by Vincent MEMBRÉ almost 6 years ago

  • Target version changed from 4.3.2 to 410
Actions #5

Updated by Benoît PECCATTE almost 6 years ago

We need to try to reproduce this

Actions #6

Updated by Benoît PECCATTE almost 6 years ago

  • Target version changed from 410 to 4.3.2
Actions #7

Updated by Vincent MEMBRÉ almost 6 years ago

  • Target version changed from 4.3.2 to 4.3.3
Actions #8

Updated by Benoît PECCATTE almost 6 years ago

  • Status changed from New to Rejected

Cannot reproduce, feel free to reopen if needed

Actions #9

Updated by Alexis Mousset over 5 years ago

  • Status changed from Rejected to New

Reproduced between two 4.3 nightlies, reopening.

Actions #10

Updated by Alexis Mousset over 5 years ago

Actually, the inventory after 4.1 -> 4.3 ugrade was not accepted in my case:

[2018-07-12 08:55:25] ERROR com.normation.inventory.provisioning.endpoint.FusionReportEndpoint - Error when trying to check inventory signature <- class configured for Signature (provider: BC) cannot be found.

And after 4.3 -> 4.3 upgrade, I got:

[2018-07-16 05:07:59] ERROR com.normation.rudder.services.nodes.NodeInfoServiceCachedImpl - An error occured while updating node cache: can not unserialize node with id 'root', it will be ignored <- Error when mapping '{"agentType":"Community","version":"4.1.7.release-1.SLES.11"}' to an agent info. We are expecting either an agentType with allowed values in cfengine-nova, cfengine-community, dsc or a json like {'agentType': type, 'version': opt_version, 'securityToken': ...} but we get: Error when parsing JSON information about the agent type. <- Invalid value for security token: no value define for security token, and no public key were stored <- Wrong type of value for the agent '{"agentType":"Community","version":"4.1.7.release-1.SLES.11"}'
...
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'Rudder system policy: daily inventory' (inventory-all) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'distributePolicy' (root-DP) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root
[2018-07-16 05:07:59] ERROR com.normation.rudder.services.policies.RuleValServiceImpl - Some nodes are in the target of rule 'Rudder system policy: basic setup (common)' (hasPolicyServer-root) but are not present in the system. It looks like an inconsistency error. Ignored nodes: root
Actions #11

Updated by François ARMAND over 5 years ago

  • Related to Bug #12606: Restricted java security policy breaks Rudder (class configured for Cipher(provider: BC)cannot be found) added
Actions #12

Updated by François ARMAND over 5 years ago

  • Status changed from New to In progress
  • Assignee set to François ARMAND
Actions #13

Updated by François ARMAND over 5 years ago

OK, so the problem is that:

- inventory from 4.1 are stored with :

agentName: {"agentType":"Community","version":"4.1.14~rc1~git201807130352-stretch0"}
publicKey: ....

But we consider that an error. If we parse the inventory (from 4.1), the parsing is done correctly (the problem is really in the unserialisation of an already stored inventory during a migration).

Actions #14

Updated by François ARMAND over 5 years ago

My previous comment was erroneous, cf the added tests in #12474.

So the problem is that for some reason, we get an inventory for root without a public key in it. That inventory is accepted (but it should not, key or certificate are mandatory in 4.3), and so afterward, the node can't be read back (because it does not met the requirement of having a key).

So we need to refuse inventory without either a public key or a certificate in 4.3

Actions #15

Updated by François ARMAND over 5 years ago

  • Description updated (diff)

I'm really not sure that accepting an inventory without a security token was the root cause. We don't have such inventories since... 2.8 ?

I will add the check. If the bug occures again, we will need to try to think to other problems that could happen. At least, we will know that it can't be from the inventory part, and more likelly there is something that breaks root stored inventory information in LDAP (replay of init data? Deletion of some attributes?)

Actions #16

Updated by François ARMAND over 5 years ago

So, more data !

Because of some error in cache, we reach a situation where:

- there is an inventory for Rudder 4.3 in LDAP,
- but there was an eviction problem with the cache of node info for the node, so we removed the faulty cache (ok, why not) but we never added back the fresh cache info.

So the cache thinks it is up-to-date, but without the node info.

The next time the node is modified (for ex with a new inventory or by clicking the "clear cache" button in Rudder settings), everything goes back to normal.

Actions #17

Updated by François ARMAND over 5 years ago

  • Related to Bug #12988: NodeInfoCache is precise to the second but we need it to be precise to the millisecond added
Actions #18

Updated by François ARMAND over 5 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from François ARMAND to Vincent MEMBRÉ
  • Pull Request set to https://github.com/Normation/rudder/pull/1988
Actions #19

Updated by François ARMAND over 5 years ago

  • Status changed from Pending technical review to Pending release
Actions #20

Updated by Vincent MEMBRÉ over 5 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 4.3.3 which was released today.

Actions

Also available in: Atom PDF