Project

General

Profile

Bug #7091

The uuid in the promises and the uuid in /opt/rudder/etc/uuid.hive may be out of sync, and chaos and sadness follows

Added by Nicolas CHARLES almost 5 years ago. Updated almost 4 years ago.

Status:
Released
Priority:
N/A
Category:
System techniques
Target version:
Severity:
User visibility:
Effort required:
Priority:

Description

In the promises, we hardcode the uuid to fetch the promises from, but still use the /opt/rudder/etc/uuid.hive for everything else (reports, inventories and all).
This leads to very "funny" moments, where uuid.hive matches the uuid in the web interface, but the node is not able to fetch its promises, because it looks in a funny place.

We should have one and only reference, be it the file or the promises, but not random async data


Subtasks

Bug #8455: Broken failsage syntax in after merge errorReleased2016-06-03Benoît PECCATTEActions

Related issues

Has duplicate Rudder - Bug #8391: The failsafe doesn't abort is there is no uuidRejected2016-05-26Actions
#1

Updated by François ARMAND almost 5 years ago

Just to be sure that the bug is correctly understood:

- we have a file, /opt/rudder/etc/uuid.hive, with the node ID.
- when a node is accepted, it starts getting ITS promises. These promises contains at several point the node id (for the URL where the node should get its future promises, the identification of reports, etc).

So, if somebody change the content of /opt/rudder/etc/uuid.hive on an accepted node, the two set of files will be unsynchronized. So at the next inventory, Rudder will get the new node id, don't know what to do with that, and the node won't be able to go get it's promises because of authorisation.

So, the problem seems to be that changing a node ID is NOT a trivial action. It's an important decision, that bears consequences. Your are changing the ID of the node in Rudder. It's not the same node anymore. Most likelly, it will get OTHER promises. It can even be managed by totally other people, for a totally different role.

So we should fails immediatelly if we are finding that somehow, the node id in /opt/rudder/etc/uuid.hive is not coherent witht the node id for which the promises were produced.

And to make the uuid update easier (for example, when cloning VMs), we should explain what to do to actually having a new node in Rudder, with new, dedicated promises, once accepted:
- add the script corresponding to "rudder agent reinit" in earlier version of Rudder,
- add an error message explaining how to change the UUID when an inconssitancy is found between promises and conig file,
- in the /opt/rudder/etc/uuid.hive config file, add comments eplaining how to update the uuid (NOT by editing the file directly).

What do you thing ?

#2

Updated by Janos Mattyasovszky almost 5 years ago

+1

We have hit the same issue, and made even a Nagios Monitor that checks if both UUIDs are in sync.

#3

Updated by Vincent MEMBRÉ almost 5 years ago

  • Target version changed from 2.10.16 to 2.10.17
#4

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.10.17 to 2.10.18
#5

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.10.18 to 2.10.19
#6

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.10.19 to 2.10.20
#7

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.10.20 to 277
#8

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 277 to 2.11.18
#9

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.11.18 to 2.11.19
#10

Updated by Vincent MEMBRÉ over 4 years ago

  • Target version changed from 2.11.19 to 2.11.20
#11

Updated by Vincent MEMBRÉ about 4 years ago

  • Target version changed from 2.11.20 to 2.11.21
#12

Updated by Vincent MEMBRÉ about 4 years ago

  • Target version changed from 2.11.21 to 2.11.22
#13

Updated by Benoît PECCATTE about 4 years ago

  • Status changed from New to In progress
  • Assignee set to Benoît PECCATTE
#14

Updated by Jonathan CLARKE about 4 years ago

After some thought, we think that we should:
  • Always use the UUID from uuid.hive on the node for all operations, logs, reports, etc.
  • Keep the UUID from the server in a generated promises file for the sole purpose of running a check that will display an error message explaining what happened and how to workaround it, then abort the agent.
#15

Updated by Benoît PECCATTE about 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Benoît PECCATTE to Jonathan CLARKE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/952
#16

Updated by Benoît PECCATTE about 4 years ago

  • Has duplicate Bug #8391: The failsafe doesn't abort is there is no uuid added
#17

Updated by Benoît PECCATTE about 4 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
#18

Updated by Vincent MEMBRÉ almost 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.11.22, 3.0.17, 3.1.11 and 3.2.4 which were released today.

Also available in: Atom PDF