Project

General

Profile

Actions

Bug #7091

closed

The uuid in the promises and the uuid in /opt/rudder/etc/uuid.hive may be out of sync, and chaos and sadness follows

Added by Nicolas CHARLES over 8 years ago. Updated almost 8 years ago.

Status:
Released
Priority:
N/A
Category:
System techniques
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

In the promises, we hardcode the uuid to fetch the promises from, but still use the /opt/rudder/etc/uuid.hive for everything else (reports, inventories and all).
This leads to very "funny" moments, where uuid.hive matches the uuid in the web interface, but the node is not able to fetch its promises, because it looks in a funny place.

We should have one and only reference, be it the file or the promises, but not random async data


Subtasks 1 (0 open1 closed)

Bug #8455: Broken failsage syntax in after merge errorReleasedBenoît PECCATTE2016-06-03Actions

Related issues 1 (0 open1 closed)

Has duplicate Rudder - Bug #8391: The failsafe doesn't abort is there is no uuidRejected2016-05-26Actions
Actions #1

Updated by François ARMAND over 8 years ago

Just to be sure that the bug is correctly understood:

- we have a file, /opt/rudder/etc/uuid.hive, with the node ID.
- when a node is accepted, it starts getting ITS promises. These promises contains at several point the node id (for the URL where the node should get its future promises, the identification of reports, etc).

So, if somebody change the content of /opt/rudder/etc/uuid.hive on an accepted node, the two set of files will be unsynchronized. So at the next inventory, Rudder will get the new node id, don't know what to do with that, and the node won't be able to go get it's promises because of authorisation.

So, the problem seems to be that changing a node ID is NOT a trivial action. It's an important decision, that bears consequences. Your are changing the ID of the node in Rudder. It's not the same node anymore. Most likelly, it will get OTHER promises. It can even be managed by totally other people, for a totally different role.

So we should fails immediatelly if we are finding that somehow, the node id in /opt/rudder/etc/uuid.hive is not coherent witht the node id for which the promises were produced.

And to make the uuid update easier (for example, when cloning VMs), we should explain what to do to actually having a new node in Rudder, with new, dedicated promises, once accepted:
- add the script corresponding to "rudder agent reinit" in earlier version of Rudder,
- add an error message explaining how to change the UUID when an inconssitancy is found between promises and conig file,
- in the /opt/rudder/etc/uuid.hive config file, add comments eplaining how to update the uuid (NOT by editing the file directly).

What do you thing ?

Actions #2

Updated by Janos Mattyasovszky over 8 years ago

+1

We have hit the same issue, and made even a Nagios Monitor that checks if both UUIDs are in sync.

Actions #3

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.10.16 to 2.10.17
Actions #4

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.10.17 to 2.10.18
Actions #5

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.10.18 to 2.10.19
Actions #6

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.10.19 to 2.10.20
Actions #7

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 2.10.20 to 277
Actions #8

Updated by Vincent MEMBRÉ over 8 years ago

  • Target version changed from 277 to 2.11.18
Actions #9

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 2.11.18 to 2.11.19
Actions #10

Updated by Vincent MEMBRÉ about 8 years ago

  • Target version changed from 2.11.19 to 2.11.20
Actions #11

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 2.11.20 to 2.11.21
Actions #12

Updated by Vincent MEMBRÉ almost 8 years ago

  • Target version changed from 2.11.21 to 2.11.22
Actions #13

Updated by Benoît PECCATTE almost 8 years ago

  • Status changed from New to In progress
  • Assignee set to Benoît PECCATTE
Actions #14

Updated by Jonathan CLARKE almost 8 years ago

After some thought, we think that we should:
  • Always use the UUID from uuid.hive on the node for all operations, logs, reports, etc.
  • Keep the UUID from the server in a generated promises file for the sole purpose of running a check that will display an error message explaining what happened and how to workaround it, then abort the agent.
Actions #15

Updated by Benoît PECCATTE almost 8 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Benoît PECCATTE to Jonathan CLARKE
  • Pull Request set to https://github.com/Normation/rudder-techniques/pull/952
Actions #16

Updated by Benoît PECCATTE almost 8 years ago

  • Has duplicate Bug #8391: The failsafe doesn't abort is there is no uuid added
Actions #17

Updated by Benoît PECCATTE almost 8 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
Actions #18

Updated by Vincent MEMBRÉ almost 8 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 2.11.22, 3.0.17, 3.1.11 and 3.2.4 which were released today.

Actions

Also available in: Atom PDF