Bug #4490
closedNon-unique machine UUID mess up collected inventory
Description
I have run into issue today when adding a new machine into Rudder unexpectedly modified another already present machine.
To make the long story short, I've tracked it down to the MACHINEID being the same for two hosts.
I have run the command below to see if I have any more problems:
# grep MACHINEID *| sed 's@^.*<MACHINEID>\(.*\)</MACHINEID>@\1@'| sort | uniq -c 1 00000000-0000-0000-0000-0025906CFD30 2 00020003-0004-0005-0006-000700080009 1 080020FF-FFFF-FFFF-FFFF-466D3D282100 1 080020FF-FFFF-FFFF-FFFF-AA7D8D4F1400 1 080020FF-FFFF-FFFF-FFFF-BE82A5282100 1 17079865-5D68-3894-B0C4-019B018772E0 1 34303732-3334-5553-4D37-313830344D54 1 34313431-3039-5553-4537-34314E35475A 1 34313830-3434-4D58-3237-323730303155 1 44454C4C-3200-1038-8044-C6C04F594631 1 44454C4C-3600-1044-8030-B8C04F383731 1 44454C4C-3800-1058-8059-B8C04F533731 1 44454C4C-4800-1046-8043-C6C04F594631 1 44454C4C-4A00-1052-8034-C8C04F594631 1 44454C4C-4A00-1052-8034-CAC04F594631 1 44454C4C-4B00-1052-8034-B1C04F594631 1 44454C4C-4B00-1052-8034-B2C04F594631 1 44454C4C-4B00-1052-8034-B3C04F594631 1 44454C4C-4B00-1052-8034-B4C04F594631 1 44454C4C-5200-105A-8036-B6C04F594631 1 48384441-3852-0030-4860-003048605394 1 4C4C4544-0032-3810-8048-B1C04F585131 1 4C4C4544-0032-4210-8044-B1C04F585131 1 4C4C4544-0052-4810-8032-B4C04F514C31 1 4EBA5B8C-271D-3308-B60B-D45681B13811 1 53D19F64-D663-A017-8922-0030487E057E 1 53D19F64-D663-A017-8922-0030487E0582 1 53D19F64-D663-A017-8922-0030487E0AF0 1 53D19F64-D663-A017-8922-0030487E0AF2 1 53D19F64-D663-A017-8922-0030487E0AF4 1 53D19F64-D663-A017-8922-0030487E0AF6 1 53D19F64-D663-A017-8922-0030487E22C4 1 53D19F64-D663-A017-8922-0030487E22C6 1 53D19F64-D663-A017-8922-0030487E22CA 1 564D637E-F530-2F47-F1CC-57816DE0977E 2 /dev/mem: Operation not permitted 1 Not Present 1 Not Settable
As you can see, the UUID 00020003-0004-0005-0006-000700080009 is found twice. And indeed, dmidecode -s system-uuid generates the same value on two hosts (Supermicro servers). This problem is not new, see
- https://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg06484.html
- http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006250
- https://pthree.org/2012/06/15/libvirt-tyan-motherboards-and-uuid/
I have currently 40 hosts loaded and 39 entries in LDAP - so for the last three outstanders at the end rudder was able to automatically generate a new UUID for the cases when UUID is really missing.
I think the UUID should be checked for uniqueness at the time the new inventory is about to be accepted, and the user should be warned or given a choice to either use the old entry (can't imagine what would be the reason for this) or generate a new random one (or if there is a way to ensure that this is indeed a new host do this automatically).
My understanding is that the new value then should somehow persist with the host data, so processing inventory updates would find a proper entry automatically.
Updated by Alex Tkachenko almost 11 years ago
Actually what is the point of separating node and machine data in LDAP? Rudder node id appears to be unique enough...
Another way might be to let the sysadmin to augment the system uuid by means of some config file, initially filled with the data from dmidecode (or whatever process is applicable to the current system). If the clash is discovered, the sysadmin could then change the uuid manually.
However, in theory, a malicious user could try to use such a file to override the node data and cause a different set of policies to apply to a given server (maybe relaxing the security controls - you get the idea).
Updated by Alex Tkachenko almost 11 years ago
Maybe this should be a separate bug - please feel free to move it if necessary.
In attempt to rectify the duplicate uuid situation I tried to delete both hosts and add them back again. Needless to say that the outcome was the same. Then, as I wanted an additional case for 3ware RAID illustration for #4395 I decided to load just one host. I deleted both again and then forced inventory collection on one of them. To my surprise the resulting data in the web interface still listed two BIOS entries (and the rest of the information was still messed).
I concluded that instead of creating a completely new machine entry, the app tried to re-use the entry with the same machine ID from removed inventories. When I deleted the node again and deleted all the entries under removed inventories, the data loaded as expected.
This may be of a lesser importance if the uniqueness of the machineID is ensured in the first place, but in the current state it is a nuisance.
Updated by François ARMAND almost 11 years ago
- Project changed from 34 to Rudder
- Category set to 26
- Status changed from New to 8
To answer your question, and explain the situation: at the begining of Rudder, we thought that we wanted to support these uses cases:
- a machine used with an OS, and then the OS reinstalled/changed, and we wanted to be able to know that it was the same machine;
- a machine used for a container provider (lxc for ex.), in common for all the container ;
- an os migrated to new hardware.
We were young and didn't really know what was usefull.
Now, we are thinking that a node is both OS and hardware, in a whole and unique set. The lxc use case is better handled with a logic of virtual host / virtual machine (with the knowledge from the host of what vm it hosts). And migrating to new hardware is just a modification of the node.
So, all your analysis are right, and you have well identified the problem. Perhaps a first workaround would be modify the inventory post process so that the machine UUID is never field, so Rudder will always generate a machine UUID from the node ID (and so, you will have the unique link between a node and its hardware).
Updated by François ARMAND almost 11 years ago
Ho, and you seems to have found an other bug, because bios should have been correclty updated (replaced), not added.
Updated by François ARMAND about 10 years ago
The solution for that one is to mitigate the problem for old node and solve it for new ones by hardlinking a node to its machine:
- for new nodes (ie: its UUID not in Rudder), DO NOT TAKE INTO ACCOUNT machine UUID. Use a hash of node UUID to build machine UUID
- for existing node (inventory update), look at the machine ID:
- if none, used the same system as for new nodes
- if one in a different pending/accepted status that the one of the node, use the same scheme as for new nodes
- else, let it as it is.
That won't solve the problem of deletion of existing nodes (#5372), but it will make it disapear for new nodes.
Note that with that, we still have machine that can belongs to several node. Deleting one of them and accepting it again will break the link.
Updated by Vincent MEMBRÉ about 10 years ago
- Status changed from 8 to Pending technical review
- Assignee set to Nicolas CHARLES
- Priority changed from N/A to 2
- Target version set to 2.6.19
- Pull Request set to https://github.com/Normation/ldap-inventory/pull/50
Updated by Vincent MEMBRÉ about 10 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset ldap-inventory:commit:043bb04bbf02df4b61165c48efeb7e51a13e1162.
Updated by Nicolas CHARLES about 10 years ago
Applied in changeset ldap-inventory:commit:be2312d5925e971100ed3d19b7a7d4ea7379216a.
Updated by Vincent MEMBRÉ about 10 years ago
- Subject changed from Non-unique UUID mess up collected inventory to Non-unique machine UUID mess up collected inventory
Updated by Vincent MEMBRÉ about 10 years ago
- Status changed from Pending release to Released
Updated by Benoît PECCATTE almost 10 years ago
- Category changed from 26 to Web - Nodes & inventories