Bug #15492
closedrudder agent on Virtuozzo/openvz hypervisors uses broken vzps
Description
Hello!
This is a reopen of bug #4498 (https://issues.rudder.io/issues/4498) but I have not found how to reopen, sorry.
I use rudder agent 5.0.12, and part of the problem is still here.
More precisely, the part regarding calling vzps
with the option "thcount"
instead of "nlwp"
(see this comment: https://issues.rudder.io/issues/4498#note-4)
I have three openvz hypervisors, they are bare metal openvz and not Proxmox, as Alexis had noticed, Proxmox now uses LXC since v4.
On these hypervisors, all report are missing systematically, and rudder agent check
always return:
WARNING: No disable file detected and no agent executor process either. Restarting agent service...ok: stop service rudder-agent succeeded
ok: start service rudder-agent succeeded
Done
ok: Rudder agent check ran without errors.
Notive the "WARNING" part, it's always present, and on the rudder server, the node is seen at the exact hour when tu rudder agent check ran, but without any report (100% missing report)
It seems the rudder service starts, but does nothing after that :-/
After reading the bug 4498, I tried to launch directly cf-agent -Kv
It's very verbose, but by searching for vzps
inside the output, I indeed found lines like:
rudder verbose: P: END methods promise (any)
rudder verbose: Using the default body: processes_action
rudder verbose: Observe process table with /bin/vzps -E 0 -o user,pid,ppid,pgid,pcpu,pmem,vsz,ni,rss,thcount,stime,time,args
rudder verbose: A: ...................................................
rudder verbose: A: Bundle Accounting Summary for 'check_cron_daemon' in namespace default
There is no error around the vzps line, but when I tried to launch it from command line:
/bin/vzps -E 0 -o user,pid,ppid,pgid,pcpu,pmem,vsz,ni,rss,thcount,stime,time,args error: unknown user-defined format specifier "thcount" Usage: vzps [options] Try 'vzps --help <simple|list|output|threads|misc|all>' or 'vzps --help <s|l|o|t|m|a>' for additional help text. For more details see ps(1).
So it seems the thcount
options is still the problem, and despite the error, cf-agent report evertyghing is ok.
If I replace thcount with nlwp the vzps command line works and return all processes.
I do not found any other logs, but it seems that the vzps failed when the cf-agent ran, but it's considered as a success.
So somewhat no report are generated but the agent think everything is ok :(
If I run rudder agent run
manually, it works and report are generated and sent to the server!
I do not know why rudder agent check
triggers the problem and not rudder agent run
For the moment I have a (dirty) workaround by deploying a custom cron launching rudder agent run (thanks rudder for this deployment ^^)
Please if you have any ideas how to patch that, it would be very appreciated, as Openvz hypervisor is not manageable by rudder at the moment because of that.
Thank you!
Updated by Alexis Mousset over 5 years ago
- Related to Bug #15488: Virtuozzo Virtual machine reported as "Unknown type" added
Updated by Alexis Mousset over 5 years ago
- Related to Bug #15487: Openvz/Virtuozzo virtual machine detected as Physical added
Updated by Alexis Mousset over 5 years ago
- Category set to Agent
- Target version set to 5.0.13
We actually used nlwp
, and changed it to thcount
afterwards in https://tracker.mender.io/browse/CFE-1822 (https://github.com/cfengine/core/commit/38d4018c83a377055197c0e4c8c8099f1bc142e3), apparently for Proxmox support. Do you know why there options are different between openVZ systems, and what is the correct way of listing processes now? (I fear reverting to nlwp would break things too).
Updated by Victor Héry over 5 years ago
Ok, this is interesting indeed
Apparently something has diverged between Proxmox and Openvz itself.
As stated at the end of bug #4498 Proxmox do not use openvz anymore (now LXC) and vzps is not present on recent proxmox:
# vzps
zsh: command not found: vzps
So I guess older version of Proxmox has different versions of vzps than virtuozzo, with different options :-/
Actually the correct options for last vzps version is nlwp
However, the ps command supports both, even if vzps should be a wrapper of ps.
I'll search from virtuozzo point of view, perhaps they have some explanation of that and keep you informed :-)
Thanks!
Updated by Victor Héry over 5 years ago
- Subject changed from rudder agent on Virtuozzo hypervisors uses broken vzps to rudder agent on Virtuozzo/openvz hypervisors uses broken vzps
Okay I was not able to find why this command changes its options.
Documentation states that both option should be supported, but it's not the case. There is no limit or bugs report.
I have opened a ticket in the openvz bugtracker (OVZ-7124) to see if there is any news or workaround about that.
In the meantime, it's sure that vzps does not exist anymore in recent proxmox version (that use LXC) and that openvz now support nlwp in vzps, so it should do the job to use nlwp instead of thcount, but it's up to you to choose I guess :-)
I'll keep you informed if I have news from openvz!
Regards,
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.13 to 5.0.14
Updated by Victor Héry over 5 years ago
Hello,
Good news from openvz, they found a bug about this:
This is a bug we inherited from upstream - tgid and thcount options were misordered in the options array, which lead to incorrect result from binary search - option was present but couldn't be found - hence the error. Next version of a package will contain the fix. I don't know when it will be released for a stable repository, but it should appear on factory repo soon enough. Fix version should be vzprocps-3.3.10-4.vl7.9.x86_64
This should be fixed in the next vzps version, if it's indeed the case, it should solve this "bug" also from rudder point of view 🎉
I'll test the fix once its released and let you know!
Regards,
Updated by Alexis Mousset over 5 years ago
- Status changed from New to Resolved
Good news! Thanks for the update.
Setting this to "resolved", please reopen if vzps fixes are not enough.
Updated by Vincent MEMBRÉ over 5 years ago
- Target version changed from 5.0.14 to 5.0.13