Bug #10758
closedNo report on Debian 8
Added by François ARMAND over 7 years ago. Updated over 7 years ago.
Description
It was reported that on a fresh node install on Debian, there was NO REPORT sent at all until a restart of rsyslog was done by hand on the node.
I was not able to reproduce it, but it is thought that it could happen if rsyslog was not started at all, as reported in #8168
Files
first_run.log (696 KB) first_run.log | François ARMAND, 2017-05-23 18:43 |
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.1.2 to 4.1.3
Updated by Florian Heigl over 7 years ago
i’ll give you a cloud-config file, if you install a packet.net server of smallest size and with debian 8 you should reproducible end up with no reports being received
https://gist.github.com/FlorianHeigl/e9d7c5ef3561494a85204b01d67bd3ce
just adjust the rudder master. i don’t know why it works for you in a test (assuming you really started from a freshly created/cloned debian). since coredumb knew the very same issue, it’s likely not what we do to cause it but how you test to not run into it
Also I don't like the classification - this doesn't just affect the first use (it does) but it also affects prod use.
Updated by Florian Heigl over 7 years ago
nevermind, i missed a line there. classification is good.
Updated by François ARMAND over 7 years ago
- Related to Bug #8168: If syslog service is stopped, it is not restarted automatically by rudder-agent, so agent doesn't report anything added
Updated by François ARMAND over 7 years ago
- Priority changed from 0 to 54
Yes, the installation was a fresh one. I did it with our test automation tool (rtf), it was a debian 8.1. We will try again without it, assuming our tool somehow workaround it.
Updated by François ARMAND over 7 years ago
@Florian: about the classification, just to clarify things: it means "that problem can be encounter as early as" [here the user visibility]. So "getting started - demo | etc" is a higher visibility than "operationnal", and all ticket in "getting started" are also in "operationnal" (which is only a subset of the former).
Updated by François ARMAND over 7 years ago
State of progress so far: I wanted to test in a local virtual image to be able to easely snapshot/share when the bug is reproduced. No success, so next stet is testing in your cloud provider.
For trace, what I did:
- went to debian.org, download the current net-install (version is debian-8.8.0-amd64-netinst.iso, image size is 247MB)
- create an empty virtual box image, boot with the iso
- make a base install (without x, etc)
- install wget, build-essential module-assistant, vbox guest
- snapshot & restart
- followed the first steps of the gist:
- echo "deb http://www.rudder-project.org/apt-4.1/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/rudder.list
- apt-get update
- DEBIAN_FRONTEND=noninteractive apt-get -y install rudder-agent --force-yes
- echo "bla bla ip" > /var/rudder/cfengine-community/policy_server.dat
Here, I encounter the problem described in https://www.rudder-project.org/redmine/issues/10774. So I had to do a "rudder agent inventory" to get an inventory and accept the node.
After the acceptation, I didn't do anything else on the node, and reporting is working fine.
So I need to test:
- the remaining steps
- and then, on your cloud provider.
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.1.3 to 4.1.4
Updated by François ARMAND over 7 years ago
Tested the last steps (unlink, install keyring, dist-upgrade, restart rsyslog) and the compliance report are correctly sent to the root server.
Updated by François ARMAND over 7 years ago
I can reproduce it on the cloud provider. So now we are going to be able to try to understand that.
Updated by François ARMAND over 7 years ago
- File first_run.log first_run.log added
Here a capture of the first "rudder agent run -v" where there is an error about rsyslog restart.
Updated by François ARMAND over 7 years ago
So, the problem is that the cloud provider is providing a false initctl with the following content:
root@agent3:~# cat /sbin/initctl
#!/bin/sh
- For most Docker users, "apt-get install" only happens during "docker build",
- where starting services doesn't work and often fails in humorous ways. This
- prevents those failures by stopping the services from attempting to start.
exit 0
So of course, this is not working, and so rsyslog is not restarted but we think it was, and so it does not work.
Updated by Alexis Mousset over 7 years ago
- Related to Bug #10781: Upstart service detection may fail on some cloud providers added
Updated by François ARMAND over 7 years ago
And the interesting part in the looooong verbose log is here:
rudder verbose: returnszero ran '/sbin/initctl status rsyslog 2>&1 | /bin/grep 'Unknown job' > /dev/null' successfully and it did not return zero rudder verbose: Caching result for function 'returnszero("/sbin/initctl status ${service} 2>&1 | ${paths.path[grep]} 'Unknown job' > /dev/null","useshell")' rudder verbose: C: + Private class: is_upstart_service rudder verbose: C: + Private class: is_init_service rudder verbose: C: + Private class: pass1 rudder verbose: Observe process table with /bin/ps -eo user,pid,ppid,pgid,pcpu,pmem,vsz,ni,rss:9,nlwp,stime,etime,time,args
Which let us thought that the problem was around initctl.
Updated by Florian Heigl over 7 years ago
keep in mind you got a "systemd" class available from cfengine.
I don't know how it determines that systemd is really really active, but I've seen it generates that class.
also, I think most distros that have systemd and compatibility wrappers for other inits use those only secondary.
i know none that do it vice-versa (have systemd around but not using it) - iirc even on debian you remove it if you switch back to "unix mode"?
(didn't test)
to me it seems the safest bet is to change the order of detection
rhel:
systemd
init
upstart (because centos 6's cut-down upstart)
debian:
systemd
init
ubuntu:
systemd
upstart
init
others
likely just:
systemd, if active
init / rc.d
Updated by Florian Heigl over 7 years ago
i have also found a similar issue with sles11 & sles12 but don't have sufficient data. I'll put that in a different issue once understood.
Updated by François ARMAND over 7 years ago
- Related to Bug #10810: rudder agent start fails on sles12 added
Updated by François ARMAND over 7 years ago
- Related to Bug #10475: service rudder restart does not work the first time on Debian 8 added
Updated by Vincent MEMBRÉ over 7 years ago
- Target version changed from 4.1.4 to 4.1.5
Updated by François ARMAND over 7 years ago
I can confirm that upcoming Rudder 4.1.4 works and that the present ticket is corrected on the debian8 from app.packet.net
Updated by Alexis Mousset over 7 years ago
- Target version changed from 4.1.5 to 4.1.6