Bug #8980
closedUpdate 3.0->3.1 on SLES commits and rebuilds vanilla system techniques
Description
Hi Folks,
We experienced an issue during the update from 3.0 -> 3.1.
Constraints¶
We have to state some facts, that caused this issue:- We have customized the system techniques (changed TCP to UDP in 3.0, since it was not customizable back then) - including commited to git.
diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st index 1397a83..ec94f63 100644 --- a/techniques/system/common/1.0/promises.st +++ b/techniques/system/common/1.0/promises.st @@ -510,7 +510,7 @@ bundle agent check_log_system !windows.rsyslogd.!policy_server:: "/etc/rsyslog.d/rudder-agent.conf" - edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), + edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), create => "true", edit_defaults => empty_backup, classes => kept_if_else("rsyslog_kept", "rsyslog_repaired" , "rsyslog_failed");
- We deploy Rudder using zypper patterns, in which the pattern is versioned, and the pattern contains the exact version of the packages. Example of the Rudder-Pattern (yum repository's pattern):
<pattern xmlns="http://novell.com/package/metadata/suse/pattern" xmlns:rpm="http://linux.duke.edu/metadata/rpm" xmlns:custom="http://fake.custom.ns/metadata/rpm > <name><![CDATA[RUDDER_MASTER]]></name> <arch>x86_64</arch> <version epoch="0" ver="11.3" rel="3.01.11.01"/> <summary><![CDATA[Rudder Root Server 3.1.11 rel 1 SLES11SP3 incl direct dependencies]]></summary> <description><![CDATA[Pattern RUDDER_MASTER]]></description> <uservisible/> <custom:env><![CDATA[RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1]]></custom:env> <rpm:requires> <rpm:entry name="ncf" flags="EQ" epoch="1398866025" ver="0.201606040106-1.SLES.11" /> <rpm:entry name="ncf-api-virtualenv" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-agent" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-inventory-ldap" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-reports" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-server-root" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-jetty" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-techniques" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-inventory-endpoint" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-webapp" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <!-- some other items are left out, but these are the mostly important ones --> </rpm:requires> </pattern>
- We use the feature of setting
RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1
(as merged via #7222).
The Problem¶
... is pretty complex, I am trying to put it in chronologically correct:
The update is being triggered by updating the given RUDDER_MASTER pattern via zypper.¶
This resolves the necessary packages, and does the update:
[09:28:31] + export RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1 [09:28:31] + zypper -v -n --force-resolution -t pattern --repo Pattern --repo ThirdParty RUDDER_MASTER=11.3-3.01.11.01 [...] [09:28:31] The following packages are going to be upgraded: [...] [09:28:31] ncf [09:28:31] 1398866025:0.201601211502-1.SLES.11 -> 1398866025:0.201606040106-1.SLES.11 [09:28:31] ncf-api-virtualenv [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-inventory-endpoint [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-inventory-ldap [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-jetty [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-reports [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-server-root [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-techniques [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-webapp [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [...] [09:29:14] Installing: ncf-1398866025:0.201606040106-1.SLES.11 [.....done] [09:29:34] Installing: rudder-inventory-ldap-1398866025:3.1.11.release-1.SLES.11 [.........done] [09:29:35] Installing: rudder-jetty-1398866025:3.1.11.release-1.SLES.11 [......done] [09:29:36] Installing: rudder-reports-1398866025:3.1.11.release-1.SLES.11 [...done] [09:29:38] Installing: rudder-techniques-1398866025:3.1.11.release-1.SLES.11 [.........done] [09:29:43] Installing: ncf-api-virtualenv-1398866025:3.1.11.release-1.SLES.11 [......done] [09:29:47] Installing: rudder-inventory-endpoint-1398866025:3.1.11.release-1.SLES.11 [.................done] [09:29:59] Installing: rudder-webapp-1398866025:3.1.11.release-1.SLES.11 [...................................done] [09:30:00] Installing: rudder-server-root-1398866025:3.1.11.release-1.SLES.11 [.done] [...]
The order of the installation is important, because that influenses the files and techniques being used.
The Package of rudder-inventory-ldap
is updated.¶
The %postinstall
script is being called after the 3.0.13 -> 3.1.11
update has been done, which includes the rudder-upgrade script:
[...] if [ -x /opt/rudder/bin/rudder-upgrade ] then echo "INFO: Running the Rudder upgrade script /opt/rudder/bin/rudder-upgrade fi [...]However, are is a chain of problems with this:
- The called script
/opt/rudder/bin/rudder-upgrade
is provided byrudder-webapp
- The
rudder-webapp
is not updated yet (see timing above) => The old script is being executed - The old script DOES NOT HAVE this no-autocommit feature.
- The
rudder-upgrade
script always calls the functionupgrade_system_techniques
which invokesupdate_rudder_repository_from_system_directory
- This commits the system techniques from
/opt/rudder/share/techniques/system/
, which is provided by the Packagerudder-techniques
- The package
rudder-techniques
is also still not been updated (see timing above), so basically what happens is this:
commit 955b4a32a15d0cb295ceac7d18959056cc18321e Upgrade system Techniques from /opt/rudder/share/techniques/system/ - automatically done by rudder-upgrade script diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st index 8211ae4..9fa628b 100644 --- a/techniques/system/common/1.0/promises.st +++ b/techniques/system/common/1.0/promises.st @@ -514,7 +514,7 @@ bundle agent check_log_system !windows.rsyslogd.!policy_server:: "/etc/rsyslog.d/rudder-agent.conf" - edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), + edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), create => "true", edit_defaults => empty_backup, classes => rudder_common_classes("rsyslog");This means:
- You commit back the old vanilla techniques which lack any customizations
- You trigger a technique reload
- Since the jetty is still running, the default system policy is going to be rebuilt for all nodes
- If you mess up and fail to properly disable the relays, it is and also distributed to the wild
- This causes any node to revert to TCP logging, flooding the relays with TCP connections, which than can DDOS the policy server
- This is causing the syslogs to hang due to no rudder message goes through, causing IO waits and high load, making some systems and ops people stressful.
Updated by Nicolas CHARLES about 8 years ago
Whow, thank you for the detailed explaination.
I'm pretty sure this has been the cause of weird bugs popping in and out of existence.
On top of my head, i'm pretty sure we should not commit techniques in the rudder-ldap post-inst. There ought to be probably a parameter in it like "do only the ldap part, pretty please"
Updated by Janos Mattyasovszky about 8 years ago
Well, actually I would either suggest to put the upgrades on-demand when the service is starting.
This would cause the first service to start a lot longer after an upgrade.
This would also go in the direction of enabling docker ;-)
Or leave it to the admin?
Dunno. The only thing I know is it's not very good in the rpm, or make sure you create dependencies between the rpm-s, so the installation order is changed by resolving the dependencies and making sure the order is webapp->techniques->ldap.
Well, it still is true that an ldap-package should not call the update techniques logic..
Updated by Nicolas CHARLES about 8 years ago
- Priority changed from N/A to 1 (highest)
- Target version set to 3.1.14
i'm improving the priority of this one, as it is pretty impacting
Updated by François ARMAND about 8 years ago
- Assignee set to Benoît PECCATTE
So, there is two steps to solve that:
1/ in the short term, rudder-ldap MUST NOT TRY to launch rudder-upgrade. In the worst case, we fall in the reported case. In the case of a distributed installation, it just won't work.
2/ for the longer term, we need to split rudder-upgrade for each package, with a dedicated rudder-ldap-upgrade.
Updated by Benoît PECCATTE about 8 years ago
- Status changed from New to In progress
Updated by Benoît PECCATTE about 8 years ago
- Related to Architecture #9103: Separate LDAP components from rudder-upgrade added
Updated by Benoît PECCATTE about 8 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1078
Updated by Benoît PECCATTE about 8 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Applied in changeset rudder-packages|21ee112a303e9a83d461ef0cb5ff2f3d50f29f06.
Updated by Vincent MEMBRÉ about 8 years ago
- Status changed from Pending release to Released
This bug has been fixed in Rudder 3.1.15/14 and 3.2.8/7 which were released today.
- 3.1: Announce Changelog
- 3.2: Announce Changelog
- Download: https://www.rudder-project.org/site/get-rudder/downloads/