Actions
Bug #8980
closedUpdate 3.0->3.1 on SLES commits and rebuilds vanilla system techniques
Pull Request:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:
Description
Hi Folks,
We experienced an issue during the update from 3.0 -> 3.1.
Constraints¶
We have to state some facts, that caused this issue:- We have customized the system techniques (changed TCP to UDP in 3.0, since it was not customizable back then) - including commited to git.
diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st index 1397a83..ec94f63 100644 --- a/techniques/system/common/1.0/promises.st +++ b/techniques/system/common/1.0/promises.st @@ -510,7 +510,7 @@ bundle agent check_log_system !windows.rsyslogd.!policy_server:: "/etc/rsyslog.d/rudder-agent.conf" - edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), + edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), create => "true", edit_defaults => empty_backup, classes => kept_if_else("rsyslog_kept", "rsyslog_repaired" , "rsyslog_failed");
- We deploy Rudder using zypper patterns, in which the pattern is versioned, and the pattern contains the exact version of the packages. Example of the Rudder-Pattern (yum repository's pattern):
<pattern xmlns="http://novell.com/package/metadata/suse/pattern" xmlns:rpm="http://linux.duke.edu/metadata/rpm" xmlns:custom="http://fake.custom.ns/metadata/rpm > <name><![CDATA[RUDDER_MASTER]]></name> <arch>x86_64</arch> <version epoch="0" ver="11.3" rel="3.01.11.01"/> <summary><![CDATA[Rudder Root Server 3.1.11 rel 1 SLES11SP3 incl direct dependencies]]></summary> <description><![CDATA[Pattern RUDDER_MASTER]]></description> <uservisible/> <custom:env><![CDATA[RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1]]></custom:env> <rpm:requires> <rpm:entry name="ncf" flags="EQ" epoch="1398866025" ver="0.201606040106-1.SLES.11" /> <rpm:entry name="ncf-api-virtualenv" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-agent" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-inventory-ldap" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-reports" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-server-root" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-jetty" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-techniques" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-inventory-endpoint" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <rpm:entry name="rudder-webapp" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" /> <!-- some other items are left out, but these are the mostly important ones --> </rpm:requires> </pattern>
- We use the feature of setting
RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1
(as merged via #7222).
The Problem¶
... is pretty complex, I am trying to put it in chronologically correct:
The update is being triggered by updating the given RUDDER_MASTER pattern via zypper.¶
This resolves the necessary packages, and does the update:
[09:28:31] + export RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1 [09:28:31] + zypper -v -n --force-resolution -t pattern --repo Pattern --repo ThirdParty RUDDER_MASTER=11.3-3.01.11.01 [...] [09:28:31] The following packages are going to be upgraded: [...] [09:28:31] ncf [09:28:31] 1398866025:0.201601211502-1.SLES.11 -> 1398866025:0.201606040106-1.SLES.11 [09:28:31] ncf-api-virtualenv [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-inventory-endpoint [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-inventory-ldap [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-jetty [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-reports [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-server-root [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-techniques [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [09:28:31] rudder-webapp [09:28:31] 1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11 [...] [09:29:14] Installing: ncf-1398866025:0.201606040106-1.SLES.11 [.....done] [09:29:34] Installing: rudder-inventory-ldap-1398866025:3.1.11.release-1.SLES.11 [.........done] [09:29:35] Installing: rudder-jetty-1398866025:3.1.11.release-1.SLES.11 [......done] [09:29:36] Installing: rudder-reports-1398866025:3.1.11.release-1.SLES.11 [...done] [09:29:38] Installing: rudder-techniques-1398866025:3.1.11.release-1.SLES.11 [.........done] [09:29:43] Installing: ncf-api-virtualenv-1398866025:3.1.11.release-1.SLES.11 [......done] [09:29:47] Installing: rudder-inventory-endpoint-1398866025:3.1.11.release-1.SLES.11 [.................done] [09:29:59] Installing: rudder-webapp-1398866025:3.1.11.release-1.SLES.11 [...................................done] [09:30:00] Installing: rudder-server-root-1398866025:3.1.11.release-1.SLES.11 [.done] [...]
The order of the installation is important, because that influenses the files and techniques being used.
The Package of rudder-inventory-ldap
is updated.¶
The %postinstall
script is being called after the 3.0.13 -> 3.1.11
update has been done, which includes the rudder-upgrade script:
[...] if [ -x /opt/rudder/bin/rudder-upgrade ] then echo "INFO: Running the Rudder upgrade script /opt/rudder/bin/rudder-upgrade fi [...]However, are is a chain of problems with this:
- The called script
/opt/rudder/bin/rudder-upgrade
is provided byrudder-webapp
- The
rudder-webapp
is not updated yet (see timing above) => The old script is being executed - The old script DOES NOT HAVE this no-autocommit feature.
- The
rudder-upgrade
script always calls the functionupgrade_system_techniques
which invokesupdate_rudder_repository_from_system_directory
- This commits the system techniques from
/opt/rudder/share/techniques/system/
, which is provided by the Packagerudder-techniques
- The package
rudder-techniques
is also still not been updated (see timing above), so basically what happens is this:
commit 955b4a32a15d0cb295ceac7d18959056cc18321e Upgrade system Techniques from /opt/rudder/share/techniques/system/ - automatically done by rudder-upgrade script diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st index 8211ae4..9fa628b 100644 --- a/techniques/system/common/1.0/promises.st +++ b/techniques/system/common/1.0/promises.st @@ -514,7 +514,7 @@ bundle agent check_log_system !windows.rsyslogd.!policy_server:: "/etc/rsyslog.d/rudder-agent.conf" - edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), + edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"), create => "true", edit_defaults => empty_backup, classes => rudder_common_classes("rsyslog");This means:
- You commit back the old vanilla techniques which lack any customizations
- You trigger a technique reload
- Since the jetty is still running, the default system policy is going to be rebuilt for all nodes
- If you mess up and fail to properly disable the relays, it is and also distributed to the wild
- This causes any node to revert to TCP logging, flooding the relays with TCP connections, which than can DDOS the policy server
- This is causing the syslogs to hang due to no rudder message goes through, causing IO waits and high load, making some systems and ops people stressful.
Updated by Nicolas CHARLES over 8 years ago
- Priority changed from N/A to 1 (highest)
- Target version set to 3.1.14
Updated by Benoît PECCATTE over 8 years ago
- Related to Architecture #9103: Separate LDAP components from rudder-upgrade added
Updated by Benoît PECCATTE over 8 years ago
- Status changed from In progress to Pending technical review
- Assignee changed from Benoît PECCATTE to Nicolas CHARLES
- Pull Request set to https://github.com/Normation/rudder-packages/pull/1078
Updated by Benoît PECCATTE over 8 years ago
- Status changed from Pending technical review to Pending release
- % Done changed from 0 to 100
Actions