Project

General

Profile

Actions

Bug #8980

closed

Update 3.0->3.1 on SLES commits and rebuilds vanilla system techniques

Added by Janos Mattyasovszky over 7 years ago. Updated over 7 years ago.

Status:
Released
Priority:
1
Category:
Packaging
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

Hi Folks,

We experienced an issue during the update from 3.0 -> 3.1.

Constraints

We have to state some facts, that caused this issue:
  • We have customized the system techniques (changed TCP to UDP in 3.0, since it was not customizable back then) - including commited to git.
    diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st
    index 1397a83..ec94f63 100644
    --- a/techniques/system/common/1.0/promises.st
    +++ b/techniques/system/common/1.0/promises.st
    @@ -510,7 +510,7 @@ bundle agent check_log_system
    
         !windows.rsyslogd.!policy_server::
           "/etc/rsyslog.d/rudder-agent.conf" 
    -        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
    +        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
             create => "true",
             edit_defaults => empty_backup,
             classes => kept_if_else("rsyslog_kept", "rsyslog_repaired" , "rsyslog_failed");
    
  • We deploy Rudder using zypper patterns, in which the pattern is versioned, and the pattern contains the exact version of the packages. Example of the Rudder-Pattern (yum repository's pattern):
      <pattern xmlns="http://novell.com/package/metadata/suse/pattern" 
                                    xmlns:rpm="http://linux.duke.edu/metadata/rpm" 
                                    xmlns:custom="http://fake.custom.ns/metadata/rpm >
      <name><![CDATA[RUDDER_MASTER]]></name>
      <arch>x86_64</arch>
      <version epoch="0" ver="11.3" rel="3.01.11.01"/>
      <summary><![CDATA[Rudder Root Server 3.1.11 rel 1 SLES11SP3 incl direct dependencies]]></summary>
      <description><![CDATA[Pattern RUDDER_MASTER]]></description>
      <uservisible/>
      <custom:env><![CDATA[RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1]]></custom:env>
      <rpm:requires>
        <rpm:entry name="ncf" flags="EQ" epoch="1398866025" ver="0.201606040106-1.SLES.11" />
        <rpm:entry name="ncf-api-virtualenv" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-agent" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-inventory-ldap" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-reports" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-server-root" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-jetty" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-techniques" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-inventory-endpoint" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-webapp" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <!-- some other items are left out, but these are the mostly important ones -->
      </rpm:requires>
     </pattern>
    
  • We use the feature of setting RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1 (as merged via #7222).

The Problem

... is pretty complex, I am trying to put it in chronologically correct:

The update is being triggered by updating the given RUDDER_MASTER pattern via zypper.

This resolves the necessary packages, and does the update:

[09:28:31] + export RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1
[09:28:31] + zypper -v -n --force-resolution -t pattern --repo Pattern --repo ThirdParty RUDDER_MASTER=11.3-3.01.11.01
[...]
[09:28:31] The following packages are going to be upgraded:
[...]
[09:28:31] ncf
[09:28:31]   1398866025:0.201601211502-1.SLES.11 -> 1398866025:0.201606040106-1.SLES.11
[09:28:31] ncf-api-virtualenv
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-inventory-endpoint
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-inventory-ldap
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-jetty
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-reports
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-server-root
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-techniques
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-webapp
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[...]
[09:29:14] Installing: ncf-1398866025:0.201606040106-1.SLES.11 [.....done]
[09:29:34] Installing: rudder-inventory-ldap-1398866025:3.1.11.release-1.SLES.11 [.........done]
[09:29:35] Installing: rudder-jetty-1398866025:3.1.11.release-1.SLES.11 [......done]
[09:29:36] Installing: rudder-reports-1398866025:3.1.11.release-1.SLES.11 [...done]
[09:29:38] Installing: rudder-techniques-1398866025:3.1.11.release-1.SLES.11 [.........done]
[09:29:43] Installing: ncf-api-virtualenv-1398866025:3.1.11.release-1.SLES.11 [......done]
[09:29:47] Installing: rudder-inventory-endpoint-1398866025:3.1.11.release-1.SLES.11 [.................done]
[09:29:59] Installing: rudder-webapp-1398866025:3.1.11.release-1.SLES.11 [...................................done]
[09:30:00] Installing: rudder-server-root-1398866025:3.1.11.release-1.SLES.11 [.done]
[...]

The order of the installation is important, because that influenses the files and techniques being used.

The Package of rudder-inventory-ldap is updated.

The %postinstall script is being called after the 3.0.13 -> 3.1.11 update has been done, which includes the rudder-upgrade script:

    [...]
    if [ -x /opt/rudder/bin/rudder-upgrade ]
    then
            echo "INFO: Running the Rudder upgrade script

            /opt/rudder/bin/rudder-upgrade
    fi
    [...]
However, are is a chain of problems with this:
  • The called script /opt/rudder/bin/rudder-upgrade is provided by rudder-webapp
  • The rudder-webapp is not updated yet (see timing above) => The old script is being executed
  • The old script DOES NOT HAVE this no-autocommit feature.
  • The rudder-upgrade script always calls the function upgrade_system_techniques which invokes update_rudder_repository_from_system_directory
  • This commits the system techniques from /opt/rudder/share/techniques/system/, which is provided by the Package rudder-techniques
  • The package rudder-techniques is also still not been updated (see timing above), so basically what happens is this:
  commit 955b4a32a15d0cb295ceac7d18959056cc18321e

    Upgrade system Techniques from /opt/rudder/share/techniques/system/ - automatically done by rudder-upgrade script

diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st
index 8211ae4..9fa628b 100644
--- a/techniques/system/common/1.0/promises.st
+++ b/techniques/system/common/1.0/promises.st
@@ -514,7 +514,7 @@ bundle agent check_log_system

     !windows.rsyslogd.!policy_server::
       "/etc/rsyslog.d/rudder-agent.conf" 
-        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
+        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
         create => "true",
         edit_defaults => empty_backup,
         classes       => rudder_common_classes("rsyslog");

This means:
  • You commit back the old vanilla techniques which lack any customizations
  • You trigger a technique reload
  • Since the jetty is still running, the default system policy is going to be rebuilt for all nodes
  • If you mess up and fail to properly disable the relays, it is and also distributed to the wild
  • This causes any node to revert to TCP logging, flooding the relays with TCP connections, which than can DDOS the policy server
  • This is causing the syslogs to hang due to no rudder message goes through, causing IO waits and high load, making some systems and ops people stressful.

Related issues 1 (0 open1 closed)

Related to Rudder - Architecture #9103: Separate LDAP components from rudder-upgradeRejectedActions
Actions

Also available in: Atom PDF