Project

General

Profile

Actions

Bug #8980

closed

Update 3.0->3.1 on SLES commits and rebuilds vanilla system techniques

Added by Janos Mattyasovszky almost 5 years ago. Updated over 4 years ago.

Status:
Released
Priority:
1
Category:
Packaging
Target version:
Severity:
User visibility:
Effort required:
Priority:

Description

Hi Folks,

We experienced an issue during the update from 3.0 -> 3.1.

Constraints

We have to state some facts, that caused this issue:
  • We have customized the system techniques (changed TCP to UDP in 3.0, since it was not customizable back then) - including commited to git.
    diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st
    index 1397a83..ec94f63 100644
    --- a/techniques/system/common/1.0/promises.st
    +++ b/techniques/system/common/1.0/promises.st
    @@ -510,7 +510,7 @@ bundle agent check_log_system
    
         !windows.rsyslogd.!policy_server::
           "/etc/rsyslog.d/rudder-agent.conf" 
    -        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
    +        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
             create => "true",
             edit_defaults => empty_backup,
             classes => kept_if_else("rsyslog_kept", "rsyslog_repaired" , "rsyslog_failed");
    
  • We deploy Rudder using zypper patterns, in which the pattern is versioned, and the pattern contains the exact version of the packages. Example of the Rudder-Pattern (yum repository's pattern):
      <pattern xmlns="http://novell.com/package/metadata/suse/pattern" 
                                    xmlns:rpm="http://linux.duke.edu/metadata/rpm" 
                                    xmlns:custom="http://fake.custom.ns/metadata/rpm >
      <name><![CDATA[RUDDER_MASTER]]></name>
      <arch>x86_64</arch>
      <version epoch="0" ver="11.3" rel="3.01.11.01"/>
      <summary><![CDATA[Rudder Root Server 3.1.11 rel 1 SLES11SP3 incl direct dependencies]]></summary>
      <description><![CDATA[Pattern RUDDER_MASTER]]></description>
      <uservisible/>
      <custom:env><![CDATA[RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1]]></custom:env>
      <rpm:requires>
        <rpm:entry name="ncf" flags="EQ" epoch="1398866025" ver="0.201606040106-1.SLES.11" />
        <rpm:entry name="ncf-api-virtualenv" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-agent" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-inventory-ldap" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-reports" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-server-root" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-jetty" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-techniques" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-inventory-endpoint" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <rpm:entry name="rudder-webapp" flags="EQ" epoch="1398866025" ver="3.1.11.release-1.SLES.11" />
        <!-- some other items are left out, but these are the mostly important ones -->
      </rpm:requires>
     </pattern>
    
  • We use the feature of setting RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1 (as merged via #7222).

The Problem

... is pretty complex, I am trying to put it in chronologically correct:

The update is being triggered by updating the given RUDDER_MASTER pattern via zypper.

This resolves the necessary packages, and does the update:

[09:28:31] + export RUDDER_NO_TECHNIQUE_AUTOCOMMIT=1
[09:28:31] + zypper -v -n --force-resolution -t pattern --repo Pattern --repo ThirdParty RUDDER_MASTER=11.3-3.01.11.01
[...]
[09:28:31] The following packages are going to be upgraded:
[...]
[09:28:31] ncf
[09:28:31]   1398866025:0.201601211502-1.SLES.11 -> 1398866025:0.201606040106-1.SLES.11
[09:28:31] ncf-api-virtualenv
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-inventory-endpoint
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-inventory-ldap
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-jetty
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-reports
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-server-root
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-techniques
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[09:28:31] rudder-webapp
[09:28:31]   1398866025:3.0.13.release-1.SLES.11 -> 1398866025:3.1.11.release-1.SLES.11
[...]
[09:29:14] Installing: ncf-1398866025:0.201606040106-1.SLES.11 [.....done]
[09:29:34] Installing: rudder-inventory-ldap-1398866025:3.1.11.release-1.SLES.11 [.........done]
[09:29:35] Installing: rudder-jetty-1398866025:3.1.11.release-1.SLES.11 [......done]
[09:29:36] Installing: rudder-reports-1398866025:3.1.11.release-1.SLES.11 [...done]
[09:29:38] Installing: rudder-techniques-1398866025:3.1.11.release-1.SLES.11 [.........done]
[09:29:43] Installing: ncf-api-virtualenv-1398866025:3.1.11.release-1.SLES.11 [......done]
[09:29:47] Installing: rudder-inventory-endpoint-1398866025:3.1.11.release-1.SLES.11 [.................done]
[09:29:59] Installing: rudder-webapp-1398866025:3.1.11.release-1.SLES.11 [...................................done]
[09:30:00] Installing: rudder-server-root-1398866025:3.1.11.release-1.SLES.11 [.done]
[...]

The order of the installation is important, because that influenses the files and techniques being used.

The Package of rudder-inventory-ldap is updated.

The %postinstall script is being called after the 3.0.13 -> 3.1.11 update has been done, which includes the rudder-upgrade script:

    [...]
    if [ -x /opt/rudder/bin/rudder-upgrade ]
    then
            echo "INFO: Running the Rudder upgrade script

            /opt/rudder/bin/rudder-upgrade
    fi
    [...]
However, are is a chain of problems with this:
  • The called script /opt/rudder/bin/rudder-upgrade is provided by rudder-webapp
  • The rudder-webapp is not updated yet (see timing above) => The old script is being executed
  • The old script DOES NOT HAVE this no-autocommit feature.
  • The rudder-upgrade script always calls the function upgrade_system_techniques which invokes update_rudder_repository_from_system_directory
  • This commits the system techniques from /opt/rudder/share/techniques/system/, which is provided by the Package rudder-techniques
  • The package rudder-techniques is also still not been updated (see timing above), so basically what happens is this:
  commit 955b4a32a15d0cb295ceac7d18959056cc18321e

    Upgrade system Techniques from /opt/rudder/share/techniques/system/ - automatically done by rudder-upgrade script

diff --git a/techniques/system/common/1.0/promises.st b/techniques/system/common/1.0/promises.st
index 8211ae4..9fa628b 100644
--- a/techniques/system/common/1.0/promises.st
+++ b/techniques/system/common/1.0/promises.st
@@ -514,7 +514,7 @@ bundle agent check_log_system

     !windows.rsyslogd.!policy_server::
       "/etc/rsyslog.d/rudder-agent.conf" 
-        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
+        edit_line => append_if_no_lines("#Rudder log system${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then @@${server_info.cfserved}:&SYSLOGPORT&${const.n}if $syslogfacility-text == 'local6' and $programname startswith 'rudder' then ~"),
         create => "true",
         edit_defaults => empty_backup,
         classes       => rudder_common_classes("rsyslog");

This means:
  • You commit back the old vanilla techniques which lack any customizations
  • You trigger a technique reload
  • Since the jetty is still running, the default system policy is going to be rebuilt for all nodes
  • If you mess up and fail to properly disable the relays, it is and also distributed to the wild
  • This causes any node to revert to TCP logging, flooding the relays with TCP connections, which than can DDOS the policy server
  • This is causing the syslogs to hang due to no rudder message goes through, causing IO waits and high load, making some systems and ops people stressful.

Related issues

Related to Rudder - Architecture #9103: Separate LDAP components from rudder-upgradeNewActions
Actions #1

Updated by Nicolas CHARLES almost 5 years ago

Whow, thank you for the detailed explaination.
I'm pretty sure this has been the cause of weird bugs popping in and out of existence.

On top of my head, i'm pretty sure we should not commit techniques in the rudder-ldap post-inst. There ought to be probably a parameter in it like "do only the ldap part, pretty please"

Actions #2

Updated by Janos Mattyasovszky almost 5 years ago

Well, actually I would either suggest to put the upgrades on-demand when the service is starting.
This would cause the first service to start a lot longer after an upgrade.
This would also go in the direction of enabling docker ;-)

Or leave it to the admin?

Dunno. The only thing I know is it's not very good in the rpm, or make sure you create dependencies between the rpm-s, so the installation order is changed by resolving the dependencies and making sure the order is webapp->techniques->ldap.

Well, it still is true that an ldap-package should not call the update techniques logic..

Actions #3

Updated by Nicolas CHARLES almost 5 years ago

  • Priority changed from N/A to 1
  • Target version set to 3.1.14

i'm improving the priority of this one, as it is pretty impacting

Actions #4

Updated by François ARMAND almost 5 years ago

  • Assignee set to Benoît PECCATTE

So, there is two steps to solve that:

1/ in the short term, rudder-ldap MUST NOT TRY to launch rudder-upgrade. In the worst case, we fall in the reported case. In the case of a distributed installation, it just won't work.

2/ for the longer term, we need to split rudder-upgrade for each package, with a dedicated rudder-ldap-upgrade.

Actions #5

Updated by Benoît PECCATTE over 4 years ago

  • Status changed from New to In progress
Actions #6

Updated by Benoît PECCATTE over 4 years ago

Actions #7

Updated by Benoît PECCATTE over 4 years ago

  • Status changed from In progress to Pending technical review
  • Assignee changed from Benoît PECCATTE to Nicolas CHARLES
  • Pull Request set to https://github.com/Normation/rudder-packages/pull/1078
Actions #8

Updated by Benoît PECCATTE over 4 years ago

  • Status changed from Pending technical review to Pending release
  • % Done changed from 0 to 100
Actions #9

Updated by Vincent MEMBRÉ over 4 years ago

  • Status changed from Pending release to Released

This bug has been fixed in Rudder 3.1.15/14 and 3.2.8/7 which were released today.

Actions

Also available in: Atom PDF