Project

General

Profile

Actions

User story #2988

closed

Log agent's report about failure in a dedicated logfile

Added by Vincent MEMBRÉ over 11 years ago. Updated about 9 years ago.

Status:
Released
Priority:
3
Category:
Packaging
Target version:
UX impact:
Suggestion strength:
User visibility:
Effort required:
Name check:
Fix check:
Regression:

Description

We want to have a log file where all agent report about an failed execution will go.

The log must be in a syslog-like format (date, etc), human readable (no UUID without the corresponding human readable name), and computer parsable (so that they can be processed by log-centralizing tools like logstash or greylog).

That log files should not contain success reports (We only want to know what's wrong).

Implementation ticket : #3018 => pull request : https://github.com/Normation/rudder/pull/34
Integration ticket : #3024 => pull requests : https://github.com/Normation/rudder-techniques/pull/14 and https://github.com/Normation/rudder-packages/pull/7
Documentation ticket : Not decided yet


Subtasks 2 (0 open2 closed)

User story #3024: [Integration] Log agent's report about failure in a dedicated logfileReleasedJonathan CLARKE2012-12-12Actions
User story #3073: add logrotate configuration for "non compliant reports" logfileReleasedJonathan CLARKE2012-12-12Actions

Related issues 1 (0 open1 closed)

Related to Rudder - User story #3195: When upgrading Rudder from 2.4 to 2.5~beta1, the logback.xml files is not updatedReleasedMatthieu CERDA2013-01-15Actions
Actions #1

Updated by Vincent MEMBRÉ over 11 years ago

First question :

Do we need only one file for report logging (which would contain every kind of reports) or do we need multiple file (one file per report kind) ?

Having only one file makes a more complicated file to understand, but it also centralize history which is very important.

Actions #2

Updated by Vincent MEMBRÉ over 11 years ago

Second question :

What kind of reports should be in log file ?

In Rudder we have 6 kinds of reports: "Success","Repaired","Unknown","Error","Pending","NoAnswer"

Succes and Pending should not be in the log file

Error and Repaired surely shall be in it.

Unkown and NoAnswer should be in it too.

So the question is mainly about those to last type

We may also use raw report type : "log_warn","log_repaired","result_error","result_repaired". But I think they are less interesting than rudder types.

Actions #3

Updated by Vincent MEMBRÉ over 11 years ago

Last question :

There is different ways to implement that system :

First solution: rsyslog rule

  • Write the log file directly thanks to rsyslog on Rudder server (and regexp that match expected reports).

This is the easiest way (just add a rsyslog rul), but we would get the raw report type (ie : log_warn, result_error, UUID only)

Second solution: select on existing table

  • Select reports from the "RudderSysEvents" table (the table in the database containing all reports).
    • actually, we need to select only new reports from the last select, so we should keep track of what has already been inserted somewhere
  • Process each report line, getting what is missing (names, etc)

Main advantage: it's really easy to do.
Main problem with that solution: the report database is huge and already a contention point, so adding big select on it may not be the wiser thing to do (but we don't have numbers to check that feeling).

We could have any kind of rudder reports, and they could be very readable for users (rule and directive name, node hostname).

Third solution: create a new "failed report" table in database

  • Create a new "report logging" table in the database,
  • Filter report from syslog and insert only whose matching an error or repaired (log_warn,log_repaired,result_error,result_repaired)
  • Process that table every 5 minutes to get the latest reports, process them, add the result in the log file,
  • Remove the reports from database (so that the disk space used, even without a vaccuum "full", stay low)

Advantages:

  • ressource usage would be less than for the second solution,
  • reports would be very readable for users (rule and directive name, node hostname)

Disavantage:

  • that's a new data table, with a lot of cloned report line from syslog
  • we will have only "error" and "repaired" status (so no possibility to calculate "no answer" lines)

Fourth solution: fully duplicate report table in database

  • Insert all reports (even Success) in a new "report logging" table in the database, so that we actually have a full duplication of what is added in that table and in RudderSysEvents one,
  • Process that table every 5 minutes and get the latest reports, process them.
  • Then add them in the log file, and remove them.

Ressource usage would be less than second solution, and there would be every kind of reports (so that we will be able to calculate "no answer" if we want)
Reports would be very readable for users (rule and directive name, node hostname)

I'll go for the fourth solution
This may be more complicated, but this solution offers the best result (every kind of reports, readable reports) without having too much impact on the system (not handling the huuuuuge "RudderSysEvents" table.

So which one do you consider the best ?

Actions #4

Updated by Vincent MEMBRÉ over 11 years ago

  • Status changed from New to Discussion
  • Assignee changed from Vincent MEMBRÉ to Jonathan CLARKE
Actions #5

Updated by François ARMAND over 11 years ago

  • Subject changed from Add a "non success report" logging system to Log agent report about failure in a dedicated file
  • Description updated (diff)

Some more remark from Vincent:

Before beginning the implementation I have some questions, they will be in a next upgrade of the ticket.
I'm not sure about the ticket Category, I think IT infrastucture and tools fits the best
Actions #6

Updated by François ARMAND over 11 years ago

  • Subject changed from Log agent report about failure in a dedicated file to Log agent's report about failure in a dedicated logfile

Better title

Actions #7

Updated by Vincent MEMBRÉ over 11 years ago

There's also some questions about the format we want for those logs

For the moment, my logger format is :

LOG_DATE LOGGER_HOSTNAME rudder-reports[ LOGGER_NAME ]: [ LOG_LEVEL ] Report sent by node ' NODE_HOSTNAME ' executed on EXECUTION_DATE for Rule ' RULE_NAME ' is in status ' REPORT_STATUS ' on directive ' DIRECTIVE_NAME ', component ' COMPONENT_NAME ', value ' COMPONENT_KEY ' with message : ' REPORT_MESSAGE '

nov. 16 15:56:22 mutex rudder-reports[repaired]: [WARN]  Report sent by node 'server.rudder.local' executed on 2012-11-16 15:55:30 for Rule 'distributePolicy' is in status 'result_repaired' on directive 'Distribute Policy', component 'Check WebDAV credentials', value 'None' with message : 'The Rudder WebDAV user and password were updated'

LOGGER NAME / REPORT STATUS / LOG LEVEL are linked. They are organized like that for the moment.
error / result_error / ERROR
repaired / result_repaired / WARN
log / log_warn/repair / INFO

May be some log should be treated as DEBUG or TRACE Level

Actions #8

Updated by Vincent MEMBRÉ over 11 years ago

For this version solution kept will be :

One log file
With raw reports status : result_error, result_repaired, log_repaired, log_warn
Will not create a new table and use RuddersysEvents table to log reports
Will not analyse reports

Log format is [Execution date] N:nodeId(Node hostname) S:report_status R:ruleId(rule name) D:directiveId(directive name) C:component name V:keyValue message

here is an example log :

[20/nov./2012 09:41 +0100] N:root(server.rudder.local) S:result_repaired R:hasPolicyServer-root(Rudder system policy: basic setup (common)) D:common-root(Common) C:Security parameters V:None Some internal security parameters were adjusted
[20/nov./2012 09:41 +0100] N:root(server.rudder.local) S:result_repaired R:root-DP(distributePolicy) D:root-distributePolicy(Distribute Policy) C:Check WebDAV credentials V:None The Rudder WebDAV user and password were updated
[20/nov./2012 09:45 +0100] N:root(server.rudder.local) S:result_repaired R:hasPolicyServer-root(Rudder system policy: basic setup (common)) D:common-root(Common) C:Security parameters V:None Some internal security parameters were adjusted
[20/nov./2012 09:45 +0100] N:root(server.rudder.local) S:result_repaired R:root-DP(distributePolicy) D:root-distributePolicy(Distribute Policy) C:Check WebDAV credentials V:None The Rudder WebDAV user and password were updated
[20/nov./2012 09:46 +0100] N:8fee32c0-00c8-4ae8-8597-5a54a9b6a523(node1.rudder.local) S:result_error R:d0c4a57e-4715-467b-a036-a907c0d9deff(Rule42) D:87097897-95cc-4868-ab39-690b6d11076d(apache) C:apacheServer V:None Could not restart Apache HTTPD
[20/nov./2012 09:46 +0100] N:8fee32c0-00c8-4ae8-8597-5a54a9b6a523(node1.rudder.local) S:result_error R:d0c4a57e-4715-467b-a036-a907c0d9deff(Rule42) D:87097897-95cc-4868-ab39-690b6d11076d(apache) C:apacheServer V:None Apache binary is not present. Something is wrong (installation failure ?)

Actions #9

Updated by Vincent MEMBRÉ over 11 years ago

  • Description updated (diff)
Actions #10

Updated by Vincent MEMBRÉ over 11 years ago

  • Status changed from Discussion to In progress
  • Assignee changed from Jonathan CLARKE to Vincent MEMBRÉ
Actions #11

Updated by Vincent MEMBRÉ over 11 years ago

  • Description updated (diff)
Actions #12

Updated by Jonathan CLARKE over 11 years ago

  • Category changed from 13 to 11

François ARMAND wrote:

Some more remark from Vincent:

I'm not sure about the ticket Category, I think IT infrastucture and tools fits the best

No, "IT infrastructure and tools" is for resources to support the Rudder project (this Redmine, the servers it runs on, GitHub, etc...). This is clearly a system integration tasK.

Actions #13

Updated by Jonathan CLARKE over 11 years ago

  • Target version set to 2.5.0~beta1
Actions #14

Updated by François ARMAND over 11 years ago

  • Status changed from In progress to 13
Actions #15

Updated by Vincent MEMBRÉ over 11 years ago

  • Description updated (diff)
Actions #16

Updated by Vincent MEMBRÉ over 11 years ago

  • Description updated (diff)
Actions #17

Updated by Jonathan CLARKE over 11 years ago

This is now awaiting the Technical Review on #3018 (development). Integration is all done.

Actions #18

Updated by Nicolas CHARLES about 11 years ago

  • Status changed from 13 to 10

all subticket are pending release, i'm correcting the status of the meta

Actions #19

Updated by Nicolas CHARLES about 11 years ago

  • Status changed from 10 to 12
Actions #20

Updated by Nicolas CHARLES about 11 years ago

  • Status changed from 12 to Pending release
Actions #21

Updated by Matthieu CERDA about 11 years ago

  • Status changed from Pending release to Released
Actions #22

Updated by Nicolas PERRON about 11 years ago

  • Project changed from Rudder to 34
  • Category deleted (11)
Actions #23

Updated by Benoît PECCATTE about 9 years ago

  • Project changed from 34 to Rudder
  • Category set to Packaging
Actions

Also available in: Atom PDF