Project

General

Profile

Actions

Bug #18203

open

Missing report with directive Scheduled Job

Added by P C 8 months ago. Updated about 20 hours ago.

Status:
New
Priority:
N/A
Assignee:
-
Category:
Agent
Target version:
Severity:
Major - prevents use of part of Rudder | no simple workaround
User visibility:
Effort required:
Priority:
0

Description

I have
A general setup period of 6hours with 5hours max delay.
and
a scheduled job directive configured liked this (for all my 20 nodes):
Lowest time the command should be run at 1
Highest time the command should be run at 4
Consider the job failed after (minutes) 120
Return codes considered as a success 0
Return codes considered as a repair 1
Return codes considered as an error 2

Regularly, some nodes (not the same ones, most of the time only 1, but can be more) have a compliance message 'missing report'


Related issues

Related to Rudder - Bug #18732: backport fix on background command execution on agentReleasedAlexis MOUSSETActions
Actions #1

Updated by P C 7 months ago

  • Subject changed from Missing report with directive Scheduled Job when Opening time shorter then period to Missing report with directive Scheduled Job
  • Severity changed from Minor - inconvenience | misleading | easy workaround to Major - prevents use of part of Rudder | no simple workaround

I have
A general setup period of 2 hours with 1 hour max delay.
and
2 scheduled job directive configured liked this (for all my 20 nodes):
Job1
Lowest time the command should be run at 1
Highest time the command should be run at 4
Consider the job failed after (minutes) 240
Return codes considered as a success 0
Return codes considered as a repair 1
Return codes considered as an error 2
Job2
Lowest time the command should be run at 0
Highest time the command should be run at 23
Consider the job failed after (minutes) 240
Return codes considered as a success 0
Return codes considered as a repair 1
Return codes considered as an error 2

Regularly, some nodes (not the same ones, most of the time only 1, but can be more) have a compliance message 'missing report', sometime for one job, sometime for both.

Actions #2

Updated by Nicolas CHARLES 7 months ago

Some more details:
On a node, with agent schedule every 2 hours, command was executed

2020-09-30T00:50:10+00:00
2020-10-01T00:50:33+00:00
2020-10-02T00:50:03+00:00
2020-10-03T00:51:06+00:00
2020-10-05T00:50:29+00:00

so no run on the 4th and 6th (validated by file touched at beginning of action)

When a job scheduler run was expected, log only says

2020-10-05T00:07:54+00:00 R: @@jobScheduler@@log_info@@xxxxxxxxx-xxxxxxxx@@yyyy-yyyyy@@0@@None@@job_to_run_zzzz_zzzzz@@2020-10-05 00:07:46+00:00##nodeId@#Scheduling job1_to_run_zzzz_zzzzz was correct
2020-10-05T00:07:54+00:00 R: @@jobScheduler@@log_info@@xxxxxxxxx-xxxxxxxx@@yyyy-yyyyy@@0@@None@@job_to_run_zzzz_zzzzz@@2020-10-05 00:07:46+00:00##nodeId@#Scheduling Scheduling job2_to_run_zzzz_zzzzz was correct
2020-10-05T00:07:54+00:00 R: @@jobScheduler@@log_info@@xxxxxxxxx-xxxxxxxx@@yyyy-yyyyy@@0@@Job@@command1@@2020-10-05 00:07:46+00:00##nodeId@#The command will be run at a random time after 00:00 on this node
2020-10-05T00:07:54+00:00 R: @@jobScheduler@@log_info@@xxxxxxxxx-xxxxxxxx@@yyyy-yyyyy@@0@@Job@@command2@@2020-10-05 00:07:46+00:00##nodeId@#The command will be run at a random time after 00:00 on this node

There are 2 jobs, it might be related

Actions #3

Updated by Nicolas CHARLES 5 months ago

  • Target version set to 6.1.7

Nothing looks weird on the code side, but it could be related to https://github.com/cfengine/core/pull/4257

Actions #4

Updated by Nicolas CHARLES 5 months ago

  • Related to Bug #18732: backport fix on background command execution on agent added
Actions #5

Updated by Vincent MEMBRÉ 5 months ago

  • Target version changed from 6.1.7 to 6.1.8
Actions #6

Updated by Vincent MEMBRÉ 4 months ago

  • Target version changed from 6.1.8 to 6.1.9
Actions #7

Updated by Vincent MEMBRÉ 4 months ago

  • Target version changed from 6.1.9 to 6.1.10
Actions #8

Updated by Nicolas CHARLES 2 months ago

lock condition is invalid, it should be !job_scheduler_lock_${iterator}_&RudderUniqueID&
I don't think it would cause the problem here, but it is wrong

Actions #9

Updated by Vincent MEMBRÉ 2 months ago

  • Target version changed from 6.1.10 to 6.1.11
Actions #10

Updated by Vincent MEMBRÉ about 2 months ago

  • Target version changed from 6.1.11 to 6.1.12
Actions #11

Updated by Vincent MEMBRÉ about 1 month ago

  • Target version changed from 6.1.12 to 6.1.13
Actions #12

Updated by Vincent MEMBRÉ about 20 hours ago

  • Target version changed from 6.1.13 to 6.1.14
Actions

Also available in: Atom PDF