Project

General

Profile

Actions

Bug #8611

closed

Service ensure running keeps staying repaired

Added by Florian Heigl almost 8 years ago. Updated almost 2 years ago.

Status:
Rejected
Priority:
N/A
Assignee:
-
Category:
Generic methods
Target version:
Severity:
Minor - inconvenience | misleading | easy workaround
UX impact:
User visibility:
Operational - other Techniques | Technique editor | Rudder settings
Effort required:
Priority:
0
Name check:
Fix check:
Regression:

Description

This is something systemd-shim-ish

@root@ref-one-master:~# ps -ef | grep sunstone
root 1011 30971 0 19:45 pts/1 00:00:00 grep sunstone
oneadmin 19214 1 0 15:50 ? 00:00:00 python /usr/share/one/websockify/websocketproxy.py --target-config=/var/lib/one/sunstone_vnc_tokens 29876
oneadmin 19236 1 0 15:50 ? 00:00:03 ruby /usr/lib/one/sunstone/sunstone-server.rb
root@ref-one-master:~# systemctl -a | grep sunstone
opennebula-sunstone.service loaded active exited LSB: Sunstone init script
root@ref-one-master:~# rudder agent run
Rudder agent 3.2.4-jessie0 (CFEngine Core 3.7.1)
Node uuid: 85f6affd-37a7-4fac-85fa-6f7005a2fde0
Start execution with config [279964146]

Result Technique Component Key Message
repaired bugfix_sunstone Service ensure running opennebula-sunsto| Ensure that service opennebula-sunstone is running was repaired
error config_ONE_sunstone_webs| Service check running opennebula-sunsto| Not applicable could not be repaired
error install_opennebula_client Package install nfs-client Install or update package nfs-client in version latest could not be repaired

  1. Summary #####################################################################
    repaired: 1
    error: 2
    execution time: 19.69s ################################################################################
    root@ref-one-master:~# rudder agent run
    Rudder agent 3.2.4-jessie0 (CFEngine Core 3.7.1)
    Node uuid: 85f6affd-37a7-4fac-85fa-6f7005a2fde0
    Start execution with config [279964146]

Result Technique Component Key Message
repaired bugfix_sunstone Service ensure running opennebula-sunsto| Ensure that service opennebula-sunstone is running was repaired
error config_ONE_sunstone_webs| Service check running opennebula-sunsto| Not applicable could not be repaired
error install_opennebula_client Package install nfs-client Install or update package nfs-client in version latest could not be repaired

  1. Summary #####################################################################
    repaired: 1
    error: 2
    execution time: 19.90s@

It'll try to restart it each time, while systemd says it's active.
( and it's running)

root@ref-one-master:~# cat /etc/debian_version
8.4

root@ref-one-master:~# rudder agent info
Hostname: ref-one-master
UUID: 85f6affd-37a7-4fac-85fa-6f7005a2fde0
Key Hash: sdflksdjflskjslfkjlksdjfkljsd983b5b8ae
Policy server:sljsdlfjsdlfjsdlfj
Roles: rudder-agent
Agent is enabled
Policy updated: 2016-06-27 19:31:10
Inventory sent: 2016-06-27 15:18:49
Version: Rudder agent 3.2.4-jessie0 (CFEngine Core 3.7.1)


Files

service_ensure_running.txt (25.7 KB) service_ensure_running.txt Ferenc Ulrich, 2017-01-24 10:34

Related issues 2 (0 open2 closed)

Related to Rudder - Bug #7247: service detection based on ps may not work for start/restart method RejectedActions
Related to Rudder - Architecture #7192: Rewrite the service_* methodsReleasedNicolas CHARLESActions
Actions #1

Updated by Alexis Mousset over 7 years ago

  • Category set to Generic methods
Actions #2

Updated by Ferenc Ulrich about 7 years ago

The method keeps reporting repaired, even if service is running fine(since months).
It should bring _kept, if no action was taken, and _repaired, if service needed to be restarted.

Actions #3

Updated by Alexis Mousset about 7 years ago

  • Category changed from Generic methods to Generic methods - Service Management
Actions #4

Updated by Florian Heigl about 7 years ago

As you can see in the logs it doesn't just incorrectly report. It tries starting the service on each run of the agent.
It also seems to ps / grep instead of asking systemd. Is that intentional?
The one thing systemd doesn't mess up is knowing if a process is running.

Actions #5

Updated by Florian Heigl about 7 years ago

I'm not sure how exactly the query goes wrong but I can see that both services showing this issue have systemd unit files and systemctl status works on them.
So I'm pretty sure the good long term fix here would be to go and ask systemd.
Our lldp example is running twice with different users which might be part of the problem.

The other process (ossec-hids) was actually broken, so this seems a different story.

Actions #6

Updated by Alexis Mousset about 7 years ago

  • Assignee set to Alexis Mousset
Actions #7

Updated by Alexis Mousset about 7 years ago

  • Target version set to 0.x
Actions #8

Updated by Alexis Mousset about 7 years ago

  • Target version changed from 0.x to 1.0
Actions #9

Updated by Alexis Mousset about 7 years ago

  • Status changed from New to In progress
Actions #10

Updated by Alexis Mousset about 7 years ago

  • Status changed from In progress to New
  • Target version changed from 1.0 to 0.x

This happens because running process check is not done through systemd, but using the process list. We will backport (at least) the check for systemctl in v0.x.

Actions #11

Updated by Alexis Mousset about 7 years ago

  • Related to Bug #7247: service detection based on ps may not work for start/restart method added
Actions #12

Updated by Alexis Mousset about 7 years ago

This was fixed in #7192 in master branch, we can try to backport a fix (systemd-based service status checks) for 0.x.

Actions #13

Updated by Alexis Mousset about 7 years ago

Actions #14

Updated by Benoît PECCATTE about 7 years ago

  • Severity set to Minor - inconvenience | misleading | easy workaround
  • User visibility set to Operational - other Techniques | Technique editor | Rudder settings
  • Priority set to 15
Actions #15

Updated by Jonathan CLARKE almost 7 years ago

  • Assignee deleted (Alexis Mousset)
  • Priority changed from 15 to 14
Actions #16

Updated by Alexis Mousset almost 7 years ago

  • Status changed from New to Rejected

This bug is fixed in 4.1, and the required changes are considered too big and impacting to backport them.

I'm closing this ticket, users should upgrade to 4.1 when possible.

Actions #17

Updated by Alexis Mousset almost 2 years ago

  • Target version changed from 0.x to ncf-0.x
  • Priority changed from 14 to 0
Actions #18

Updated by Alexis Mousset almost 2 years ago

  • Project changed from 41 to Rudder
  • Category changed from Generic methods - Service Management to Generic methods
Actions

Also available in: Atom PDF