Project

General

Profile

Actions

Bug #2446

closed

We have a server which stuck with cf-agent for 4 days

Added by Nicolas PERRON almost 12 years ago. Updated about 9 years ago.

Status:
Rejected
Priority:
2
Category:
Web - Config management
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

We are using Rudder 2.3 which is using cf-agent 3.2.0 on a server and it stuck on our server for 4 days. We could have met a CFEngine bug.

Apr 12 06:30:08 server /USR/SBIN/CRON[28201]: (root) CMD (if [ `ps -efww | grep cf-execd | grep "/var/rudder/cfengine-community/bin/cf-execd" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-execd; fi)
Apr 12 06:30:08 server /USR/SBIN/CRON[28202]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 12 06:33:51 server kernel: [11903872.036045] BUG: soft lockup - CPU#0 stuck for 69s! [cf-agent:13980]
Apr 12 06:33:51 server kernel: [11903872.036876] Modules linked in: xt_tcpudp xt_state ipt_LOG iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables tun loop snd_pcm snd_timer snd soundcore i2c_piix4 snd_page_alloc i2c_core psmouse pcspkr serio_raw evdev virtio_balloon processor button ext3 jbd mbcache sg sr_mod cdrom ata_generic ata_piix virtio_blk virtio_net uhci_hcd ehci_hcd thermal libata floppy thermal_sys virtio_pci virtio_ring virtio usbcore nls_base scsi_mod [last unloaded: scsi_wait_scan]
Apr 12 06:33:51 server kernel: [11903872.036876] CPU 0:
Apr 12 06:33:51 server kernel: [11903872.036876] Modules linked in: xt_tcpudp xt_state ipt_LOG iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables tun loop snd_pcm snd_timer snd soundcore i2c_piix4 snd_page_alloc i2c_core psmouse pcspkr serio_raw evdev virtio_balloon processor button ext3 jbd mbcache sg sr_mod cdrom ata_generic ata_piix virtio_blk virtio_net uhci_hcd ehci_hcd thermal libata floppy thermal_sys virtio_pci virtio_ring virtio usbcore nls_base scsi_mod [last unloaded: scsi_wait_scan]
Apr 12 06:33:51 server kernel: [11903872.036876] Pid: 13980, comm: cf-agent Not tainted 2.6.32-5-amd64 #1 Bochs
Apr 12 06:33:51 server kernel: [11903872.036876] RIP: 0033:[<00007f0ffa46f200>]  [<00007f0ffa46f200>] 0x7f0ffa46f200
Apr 12 06:33:51 server kernel: [11903872.036876] RSP: 002b:00007fffa3c8a300  EFLAGS: 00000246
Apr 12 06:33:51 server kernel: [11903872.036876] RAX: 0000000001fa6420 RBX: 0000000001fbaa50 RCX: 0000000001fa4a00
Apr 12 06:33:51 server kernel: [11903872.036876] RDX: 000000000000005f RSI: 0000000000000003 RDI: 0000000001fa6420
Apr 12 06:33:51 server kernel: [11903872.036876] RBP: ffffffff8101166e R08: 0000000000000000 R09: 00007fffa3c8a490
Apr 12 06:33:51 server kernel: [11903872.036876] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
Apr 12 06:33:51 server kernel: [11903872.036876] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Apr 12 06:33:51 server kernel: [11903872.036876] FS:  00007f0ffb3e1700(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
Apr 12 06:33:52 server kernel: [11903872.036876] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 12 06:33:52 server kernel: [11903872.036876] CR2: 0000000000418760 CR3: 0000000001764000 CR4: 00000000000006f0
Apr 12 06:33:52 server kernel: [11903872.036876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 12 06:33:52 server kernel: [11903872.036876] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 12 06:33:52 server kernel: [11903872.036876] Call Trace:
Apr 12 06:35:09 server /USR/SBIN/CRON[28261]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 12 06:35:28 server /USR/SBIN/CRON[28263]: (root) CMD (if [ `ps -efww | grep cf-execd | grep "/var/rudder/cfengine-community/bin/cf-execd" | grep -v grep | wc -l` -eq 0 ]; then /var/rudder/cfengine-community/bin/cf-execd; fi)
Apr 12 06:37:45 server rudder[28272]:  R: @@Common@@result_success@@hasPolicyServer-root@@common-root@@19@@Update@@None@@$(g.execRun)##fcd702a9-d3d3-4267-b326-211a5608332b@#Policy and dependencies already up to date. No action required.
Actions #1

Updated by Jonathan CLARKE almost 12 years ago

  • Category set to 14
  • Target version changed from 2.3.7 to 2.3.8

The message "BUG: soft lockup - CPU#0 stuck for" seems to be related to virtualization environments according to Google... Maybe CFEngine handles it badly, and remains stuck.

I guess we should address this bug directly to CFEngine.

Actions #2

Updated by Jonathan CLARKE over 11 years ago

  • Target version changed from 2.3.8 to 2.3.9
Actions #3

Updated by Nicolas PERRON over 11 years ago

  • Target version changed from 2.3.9 to 2.3.10
Actions #4

Updated by François ARMAND about 11 years ago

  • Status changed from New to Discussion
  • Assignee set to Nicolas PERRON

We never succeeded in reproducing it, nor encounter it again.

Perhaps we could report it to CFEngine nonetheless, and close that one ?

Actions #5

Updated by Matthieu CERDA about 11 years ago

  • Target version changed from 2.3.10 to 2.3.11
Actions #6

Updated by Matthieu CERDA almost 11 years ago

  • Target version changed from 2.3.11 to 2.3.12
Actions #7

Updated by Matthieu CERDA almost 11 years ago

  • Target version changed from 2.3.12 to 2.3.13
Actions #8

Updated by Nicolas PERRON almost 11 years ago

François ARMAND wrote:

We never succeeded in reproducing it, nor encounter it again.

Perhaps we could report it to CFEngine nonetheless, and close that one ?

On their bugtracker (https://cfengine.com) the versions of CFEngine are from 3.3.0 to 3.6.x then I suppose CFEngine 3.2.0 is no more supported.

Actions #9

Updated by Nicolas PERRON almost 11 years ago

  • Target version changed from 2.3.13 to 84
Actions #10

Updated by Nicolas PERRON almost 11 years ago

  • Target version changed from 84 to 2.4.7
Actions #11

Updated by Nicolas PERRON over 10 years ago

  • Assignee changed from Nicolas PERRON to Jonathan CLARKE

What should we do with this bug ? CFEngine 3.2.X seems to be no more supported.

Actions #12

Updated by Jonathan CLARKE over 10 years ago

  • Status changed from Discussion to Rejected

Nicolas PERRON wrote:

What should we do with this bug ? CFEngine 3.2.X seems to be no more supported.

We have put into place automatic detection for many cases when CFEngine gets stuck. Hopefully, these have fixed this issue, since no one has seen it again since we first opened this ticket 15 months ago.

Therefore, let's close it. We can always re-open if it happens again.

Actions #14

Updated by Benoît PECCATTE about 9 years ago

  • Category changed from 14 to Web - Config management
Actions

Also available in: Atom PDF