Project

General

Profile

Actions

Bug #4717

closed

Document how to solve hughe IO wait problem leading to random "NoAnswer"

Added by Dennis Cabooter about 10 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
N/A
Assignee:
-
Category:
Documentation
Target version:
Severity:
UX impact:
User visibility:
Effort required:
Priority:
Name check:
Fix check:
Regression:

Description

There was a hughe problem on Ubuntu nodes. At random on or more nodes were in NoAnswer state. Remove all tcdb files and manually run cf-agent -KI solved the problem temporary. Also we had hughe IO wait on our storage, which affected the storage and all our VMs. Eventually I found out that CFEngine (Tokyo Cabinet) was the cause of all our IO problems (everywhere in our network).

There were machines really doing nothing (yet) and they had hughe IO waits. The iotop command showed that cf-agent was the only process writing to the file system. After Kegeruneku pointed me to http://blog.normation.com/en/2013/09/09/speed-up-your-cfengine-by-using-a-ram-disk/ and I implemented that through Rudder, all problems seem to be gone.

Please advise everyone to add the following to fstab, especially if they use Ubuntu (12.04 LTS - Precise). You should add this to the Rudder documentation in bold. However, it only applies to cfengine versions that use Tokyo Cabinet.

# Tmpfs for the CFEngine state backend storage directory
tmpfs /var/rudder/cfengine-community/state tmpfs size=128M,nr_inodes=2k,mode=0755,noexec,nosuid,noatime,nodiratime 0 0
Actions

Also available in: Atom PDF