The case of the missing GID
Cfengine crashed with a bus error on my now-rebuilt MacBook Pro. The exact same binaries used to work under Tiger and continue to work on the other Leopard Macs. Something was awry.
I ran cfagent under gdb and got a backtrace of the crash. The problem appeared to be in MakeGidList() which was called from a copy rule with parameters
owner=root group=root mode=444
Stepping through the function I found that the GID returned from getgrnam("root") was not 0 as you would expect but some garbled number which caused the crash when it was processed further on.
This was a bit of headscratcher until I remembered another rule which cfagent runs.
control: ActionSequence = ( shellcommands ) dscl = ( "/usr/bin/dscl localhost" ) Leopard:: dscl_local = ( "/Local/Default" ) !Leopard:: dscl_local = ( "/NetInfo" ) classes: HasRootGroup = ( ReturnsZeroShell(${dscl} -list ${dscl_local}/Groups/root &>/dev/null) ) shellcommands: !HasRootGroup:: "$(dscl) -create $(dscl_local)/Groups/root" useshell=false "$(dscl) -create $(dscl_local)/Groups/root PrimaryGroupID 0" useshell=false "$(dscl) -create $(dscl_local)/Groups/root GroupMembership root" useshell=false "$(dscl) -create $(dscl_local)/Groups/root Password \"*\"" useshell=false "$(dscl) -create $(dscl_local)/Groups/root RealName \"System Group\"" useshell=false "$(dscl) -create $(dscl_local)/Groups/root SMBSID S-1-5-21-100" useshell=false
On OS X the group with GID 0 is wheel. My cfengine rules assume that the group root exists with GID 0 and the above snippet will create it if it can’t be found. This then allows my copy rules to say group=root and have it work on multiple operating systems.
My problem was a typo in one of the lines. What I’d actually got was:
"$(dscl) -create $(dscl_local)/Groups/root PrimaryGroupId 0" useshell=false
PrimaryGroupId is not a valid attribute in the eyes of DirectoryService. PrimaryGroupID is but that one typo had led to the root group being created with an undefined GID. Hence when cfagent tried to determine which GID to use for group root it got horribly confused and died.
cfagent is now working properly after I deleted the group and ran the corrected rule to replace it.