2007-11-22

The case of the missing GID

Filed under: Geekiness — iain @ 08:53:42

Cfengine crashed with a bus error on my now-rebuilt MacBook Pro. The exact same binaries used to work under Tiger and continue to work on the other Leopard Macs. Something was awry.

I ran cfagent under gdb and got a backtrace of the crash. The problem appeared to be in MakeGidList() which was called from a copy rule with parameters

    owner=root group=root mode=444

Stepping through the function I found that the GID returned from getgrnam("root") was not 0 as you would expect but some garbled number which caused the crash when it was processed further on.

This was a bit of headscratcher until I remembered another rule which cfagent runs.

    control:
      ActionSequence = ( shellcommands )

      dscl = ( "/usr/bin/dscl localhost" )
    Leopard::
      dscl_local = ( "/Local/Default" )
    !Leopard::
      dscl_local = ( "/NetInfo" )

    classes:
      HasRootGroup = ( ReturnsZeroShell(${dscl} -list ${dscl_local}/Groups/root &>/dev/null) )

    shellcommands:
    !HasRootGroup::
      "$(dscl) -create $(dscl_local)/Groups/root" useshell=false
      "$(dscl) -create $(dscl_local)/Groups/root PrimaryGroupID 0" useshell=false
      "$(dscl) -create $(dscl_local)/Groups/root GroupMembership root" useshell=false   
      "$(dscl) -create $(dscl_local)/Groups/root Password \"*\"" useshell=false
      "$(dscl) -create $(dscl_local)/Groups/root RealName \"System Group\"" useshell=false
      "$(dscl) -create $(dscl_local)/Groups/root SMBSID S-1-5-21-100" useshell=false

On OS X the group with GID 0 is wheel. My cfengine rules assume that the group root exists with GID 0 and the above snippet will create it if it can’t be found. This then allows my copy rules to say group=root and have it work on multiple operating systems.

My problem was a typo in one of the lines. What I’d actually got was:

      "$(dscl) -create $(dscl_local)/Groups/root PrimaryGroupId 0" useshell=false

PrimaryGroupId is not a valid attribute in the eyes of DirectoryService. PrimaryGroupID is but that one typo had led to the root group being created with an undefined GID. Hence when cfagent tried to determine which GID to use for group root it got horribly confused and died.

cfagent is now working properly after I deleted the group and ran the corrected rule to replace it.

2007-11-21

Well, arse

Filed under: Geekiness — iain @ 18:15:09

So the reason I couldn’t reinstall Tiger on my laptop was not because the failed Leopard upgrade had broken my filesystem but because I was using the DVD which came with my iMac and not the one from my MacBook Pro.

Which means I erased the filesystem and lost files for nothing.

How kind of Apple to explain why the installation couldn’t proceed. How nice of them to say "Sorry but this DVD can only be used to install OS X on an iMac."

Oh, wait. They didn’t say that.

They didn’t say anything at all. Just "bugger off; OS X not yours."

I am very, very upset about this.

Leopard installed

Filed under: Geekiness — iain @ 08:35:33

Rebecca’s Mini now has Leopard. The install went flawlessly, unlike that of my MacBook Pro (end result: dead laptop) and my iMac (two lockups and a bunch of trashed settings).

After the bad experience with LDAP on my iMac I half expected the Mini to refuse to allow me to log in.

I should have been fully expecting it.

Luckily I was able to ssh to the machine as I’d put a DSA key in /var/root/.ssh/authorized_keys. I was then able to create a local admin user so I could log on at the console and repair the LDAP settings with the GUI. Once again "repair" means delete and add again as there was nothing wrong with them.

If you’re wondering why I didn’t have a local account it’s simply because I had renamed it to maling some time ago when I was playing with NetInfo (now obsolete, of course) and then deleted it when the machine became LDAPped.

    dscl localhost -create /Local/Default/Users/admin
    dscl localhost -create /Local/Default/Users/admin PrimaryGroupID 20
    dscl localhost -create /Local/Default/Users/admin NFSHomeDirectory /Users/admin
    dscl localhost -create /Local/Default/Users/admin UniqueID 501
    dscl localhost -create /Local/Default/Users/admin UserShell /bin/bash
    dscl localhost -append /Local/Default/Groups/admin GroupMembership admin

2007-11-20

More Leopard woes

Filed under: Geekiness — iain @ 23:23:21

The lustre is starting to wear off Apple for me.

In the past I used to absolutely despise Macs. This was in the pre-OS X days. I despised them for one simple reason. They were rubbish. Then with OS X they were suddenly pretty good. They look pretty, they’re stable and they’re easy to use. Hurrah for Macs.

Until they go wrong and you get stuck. Or until an OS upgrade breaks things in fundamental ways.

Mark arrived back from the US with the Mac Mini I’d asked him to bring me back. I wanted it for two reasons. One: to replace my Linux workstation which is a Pentium 4 with RAMBUS (cost price, adjusted for inflation, about £6.02×1023 in today’s money) and is way too hot and noisy for my front room. Two: to get the Leopard workstation CD as I only have the server install.

Installing Linux on the new Mini could wait. My first goal was to upgrade my MacBook Pro and Rebecca’s Mini to Leopard. The laptop was first.

I popped the Leopard upgrade DVD into the drive and booted the machine. Yes I want to do an upgrade install. Off we go.

"The install failed."

That’s what it said. Nothing at all in any way helpful about why the install failed or what could be done. It failed. That’s that. Have a nice day.

So I rebooted to try again. Then the installer said it couldn’t find OS X 10.4 and hence couldn’t start the upgrade.

Say what? It was there ten minutes ago. What did you do?

I rebooted again with the 10.4 installer.

"The software cannot be installed on this computer."

WHY … NOT?

I erased the OS X partition. I know for a fact that there are files on that that don’t exist on the network because of the way profile syncing works. They’re gone now. And still the thing "cannot be installed" for no reason it would care to divulge.

I haven’t had this much fun installing an OS since NetBSD trashed my partition table and I spent an hour or two with a shell script for-loop guessing at fdisk commands to restore it.

At least I can still boot into Windows. Leopard is currently installing – quite happily – on Rebecca’s Mini. I’ll decide what to do with the MacBook Pro later.

All this comes after my iMac locked up … again … for no reason … again and, after rebooting, decided that it would ignore my LDAP bindings … again. It also wouldn’t mount my home directory but that turned out to be the NFS server’s fault, and then wiped out my Terminal preferences proplist, which didn’t.

Annoying.

2007-11-19

The Holy Grail (1/3): Correlation does not imply causation

Filed under: Geekiness — iain @ 22:20:01

Don’t be in a hurry to read parts 2 and 3. This series is called the Holy Grail for a reason. I’ve been struggling with three Slightly Annoying Samba problems for what seems like forever; going on for a year in the case of this particular issue. The bugs were these:

  1. My Windows roaming profile won’t sync when I log out.

  2. Windows tells me my password has expired (which it hasn’t) when I log in.

  3. I can’t mount a Windows share from a UNIX machine when the Windows server is part of a domain. I can mount a non-domain Windows share, I can mount a Samba share and I can connect to a domain share with smbclient but not with mount.cifs, smbmount or mount_smbfs.

Finally, today, I got to the bottom of the roaming profile sync.

(more…)

Powered by WordPress