2007-11-19

The Holy Grail (1/3): Correlation does not imply causation

Filed under: Geekiness — iain @ 22:20:01

Don’t be in a hurry to read parts 2 and 3. This series is called the Holy Grail for a reason. I’ve been struggling with three Slightly Annoying Samba problems for what seems like forever; going on for a year in the case of this particular issue. The bugs were these:

  1. My Windows roaming profile won’t sync when I log out.

  2. Windows tells me my password has expired (which it hasn’t) when I log in.

  3. I can’t mount a Windows share from a UNIX machine when the Windows server is part of a domain. I can mount a non-domain Windows share, I can mount a Samba share and I can connect to a domain share with smbclient but not with mount.cifs, smbmount or mount_smbfs.

Finally, today, I got to the bottom of the roaming profile sync.

I was absolutely convinced that this was a Samba bug. I tried recompiling Samba. I tried upgrading Samba. I tried downgrading it. I tried trashing my LDAP tree and recreating it. None of these things worked. When I logged out of my workstation Windows didn’t even attempt to sync my profile. It just said "Logging out" and that was that.

Then I convinced myself there was an issue with my profile. I tried creating another user on the Samba server and copying all my files over to it. That user account could sync its profile but I still couldn’t. Theorising that my profile was too large I tried creating a brand new profile and logging in and out multiple times, each time adding another 1Gb file to My Documents. The test user still worked even when its profile was larger than my own. I still couldn’t sync.

The problem turned out to be nothing to do with Samba at all, though I found it through a debugging technique listed in the Samba documentation. I set this registry key:

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon]
    "UserEnvDebugLevel"=dword:00030002

This had the effect of creating a log file %SYSTEMROOT%\System32\Debug\Usermode\userenv.log which described in great detail what was happening.

In particular it said this:

    USERENV(1dc.1e0) 20:50:46:812 UnloadUserProfileP: Wait succeeded.  In critical section.
    USERENV(1dc.1e0) 20:50:46:812 UnloadUserProfileP:  Didn't unload user profile, Ref Count is 2

And as is so often the case, once the problem is exposed you start seeing references to it everywhere. Google revealed loads of people with the same symptoms. The nVidia Control Panel holds open a file in the user profile and thus doesn’t allow it to be freed on logout.

As I read through other people’s experiences I saw a whole list of debugging steps that they’d done which might have clarified things for me if I’d done them myself but which I hadn’t done for one reason or another.

One guy tested removing the nVidia drivers and logging in and out. Of course it worked. But I absolutely never log in as myself when installing new video drivers in case the forced 640×480 resolution messes up my desktop. I always install new drivers as the local administrator. So I never noticed – never got the chance to notice – that my weird profile issue didn’t happen while a driver install was in progress.

Another guy logged on to a different PC and saw things working. I didn’t do that because my roaming profile is in reality little more than a backed-up profile. I only ever log on to my own workstation as me. Anything done on the few Windows virtual machines is done as local administrator again.

There I was, convinced that this problem dated back to the day I upgraded my fileserver from 32-bit to 64-bit, or perhaps as far as the day I upgraded Samba, or maybe the day I upgraded OpenLDAP. And it turned out it was the day I bought a new graphics card.

Correlation does not imply causation.

What’s that you say? The fix? Somewhat ugly but it works. Remove %SYSTEMROOT%\System32\nvcpl.dll and %SYSTEMROOT%\System32\nvcplui.exe. Simply removing references to them in the registry won’t suffice because even if they don’t start at login they will be launched when you do any 3D stuff. So they have to go.

No Comments »

No comments yet.

RSS feed for comments on this post.

Leave a comment

You must be logged in to post a comment.

Powered by WordPress