In the
Adminfoo is baaaaack article, I promised a weekly article based on my experiences maintaining a few of the systems I watch out for: this is the first in that series.
Over the years I have developed a sort of informal inspection/maintenance checklist. And I do mean
informal. I haven't really bothered to write it out; in my view the important thing is that you actually log on to each of your systems every so often. Check the logs. Browse the filesystem. Apply needed patches. Get an idea of how the system responds - just go and, y'know,
get personal with the systems you're responsible for. Perhaps the effort of writing this series will finally spur me to write that checklist ... we'll see.
In the meantime, this series will (hopefully) get a new episode every week. The whole thing could become boring as hell, or it could turn out to be a valuable regular touchstone for other administrators out there - obviously I hope the latter is the case. To start with, I'll be journalling my experiences with the following systems (listed in no particular order):
- Windows 2003R2 server (domain controller)
- CentOS 4 (application server)
- Windows Small Business Server
- Windows XP (various systems actually)
- Ubuntu 6.06LTS running as a desktop system
- Windows Vista (my own daily driver)
- Windows 2000 (application server)
Ready? Let's go! This week I'll primarily focus on patching all the systems.
Windows 2003R2 server (domain controller)
Patches needed:
- Windows Server 2003 Service Pack 2 (32-bit x86) (critical)
- Remote Desktop Connection (Terminal Services Client 6.0) for Windows Server 2003 (KB925876)
- Microsoft .NET Framework 3.0: x86 (KB928416)
- Microsoft .NET Framework 2.0: x86 (KB829019)
- Microsoft Base Smart Card Cryptographic Service Provider Package: x86 (KB909520)
I'm using the WU website this system, because even though it was set to do Automatic Updates weekly on Saturdays, SP2 has been out for 10 days and the system hasn't applied it or notified me. Also, be aware that Automatic Updates only installs the critical stuff; you still need to connect to WU for noncritical updates. I used to be more minimalistic, but nowadays, except for special-need servers, I just install all the updates WU recommends. Keeping track of why I didn't install some specific set of patches is a pain and disk space is cheap, so this simplifies life a bit.
Only SP2 is marked critical; the others are marked optional. The entire update process took 46 minutes (of which 11 minutes downloading). SP2 sensibly installed last. There were no prompts during that time, so I was free to do other stuff. This is a good by the way; older Service Packs used to prompt at least twice during installation. Reboot took another 3 minutes; afterwards I examined the System, File Replication Serice, Directory Service, DNS Server, and Application eventlogs, going back 7 days from today. No problems were noted. I visitited WU one more time after the reboot, and no more updates were called for.
CentOS 4 (application server)
The system had not been updated since Nov 7 of last year, so it needed quite a few updates! With 43 packages to update and the kernel needing upgrade, I won't list them all out. I will say YUM makes it easy (download and update took less than 10 minutes) - but the kernel updates mean a reboot is called for. And because the running VMware images needed time to properly shutdown, the reboot took about 12 minutes.
Also, since the system is host to VMware Server, I had to run
/usr/bin/vmware-config.pl
after the reboot, so that VMware could recompile for the new kernel. I was able to answer with all the defaults. Until I ran that script, none of the vmware images setup for auto-start on this system would run. Once I ran the script, the images started up and ran without any further action on my part.
I edited /etc/motd to indicate that the system is fully patched as of today. In case you were wondering how I knew the date of last system update - that's how. On *nix systems it's my practice to note updates or other significant events in that file.
I'll defer other maintenance actions for this system to next week.
Windows Small Business Server
No updates were needed. Log analysis did turn up a couple of issues though:
- Backup jobs for 3/19-3/23 had failed due to full disk E:. It still had old files from the prior backup schedule. (On this particular system, I had changed the backup-to-disk strategy a week before)
- During the week the system logged about 5 of this error (dmio): "dmio: Harddisk0 write error at block 488397167: status 0xc0000015". It's not a big deal but if we start seeing a lot of this, we may have a problem.
Other than that, all event logs looked fine for the prior week. I cleared the System, DNS, Application, Directory and NTFRS logs; getting this system up to snuff has been a little project for about a month, and the old logs showing lots of (now corrected) errors were becoming distracting.
I cleared out old items from the Symantec AntiVirus quarantine area. The newest was more than a year old; since SAV sometimes asks if it can try to 're-cure' old quarantined items with new sigs, this is just another distraction
Finally I took a quick look at what accounts were members of the 'Domain Admins' group, and forwarded a question to the client about some of the entries I saw there.
Windows XP
Today's XP system up for examination is actually a test system on my network. Like most sysadmins, I've got lots of XP systems I interact with either directly or indirectly, and in coming editions of this report, I'll pick any system which presented an interesting issue that week.
Anyway - this system purposely sits outside my firewall, without antivirus running. It does run the Windows Firewall and a few security customizations I'll probably detail in a later post. Because the system sits outside the firewall, I take extra care to do a couple of filtered views of the Security log:
- Event Source: Security, Category: Logon/Logoff, Username: Administrator, Event Types: Information and Success Audit. Here I am specifically filtering the log down to those times when Administrator (the only member of Administrators group) has logged on. I can quickly see by the dates and times that only I have logged on to the system administratively. If someone had managed to hack the Administrator password, I'd see their logons here. Now, if I check the Failure Audit event type as well, I can see that the script kiddies have been banging away at this system day and night for months. They still haven't guessed the password.
- Event Source: Security, Category: Detailed Tracking, Event Types: Information, Success Auditm and Failure Audit. Here I can see new processes launched on the system. There have been several hundred in the past week, so I open the first one and start rapidly clicking the down arrow. I can quickly get a sense that they are all processes I ran while using the computer - this would be much harder to evaluate intelligently on a system used by someone else.
I've also been experimenting with running checksums against the files on the system as a method of host integrity checking, but that's outside the scope of this article. More later. Suffice it to say: the system is still unbreached.
There's plenty of available disk, eventlogs don't hold any errors that bother me. The system is looking good for this week's checkup!
Ubuntu 6.06LTS (desktop system)
Updates needed: dvd+rw-tools file language-pack-gnome-en libmagic1 libmysqlclient15off libwpd8c2a mysql-common (via apt-get upgrade). Most of these were security updates.
...and how did I know that? Well, thanks to the good folks in Freenode's #ubuntu channel, I found out that with Aptitude (invoked in a console session), I can look at the list of proposed upgrades and press "C" on any one of them to see the change log for the package. These are usually verbose enough to give you at least some idea of what's going on. That was something I hadn't realized before, as I normally do updates relatively blindly via apt-get ... which has no such option as far as I can tell. I looked for a similar option in Synaptic but did not find one.
In any case the updates took seconds to download and apply; no reboot was needed and no issues resulted. Simple, easy, quick!
I took a quick walk through the messages, user, and daemon logs for the system. Messages was showing a lot of "VFS: busy inodes on changed media." And no surprise: the Vista system running the antivirus scan is a guest on this Ubuntu VMware host, and a virus scan tends to pound a disk pretty hard.
Windows Vista (my own daily-use system)
No patches were needed. So I spent a few minutes looking over the eventlogs - nothing really to note except the multiple occurences of event ID50 from source Time-Service: "The time service detected a time difference of greater than 5000 milliseconds for 900 seconds." I'm not really worried about these, because this system runs in VMware, which has chronic issues keeping clocks within a minute or so of correct time. It hasn't harmed things.
I also took a look at the new Vista Reliability Monitor (just type Reliability in the Start menu), which is a nice widget, seemingly tailor made for these weekly checks. It tells me that IE7 failed twice in the past week, which I was painfully aware of. I suspect a JavaScript/ActiveX function on ??some web page?? put IE7 into some kind of looping condition but can't be sure. I haven't gotten to the bottom of it yet, and possibly never will.
I don't run antivirus on this system. I'm pretty sure I won't really need it, since I run with an account that's only a member of local Users group, UAC is enabled, and I don't install a lot of software anyway. But I do think it's a good idea to scan for malware from time to time, so I visited the onecare.live.com site for a scan (I chose Full Service, but there are several options). The Full Service checkup checks for malware, disk fragmentation, unused files, open ports, and does a registry cleaner scan.
The site warned me that the Vista edition of the scanner was still in beta, but I decided to give it a shot. There was a little bit of fussiness getting the scanner to install and launch; always allow popups from this site solved that. UAC prompted for privilege escalation twice, and the tool started to run. I went off to watch a movie (The Departed, if you're curious), but I do know the initial download took about 4 minutes. When I came back, five hours later ... the scan was still only 9% completed! Looking closer at the scan in progress, I saw it was scanning all network shares currently mapped to the system (which is part of the domain I run at home). Hmm! I didn't really need it to scan over 300GB of data on my fileserver, so cancelled the scan, returned to the onecare.live.com site, and tried again. This time I noticed the Customize option, which allowed me to choose which drives to would scan. It's not exactly prominent, and I'm not sure it's a good idea for the scanner to scan network shares by default, and without explicit warning. That said, once I choose a scan limited to local drives only, the scan still took a long time - after 90 minutes it had still only scanned 10% of the 26gb filesystem (again, I went off to do other stuff while the scan completed). When I came back several hours later, IE7 had restarted, and no trace of my scan was visible. Eventlog showed that IE7 had crashed, faulting module Flash9b.ocx. I dunno if that is relevant to the OneCare online scanner, or some other tab I had open. I found that I couldn't re-run the scan: after selecting options, it would go immediately to the 'done' screen without citing any error.
Well, that's what I get for trying beta functionality. So I tried TrendMicro's Housecall. It failed too: still 'preparing' 60 minutes after I initiated the scan. Hmm. I'll try this again next week!
Windows 2000 (application server)
Updates needed:
- Root Certificates Update
- Microsoft .NET Framework 2.0: x86 (KB829019)
- Microsoft Base Smart Card Cryptographic Service Provider Package: x86 (KB909520)
No reboot was needed for the above updates. But I ran the WU webpage again, because I knew that once .NET 2.0 was installed, I'd be offered some more updates for it. And indeed I was:
- Security Update for Microsoft .NET Framework, Version 2.0 (KB922770)
- Security Update for Microsoft .NET Framework, Version 2.0 (KB917283)
This brings up a tactic I've been using for some time - run the update manager (for any OS) again and again until it reports that it has no more updates to install. Sometimes one update triggers the need for another. Anyhow, all five of the updates listed above completed in about 10 minutes. And the second round of updates didn't need a reboot either.
It's interesting to note that .NET Framework 3.0 is not offered to this W2000 system! That's because it's not supported for Windows 2000...
And, what do you know, this was a system that missed getting the DST updates. As we recall, Windows2000 systems don't get a free DST update, so we have to apply a slightly more manual one. I used the process defined at KB914387, since I had already created the TZupdate.reg textfile and the refreshTZinfo.vbs script detailed there. Within 3 minutes, the clock updated.
I'll defer other maintenance activities for this system to next week.
-------
That concludes this week's debut episode of the Weekly Systems Maintenance Report. As you can see, the series is kind of 'a day in the life' of a sysadmin, with various tips and observations thrown in along the way. Please add a comment to this entry - let me know if this sort of thing has value for you!