I an oncall right now. Christmas is the very best time of the year to be oncall here, because absolutely nothing happens. Oncall only lasts 3.5 days, and I don't expect to talk to anyone until midday Tuesday at the very earliest, if at all. No one on campus = no one breaking things.
I have a server that appears to be working it's way towards a hardware problem of some sort. I got an email from the hardware diagnostics this afternoon that said there was a "serious" level error, because it couldn't log a chassis code. Not sure if the error it was trying to log was a serious error or the fact that it didn't log was a serious error. The rest of the email just complains about the communications failure, not actually about the error that caused the write event in the first place. I have looked at the machine, everything looks okay right now, I'm not sure what it's complaining about. Maybe it was just lonely.
Why not just reboot the machine you say? I don't play that game. I have a machine with 638 days uptime, it would have been double that right now if it hadn't been for a complete power outage in the data center 2 years ago... (Now of course by saying that, the rest of my night will be filled with calls of machines going down, fire in the data center, cats and dogs falling from the sky, etc.)
No comments:
Post a Comment