The Signs I Ignored
I was out of town on work for a few weeks and when I came back I noticed a loud click from my server occasionally. It wasn’t constant and I figured it was just something annoying with the fan so I didn’t think much of it.
The Moment It Happened
I decided I should make sure all the applications I installed from the ports on FreeBSD were up to date so I found a program that would do it for me. It took a while but I had it going remotely most of the day while I was at work. Some time in the afternoon my session was disconnected from the server. I tried to reconnect and I couldn’t access it. When I arrived home, I turned on the monitor to find a single blinking cursor in the upper left of the screen. I don’t normally have a keyboard attached to the server because I just SSH into it. I decided to just shut off the power to the server and restart it. When it restarted it booted up to a point and then I got a message similar to this
/mnt/usr: bad dir ino 16392 AT OFFSET 512: MANGLED ENTRY
panic: ufs_dirbad: bad dir
It would only reboot after this message so I decided I was screwed.
The Search and Solution
What I Tried
When rebooting the server S.M.A.R.T. told me that my hard drive was bad. Fortunately it’s not the hard drive with my data on it (which is a RAID array anyway) but just the drive with the system and programs installed. The one that was being used during the upgrade process. I decided to boot up with my FreeBSD install CD and go to the FixIt mode which presents a command prompt with some basic tools. I used fsck as suggested on that ever helpful blog and found it actually repaired a few things on the drive. After rebooting again I was finally able to log in but SAMBA wasn’t starting and there were some files still seeming to be missing or corrupt. Well, I wasn’t going to bother trying to repair the installation on a bad hard disk anyways so I thought why not take an image of it!
A little more research brought me to some simple instructions on using dd to image a drive. I booted to FixIt again and ran dd to copy an image to my external USB hard drive. It took quite a while but in the morning it was finished! I checked to make sure the file was there, shut down, went out and got another hard drive, installed it then tried to mount the USB hard drive again. That’s when it decided not to come up. My only guess is that I didn’t unmount it and the drive data was corrupted when I shut down the computer. Normally on shutdown, everything is unmounted but maybe when running in FixIt that doesn’t happen. I’m not really sure.
My next chance was to just reinstall FreeBSD. Did it a few months ago so I might as well refresh my memory! Installation wasn’t too bad. At least I knew what programs I needed or didn’t need this time. So I installed the programs and then followed my own tutorial on installing the drivers for my RAID card which I had just posted a few days before on OpenSourceCommunity.org. Got the RAID array loaded up and then figured out how to configure SAMBA without the use of the web gui which I used last time. So everything is back up and running now! Tonight I’ll run dd again to get an image of my working install so I’ll be prepared next time.
I just love the fact that almost every problem you have, someone else has had it and you can at least get some clues to the solution online. Thanks to the people who wrote these articles, I was able to get my server up and running again in a couple days and learn a little more about FreeBSD as well.