VMware Cloud Community
virtualDoom
Contributor
Contributor
Jump to solution

Rebooted ESX now has superblock problem

Hi,

I had an issue where one server from a cluster pair using HA had licensing issues (which was to do with our license server), this meant that the server was showing as disconnected. Restarting the vmware services on the host and the VC services didn't help, so I restarted the server. This has made matters worse as I now get the following error:

WARNING: Your /etc/fstab does not contain the fsck passno field. I will kludge around things for you but you should fix your etc/fstab file as soon as you can.

/:

The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filsystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:

e2fsck -b 8193 <device>

fsck.ext2: Is a directory while trying to open /

Then I am allowed to go into /dev/sda2, which if I run df on shows as full (although it's a read only file system).

Any tips would be most useful.

Thanks in advance

0 Kudos
1 Solution

Accepted Solutions
reorx
Enthusiast
Enthusiast
Jump to solution

VM,

Yup for some reason it does. It is possible that you have a corrupt file system and won't be able to recover. I have seen that, although your best bet is to recover that /etc/fstab and then try booting from it. I hope all goes well.

Jen

View solution in original post

0 Kudos
21 Replies
reorx
Enthusiast
Enthusiast
Jump to solution

VirtualDoom... cool name,

I believe you fat fingered your fstab. You need a 1 or 0 at the end of the line. As example:

/dev/cdrom iso9660 /mnt/cdrom noauto,users 0 0

If I exclude the last 0 then I am hosed (and that very error occurs). Make sure you have spaces and/or tabs between your colums but most importantly make sure all columns are there. Good luck.

Jen

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

VirtualDoom,

You will probably have to boot from a rescue disk and mount the root drive so that you can successfully edit your fstab. Otherwise I think you are in read only mode (if I remember correctly).

Good luck,

Jen

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Hi reorx,

The fstab is 0 bytes, which I think is what the problem is. Although will df show a full file system if it has been mounted as read only?

Thanks vDoom

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

VM,

Yup for some reason it does. It is possible that you have a corrupt file system and won't be able to recover. I have seen that, although your best bet is to recover that /etc/fstab and then try booting from it. I hope all goes well.

Jen

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Hi can you recommend a floppy image that I can use with a bladecenter web front end? I've tried Ultimate Boot, Tomsrtbt (which requires me to be in DOS to write the image to a floppy before I can image the floppy to pass it to the server). All I need is a linux floppy image that I can mount in the web environment.

Thanks

Seb

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

http://www.linuxmigration.com/quickref/install/media.html#createfloppy

There is a floppy image on this page. (bootdisk.img). Hope that is what you need.

Jen

virtualDoom
Contributor
Contributor
Jump to solution

Thanks I have managed to get hold of one now. The link you posted has got broken links for the bootdisk.

I'll let you know how I get on.

Seb

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

Yo Seb, whatever happened? You get recovered? Was it a missing file or total mayheim? --Jen

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Jen,

Well, I can't mount the filesystem. It's SCSI so I should be able to use the following command (after which I can use chroot):

mount /dev/sda2 /mnt

sda2 is the partition that contains the files that I need to look at (fstab), I get a message saying that the device is not configured. Is there any reason that the above command wouldn't work? I was using the Ultimate Boot CD, and booted into a skinny linux distro. I tried to use the filesystem tools on the CD but they don't recognise the filesystem.

Seb

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Also when I attempt to mount on BasicLinux using the same command (provided with UBCD) I get the error:

mount: Mounting /dev/sda2 on /mnt failed: Block device required

I'm not sure whether it requires a SCSI driver to be manually loaded in order to see the disks?

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Ok, I've got some more news.

I ran e2fsck -p on all the partitions on the sda disk, it found that they were all clean except the vmfs (which is obvious) and the swap partition, it came back with the following error:

e2fsck: Bad magic number in super-block while trying to open /dev/sda3

/dev/sda3:

The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device>

So reading the message it is saying that swap partitions will also show a bad superblock as they do not have an ext2 filesystem on them? So e2fsck is showing that all the ext2 filesystems are fine, which leads me to believe that it is only the fstab file that is failing, although I can't get to it in a read/write filesystem at the moment!

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Hi Jen,

Ok here's the latest. I discovered what the issue was, you know that I said that df showed the file system as full - well it was! Virtugo's VirtualSuite had dumped a 1.7Gb sensor.log file in the root partition. Finally managed to get on there with Knoppix (can't believe I forgot that I had that gem!), deleted the file, created a new fstab and rebooted. Now the server comes up, without any network cards but I have a feeling that some other files have been affected by this and may need re-creating. At least I don't have to rebuild! phew...

Thanks for all your help

Seb

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

Hi,

good news! I think that most of this job is simply not giving up. Staying at it and digging through the minutia. Good work guy! Yup is sounds like you have some other corruption, I would stick with basic config issues, that is hopefully all that it is. Haven't heard of knoppix but I will check it out now.

Jen

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Hi Jen,

I agree, perseverance is the key!

Yeah looks like some other small issues:

1) It tells me to look in /var/log/vmware/esxcfg-boot.log, which I did and all it says is ERROR: No VMKernel modules found.

Secondly

2) On the main screen after booting (where it says to press Alt-F1 to access the console) I get 2 of the following messages:

cpu2: 1033)Util: 815: Status 0xbad0001 trying to get a valid VMKernel MAC Address

Have you got any tips?

Knoppix is just a Linux LiveCD, but with a GUI (I did all the work using CLI but sometimes it's handy to have a GUI as well because then you can have multiple terminal windows open to compare).

Cheers

Seb

0 Kudos
reorx
Enthusiast
Enthusiast
Jump to solution

Hi Seb,

At this point I would retrieve everything that cannot be restored and rebuild that puppy. Hate to say it but at least you got it back this far. It looks like the kernel is toast, or at least enough of it to complain. I wouldn't take my chances, hopefully you are able to get most of it off of there and then??? As far as the VMs you should be able just to save their vmdk files and when you recreate a virtual machine with the same name just plug those in. Right? Gosh it looks like you have your work in front of you!

Alternately, you could try to run the ESX cd in fix mode? Or reinstall on top of it (after you save what you can in case that blows up on you). The ESX cd may need to reload the missing kernel modules.

Jen

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Hi Jen,

I was thinking that last night, but then I thought if the fstab could be reduced to 0 bytes what other config files could have the same happen when the root volume fills up? I had a look and I have now fixed the /etc/sysconfig/network file (because fortunately this is a clustered server and the other one is identical so I can compare the two), so the service console is now showing an IP address although I cannot ping it (because there are no network cards available). So I then looked at the network-scripts, they all looked fine. So next I looked at the startup scripts and it turns out that esx.conf is corrupt, so I am going to copy over one from the other cluster node and see if I can bring it back with that (obviously editing it with changes that should be unique to that server). Fingers crossed!

I'll let you know what happens!

Cheers

Seb

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

Ok rebuilt the esx.conf and server is now up and running connected to VC and I have reconfigured the VMotion network device (as I couldn't get any details for that). I now have a message which says:

HA Agent on server in cluster cluster_name in site has an error

Highly descriptive, how do I find out what the error is? Is it recorded in an HA log file for instance?

Seb

0 Kudos
virtualDoom
Contributor
Contributor
Jump to solution

It seems as though Virtugo had dumped a similarly large file on the other server, I've killed both servers Virtugo Sensor service for the time being. Both servers are now up and running happily with no errors. I presume the virtual machines will wander back across to the previously broken server as and when they feel like it.

Thanks for your assistance.

Seb

0 Kudos
Man-Roy
Contributor
Contributor
Jump to solution

Seb,

A friend of mine alerted me to this thread. I am the engineering manager at Virtugo, and I want to make sure I understand the problem you were having.

What version of Virtugo were you running at the time you ran into this problem? Was it one of the beta versions of 6.0, or one of the older releases, like 5.2?

Any information you can provide will be greatly appreciated.

Sincerely,

Gary Klimowicz

0 Kudos