VMware Cloud Community
tomuxi
Enthusiast
Enthusiast
Jump to solution

VMware Converter Standalone halts at 97-98% at mkinitrd (dracut) for RHEL/CENTOS 7.6 kernels

Hi,

It seems that Converter's dracut jobs at one of the final Linux online conversion steps goes into some sort of busy-loop while running dracut -v -f /boot/initramfs-<kernel version>.img <kernel version>. It is required to manually login to the resulted helper VM, kill the dracut jobs (2), wait for VM shutdown, boot the VM through rescue ISO, and run in chroot the same dracut command manually.

Regards,

-Tom

1 Solution

Accepted Solutions
patanassov
VMware Employee
VMware Employee
Jump to solution

Hello Tom,

actually one of our support engineers found a better looking workaround. Invoke

yum downgrade lvm2-libs-2.02.180-10.el7_6.2.x86_64 lvm2-2.02.180-10.el7_6.2.x86_64 device-mapper-event-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-1.02.149-10.el7_6.2.x86_64 device-mapper-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-event-1.02.149-10.el7_6.2.x86_64

before conversion. Then invoke

yum update

on the destination after the conversion.

As for RedHat - here are some more details. They have started making use of /run directory (Linux distributions to include /run/ directory - The H Open: News and Features ) probably since 7.0 (didn't investigate). As you probably know, chroot-ing and running dracut involves bind mounting /dev, /sys, and /proc beforehand. Now this list gets appended with /run. Converter helper uses an older kernel that has no such directory. Why this has become an issue just now and not before is still unclear but I don't anticipate RedHat to stop making usage of /run. The logs from the broken conversion show an attempt to open a file from udev's temp data, located in /run/udev/data. Formerly those data were located in /dev/.udev/db. IN additiona the files names differ from the previous ones, not sure but possibly their content, too. You could eventually rant before RedHat that they have broken backward compatibility with this change. Though, regarding the overall lack of respect for backward compatibility from Linux manufacturers, I am not quite optimistic.

Regards,

Plamen

View solution in original post

17 Replies
patanassov
VMware Employee
VMware Employee
Jump to solution

Hi

Can you upload the helper log for examination? An easy way is to upload the task bundle.

Regards,

Plamen

Reply
0 Kudos
tomuxi
Enthusiast
Enthusiast
Jump to solution

I will get the log file next time it happens. Anyhow it is clear that this happens mostly with 3.10.0-957.5.1 kernel.

Reply
0 Kudos
tomuxi
Enthusiast
Enthusiast
Jump to solution

Hi, attached is the dracut part of the helper log file. The dracut process will run indefinitely and has to be killed manually. The rest of the conversion needs to be done manually. Regards, -Tom

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Hi,

The snippet of the log files attached is only about 1 min long, there is nothing suggesting running indefinitely in it. There are also no errors. If you have concerns about posting the whole log bundle in public, you can send me an email (I'll send you my address in a PM).

This is an interesting issue; I haven't seen that before and would like to have a look. With some luck, it would be possible to spot some potential workaround so that manual reconfig wouldn't be necessary.

Regards,

Plamen

tomuxi
Enthusiast
Enthusiast
Jump to solution

(For onlookers: Full log file sent privately.)

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Hi,

I saw the log - it is not much different than the posted snipped - i.e. terminates abruptly a minute and a half after logging the dracut command.

I've tried to reproduce the issue by installing a fresh CentOS 7.6. It converted just fine. Before that I ran dracut manually - no problems, it finished in a few minutes (2 or 3, didn't measure). BTW Converter has a timeout of 1h for this command.

My guess is you have just been impatient and killed reconfiguration too early.

HTH,

Plamen

Reply
0 Kudos
tomuxi
Enthusiast
Enthusiast
Jump to solution

Hi,

Thanks for your efforts. As stated the log is from a session where I manually killed the Dracut process. Without manual intervention the Converter fails with generic timeout.

The timeout has happened without exception with our servers that run the 3.10.0-957.5.1 kernel, although I tend to kill dracut now manually after waiting for 10 minutes or so (after maybe five times waiting for the 30min/1h timeout in the past).

However all these servers have been updated from previous versions of CentOS/RHEL. Perhaps there is something left from previous versions that intervenes current one and creates problems for Dracut. A library perhaps... Manually the same dracut command works fine with the converted machine's disks when booted from a recent CentOS/RHEL DVD rescue chroot.

If you are interested to investigate further, you could try to install 7.5 and then update it to 7.6 and try to convert that.

Terveisin,

-Tom

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Hi,

FYI: I managed to reproduce the issue with you steps - installing centos 7.5 and upgrading to 7.6. I currently have no idea why this is so. Will post if I find out something...

Regards,

Plamen

patanassov
VMware Employee
VMware Employee
Jump to solution

Hi Tom,

It seems you have logged a service request in VMware support. Is this so? If so, we will keep the communication in the bug.

Meanwhile, a quick update - I tried to issue the dracut command directly in the helper VM and it, somewhat surprisingly, passed (within about 2 and a half minutes). Next step is to figure out why.

Regards,

Plamen

Reply
0 Kudos
tomuxi
Enthusiast
Enthusiast
Jump to solution

Plamen,

Thanks for your efforts. Yes same experience here, manually the same command runs flawlessly.

I don't have VMware support account, perhaps someone else has made a ticket there about this same issue. It's good, but please let me also know if you make some pre-release version of Converter where this is fixed so I can also test it.

Regards,

-Tom

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Hello Tom,

I think I found a workaround. Let me share some new findings:

- What breaks the conversion is not upgrade to 7.6, it is a newer update (to kernel 3.10.0-957.5.1; standard 7.6 installation has kernel 3.10.0-957). It seems it has been rolled out on 01/29 for CentOS, perhaps earlier for RHEL. I.e. after running 'yum update' on a fresh 7.6 installation, that machine can't be converted either.

- The process that hangs during dracut is execution of 'lvm vgs'

- In the setup I use for testing, the root volume is logical and the boot volume is basic. If the conversion is done w/o preserving lvm, i.e. to let the root volume become basic, the conversion succeeds. I suppose logical volumes other than root wouldn't affect dracut but haven't verified that. This means that there is crude workaround - to convert with making root volume basic. This isn't good enough, though.

- It occurs that during the recent '...5.1' kernel update, the lvm binary has been updated as well. By changing /usr/sbin/lvm with one that comes with a fresh 7.6 installation (with kernel 3.10.0-957), the conversion succeeds. That's the workaround I would suggest.

To complicate things, the new lvm build has exactly the same version as the old one, though they differ in size.

Regards,

Plamen

tomuxi
Enthusiast
Enthusiast
Jump to solution

Plamen,

Thanks for your efforts, the issue is becoming clearer. I made a ticket for RedHat since it could be a bug in that version of lvm/kernel.

Regards,

-Tom

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Hello Tom,

actually one of our support engineers found a better looking workaround. Invoke

yum downgrade lvm2-libs-2.02.180-10.el7_6.2.x86_64 lvm2-2.02.180-10.el7_6.2.x86_64 device-mapper-event-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-1.02.149-10.el7_6.2.x86_64 device-mapper-libs-1.02.149-10.el7_6.2.x86_64 device-mapper-event-1.02.149-10.el7_6.2.x86_64

before conversion. Then invoke

yum update

on the destination after the conversion.

As for RedHat - here are some more details. They have started making use of /run directory (Linux distributions to include /run/ directory - The H Open: News and Features ) probably since 7.0 (didn't investigate). As you probably know, chroot-ing and running dracut involves bind mounting /dev, /sys, and /proc beforehand. Now this list gets appended with /run. Converter helper uses an older kernel that has no such directory. Why this has become an issue just now and not before is still unclear but I don't anticipate RedHat to stop making usage of /run. The logs from the broken conversion show an attempt to open a file from udev's temp data, located in /run/udev/data. Formerly those data were located in /dev/.udev/db. IN additiona the files names differ from the previous ones, not sure but possibly their content, too. You could eventually rant before RedHat that they have broken backward compatibility with this change. Though, regarding the overall lack of respect for backward compatibility from Linux manufacturers, I am not quite optimistic.

Regards,

Plamen

tomuxi
Enthusiast
Enthusiast
Jump to solution

Hi Plamen,

Thanks for your efforts. Downgrading to these specific package versions is a reasonable workaround as it is automatically fixed with the next update in the target.

Do you think Converter will later have a more modern kernel and thus /run also and it could be successfully bind-mounted to target during conversion?

Regards,

-Tom

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

This is the reasonable thing to do at some point in time However I have no idea when at this moment.

Reply
0 Kudos
tomuxi
Enthusiast
Enthusiast
Jump to solution

The problem does not appear anymore with more recent LVM/device-mapper packages. No counter measures are necessary anymore with them.

Reply
0 Kudos
patanassov
VMware Employee
VMware Employee
Jump to solution

Thank you for the update.

The irony is we have updated Converter documentation to claim support up to 7.5 only. Will see what to do in this case...

Reply
0 Kudos