Hello,
I am planning to add additional memory on a production esxi 4.1 R910 server in a cluster. I am planning to do following steps:
Change DRS to manual
vmotion all vm's off the host
Put host in maintainance mode
Shutdown host
Add memory
Start the host
Check if additional memory is visible
Change DRS to automatic
We did memory test on the server when it was first put in the cluster.
Is it necessary to do a memory test again while adding additional memory?
Thanks
usr345 wrote:
Hello,
I am planning to add additional memory on a production esxi 4.1 R910 server in a cluster. I am planning to do following steps:
Change DRS to manual
vmotion all vm's off the host
Put host in maintainance mode
Shutdown host
Add memory
Start the host
Check if additional memory is visible
Change DRS to automatic
We did memory test on the server when it was first put in the cluster.
Is it necessary to do a memory test again while adding additional memory?
Thanks
Nope. You don't need to take it out of DRS, just migrate VM's manually (put it in maintenance mode). Now power it off while in maintenance mode. At that point when you add memory and boot it.. it's not on the cluster, giving you a chance to test the memory.. (to see if its visible). Memory test upon BIOS start is sufficient...
When you see the memory is fine, then you can exit maintenance mode...
usr345 wrote:
Hello,
I am planning to add additional memory on a production esxi 4.1 R910 server in a cluster. I am planning to do following steps:
Change DRS to manual
vmotion all vm's off the host
Put host in maintainance mode
Shutdown host
Add memory
Start the host
Check if additional memory is visible
Change DRS to automatic
We did memory test on the server when it was first put in the cluster.
Is it necessary to do a memory test again while adding additional memory?
Thanks
Nope. You don't need to take it out of DRS, just migrate VM's manually (put it in maintenance mode). Now power it off while in maintenance mode. At that point when you add memory and boot it.. it's not on the cluster, giving you a chance to test the memory.. (to see if its visible). Memory test upon BIOS start is sufficient...
When you see the memory is fine, then you can exit maintenance mode...
It's always best practice to test the memory before placing the servers back in production. Whether you do it or not is another matter 🙂
usr345 wrote:
Change DRS to manual
vmotion all vm's off the host
Put host in maintainance mode
Is your DRS mode "Fully Automated"? Then you should not change it, but instead use it. Just put the host into Maintance mode and DRS will vMotion the VMs away to the most suitable other hosts.
gh0stwalker wrote:
It's always best practice to test the memory before placing the servers back in production. Whether you do it or not is another matter 🙂
Only if you buy cheap off the shelf non-vendor memory.. besides memory has error correction (or it should for server). Otherwise the testing is not necessary.. and it's a CHOICE not a practice...
When you buy a server, do you test the server before you put into production? If you do, I have to question why you would by that server from a vendor if you can't trust them... The good vendors, HP, Dell, IBM ALL fully test their memory, machines, and components BEFORE they get to the customer.
We actually do check the memory of every physical server before we put it in to production, and I agree that the memory should be checked by the vendor before it's delivered to the end customer, but as they say in the classics, stuff happens. It's not unheard of to receive hardware DOA from any of the big vendors.
My motto...better to be safe than sorry, and in the scheme of things one day running memtest is not going to blow the project timelines or budget out the window.
I have no problem with people not performing the check, so I hope you can respect our decision for doing it, even if you don't agree with it.
usr345 wrote:
We did memory test on the server when it was first put in the cluster.
Is it necessary to do a memory test again while adding additional memory?
You don't have to, but I would strongly recommend running a good memory test tool for at least a few hours.
Just test a few hours, and your test will be more exhaustive than anything your hardware
vendors does just before shipping you the memory.
You should weigh the cost/risks involved in testing, against the cost/risks involved in not
testing. Your hardware vendors are not gods, they may have tested your 'brand new part'
6 months ago, but when you order it -- they will get it to you ASAP, which means they aren't
going to spend 5 days stress testing it before shipping to you.
Things happen in 6 months -- things like solar flares. Sometimes things get shaken a bit
in shipping; sometimes technicians make errors installing memory, sometimes a speck of dust
lands in the wrong place and gets lodged in the DIMM slot as memory is being installed.
You cannot be certain the part is just as pristine as when they tested it, when it arrives in your server.
IF it will not compromise your failover capacity, run memtest86+ or a similarly exhaustive memory
test for at least 72 hours after changing the configuration. It doesn't hurt to test memory.
It WILL hurt later if a couple weeks from now, you have your server crash during peak hours due to a
detected double-bit memory error.
It doesn't matter if your memory came directly from Dell, IBM, HP, or your server vendor.
Test. Test. Test.
I've on occasion gotten parts from a trusted hardware vendor, that turned out to be faulty
after a burn-in test.
Given all the above... there's really only two reasons not to do additional memory testing.....
(1) laziness/apathy (don't tell your boss)
(2) Can't afford to keep that host down, because of how long it will actually take to run the test.
E.g. if putting that host in maintenance mode for 72 hours, means HA is lost for your cluster,
or proper performance of Tier 1 apps will be at risk, this could be unacceptable.