Skip navigation
VMware
0

Unfortunately, I'm just unhappy here trying to work around the auto-formatting that the VMware communities does for you.

 

I like to have a little more control over it, for better or worse.

 

So, I'm headed over to blogspot.  I will continue to support comments and discussion on posts here up to this point, but all future posts will be made at the new site.

 

http://that1guynick.blogspot.com

 

First couple of posts are in the works and should be up this weekend.

 

Sneak preview:  NetApp and AIX 6.1 living together running legacy apps over 10GbE NFS!

181 Views 0 Comments Permalink
2

 

My vCenter has been loading slow for as long as I can remember.   It finally got annoying enough to do something about.  "Certainly, they've made this better by now."

 

 

So I got a fresh drink, dug in, and starting searching for blog posts and articles on how to trick out vCenter performance, slow loading, etc.  I stumbled across this KB article from VMware titled:

 

 

Defragmenting VirtualCenter performance data indexes on a MS-SQL database

 

 

Overview of the KB:

 

 

Fragmentation of indexes occurs when the logical order of pages is different than the physical order on the disk. In VirtualCenter fragmentation occurs most noticeably due to the statistics collection and consolidation.    When the indexes are excessively fragmented, performance of queries to the VirtualCenter database is slow.

 

 

SWEET!  This could be it!   So I scrolled down to the bottom of the page for the section on VirtualCenter 4.x on SQL 2005, and ran the queries to check for fragmentation....

 

 

Use VirtualCenter

go

dbcc showcontig (VPX_HIST_STAT1)

dbcc showcontig (VPX_HIST_STAT2)

dbcc showcontig (VPX_HIST_STAT3)

dbcc showcontig (VPX_HIST_STAT4)

go

 

 

What it revealed was SHOCKING...

 

 

 

 

Now, that's a lot of numbers, but look at the one's I've highlighted.  Per the KB article:

 

 

The key pieces of information to determine fragmentation are Scan Density and Logical Scan Fragmentation. In the example, scan density is 27.49%. In an ideal environment the closer this number is to 100% the better the database performs. For Logical Scan Fragmentation, the example shows 38.79%. The lower the percentage the better the system performs.

 

 

Holy crap!  My scan density was in the teens when it's supposed to be closer to 100%, and my fragmentation was 99% when it supposed to be closer to 0%!!!

 

 

Again, per the KB article, you can run the following optimization routines to "defrag" the indexes:

 

 

dbcc indexdefrag ('<database>', 'VPX_HIST_STAT1', 'PK_VPX_HIST_STAT1')

dbcc indexdefrag ('<database>', 'VPX_HIST_STAT2', 'PK_VPX_HIST_STAT2')

dbcc indexdefrag ('<database>', 'VPX_HIST_STAT3', 'PK_VPX_HIST_STAT3')

dbcc indexdefrag ('<database>', 'VPX_HIST_STAT4', 'PK_VPX_HIST_STAT4')

go

 

 

So off we went...

 

 

 

 

Crazy.  I had no idea this was going on in my database.  Maybe it's one of those set-it-and-forget-it things that typical admins don't pay that much attention to, because just about every mgmt app these days has a SQL DB backend requirement.

 

 

Here's what the numbers looked like after the defrag.  Only took ~5 minutes. Logical fragmentation went from 99% to 18%

 

 

 

 

While these aren't as good of results as I expected to see (notice some of the other tables are still 50% fragmented) I'm convinced that there are more methods to trick out vCenter so that it doesn't load so slow.  Every time I click on a different item in the column on the left, I want it to pop.  It's essentially a webpage.  It can be done.

 

 

I'm on a mission...

 

 

-Nick

 

 

879 Views 2 Comments Permalink
0

 

...but not necessarily in that order. In this post, I hope to do a full review of my personal experience with VSC1 and 2, as well as some hopes and projections as to where I WOULD HOPE TO see it go in the future.

 

I wanted to preface this post with the disclosure that I am a long-time NetApp user/customer. I am very experienced with the hardware, core systems, management platforms, and bolt-on softwate to manage and report on storage-based backup & recovery of tier 1 applications.

 

 

As much as NetApp likes to boast that they are a single platform across all of their hardware, well, for that part they are right. DataONTAP is a wonderful operational platform with loads of customizable features, an easy-to-use UI in FilerView, and a powerful CLI and now API-set that you can do just about anything with.

 

 

What they DON'T tell you is that for every little thing you want to do in addition to the base management of systems requires a separate software installation. I'll list a few of them here:

 

 

SnapManager Suite (I personally call it the suite, because there's a different one for the following..)

 

  • SM-SQL - Microsoft SQL Server

  • SME - Microsoft Exchange Server

  • SMO - Oracle on Linux

  • SMO - Oracle on Windows

  • SMVI - VMware

  • SMHV - Hyper-V

 

Other's include:

 

  • Operations Manager

  • Protection Manager

  • Provisioning Manager

  • System Manager

 

In all fairness, since v3.8.x, we've seen serious improvement in the integration of OpsMgr, ProtMgr, and ProvMgr. I'm just not sure I understand the need to have 2 or 3 different products. If someone bold-faced told me it was to milk more money out of the customers, I'd completely understand, but the pricing models don't reflect that.

 

I'm sure I'm missing some. Anyway, each of these applications was designed by a completely different team, obviously none of which communicated with each other. Function, Reporting, Scheduling, even basic UI's are completely different in all of them. For a company who spends so much time preaching about single interface for everything, you'd think they would carry the same motto across to these add-on applications.

 

Now, let's not get our panties in a wad here. I'm not trying to harp on NetApp (too much). They know these areas need improving, and there is talk of something called "Unified View" coming end of next year to consolidate all of these apps into a single pane of glass.

 

 

Enter VSC...

 

 

I remember being so excited when I first heard about this coming out last year. FINALLY someone's getting the integration ball rolling. Some vocabulary for those unfamiliar. VSC stands for Virtual Storage Console, and is NetApp's integration/plugin that becomes an additional tab in your vCenter.

 

 

I remember loading it up and being so excited seeing that I could view all of my storage from within VirtualCenter. COOL! I could auto-set NFS parameters to make my hosts comply to VMware + NetApp Best Practices. COOL!

 

 

But wait, is that it? I mean it's cool and all to see all of this stuff inside of my SPOG vCenter, but ....meh....it's v1.0, I'm sure it'll get better with the next release. This is just laying the framework.

 

 

Over the course of the past year, I'm sure you've all heard more about the Rapid Cloning Utility and Snapmanager for Virtual Infrastructure products than the VSC.

 

 

I heard 2.0 was coming, was super-excited about it (again), but never got a chance to play with it until this week when I was upgrading my entire Infrastructure to 4.1. I always thought, "I'll just reload it when I rebuild VC."

 

 

From my initial views, it's not much different, to be honest. It has all the same views as it did before, about the same functionality (minimal) as before, and really the only difference it has is that they've baked in RCU and SMVI into the VSC as one thing. While that's cool and all, a lot of the core functionality of SMVI was not updated/fixed, reporting was not enhanced in any way (BIG, BIG problem), and I still do not regret my decision of moving away from SMVI to vRanger. Now, I have to eat up TERABYTES of space to store vRanger repo's instead of just having little snapshots. All because you won't change your reporting format, or the way you send out the notifications. It's not so much about that, but that the report lives on the server and dies with the snapshot. So if you expire a snapshot, it DELETES THE HISTORICAL REPORT! NO! I could (and have) rant about this for hours, but it is not the intent of this post. I'll leave it as saying, being in the Public Healthcare space, the http://communities.vmware.com/blogs/nickhowell/2010/09/10/netapp-vsc-20-the-good-the-bad-38-the-ugly/lack of "reporting" in SMVI is simply unacceptable, and no matter how simple and great you make the functionality of the SMVI product, we're not going to use it until that part is fixed.

 

 

You're also not going to find a lot of customers using RCU outside of the bigger VDI shops, or places that have hundreds and thousands of VM's.

 

 

My hopes for this product are vast. I really hope they see how HUGE this could be for virtual environments running NetApp, and what a huge player/selling point this could be. Because it's not at the moment.

 

 

So, let's get started.

 

 

 

 

  • I want to be able to customize the view somewhat by adding additional columns. Add a column for some of the pertinent stuff such as SIS (dedupe) being enabled on the volume (read: DATASTORE). Show me a column with a percentage of space I've saved with dedupe. Throw it in my face! Don't make me go and hunt for it on the command line with a 'df -s'. Autogrow and Snapreserve percentages, as well as snap space consumed would be awesome here! Stop making me go to System Manager or FilerView to set settings on a Datastore volume. Build that functionality in here. Right click on a Datastore and enable dedupe, set the schedule for dedupe, set Autogrow and Autodelete functionality for snapshots! All within vCenter! Amazing, right?! I know.

  • The views in each section should be different based on what tier you have selected in vCenter. vCenter, Cluster, Host, Resource Pool, VM. Each view should cater to that particular level, so that when I click a host, I only see the datastores mounted to it. When I click a cluster, I see all datastores mounted to ALL hosts in the cluster. Put a little more customization into it so that we can drill down like that.

  • If you select a VM, it removes the Overview section, and takes you straight into the SMVI interface. WHY?!?! You could show me SO much here about my individual VM's, such as how much that VM is consuming within my Datastore. Which brings me to my next point:

  • ALIGNMENT! At the VM view, you could build in 'mbrscan' (instead of making us download it) to tell me whether my VM is aligned or not, and even include the functionality to power down the VM, align it, and boot it back up! As important as alignment has become to storage performance, I cannot believe you're still making us use mbrscan and mbralign manually! How are we supposed to continue to do that with ESXi going forward?! NetApp's answer: "You can run it from a Linux VM." I DONT WANT TO!:)* Other 3rd parties are capitalizing on our follies as admins BIG TIME by SELLING us this sort of product.

  • On a positive note, the ability to set all of the best practices parameters is AMAZING! This is the essence of this product, and you should see this as a stepping stone on how to do other things that we would typically do in FilerView and/or at the CLI.

  • Another positive is that the installation and setup are super-simple! Install it on your vCenter server, register the plugin, and when you open the tab the first time, it autodiscovers all of your storage systems and prompts you for credentials. DONE! Brilliant!

 

You have the potential to make this a REPLACEMENT for Filerview and System Manager for those customers with Virtual Environments.

 

My hope is that you'll start listening to us, instead of telling me: "Sorry, we're not going to change that," as I was told in a communities post.

 

We're out here! We want this to be an amazing product! Let us help!

 

 

-Nick

 

 

3,208 Views 0 Comments Permalink
0

VMworld 2010 winnings!

Posted by that1guynick Sep 3, 2010

This year was certainly an unprecedented, and albeit unEXPECTedly, lucrative conference for me. Whoever says they don't believe in luck, chance, and coincidence should re-evaluate their belief systems.

 

First:

 

As I mention in a prior post, my name was drawn from a hat to win the Ultimate vSphere Reference Library. I met up with @gregwstuart today to conduct an interview and to collect my winnings. I will post the video when Greg sends it to me.

 

 

Here's a photo:

 

 

 

 

(not pictured: Windows 7, vDestination mousepad)

 

 

Big thanks to all of the authors and providers involved, and to @gregwstuart and @vDestination for putting together such a wonderful contest.

 

 

Follow him: @gregwstuart

Blog: vDestination.com

 

 

Second:

 

 

As if that were not enough, I then won an iPad from NetApp simply by wearing their hat all week, and being COINCIDENTALLY LUCKY enough to bump into Keith Aasen, a camera man, and a beautiful woman carrying a microphone. They all turned to me and I thought maybe they just wanted to interview me about the conference. The Keith asks, "So have you won anything from NetApp yet?" While I was trying to remember exactly what they were giving away, Keith was holding this:

 

 

 

 

OMGOMGOMGOMGOMGOMGOMGOMGOMGOMGOMGOMG......breath......OMGOMGOMG!

 

 

Youtube video here @ 2:25

 

 

 

 

Third:

 

 

And we're not done yet. For the past two years, Kingston has been bringing a professional Guitar Hero player to their booth. Her name is Annie Leung and is known as the "best female Guitar Hero Player in the world." In the booth, you can play against her for a chance to win a 64GB SSD hard drive from Kingston. I tried to keep up on guitar, but just couldn't hang, even though I always play on Expert. So, I thought I would try something unique, and challenge her to a hard Bass track that I had really gotten good at: "2 Minutes to Midnight" by Iron Maiden.

 

 

While ultimately, she beat me, I definitely gave her a run for her money. In the end, I received a 64GB SSD hard drive for my efforts.

 

 

 

 

Annie, thanks for an awesome match. Come find me on XBL for a match anytime. XBL name: krackmnky

 

 

You can follow Annie on Twitter and visit her YouTube channel.

 

 

This is all above and beyond a year's worth of tshirts, new backpack, badass PowerCLI reference poster, several cool ballcaps, and a lifetime's worth of new friends.

 

 

What a conference. So rewarding in every way, shape, and form. And I am eternally grateful for having been a part of such a milestone of a conference.

 

 

-Nick

 

 

287 Views 0 Comments Permalink
0

Vegas, baby, VEGAS!!!

Posted by that1guynick Sep 2, 2010

In case you missed the announcement earlier this week, you'll be excited to know that VMworld 2011 has been moved to Las Vegas.  My only additional comment on this is....."WOOOOOHOOOOO!!!

 

 

One can only assume that since the labs and datacenter for VMworld 2010 were moved to Moscone West, that we have outgrown what Moscone can handle.

 

 

358 days and counting...

 

 

191 Views 0 Comments Permalink Tags: vmware, vmworld, vsphere
1

This morning, I sat down at my computer with a cup of coffee and start perusing thru my Inbox, and happened across this email from Greg Stuart of vDestination:

 

Congrats Nick, you are the lucky winner of the vDestination.com Ultimate vSphere Reference Library.  I just did a blog post with the video of the drawing I did last night.  Are you going to VMworld?  If not, please supply me with your shipping address so I can ship these great prizes to you.  Some of the prizes I'm picking up from a couple of the authors at VMworld, so I won't be shipping it until I get back from VMworld.  Again, congrats, enjoy these awesome prizes. 

 

See the video of the drawing here:  http://vdestination.com/2010/08/27/we-have-a-winner/

 

I'm honestly floored!  I never win anything (though I said this last year when winning an iPod Touch @ VMworld), but this is truly something useful, and I hope that I can make the best use of it by expanding my knowledge, finally getting my VCP and working towards VCDX, and continuing to be an evangelist for VMware virtualization.

 

In case you're interested, here's my submission to the contest:

 

After virtualizing nearly everything in our environment over the past 4 years, there hasn’t been a lot of time to go back and revisit what we could have done better/differently. By re-educating myself, I hope to improve my environment even more, by optimizing administrative tasks with automation and PowerCLI, building reporting workflows, as well as sharing with my fellow co-workers bringing them up to speed on virtualization. Everybody has a different way of doing things, and the little bit of Scott’s book that I have read has helped me tremendously. I hope to develop some standards around how we build and manage our environment, based on the extensive knowledge of these guru’s!

 

Thanks again to everyone involved, and to all of you who donated something to the contest.  I promise to truly make the best use out of all of it that I can.

 

-Nick

221 Views 1 Comments Permalink Tags: performance, vmotion, virtual, esx, vmware, infrastructure, management, esxi, virtualization, ultimate, library, contest, vsphere, vdestination
0

 

Are you a NetApp customer?

Do you know which version of DataONTAP your storage system(s) are running?

 

 

As of today, we are currently running on 7.3.1.1P8. This system had 337 days of uptime. I was anxiously awaiting my 1 year milestone, as some of you may have seen me posting about it on Twitter. Some of the NetApp crowd was also involved in sharing/bragging about this. I also did not want to update it prior to being gone nearly the entire month of September for VMworld and Oracle OpenWorld.

 

 

Enter BUG #332110: Driver refresh for X1008, X1010 dual port 10G ethernet card

 

 

***If you're a NetApp customer, you can click here to see the full report

 

 

Driver refresh from vendor to fix known problems in both hardware and software.

Vendor found a problem with the Media Access Control(MAC) in the T3B2(not in T3C) revision of the

chip. Only X1107 use T3C in NetApp.

When the MAC is under high load, it could get into a mode that does nothing except transmit pause frame. It would not even forward received traffic up to the host. Only reboot the filer could get the MAC out of that mode. The driver refresh include a work around to detect and reset the MAC portion of the chip at run time.

Internal MAC flowcontrol is enabled in the refreshed driver. A port of that that was ported to the old

driver to fix 313558.

The support for X1106/X1107 are added in this driver.

 

 

So begins our story...

 

 

Friday night (technically Saturday morning), I get a call from a DBA around 3AM that says, "Hey, Oracle can't access it's NFS mounts..."

 

 

sigh "OK, let me check..."

 

 

Dig a little deeper, call our Network Engineer, and then see that ALL vm's are disconnected. Oh dear.

 

 

Dig a little deeper, and see that NO 10GbE traffic is passing to the NetApp. Oh dear.

 

 

We had hit the bug. And the really bad part? It didn't actually take the filer or the interfaces down/offline, so cluster failover didn't take place.

 

 

What did this do for us? OPENED OUR EYES! We need to keep up with patches/upgrades better. We need a physical Domain Controller in place, because when the filer tried to come back up, it couldn't find our virtual domain controllers that were offline, and therefore, the authentication part of the "giveback" process failed.

 

 

Sidenote: "Dear NetApp, please give us some flexibility on the Active Directory integration. I've got dozens of Domain Controllers all around the country, but the installation only looks in the local site/subnet where the filer resides. Had it traversed outside, it would have found MANY online DC's."

 

 

We need to configure more granular things to monitor latency to our storage systems, because technically, it never went down.

 

 

Lastly, I had made some configuration changes to our VIF configuration over the course of 337 days of uptime. What I didn't realize is that those changes were never written to the RC file, which is what initializes and creates all of your VIFs on boot. So, we came back up on a single interface. Essentially, our entire company is running on a single 10GbE interface now. A decision was made to get the company back up and online rather than to try and troubleshoot/reconfigure everything, and that we would deal with this in a later planned downtime.

 

 

Tomorrow night, we now have to have an additional 2-4 hour planned emergency outage to correct all of these things? Why? Because I didn't stay on top of my upgrades, and didn't stay on top of my configs.

 

 

Lesson learned? You bet.

 

 

If you're using NetApp storage with any of the bleeding edge stuff like 10GbE/FCoE expansion cards, keep your stuff up to date. It is bound to have bugs, all software does.

 

 

-Nick

 

 

444 Views 0 Comments Permalink
0

VMworld 2010!

Posted by that1guynick Aug 23, 2010

I thought I would sit down and make some notes for myself with regards to what I was looking forward to most at VMworld 2010, and then revisit them AFTER the show to see how rewarding it was, what I gained above and beyond expectation, and if all of my hopes and concerns were addressed.

 

Most of all, I'm looking forward to the Technology Exchange.  Granted, all of us hang out on Twitter all day every day, but it's different when you put hundreds of vendors in the same big hall that all just want to geek-out about virtualization with you.  I can't express enough to vendors how rewarding that is to us end-users and engineers out in the field.

 

I'm looking forward to the keynote addresses of the major players in the industry, including Paul Maritz and his vision of VMware going forward.  These are also exciting times where new things get revealed, and then we get to run to the Tech Exchange and geek out AGAIN about the next best thing that's coming!  

 

It's a whirlwind of a good time.  From the informative sessions, tweetups & social media, fun and instructional labs, and wonderful after-party's hosted by our favorite vendors and resellers...not to mention the fact that we're in one of the coolest cities in the world...It's the best money that a company can spend on their engineers.

 

What am I looking forward to, content-wise:

 

*New books at the bookstore.  Need some new reading material, and there's lots of new books out!

*NEW BACKPACK!  I can finally retire the neon green for some gunmetal grey hotness!

*Optimization of Storage & Backups & DR.  The never-dying topic of DR...

*NetApp NTAP 8.1 and the future of virtualized storage

*Cool new plugins and features for doing all of the things INSIDE of vCenter

*Last year, I met a bunch of wonderful people and hope to have more conversations with vendors and give direct feedback to problems I have faced.

*Meeting more of my tweeps and getting some face-to-face introductions!

*Promote #NAS4LIFE with proliferation of NFS-based storage!

*Maybe a free iPad or two from contests!

 

Looking forward to seeing you all in a week!

 

-Nick

138 Views 0 Comments Permalink Tags: virtual, esx, vmware, vmworld, esxi, virtualization, vsphere
0

We might be what you would call an early adopter of 10GbE at my company. We've now been on it a little over a year. We were quick to abandon any Fiber-Channel or block-level sort of storage connectivity because....who needs the headache?!

 

10GbE NFS is here, and it is here to stay. And with folks like Oracle literally BUILDING-IN a client in their database software to completely bypass the kernel...that's staying power. And it ain't goin' nowhere.

 

I wanted to take a minute to summarize how 10GbE has "changed our life," and how we hope to take advantage of it in the future.

 

 

First and foremost....Oracle. We recently migrated from an 18U Sun refrigerator to a 2U HP server. Now, most of that is just advancement in x86 hardware, but what we were testing alongside all of that was remarkable. We were running VMware VM's with Oracle Enterprise Linux, and piggybacking VMXNET3 vnic's over the same 10GbE connection the hosts' use to connect their datastores, to mount storage directly to the VM.

 

 

I want you to think about that for a minute, and how frickin' cool that is, and where we were as systems/network admins even as recent as five years ago. Pretty remarkable isn't it?

 

 

Now, I've heard some naysayers coming across the line, specifically about pv-vmxnet3 adapters/software/whatever, and most of that is because it doesn't support FT. Well, that's not a hard club to be in, considering almost everything and the kitchen sink doesn't support FT.

 

 

What I'm here to say is, we drank the kool-aid.

 

 

All of my hosts connect all of their datastores with 10GbE NFS. (2 clusters, 7 hosts, 89 VM's, including PRODUCTION SQL, Oracle, & Exchange)

Many of my VM's are running 10GbE NFS directly mounted to the VM

My Exchange Mailbox servers are running 10GbE iSCSI (created 1500 MB's in :58 seconds)

Our PRODUCTION Oracle 10g Servers are running 10GbE NFS, and running circles around the v880's and Fiber-Channel gear we used to have.

 

 

Call me an evangelist, but I love me some 10GbE.

 

 

So go find yourself some Nexus switches, sit 'em at the top of your racks and hold on to your hats.

 

 

Nothing I've ever architected runs as fast as my VMware virtual environment on 10GbE. And I'm looking forward to future deployments where more arguments for 10GbE NFS can be made.

 

 

#NAS4LIFE

 

 

-Nick

 

 

495 Views 0 Comments Permalink Tags: performance, storage, vmotion, virtual, esx, vmware, vi3, infrastructure, oracle, nfs, management, netapp, esxi, virtualization, cisco, 10gbe, nexus, vsphere, fas
1

Snapshot. Snapvault. Snapmanager. SnapLock. SnapDrive. SnapMirror.

 

Are you screaming and confused yet?

 

 

 

Don't worry. Most of their new customers are, as was I, until about a couple of years or so ago when I really took the time to dive in with some SE's and try to understand the logic around it all.

 

 

 

Since I still hear from new and/or potential customers frequently, asking me to help them understand what all of this "Snap" stuff is about, I thought I would take the time to write it all out here.

 

 

 

What I’m not going to do is regurgitate a bunch of the circles and ABC diagrams that you can find on the NOW site or in a typical google search, because that can often times compound the confusion.  We’re going to strictly talk logic, vocabulary, and understanding.  So you other NetApp guys out there, please understand this is more personal theory than “inode level 2” type stuff.

 

 

 

So sit back, grab a coffee, open your mind, and let's get going!

 

 

 

Snapshot.

 

 

 

NetApp defines a Snapshot as:  "a read-only, space-efficient, point-in-time image of the data in a volume or aggregate. It is only a picture of the file system and does not contain the actual contents of data files."

 

 

 

Wait. What do you mean it doesn't contain data?!  What's the point otherwise!?

 

 

 

OK, here's my version:  Think of a snapshot as a differential backup from the last time you took a full backup.  What happens during a differential backup? Only the parts/files that have changed since your last backup get backed up.  Similar concept, only we're talking blocks instead of files. Your live volume will be the equivalent of your full backup, and every time you take a snapshot, it’s another differential copy.  This isn’t a 1-to-1 comparison of the two minds of thought, but it’s an easy way for new folks to understand.

 

 

 

All those words at the top of this post that start with Snap are all based on the Snapshot, and the concepts around it.  Keep that in mind as we talked about the other big Snap words.

 

 

 

So let’s continue on with what is likely the most commonly used…

 

 

 

Snapmirror.

 

 

 

Snapmirror is exactly what you think it would be, just as it sounds.  It is a mirror of a dataset between two different NetApp filers.  While there are different levels and utilizations of Snapmirror, I don’t want to go into it here, because that’s not what we’re talking about.

 

 

 

How are Snapshots and Snapmirror related? That’s what I want to cover here, because everything is based around the Snapshot.

 

 

 

When you first build a Snapmirror relationship of a volume between two filers, this relationship must be “initialized.”  Initializing the Snapmirror is a fancy way of saying, “I need to copy all the data from the source to the destination, before I can actually start syncing the data.” How does it do this effectively?  You guessed it.  Snapshot.  A snapshot is taken at the time of initialization, and this is called your “baseline.” Contents of this snapshot are completely sync’ed between the two filers, including any and all other snapshots you’ve taken on the source.  Since snapshots of a volume are stored within said volume, and you’re making a 1:1 mirror copy of said volume, all the other snapshots from the original source volume are coming along too.  So when you go to snap list <vol name> or Filer > Volume > Snapshots > Manage in FilerView, you will see all of you scheduled snapshots, as well as a new one (“baseline”), and all of these will exist for that volume on both filers hosting the Snapmirror relationship.

 

 

 

Make sense?  If not, ask away in comments!

 

 

 

"But Nick, I don’t want to keep a constant 1:1 full copy sync’ed of all of my volumes on two controllers! I just want a backup mechanism to be able to store MORE snapshots for longer periods of time!"

 

 

 

Ah-HA!  Enter SnapVault.

 

 

 

SnapVault is the disk-to-disk “backup” version of SnapMirror.  Where SnapMirror is the 1:1 scheduled sync’ing of a volume, SnapVault is the scheduled “backup” of a volume, only copying changed blocks in…you guessed it…Snapshots (noticing a trend here?)…from the source to the destination.

 

 

 

SnapVault systems are made up of two major components:

 

 

 

SnapVault Primary’s:  These are you source systems containing volumes/datasets that you want to backup.

 

 

 

SnapVault Secondary’s:  These are the destination system(s) where you long-term archive as many iterations of a dataset as you have available disk to store them on.

 

 

 

To make it easier to understand, think of it as a hub-and-spoke topology.  In the simplest of configurations, your SV_Secondary (destination) filer is the hub of the wheel, and each SV_Primary (source) represents a single spoke, connecting back to the hub.  You can SnapVault many primary systems to a single secondary system, again, for as much disk as you have available on the secondary to store data. For every controller (2 on clustered systems) you must have a SV_Primary license if you intend to use SnapVault to backup volumes on that controller.  However, you will only need the single SV_Secondary license, unless you begin to expand your SnapVault configuration out even more.

 

 

 

SnapVault also gives you the ability to centrally control all snapshot schedules among all of your filers across your entire infrastructure, to maintain naming conventions, standardize schedules, etc.  This gets even moreso centralized with NetApp’s other software product, known as Protection Manager.  But, again, a little out-of-scope of this writing.

 

 

 

Before we get to SnapManager products and SnapDrive, I wanted to cover one that is most-often used in Legal, Hospital/Medical, and government agencies, which require extremely long-term archiving of WORM data.  In case you didn’t know, WORM stands for Write-Once-Read-Many.  It is a way of locking down change access to data completely (even from admins) but still allowing to be accessed in a read-only fashion for reporting, auditing, compliance, and legal purposes.

 

 

 

SnapLock.

 

 

 

SnapLock is NetApp’s way of locking down a dataset in this fashion.  It is almost always used in conjunction with a 3rd party application that is used to manage archived data.  Similar comparisons to this are things like EMC’s DiskXtender product, which uses UDO laser discs as WORM media.  But, this is still cumbersome, as it requires you to manage laserdiscs, and protect them if they ever have to be moved, removed, or destroyed.

 

 

 

SnapLock is still used in the same fashion, only you’re locking this data down onto your Unified shared storage, and this can be mirrored, snapped, or vaulted to other controllers, and will still maintain its SnapLock status.

 

 

 

Honestly, this is the one I have the least experience with, but being a medical company, we are looking at ways to improve our current WORM solution, and SnapLock is definitely on the radar for us in the next couple of years.

 

 

 

SnapDrive and SnapManagers

 

 

 

These two are not directly related to Snapshots, but they affect the ways and processes that Snapshots are taken.  SnapDrive and SnapManager products are middle-men.  They are communication applications that allow applications, operating systems, and NetApp storage systems to all speak the same language and accomplish all of your goals for you.

 

 

 

NetApp has a SnapManager app out there for just about every tier1 application on the marketplace.

 

 

 

Oracle (SMO)

Exchange (SME)

SQL (SM-SQL)

VMware (SMVI)

 

 

 

…and more.

 

 

 

I’m going to pick the one I’m the most familiar with (Oracle) to give some direct steps and examples of how these two coordinate among the App, OS, and Storage system to give you app-level-consistent snapshots of your apps and databases.

 

 

 

SnapDrive is the “agent” of sorts, that gets installed locally on the host of whatever application/OS/DB you’re wanting to take snapshots of.  He will tell the OS and local host disks the correct instructions when the NetApp controller makes calls to it.

 

 

 

SnapManager application software also gets installed on your host, but it is geared more as an agent for the application/db running on said host.  He will tell the Application itself the correct instructions when the NetApp controller makes calls to it.

 

 

 

These two have to work in conjunction with each other, as well as with the OS/application/DB/NetApp, in order to give you consistent backups.

 

 

 

So here we go:  The layman’s language of how SnapManager takes a snapshot of a production Oracle database.

 

 

 

Let’s assume you want to snapshot an Oracle database once per day at midnight.  Oracle databases need to be put into “hot backup mode” in order for the snapshot to be worth anything.  You schedule SMO to take this snapshot of this database named ORACLE on a host named SERVER1 every night at midnight.  The data files for this db are stored on FILER1.

 

 

 

It goes down a little something like this:

 

 

 

SMO> Hey, FILER1, my schedule says it’s time to snapshot ORACLE database on SERVER1.

 

 

 

FILER1> Hmm, ok.  Let me check with SERVER1 and make sure SnapDrive isn’t busy, because he’ll have to pause the mount points to the OS while we do this thing.

 

 

 

SnapDrive> Hey! SERVER1 said you needed me. Yea, I’m ready when you guys are!

 

 

 

SMO> OK, hold on, let me put the database in hotbackup mode. (alter database *) OK, done!

 

 

 

FILER1> OK, SnapDrive, I’m gonna snapshot, you ready on those mount points?

 

 

 

SnapDrive> Ready when you are!

 

 

 

SMO> Go go!

 

 

 

(roughly 60 seconds per 100GB is typically what I see here)

 

 

 

FILER1>  OK, snapshots completed, SMO.  Name them whatever you like and re-open the database when you’re ready!

 

 

 

SMO> Done and done. (alter database open *)  We’re done!  See ya tomorrow night!

 

 

 

So, while SnapManager and SnapDrive don’t really have anything to do with the snapshots themselves (technically…let’s not split hairs here), they are pivotally important to the communications between all systems involved.  And while I didn’t understand the product(s) initially, I have certainly fallen in love with them, and would love to be involved in the “how to make them better” side of things.

 

 

 

Each of the tier1 app SnapManager products must be run and configured autonomously of each other, but there is unconfirmed rumor abounding about a “unified view” version coming soon where you will get a MMC-style version that will encapsulate all SnapManager products into one screen, giving you a single pane of glass to manage all of your snapshotting routines for all of your tier1 apps.

 

 

 

I hope this will serve as a good primer for you newer NetApp customers out there, and help clear up some of the confusion about what each of the Snap products does and how they interact with other.

 

 

 

At the end of the day, spend most of your time understanding what a Snapshot is, but more importantly what it isn’t.  Once you have that down-pat, the rest just comes along naturally.

 

 

753 Views 1 Comments Permalink Tags: storage, snapshot, backup, infrastructure, management, netapp, virtualization, restore, snap, snapmanager, snapdrive, snapmirror, fas, snapvault, snaplock
0

So just a quick note here.  We are staging our NetApp + Nexus 10GbE upgrade/migration, and was going through the motions, moving VM's off in order to add the 10GbE NIC's, and start running into Host CPU incompatability errors.  What I wanted to do here was basically post everything we discovered, and tried (unsuccessfully), to get it to live vMotion.

 

Symptoms:  live vMotion throws an error about Host CPU incompatability at 'ecx'    however, cold migration goes without a hitch.  This was unacceptable however, because we promised NDU's to the new NetApp.

 

The digging begins.  First things first, I KNOW that Intel-VT is enabled in the BIOS, because we would not be able to host x64 VM's were it not.  And we have several, and they have lived on every host.  But you know what?  I'll check anyway.  I was right.  Intel-VT is enabled.

 

At the suggestion of Rick Scherer, I tried to create a brand new EVC-enabled cluster.  Apparently the X7350 3GHz Xeon's don't support this, as I was unable to add the first host to this new cluster.

 

Back to the drawing board.  OK, let's start looking into the NX-bit bypass, and dig through the vmx files.

 

Tried to power down a non-critical VM, change the  CPUID Mask to "Hide the NX/XD flag from guest."  Powered it back up, and tried to vMotion again....failed.

 

Now I'm really getting frustrated, and digging into the vmx files with a fine tooth comb.  Then I stumbled across something......odd....

 

tools.syncTime = "FALSE"

uuid.location = "56 4d 65 df 6b c0 4c 44-7f ee b8 89 f7 af 4d 66"

sched.mem.max = "1024"

sched.swap.derivedName = "/vmfs/volumes/cd9c6686-f1131816/Web4-01/Web4-01-99e2ad9d.vswp"

hostCPUID.0 = "0000000a756e65476c65746e49656e69"

guestCPUID.0 = "0000000a756e65476c65746e49656e69"

userCPUID.0 = "0000000a756e65476c65746e49656e69"

hostCPUID.1 = "000006fb000408000004e3bdbfebfbff"

guestCPUID.1 = "000006fb00010800800022010febbbff"

userCPUID.1 = "000006fb000408000004e3bdbfebfbff"

hostCPUID.80000001 = "00000000000000000000000120000800"

guestCPUID.80000001 = "00000000000000000000000120000800"

userCPUID.80000001 = "00000000000000000000000120000800"

evcCompatibilityMode = "FALSE"

 

I actually have no idea what this truly means, or where it comes from, but on just about every other 1vCPU VM I inspect, I see like values across the board.  I did make a copy and try changing it to match the other values, and the migration still failed.

 

The tech I worked with last night was supposidly allocating this to a Bug ID of 380853 (which I can't access...partners?!) and submitting a set of exported logs to the engineering team.

 

I don't have time to sit around and wait, so I'm proactively moving things off.  If they need to be quickly powered-down to be moved, I'm clearing it with the responsible party, and plowing through it.  Certainly not how I would like to go about it, but hey, bigger fish to fry.

 

A couple of things to note.  There are (3) HP DL580 G5 servers in this cluster.  They're identical.  Same box, same (4) X7350's per host (OR ARE THEY?!), same 128GB RAM per host.   Each of them had different BIOS revisions, so that is being brought up to the latest (May 2009).   Also, I decided to check out the  /proc/vmware/cpuinfo on each host. 

 

Dammit, Intel.  This is annoying.  Please stop doing this.  See below:

 

ESX host 1.   This is the oldest one, and set the stage for the other two with the X7350's.  We were specific about getting identical hardware so we didnt have to deal with performance degradations and "workarounds" such as EVC.

 

  1. cat /proc/vmware/cpuinfo

 

ESX1

ESX2

ESX3

pcpu

00

00

00

family

06

06

06

model

15

15

15

type

00

00

00

stepping

11

11

11

tscKhz

2933332

2933434

2933434

processorKhz

2933332

2933434

2933434

busKhz

266666

266675

266675

name

GenuineIntel

GenuineIntel

GenuineIntel

ebx

0x00040800

0x00040800

0x00040800

ecxFeat

0x0004e3bd

0x0004e3bd

0x0004e3bd

edxFeat

0xbfebfbff

0xbfebfbff

0xbfebfbff

initApic

0x00000000

0x00000000

0x00000000

apicID

0x00000000

0x00000000

0x00000000

 

So I sent a big long email draft to my local Intel field engineer and we'll see what the deal is.   For now, I'm putting a smile back on my face and headed to the datacenter to turnup some 10GbE!

 

-Nick

231 Views 0 Comments Permalink Tags: vmotion, virtual, esx, vmware, vi3, infrastructure, management, esxi, virtualization, vsphere
5

Is it just me, or is there still a lot of taboo and misinformation in our industry about VMware, and what it brings to the table, and what it's capable of?  As big of keywords as 'virtualization' and 'consolidation' have become, I still get the eerie doubt from a lot of peers about just that.  The minute you say, "Hey, why don't we just VM it!" there's a hush that falls over the room as everyone starts throwing random reasons out why they think it wouldn't work.  And it really comes down to a lack of education (adoption might be a better word..."the norm" so to speak) throughout the industry.  You sales folks have GOT to stop feeding us high-level powerpoint slides with these monotone readings over the top of them.  Put the slides away and just TALK SHOP WITH US!

 

As much as VMware and other big http://communities.vmware.com/blogs/nickhowell/2009/09/09/oracle-on-vmware-one-mans-battle-against-a-titan/storage players have done for the industry, especially in the last few years, the sales force is still the same as it was10-20 years ago. You're still just using keywords to get our attention, but not really telling us how to use your product to our advantage.  Even when we've had vendors and resellers come INTO our office to give 'demonstrations,' it's still nothing more than them plugging their laptop into our big tv, and showing us the same powerpoint they've shown to 100 other potentials.  Eventually, the techie's get to take over, and the lonely PSE they let out of the closet for a day to come along finally gets to converse with the company's IT team and we get down to business, if we're not nodding off from watching the slides click by while the droning of the sales rep reading them verbatim seems more like a lullaby.

 

But that's not what I'm writing about here.  This isn't a rant.  This was a precursor into a battle I didn't know I was getting myself into.

 

A little backstory...

 

Our current environment involves a couple of Sun 18U refrigerators with a fiber-channel Hitachi brick. Pretty cookie-cutter for a high-end, mission-critical Oracle database. Layer on a bunch of Oracle database, application, and DR software as well as all the manual processes that go along with them, and you've got a rough idea of what we're starting with.

 

Before this upgrade process started, we had finally gone down the consolidated storage path, joined the NetApp country club (as my boss likes to call it), and were using it for mainly shared network drives, user data, and VMware.  The NetApp came into play because we also wanted to consolidate the Oracle storage onto the NetApp and off of the standalone brick.

 

There were a few scenarios that were thrown out in the beginning.  There were the new Niagara-based Sun T2000's, which were actually RECOMMENDED to us, and I'm still convinced to this day that this is 100% our fault for not doing the homework we should have done pre-purchase.   In case you're not familiar, the Niagara chipset was never really designed to run databases.  They turned out to be a huge flop during our performance testing until the right engineer/Sun-guru came along and told us about the whole Niagara story.

 

It was also during this phase that we started parting the seas of IT between the DBA's and the Operations crowd.  I had been doing some serious reading into NFS.  I have had a lot of success using NFS + VMware for my datastores.  We actually started out using iSCSI + VMware, but I added a new large datastore via NFS and the performance was actually better.  And the allure of resizing entire datastores on the fly was enough to make me jump in with both feet.    So, the DBA's would never conceive of running anything but fiber-channel to high-end storage.  So our initial failed phase of testing with the T2000's was largely attributed to using NFS, because it was the likely scapegoat.  It wasn't until we hooked the T2000's up to fiber-channel to the same NetApp configuration that we noticed the performance was still terribad, and only marginally better than our current 5+ year old Sun refrigerators.

 

After some serious digging, NetApp and Sun engineer involvement, and theorycrafting (more like "ok now what the hell are we supposed to do?"), I started planting the VMware bug.

 

"No way."

"There's no way VMware can handle the workload."

"Are you kidding me?"

etc.

 

So, I sat quietly as we went on to the next Sun solution.  This time, we did some serious homework, in looking at the Sun M4000.  Different architecture, built to run high-end databases, etc etc etc.

 

At the same time, in another corner of the datacenter, I proactively on my own, built a linux VM.  Working with one of the DBA's, we installed a copy of 11gR1, and migrated a copy of our production database to it.  We divi'ed up some volumes on the NetApp to host the data, and mounted those via NFS directly to the VM.   At the same time, we also configured Oracle's new D-NFS client, to make direct connection from the database to the storage, bypassing the kernel layer.

 

The results....well, they were nothing short of shocking.  The linux VM running 11g completely outworked the M4000 with a 1:1 hardware configuration.  Mind you, this was only on a 1GbE connection, single path, to a NetApp filer that was already heavily taxed hosting tons of CIFS shares, and NFS ops to VMware.  (Testing was conducted using SwingBench and Real Application Testing)

 

So, where do we go from here?   We had a meeting.  Everyone was excited/shocked/appalled by the results ("How the hell did a little VM run circles around the boxes that launch the space shuttle?!").  We all threw our hands in the table, and said "GO VMWARE!" and off we went, with a purpose in mind to virtualize Oracle.

 

We are currently in the process of building out a production-level environment, and once I have cleared it to discuss it more, look for a subsequent post related to the final architecture.

 

We are definitely excited.  Not so much for the consolidation VMware brings to the table, but the agility and easy resilience things like vMotion, HA, and eventually FT, bring to the table, especially for big tier 1 apps such as Oracle OLTP databases.

 

Stay tuned.

 

-Nick

1,543 Views 5 Comments Permalink Tags: virtual, esx, vmware, vi3, infrastructure, oracle, management, esxi, virtualization, vsphere
1

The day had arrived.

 

After many marketing documents, press releases, and speculative forum posts, we were all waiting with baited breath to see just how VMware could make one of the most revolutionary products to ever hit the market even better.

 

I attended the vSphere 4 launch tour.  I watched the powerpoint slides scroll through with all of the piles of new features.  I was sold.  I researched a bit, and couldn't find any articles negating the ability to upgrade.  Asked all my questions beforehand...

 

"Can ESX 3.x and 4 hosts co-exist in the same cluster?"

"Is there anything retroactive about running a non-upgraded-VM on a ESX4 host?"

"Can I still join ESX4 hosts to VirtualCenter 2.5?"

 

This accurately defined my upgrade path, and this is basically a synopsis of how that process went down.  There were some surprises, but none of them were bad.

 

Step 1.  Upgrade VirtualCenter 2.5.x to vCenter 4.0

 

"Wait...what?"  This shocked me a bit at first, because the answers I got from the launch tour told me that I could join ESX4 hosts to VirtualCenter 2.5!  Why do I need to upgrade VC to vCenter4 first? 

 

And then the golden nugget was dropped on me.

 

It just so happened that we had coincidentally just purchased an additional host to add to the cluster.  I had waited to do the install so that I could truly test the whole "adding an ESX4 host to VC 2.5" thing.

 

Clean install went flawlessly.  Things to note:  You can now preload custom drivers DURING the install.  Got a weird NIC or HBA?  Oddball SCSI card?  Here's your chance.  No need to go in post-install and do your configuring there.  You can do it before the installer launches into loading packages.  COOL!  Other than that, it was just like installing any of the other versions, aside from a few graphics changes.

 

First host install complete.  Once it's loaded, I connect to it directly via the VIC, to do some simple configuring.

 

ROADBLOCK!

 

New client install time!   The first time you connect to ESX4 or vCenter4, you're prompted to install the new client.  Schnazzy new dashboard, and I see the potential here for lots of vCenter plugins.  (more on that in another post)

 

Client updated, onward!  Connect to the host, configure my storage and networking to be identical to the other hosts in the cluster.  Easy-peezy, no problems there.  Right-click on the cluster, add host, go through the motions, and in a matter of minutes, I now have an HA/DRS-activated ESX4 host in my VC 2.5/ESX3.5.x cluster.   Flawless.

 

WOW!  This is great!  So I kick one of the 3.5 hosts in Maintenance Mode, go to lunch while the VM's migrate off, come back and pop in the ESX cd.  Boot up....

 

"Existing ESX hosts can no longer be upgraded via this method.  Please upgrade through vCenter."

 

Hmm...a bit of a quick digging made me realize that there was not a choice in upgrade paths anymore.  So I read back over the upgrade documentation once again.  AHHHHH!  So THIS is why they want you to upgrade VC to vCenter4 first!

 

Apparently, one of the big changes in vCenter4 was making Update Manager NOT USELESS!  "Alright....let's give this thing a shot."

 

So, ditching the idea of upgrading hosts for the moment, I switched gears over to VC.  Downloaded the .zip pack and copied them to my existing VC server (which is also a VM).  Ran the executable, it found my existing VC installation.  Great!  Then the prompts about the database come up.  So I put in my db user/pass, and the name of the database, ERROR!  Ut oh.  It prompts me that the vc user must have full 'sa' rights to the MSDB database.  I was confused for a bit because I didn't read the fine print closely enough.  I thought it was referring to my actual VC database.  So, I double-checked that the vc user had owner privs to the VC database, went back and tried again.  ERROR!  It took me a couple of tries to realize I wasn't paying enough attention, and that it said 'MSDB' database, and not my actual 'VC' database.

 

Problem solved, moving on.  Upgraders, just make sure you're doing the MSDB when you upgrade to vCenter, not your actual VC database.

 

Licked that, install finished fine, and IIRC, it prompted me to remove the old model licensing server.  Note, if you're upgrading all of your vCenter and hosts to v4, there's no need to keep it, but it will not interfere with anything.  v4 doesn't even reference it, and it's good to have around in case you need to do something in 3.5 (i.e. test/dev environments, perhaps?)

 

Restarted the VC VM (sorry, "vCenter" now.  Did Steve Jobs start working for VMware under the table? vCenter/vSphere...) and connected to it.  Nice dashboard!  I like it!  Love the new licensing model as well.  I went straight to the licensing site (licensing.vmware.com) and consolidated all of my individual licenses into ONE KEY!  This was amazing!  Gone are the days of managing dozens of license keys!  And it actually WORKS now!  Also, little did I know, but VMware had also already upgraded my license keys to be vSphere4 keys.

 

So, VC is upgraded to vCenter 4.  Licenses have been properly allocated across all hosts with ONE key, and we're ready to rock.

 

In my next writeup, I'll go over upgrading your individual hosts.

 

 

 

-Nick

 

 

1,344 Views 1 Comments Permalink Tags: virtual, esx, vmware, vi3, infrastructure, management, esxi, virtualization, vsphere
0

Introductions

Posted by that1guynick Jun 29, 2009

 

In 1999, a friend of mine approached me asking me if I wanted to help him run some cables in an office building for $10/hr.  I said sure.  Being a techy-guy, I started asking questions about what cables were what, where they plugged in, and why they plugged in there...

 

It was all downhill from there.

 

Less than 2 years later, I was an MCSE-2000.  For the past 10 years, I've worked in various systems administration positions, primarily focusing on Active Directory & Exchange domains.  We did lots of Exchange migrations and AD 2000-to-2003 migrations earlier in the decade, because well, that's what admins did in those days.

 

But it wasn't until my current employer and I embraced the idea of virtualization that I truly came into my own. I finally found something I wanted to "specialize" in.

 

Since then, everything we have done has always had the "should we virtualize it?" tagline added to it, and more recently, it has been, "is there any reason why we can't virtualize it?"

 

My virtual career path started with ESX 3.0, and NetApp shared storage.  Most of what I discuss here will be our trials and tribulations with migrating a company in the Healthcare sector (non-hospital) to the virtual world of consolidating servers and storage.

 

I welcome the feedback and any questions anyone and everyone might have, and hope that this not only serves as a track in my progress, but as a way for me to help fellow admins with situations I may have already encountered. 

 

Nick Howell

Datacenter Engineer

 

 

62 Views 0 Comments Permalink


Communities