SQL Server Performance issues as a VM - Page 2

TomR_CAW · ‎08-08-2008

Since converting an MSSQL server to a virtual machine we have seen severe performance drops and an array of unusual errors. Here's the scenario:

3 Dell PowerEdge 2950 servers each configured as follows:

2 PROC, Quad core Intel E5430 2.66Ghz

8GB RAM

2 onboard Broadcom GB NICs (used for service console and admin network only)

1 Intel GB 4 port NIC (2 for VMotion, 2 for Production VMs)

2 QLogic QLA2432 4GB HBAs

EMC CX300 SAN

Admin, VMotion and Production networks are physically separate, each with it's own GB switch.

We are running ESX 3.5 on all hosts with VC 2.5 setup as a VM. HA, DRS, VMotion all enabled. There are six VMs running currently arranged like so:

Host1

WebServer VM (4 vCPU, 3GB RAM, Win2K)

Host2

WebServer2 VM (1 vCPU, 1GB RAM, Win2K3), secondary sites, minimal traffic

WebServer3 VM (1 vCPU, 1GB RAM, Win2K3), not in production yet, no traffic

VirtualCenter VM (1 vCPU, 2GB RAM, Win2K3)

DataImport VM (1 vCPU, 1GB RAM, Win2K), processes data files to load into SQLServer

Host3

SQLServer VM (4 VCPU, 3GB RAM, Win2K, MSSQL2000 sp4) primary backend database

The WebServer, DataImport and SQLServer VMs were all converted using Converter Enterprise.

All VMs are stored on the SAN in LUN A, except SQLServer in LUN B.

All SQLServer databases (.mdf files) are in two RDMs to LUNs C & D.

All SQLServer log files are in the VM.

We operate approximately 200 SQL databases ranging in size from 20MB to 20GB. Most are around 750MB. There is a lot of SQL activity. We receive daily update files from clients to load into their databases. These files are stored in a separate physical server and processed by the DataImport VM which does most of the insert/update/delete functions. The web portals themselves are mostly just queries. Very few inserts or updates. The past couple weeks since converting everything to VMware we have seen a lot of performance issues. Web Portal requests routinely return ASP timeouts while waiting for responses from SQL. We have also seen a lot of other intermittent errors since the conversion like these:

"The RPC Server is unavailable" when accessing some pages.

and

"Microsoft OLE DB Provider for SQL Server error '80004005'

DBNETLIB ConnectionOpen (PreLoginHandshake()). General network error. Check your network documentation."

Both of these are intermittent and will often go away after refreshing a page. We've also seen these creep up on our web server: "Out of process application '/LM/W3SVC/1/ROOT/xxxxx' terminated unexpectedly."

After some monitoring with esxtop, VI client Performance, perfmon, and our switches the network, SAN fabric and memory do not appear to be dragging. The only thing that is jumping out at me is the SQLServer CPU usage. Since converting the VI client performance tab and esxtop is showing consistently above 90% CPU usage when our DataImport programs are running. If I shut them down the importers, usage drops to 30% or below. Also perfmon and taskman in the guest are showing barely 25% usage at the same time esxtop shows 100%.

I'm at a loss of where to look next. SQL Server was able to handle all site and importer requests perfectly prior to the conversion and that was with lesser hardware. Any help would be greatly appreciated.

Thanks.

Tom

jhanekom · ‎08-12-2008

Long post, but I really think these are all worth investigating...

My gut-feel, "short answer" would also be to investigate the possibily of a fresh install of the DB server in order to try and resolve the matter. P2V operations, while better than in the past, sometimes has a tendency to pull through legacy stuff that doesn't really belong in the VM. It could be that there is an extra device driver or hardware management agent that's not playing well with ESX. (Just double-checking: can you confirm that in Device Manager, Computer, your system is set to use a Multiprocessor HAL?)

Longer answer would be to try and locate the bottleneck, as with troubleshooting any performance issue. Fortunately, there are some excellent tools available to help with this.

- VMware Tools - simple to miss, potentially huge impact. If you don't have the Tools installed in the VM, there could potentially be a massive CPU overhead in the network stack.

- Antivirus - just a sanity check. Disable temporarily if installed and see if anything changes.

- W2K SP4 Update Release 1. This is required for Windows 2000 to be supported on VMware ESX, so make sure it's installed. Note sure if it could explain your symptoms, but worth a look.

- CPU - esxtop is the tool to use for real-time troubleshooting; the most important value you want to look out for in most CPU-bound cases is %READY. If this is higher than 10% in esxtop, something is preventing the VM from being scheduled to run as frequently as it should. I know you're saying it's the only VM on the system, but look at the possibility of contention somewhere. Also, make sure you're not allocating more than half the available number of cores to a single VM; see the performance tuning best practices guide for more details: http://www.vmware.com/pdf/vi_performance_tuning.pdf

- Disk - IOmeter with a decent profile can help compare storage figures before/after - I'm not sure if your SQL data was on the Clariion previously as well; even if you can't compare the figures to anything on your side, if you post them here I can compare them with what I've got (IOmeter: http://www.iometer.org/; use the 8K OLTP profile from this definition file with queue depths of 4, 16, 64 and 256: http://arethusa.tweakers.net/~femme/iometer/workloads.icf)

- Networking - use iperf to verify that you've got reliable, high-speed connections between the various virtual machines. Run "iperf -s -w 256k" on one node, and "iperf -c -w 256k" on the other node. (http://dast.nlanr.net/Projects/Iperf/#download)

One last thing - can you please check your vmkwarning files (located in /var/log/) to see if there are any errors/warnings popping up on any of the ESX hosts?

TomR_CAW · ‎08-12-2008

I'm considering the fresh install of the DB now. It's not a pleasant thought at this point.

We are running a Multiprocessor HAL on all multiproc VMs and Singleprocessor HAL on the single proc VMs.

I removed all specialty and unused device drivers from Windows and uninstalled any unused software (EMC PowerPath, QLogic SANSurfer and such) last week.

VMware Tools is installed on all VMs.

Antivirus was removed from all VMs and was to be reevaluated once everything was converted and running smoothly.

All Windows VMs are fully patched.

esxtop %RDYnever went above 3% when running under full load. While %USED on each vmm was above 95%.

I've never used IOmeter or iperf. I'll look into those right away.

I checked the vmkwarning files on the host and the following is all I found:

# cat /var/log/vmkwarning.3

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.009 cpu6:1040)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.009 cpu6:1040)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.009 cpu6:1040)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.205 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.205 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.205 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.207 cpu6:1040)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.207 cpu6:1040)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.207 cpu6:1040)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.480 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.480 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.480 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.498 cpu6:1040)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.498 cpu6:1040)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.498 cpu6:1040)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.507 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.507 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:01 host3 vmkernel: 35:00:13:14.507 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.843 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.843 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.843 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.860 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.860 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.860 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.868 cpu5:1041)WARNING: SCSI: 279: SCSI device type 0xd is not supported. Cannot create target vmhba0:288:0

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.868 cpu5:1041)WARNING: SCSI: 1249: LegacyMP Plugin could not claim path: vmhba0:288:0. Not supported

Jul 23 09:40:02 host3 vmkernel: 35:00:13:14.868 cpu5:1041)WARNING: ScsiPath: 3180: Plugin 'legacyMP' had an error (Not supported) while claiming path 'vmhba0:C0:T288:L0'.Skipping the path.

Jul 23 10:50:36 host3 vmkernel: 35:01:23:49.001 cpu6:1058)WARNING: SCSI: 4526: Manual switchover to path vmhba2:0:2 begins.

Jul 23 10:50:36 host3 vmkernel: 35:01:23:49.002 cpu6:1058)WARNING: SCSI: 4575: Manual switchover to vmhba2:0:2 completed unsuccessfully.

Jul 23 10:50:36 host3 vmkernel: 35:01:23:49.017 cpu3:1057)WARNING: SCSI: 4526: Manual switchover to path vmhba2:1:2 begins.

Jul 23 10:50:37 host3 vmkernel: 35:01:23:50.063 cpu1:1057)WARNING: SCSI: 4567: Manual switchover to vmhba2:1:2 completed successfully.

Jul 23 10:50:40 host3 vmkernel: 35:01:23:53.206 cpu4:1055)WARNING: SCSI: 4526: Manual switchover to path vmhba2:0:3 begins.

Jul 23 10:50:40 host3 vmkernel: 35:01:23:53.206 cpu4:1055)WARNING: SCSI: 4575: Manual switchover to vmhba2:0:3 completed unsuccessfully.

Jul 23 10:50:40 host3 vmkernel: 35:01:23:53.210 cpu0:1057)WARNING: SCSI: 4526: Manual switchover to path vmhba2:1:3 begins.

Jul 23 10:50:40 host3 vmkernel: 35:01:23:53.583 cpu0:1057)WARNING: SCSI: 4567: Manual switchover to vmhba2:1:3 completed successfully.

Jul 23 10:59:31 host3 vmkernel: 35:01:32:43.905 cpu4:1055)WARNING: SCSI: 4526: Manual switchover to path vmhba2:1:1 begins.

Jul 23 10:59:31 host3 vmkernel: 35:01:32:43.905 cpu4:1055)WARNING: SCSI: 4575: Manual switchover to vmhba2:1:1 completed unsuccessfully.

Jul 23 10:59:31 host3 vmkernel: 35:01:32:43.989 cpu6:1056)WARNING: SCSI: 4526: Manual switchover to path vmhba1:0:1 begins.

Jul 23 10:59:33 host3 vmkernel: 35:01:32:46.350 cpu4:1055)WARNING: SCSI: 4526: Manual switchover to path vmhba1:1:2 begins.

Jul 23 10:59:33 host3 vmkernel: 35:01:32:46.389 cpu4:1055)WARNING: SCSI: 4575: Manual switchover to vmhba1:1:2 completed unsuccessfully.

Jul 23 10:59:33 host3 vmkernel: 35:01:32:46.451 cpu0:1057)WARNING: SCSI: 4526: Manual switchover to path vmhba1:0:2 begins.

Jul 23 10:59:35 host3 vmkernel: 35:01:32:47.860 cpu0:1057)WARNING: SCSI: 4567: Manual switchover to vmhba1:0:2 completed successfully.

Jul 23 10:59:35 host3 vmkernel: 35:01:32:48.019 cpu6:1056)WARNING: SCSI: 4567: Manual switchover to vmhba1:0:1 completed successfully.

Jul 23 10:59:35 host3 vmkernel: 35:01:32:48.039 cpu4:1058)WARNING: SCSI: 4526: Manual switchover to path vmhba1:0:0 begins.

Jul 23 10:59:35 host3 vmkernel: 35:01:32:48.041 cpu4:1058)WARNING: SCSI: 4567: Manual switchover to vmhba1:0:0 completed successfully.

Jul 23 10:59:39 host3 vmkernel: 35:01:32:51.991 cpu0:1057)WARNING: SCSI: 4526: Manual switchover to path vmhba1:1:3 begins.

Jul 23 10:59:39 host3 vmkernel: 35:01:32:51.994 cpu3:1057)WARNING: SCSI: 4575: Manual switchover to vmhba1:1:3 completed unsuccessfully.

Jul 23 10:59:39 host3 vmkernel: 35:01:32:51.997 cpu6:1056)WARNING: SCSI: 4526: Manual switchover to path vmhba1:0:3 begins.

Jul 23 10:59:39 host3 vmkernel: 35:01:32:52.029 cpu6:1056)WARNING: SCSI: 4567: Manual switchover to vmhba1:0:3 completed successfully.

#

TomR_CAW · ‎08-13-2008

I hate trying to troubleshoot problems with inconsistent symptoms...

So in order to keep my websites running as smoothly as possible for my clients I've started shutting down our data importers during the day and only running them at night. Last night I started the importers and approximately 35 were processing files at once. This is a large number to do at once and usually drags the system down, even in the old physical setup. So I jump on a few of the sites to see if the errors are still showing up like before. I had ZERO problems logging in, executing large reports, and general site navigation. I also ran a good number of queries from Query Analyzer with very little delay. Most returned results in under 5 seconds, even on large reports. The longest was about 15 seconds which was expected with the particular query. This is all while the 35 data imports were running. CPU usage in the VI client was pegged at 100%. In the guest it ran at about 40-50%. I let the importers run overnight and checked them first thing this morning. About 15 were running when I looked and I did have some problems with sites this morning. Several of them returned timeouts and one returned the RPC error. The data importers are currently set to slice out about 1/3 of our clients per instance. So we have three instances of the program running with about 60 clients per. I shut down one of these and the sites all started behaving again. Even with the other two still running and about 6 clients still processing files. So I let those two run. About an hour later I started the third instance again which brought another 5 clients processing files to the game, totaling about 10. CPU in VI client still averaging high 90s. Guest averaging 30-35%. Site usage appears normal at the moment. Right now I only have two files processing and the CPU in VI client is STILL averaging 90% usage! Guest is averaging 20%. I understand that there will be some discrepancy but a 70% difference!? So far today things are running smooth. I don't believe anything is solved, but for some reason it's working well enough today.

I will probably spend the day today looking into IOmeter and iperf.

TomR_CAW · ‎08-13-2008

IOmeter results.

TomR_CAW · ‎08-13-2008

More IOmeter results.

TomR_CAW · ‎08-13-2008

iperf test results at 256k TCP window size:

Source Destination Bandwidth

-

WebServer1 SQLServer1 913 Mbits/sec

Importer1 SQLServer1 810 Mbits/sec

WebServer2 SQLServer1 883 Mbits/sec

WebServer3 SQLServer1 841 Mbits/sec

SQLServer1 WebServer1 919 Mbits/sec

Importer1 WebServer1 1.54 Gbits/sec *running on same host

WebServer2 WebServer1 928 Mbits/sec

WebServer3 WebServer1 939 Mbits/sec

WebServer1 Importer1 1.48 Gbits/sec *running on same host

SQLServer1 Importer1 839 Mbits/sec

WebServer2 Importer1 920 Mbits/sec

WebServer3 Importer1 933 Mbits/sec

WebServer1 WebServer2 928 Mbits/sec

Importer1 WebServer2 924 Mbits/sec

SQLServer1 WebServer2 919 Mbits/sec

WebServer3 WebServer2 1.40 Gbits/sec *running on same host

WebServer1 WebServer3 930 Mbits/sec

Importer1 WebServer3 875 Mbits/sec

SQLServer1 WebServer3 920 Mbits/sec

WebServer2 WebServer3 1.58 GBits/sec *running on same host

Argyle · ‎08-13-2008

Take a look if you are hitting the max TCP port limit in Windows on your SQL Server. It can lead to strange errors and general network error messages. It's common if you have a high traffic web site or batch jobs that incorrectly open too many SQL connections in a loop. By default max number of open TCP ports in Windows is 5000. You can increase it to 65000 with the MaxUserPort registry key. The key TcpTimedWaitDelay also plays a role here. By default its 4 minutes before a TCP connection will close.You can set it as low as 30 seconds.

On your SQL Server run the following from a command prompt when you encounter problems:

netstat -nao

If you see client port numbers close to 5000 for the SQL port (1433) you know you have this problem. The TCP connections are not closing fast enough. If you have local batch jobs running change these to Named Pipes / Shared Memory connections instead of TCP/IP connections. Let the web servers and other external apps use TCP/IP. You can find more info about it here:

http://support.microsoft.com/kb/196271

http://support.microsoft.com/kb/328476

It's possible that you are hitting the 5000 port limit on your web servers as well.

TomR_CAW · ‎08-13-2008

Thanks Argyle. netstat is indeed showing TCP ports near 5000 even now when the system is mostly idle. I'll check into this as soon as possible. I'll also see about changing our data importers to connect using named pipes instead of TCP/IP. Unfortunately I won't be able to do anything with this until tomorrow night since I will be away from the office tomorrow and don't want to make changes just before I leave town.

Dave_Mishchenko · ‎08-13-2008

Your post has been moved to the Performance forum

Dave Mishchenko

VMware Communities User Moderator

jhanekom · ‎08-14-2008

Those network performance tests look very good - nothing wrong there.

Looking at the disk performance tests, I'm a bit concerned - the figures do not compare well at all to what I've seen on other setups. To give you some sort of indication, I once tested 5x 36GB 10k SCSI drives on an HP ProLiant DL380 G3 system (old) with the built in RAID controller and no write cache; at a queue depth of 64, I saw 500 Transactions per Second with the 8K OLTP test. On an HP EVA 4400 with 16 spindles and a vRAID1 LUN, I got over 5000 transactions per second.

Having said that, your tests are a bit difficult to compare, since you ran the test on all disks simultaneously, but your combined transaction rate is around 300. This can be explained, though, if the storage subsystem was very busy at the time.

For the sake of making sure, would it be possible to run the tests on only one disk at a time? You don't have to run it on all your data disks - perhaps just one data disk and one log disk.

TomR_CAW · ‎08-14-2008

Apparently the support forums here will autopost any autoresponder I have setup...

Ken_Cline · ‎08-14-2008

Tom,

When you get back tomorrow, try setting a CPU reservation on your SQL VM. Since it's the only thing on the host, try setting it to 400% (100% * 4 vCPU). I don't know why this makes a difference in this scenario, but I have seen instances where setting the reservation (even with no contention) makes a difference. The other thing I would recommend is the clean install in a new VM. There have been reports of performance issues with P2V'ed VMs - not frequent, but it does happen.

Ken Cline

Technical Director, Virtualization

Wells Landers[/url]

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/

Argyle · ‎08-15-2008

Looking at the symtoms I'm quite sure it has nothing to do with either VMware, SQL or hardware and that it's the max TCP port limit in Windows you are hitting. I'd be interested to hear what the results are after changing the registry keys MaxUserPort to 60000 and the key TcpTimedWaitDelay to 30 seconds as mentiond in a previous post (on the SQL server but sometimes also needed on the web servers). And look for any code that open SQL connections in a loop :D. I'd also limit the max RAM SQL server can use to total RAM in system minus 512 to 1024 MB to leave some dedicated RAM for the OS that handles network processes. Otherwise SQL Server has a tendancy to take it all and doesn't give it back in a timely manner when the OS needs it.

TomR_CAW · ‎08-15-2008

I just had a report of the problem resurfacing and was able to duplicate the errors. Timeout errors from the web server. Come and gone, but I'm sure they will be back.

Ken, I changed the CPU reservation last night to approximately 400% of the available mhz.

Argyle, I haven't had a chance to modify the registry entries yet. According to the articles you linked I would need to change the max TCP port limit on my client (WebServer1) not the SQL Server. Also, I've checked my programming and we are using pooled connections. Using perfmon I can pull up the current user connections to SQL and it's currently at 268 with a maximum of 300. netstat also gets nowhere near 5000 entries even though the port number it lists occassionally has one in the 4000's. The RAM on the VM is set to 3GB and SQL is using up 1.7GB, which I believe the max RAM available to SQL Server 2000 standard is 2.0GB so it's has plenty of room overall to hit it's cap and leave enough for the OS.

I will still attempt to change the registry entries soon though.

TomR_CAW · ‎08-15-2008

jhanekom, I finally got the updated iometer results. These look better.

1100 average IOs on the log file drive (RAID 5 on the VMs LUN)

3000 average IOs on the data file drive (RAID 10 RDM from the SQL VM) <-- Enough acronyms in that?

Anyway, here are the result files.

TomR_CAW · ‎08-18-2008

I made the registry modifications to MaxUserPort and TCPTimedWaitDelay with values of 60000 and 30 respectively. I was notified of problems with my websites at approximately 7:00am this morning. Netstat on the SQL Server showed roughly 300 open TCP connections. Netstat on the web server showed approximately 200 TCP connections as either ESTABLISHED or TIME_WAIT and about 2400 entries like "UDP 0.0.0.0:29835 :" The site issues did not go away until I shut down one of my data importers. Even then only a few sites were operating smooth.

Argyle · ‎08-18-2008

Ok. It sounded really similar to a problem we had with tcp ports. Especially the "RPC Server Unavaliable" and "PreLoginHandshake()). General network error" messages. Are you running any reindexing or update statistics on the involved databases?

TomR_CAW · ‎08-18-2008

You did have me hopeful for a while, but I guess it just wasn't it. We don't have any reindexing other than the usual automatic stuff done by SQL Server. We do backup and clear the transaction logs every hour but that only lasts about 5 minutes and the issues are constant.

Argyle · ‎08-18-2008

When it comes to reindexing that part isn't automatic but it can help a lot to run it weekly for performance. You should at least run it once on each database when you move server or change hardware or disks. If you do not want to run an entire reindex right now (can take time and log can grow depending on table sizes) you could run a "sp_updatestats" on each database to begin with via Query Analyzer to see if it improves performance. A sql query plan can turn really bad with timeouts as a result if the index and statistics data is old.

Sometimes when you have many (or big) dataload operations it can help to run a index rebuild straight after but again it depends on the database size if its possible.

TomR_CAW · ‎08-18-2008

I'm sorry, I should have been more clear. Reindexing occurs on a weekly schedule on Saturday at 10:00am. It is not done during the week. The only scheduled changes during the week are the transaction log backups/truncations that occur every hour.