We are running esx 3.01 on dell powedege 2950 and xeon 5160 2 dual cores (3.06ghz proc) there are only 4 other virtual servers on this at the moment all doing next to nothing.
We have a windows 2000 file server which is running at 100% cpu, predominantly Services.exe +/- 85% and system +/- 15%
it is configured:
Large vmfs hard drive (1tb)
everything else is good with the system, Ram is only using 400mb
there are 500 clients attached. I have set the shares to high on the proc. I have made sure it's affinity is not on 0.
It seems very odd, I can throw another proc at it but would rather not, if I can help it, anybody else come across this? would you consider this normal? This is a clean install all patched and up to date, with vmware tools installed.
I have checked for a virus and come up negative.
The only other variable is this came from a template which was dual proc, and converted to a uniproc. The driver has been updated to uniproc to reflect this.
This is still causing us trouble, I have added an additional processor, but this jut maxes out as well. It is mainly the services process and a little system. The IO/sec is very high on the services process 140,000 io/sec we have about 500 user connecting to this server, and it is solely a file server.
the data is in a 1tb partition on its own LUN on a netapp filer.
We had absolutely no problems in the physical enviroment, and the server would tick over all day at 10%-15% on an old P3 cpu!
This is now a completely fresh build with 1 cpu installed from scratch the box is a dell 2950, all other servers are operating "normally"
anybody had any issues along these lines. I don't want to go back to physical, but we may have to.
Any suggestions would be greatly appreciated.
What does Process Explorer say? It should be useful in seeing exactly what services / background processes are using up the CPU.
Have you migrate the VM from a physical server? Compare the CPU usage between esxtop and the windows task manager. Is it different? If yes, check the HAL-Driver in the Device-Manager. It can make trouble when you use one vCPU an the HAL-Driver ist set to multiprocessor.
Hi Nocturne, yes that's a good call, we have had that problem in the past, but this server was built from scratch on ESX and has the correct HAL.
Hi decker0 I'll have a closer look at that, but the processes are Server Services and System.
Try to shut as many services as possible. If the load disappears you maybe can identify which one causes the load by starting them one by one again.
what is the servers role? F&P, SQL Mail, Web??? is it patched upto the hilt? Server Services and System should not be taking 100% resources, it is a symptom of something else
Not wanting to teach anybodies grannie to suck eggs, but have you run a AV scan on the server or a spyware/malware scanner.
I've got no problem with egg sucking lessons, most often it is something simple that's been overlooked. It is only a File server, fresh build and patched to the very latest, we've run AV scans and adware/malware but without results.
Does your AV scan in real time or does it only do a scan at set points in time? when are your AV updates set to install and do they then initiate a full scan of all drives?
We have looked at AV, and it does need tweeking, for example I need to exclude reads, and it does scan in realtime, but this is not the cause of our troubles.
I have had a bit of a breakthrough, and I am just about there now.
I ran filemon this showed shedloads of reads from the services.exe process. The file path was d:\system volume information\tracking.log.
after a google session we could narrow this down to "distributed link tracking client" service. When I stopped this services my cpu dropped back to 20% about where I would expect it.
This is what technet says about the service:
Distributed Link Tracking (DLT) Client - maintains links between the NTFS file system files within a computer or across computers in a network domain. The DLT Client service ensures that shortcuts and (Object Linking and Embedding) OLE links continue to work after the target file is renamed or moved. When you create a shortcut to a file on an NTFS v5 volume, distributed link tracking stamps a unique object identifier (ID) into the target file, known as the link source. Information about the object ID is also stored within the referring file, known as the link client.
Distributed link tracking can use this object ID to locate the link source file in any combination of the following scenarios that occur within a Windows 2000 domain:
The link source file is renamed.
The link source file is moved to another folder on the same volume or to a different volume on the same computer.
The link source file is moved from one NTFS volume to another within the same domain. (The NTFS volumes must be on computers running Windows 2000. The NTFS volumes cannot be on removable media.)
The computer containing the link source file is renamed.
The shared network folder containing the link source file is renamed.
The volume containing the link source file is moved to another computer within the same domain.
Distributed link tracking also attempts to maintain links even when they do not occur within a domain, such as cross-domain, within a workgroup, or on a single computer that is not connected to a network. Links can always be maintained in these scenarios when a link source is moved within a computer, or when the network shared folder on the link source computer is changed. Typically, links can be maintained when the link source is moved to another computer, though this form of tracking is less reliable over time.
Distributed link tracking uses different services for client and server:
The DLT Client service runs on all Windows 2000-based computers. In non-networked computers, the Client service performs all activities related to link tracking.
The DLT Server service runs on Windows 2000 Server domain controllers. The server service maintains information relating to the movement of link source files. Because of this service and the information it maintains, links within a domain are more reliable than those outside a domain. For computers that run in a domain, the DLT Client service takes advantage of this information by communicating with the DLT Server service.
Note: The DLT Client service monitors activity on NTFS volumes and stores maintenance information in a file called Tracking.log, which is located in a hidden folder called System Volume Information at the root of each volume. This folder is protected by permissions that allow only the system to have access to it. The folder is also used by other Windows services, such as Indexing Service.
If the DLT Client service is disabled, you won't be able to track links. Likewise, users on other computers won't be able to track links for documents on your computer. See also DLT Server.
Trouble is this is a file server, so I would guess I need this enabled! This data disk was just reattached to the newly created OS, so I am thinking it is something to do with that. I may see if I can delete the system volume information on the data drive and recreate it.
I just had a similar problem on two guests. Each had a different fix. For one server, I reinstalled VMTools and it solved the problem. For the second server, I just cold migrated it to a different host. For some reason, that worked. Hosts are identical. Maybe the migration did a reconfig or something to the .vmx file.
I am so glad to get this finished, thanks everyone for your help.
It seems that the tracking.log file was 400kb (normally 20kb) I think this must have become corrupted at some stage, I think this was when the data drive was reconected to the OS disk in VMware.
I stopped the "distributed link tracking client" service.
I gave myself write permissions to d:\System Volume Information. renamed tracking.log to tracking-old.log
when I restarted the service the tracking.log file was recreated.
Well done an excelent peice of detective work
thank you for comming back to the forum with your fix