I did some research on this to better understand how VC uses it database. Below is the results of this if anyone is interested:
What data is stored in the VirtualCenter database?[/b] As of version 2.0.1 of VirtualCenter the database consists of the following tables. The database mainly consists of alarm/event data, HA/DRS data, ESX host information, task/scheduled tasks and VM information. All ESX server and VM configuration data is stored on each ESX server and is simply read and displayed by VC. VC is just a central management front end to set configuration information and displays information that is read from all ESX hosts instead of having to manage each server individually. You can also use the VIC to connect directly to the ESX servers without VirtualCenter and modify the same configuration data. Once you add a ESX host back into VC it reads all the configuration info from that host.
The database is not critical to the operation of ESX servers or their virtual machines, they would continue to function normally if VC or its database were unavailable (Except for DRS and vMotion which would not work, HA would still work). If the database were to crash and a new one created you could add your ESX servers back in and it would repopulate the configuration information. The only data unique to the database is performance statistics, alarms, events, tasks, resource pools and custom attributes. This is not official documentation and is based on my browsing the database schema and data with a SQL client. Using a SQL client like Toad (Quest Software) or WinSQL Lite (Free) you can browse the data contained in these tables, you can also write SQL code to query information contained in the tables. Below is some sample SQL code to query information on VMs and hosts. For information on the views defined in VirtualCenter see this white paper: http://www.vmware.com/pdf/vc_dbviews_2x.pdf
VPX_ALARM_RUNTIME[/b] Used to store alarm events for each host or VM that is covered by a defined alarm, this table has 7 columns and as many rows as hosts/VMs that have alarms configured. For example if you had 10 VMs and each had 3 alarms assigned to them you would have 30 rows.
VPX_COMPUTE_RESOURCE_DAS_VM[/b] Used to store resource information for HA (priority, # of failure, retry period, power off/on isolation). This table has of 6 columns and as many rows as you have VMs that are part of HA.
VPX_DATASTORE[/b] Used to store VMFS/NAS datastore information (storage URL, capacity, free space, type). This table has 9 columns and as many rows as you have defined data stores (RDMs will not show up here).
VPX_EVENT[/b] Used to store all events as a result of tasks or alarms in VC (event type, date/time, VM name, username, category, hostname), this table is typically large and has 15 columns and usually a large amount of rows but is generally small in megabytes, 50,000 rows will equal approximately 9MB.
VPX_EVENT_ARG[/b] This corresponds to the VPX_EVENT table and contains event ids, argument types & data and miscellaneous IDs. This table is usually pretty large and contains the text of the events from the VPX_EVENT table. This table has 14 columns and usually has more records then the VPX_EVENT table, 150,000 records will generally use about 20MB of space.
VPX_FIELD_VAL[/b] Used to store custom attribute values that are displayed in the VI client. This table has 3 columns, field id (corresponds to VPX_FIELD_DEF table), entity id (corresponds to VPX_ENTITY table) and value. It will have as many rows as you have values for custom attributes.
VPX_GUEST_DISK[/b] Used to store disk space information for VMs. This table has 4 columns, vm id (corresponds to VPX_VM table), path (drive letter), disk capacity and free space. It will have a row for each drive partition that a VM has configured.
VPX_GUEST_IP_ADDRESS[/b] Used to store IP address information for VMs. This table has 3 columns, vm id (corresponds to VPX_VM table), device id (usually 4000, increments if you have more then one IP address), and ip address. It will have a row for each ip address that a VM has configured.
VPX_GUEST_NET_ADAPTER[/b] Used to store the VM network name for each network adapter, these correspond to the network names in each vswitch configuration. This table has 5 columns vm id (corresponds to VPX_VM table), device id (usually 4000, increments if you have more then one IP address), mac address, is connected and network name. It will have a row for each network adapter that a VM has configured.
VPX_HIST_STAT[/b] Used to store historical performance statistics that are collected by VirtualCenter. This is the biggest table and only has 5 columns (sample id, stat id, entity id, device id and stat value) but can have millions of rows in it.
VPX_HOST_CPU[/b] Used to store ESX server host CPU information. This table has 6 columns (host id, cpu index, hertz, bus hertz, cpu description and cpu vendor) and will have a row for each CPU (not core) that a ESX server has in it.
VPX_HOST_CPUID_FEATURE[/b] Used to store ESX server host CPU Identification masks. This table has 7 columns (host id, feature level, feature vendor, EAX mask, EBX mask, ECX mask, EDX mask) and usually has 5 rows for each ESX server that you have regardless of the number of CPUs in each host.
VPX_HOST_CPU_CPUID_FEATURE[/b] Similar to the previous table. This table has 8 columns (same as above plus a cpu index column) and usually has 10 rows per ESX server that you have regardless of the number of CPUs in each host.
VPX_HOST_CPU_THREAD[/b] Used to store ESX server host CPU thread information. This table has 3 columns (host id, cpu index, thread id) and a row for each CPU core in each ESX server. If you had a dual-core server there would be 4 rows with thread IDs of 0,1,2 and 3.
VPX_HOST_NODE[/b] Used to store ESX server host memory information. This table has 4 columns (host id, numa id, mem range begin, mem range length) and a row for each CPU (not core) that a ESX server has in it.
VPX_HOST_PCI_DEVICE[/b] Used to store ESX server host hardware information. This table has 12 columns (host id, pci id, class id, bus, slot, pci function, vendor id, sub vendor id, vendor name, device id, sub device id, device name) and a row for each hardware device in each ESX server (approx. 32). This includes NIC, FC cards, processors, video cards, USB, etc.
VPX_HOST_VM_CONFIG_OPTION[/b] Unsure what this is used for, as the name implies it has something to do with VM configs. This table has 4 columns (host id, config option ver, data, array index) and 2 rows for each ESX server.
VPX_LICENSE[/b] Used to store ESX server license information. This table has 3 columns (serial number, serial key, type) amd appears to not be used if you use a License Server in your environment. Possible used if you use host based licenses instead of a license server.
VPX_NETWORK[/b] Used to store ESX server host network name configurations, these correspond to the network names in each vswitch configuration. This table has 3 columns (id, name, data center id) and a row for each unique network name configuration.
VPX_PRIV_ROLE[/b] Used to store all VirtualCenter privileges (ie. VirtualMachine.Interact.PowerOn, ScheduledTask.Run) that can be assigned to users and groups. This table has 2 columns (privilege name, role id) and as many rows as there are unique privileges (approximately 293).
VPX_RESOURCE_POOL[/b] Used to store Resource Pool information. This table has 13 columns (id, config spec, allocated cpu, allocated vm cpu, allocated mem, allocated vm mem, available pool mem, available vm mem, current cpu, current mem, overall status) and as many rows as you have resource pools defined.
VPX_ROLE[/b] Used to store all VirtualCenter Role information (ie. VirtualMachineAdministrator, VirtualMachineUser). This table has 2 columns (id, name) and will have as many rows as roles that are defined.
VPX_SAMPLE[/b] Used to store historical statistic sample times and intervals, corresponds with VPX_HIST_STAT table. This table has 3 columns (id, sample time, sample interval) and usually a large amount of rows (although now where near as large as VPX_HIST_STAT).
VPX_SCHED_ACTION[/b] Used to store scheduled task action information in VirtualCenter. This table has 3 columns (scheduled task id, action type, action data) and will have as many rows as scheduled tasks that are defined.
VPX_SNAPSHOT[/b] Used to store snapshot information for VMs. This table has 11 columns (id, host snapshot id, vm id, snapshot name, snapshot desc, create time, power state, is quiesced, parent snapshot id, is current snapshot, config) and will have a row for as many snapshots that exist for your VMs.
VPX_STAT_CONFIG[/b] Used to store statistic collection intervals in VirtualCenter. This table has 3 columns (length, name, sample interval) and as many rows as you have collection intervals defined (usually 4).
VPX_ TASK[/b] Used to store all task information (ie. VM Power On, VM Re-configure, Alarm Create) that occurs in VirtualCenter. This table has 27 columns and as many rows as tasks that have occurred in VirtualCenter (can be thousands).
VPX_VM[/b] Used to store virtual machine configuration information (ie. vm id, path to vmx file, uuid, os, state, memory, resource group, ip address, vmware tools version, etc.) read from the ESX server is is hosted on. This table has 41 columns and as many rows as VMs that are managed by VirtualCenter.
Sample SQL code to query Disk Space info from all VMs (in gigabytes):
o select b.name, path, round(capacity/1073741824,2) "Total", round(free_space/1073741824 ,2) "Free" from vpx_guest_disk a, vpx_entity b where a.vm_id = b.id order by b.name
Sample SQL code to display Up Time info from all VMs:
o select b.name, (to_char(sysdate, 'J') - to_char(boot_time, 'J')) "Up Days" from vpx_vm a, vpx_entity b where a.id = b.id order by b.name
The size of the Statistics database (VPX_HIST_STAT) will vary based on the number of hosts & VMs managed, frequency of performance data collection, collection level and type of database. Each stat sample collected is about 60 bytes for SQL, 100 bytes for Oracle, and each event stored is 1600 bytes for SQL, 600 bytes for Oracle.
Using default settings, the statistical data for 25 hosts running 8-16 VMs per host will plateau around 40-60 MB in a year (80-140 MB if set to full). Each month, the average number of events generated will also consume about 190 MB in SQL, and 70 MB in Oracle. Total DB size after a year is expected to be around 2.20 Gb in SQL, and 1.0 Gb in Oracle.
Using default settings, the statistical data for 75 hosts running 8-16 VMs per host will plateau around 90-150 MB in a year (200-330 MB if set to full). Each month, the average number of events generated will also consume about 190 MB in SQL, and 70 MB in Oracle. Total DB size after a year is expected to be around 2.40 Gb in SQL, and 1.2 Gb in Oracle.
The statistic sample frequency and retention is specified in the VI client. Select Administration, VirtualCenter Management Server Configuration from the top menu then select Statistics. The default settings are:
o Past Day 5 minutes per sample, 288 total samples
o Past Week 15 minutes per sample, 672 total samples
o Past Month - 60 minutes per sample, 720 total samples
o Past Year 1440 minutes per sample, 365 total samples
For example if you wanted to collect less statistical data you could change the settings to something like below:
o Past Day 15 minutes per sample, 96 total samples
o Past Week 60 minutes per sample, 168 total samples
o Past Month - 120 minutes per sample, 360 total samples
o Past Year 1440 minutes per sample, 365 total samples
Or if you wanted to collect more statistical data you could change the settings to something like below (caution your database will be very large):
o Past Day 1 minutes per sample, 1440 total samples
o Past Week 5 minutes per sample, 2016 total samples
o Past Month - 30 minutes per sample, 1440 total samples
o Past Year 60 minutes per sample, 8760 total samples
You can also delete any ones you do not want, for example if you only want a weeks worth of data then delete the Past Month and Past Year. Once the total samples number has been reached the older samples are automatically purged from the database. If you modify these values you must follow these rules.
o The new sampling period must be a multiple of a previous one. In the defaults youll notice that week (15) is a multiple of day (5), month (60) is a multiple of week(15) and year (1440) is a multiple of month (60).
o The new sampling length must be longer than a previous one. Week (15) is longer then day (5), etc.
o The user-specified name of the historical interval must be unique. (In case you add new ones)
o Note: Existing data is reset (lost) when you change the interval configuration. However, only the data for that interval is reset. For example, if you change only the weekly time interval, the daily and monthly data are retained.
You can also specify the amount of statistical detail that is captured by setting the Statistic Collection Level. The default for this setting is Level 1 which includes basic metrics, it can be changed anywhere up to Level 4 which provides the most statistical detail but can significantly increase the size of the database and can cause additional performance overhead. You can also change the number of threads that are used for collecting performance statistics from managed hosts. Use multiple threads to increase the number of managed hosts and take advantage of the additional processor.
VirtualCenter does not support doing this directly, the only way you can do this by deleting directly from the database tables. It is strongly recommend to have a functional backup of the VC database before clearing these tables in case of an unforeseen problem. Also the the VC service must be stopped in order to avoid data corruption. This is not a supported procedure by Vmware but should be in my opinion, these tables can get pretty large in big environments as every single task and event is preserved in the database.
The tables where the data is stored are VPX_TASK (Tasks) and VPX_EVENT, VPX_EVENT_ARG (Events). The VPX_EVENT_ARG will usually have more rows then the VPX_EVENT table. This is because the VPX_EVENT table only has one row per event but the VPX_EVENT_ARG table can have multiple rows per event (The link between the 2 tables is the EVENT_ID column).
o Shutdown the VirtualCenter service
o Connect to the database server that is hosting the VC database with a SQL browser/client. You can use a free tool like WinSQL Lite (http://www.snapfiles.com/get/winsql.html) for this.
o To delete all data in the tables type: delete from VPX_TASK then delete from VPX_EVENT and then delete from VPX_EVENT_ARG and finally Commit
o Optionally you can do truncate table VPX_TASK then truncate table VPX_EVENT and then truncate table VPX_EVENT_ARG (Truncate is faster and does not use as much undo space as delete)
o To selectively delete data older then 30 days: delete from VPX_TASK where complete_time < sysdate - 30)
o To selectively delete data older then 30 days from VPX_EVENT and VPX_EVENT_ARG is trickier because VPX_EVENT_ARG does not have a date field and it tied to the events in VPX_EVENT by the Event_id field.
o First delete from the child table (VPX_EVENT_ARG): delete from vpx_event_arg where event_id in (select a.event_id from vpx_event_arg a, vpx_event b where a.event_id = b.event_id and create_time < sysdate - 30)
o Next commit the delete by typing Commit
o Then delete from the parent table (VPX_EVENT): delete from vpx_event where create_time < sysdate 30
o Finally commit again by typing Commit
o Start the VirtualCenter service
The Disk Space Info query gave me some errors. Keep in mind I am not a SQL guy...
The "o" at the beginning gave a Could not find stored procedure error.
Also did not like the rounding functions. Gave a The conversion of the varchar value '6432133120' overflowed an int column. Maximum integer value exceeded error.
I ended up modifying the query to:
select b.name, path, capacity "Total", free_space "Free" from vpx_guest_disk a, vpx_entity b where a.vm_id = b.id order by b.name
which worked but returned the raw size values. No big deal.
Also, it only seemed to return information about running VMs.
Thanks anyway, it still took me a long way.
For capacity planing purposes I want to populate an Access database with some of the info in the VC database. This is to allow reservations for planned VM's to be made in our ESX environment without touching the VC database.
A possilble enhancment for a future releace of VC would be to include some capacity management functionality, taking feedback from ESX performance stats and allowing possible.planned VM's to be 'allocated' - not created but 'reserved'. Does anyone else have ideas on this topic?
What I'm trying to do is write sql to query our VC Oracle database
For each VM i want to list the VM hostname, esx host , each vmkd filename and it's lun location and size.
I can't seem to find the table that stores a VM's vmdk filename and size - perhaps this is not stored in the VC database (being stored in the VM's .vmx file).
This Sql code may be useful for anyone else attempting to do a similar task. This is my sql so far - seems to work for Windows VM's , but the VM_Capacity and VM_Free_Space figures for Linux VM's are not accurate.
SELECT vpxv_vms.vmid, vpxv_vms.NAME, vpxv_vms.hostid,
vpxv_vm_datastore.ds_id, vpx_datastore.NAME, vpx_datastore.capacity / 1024000000 as Lun_Capacity,
vpx_datastore.free_space / 1024000000 as Lun_Free_Space, vpxv_hosts.NAME, vpx_guest_disk.PATH,
vpx_guest_disk.capacity / 1024000000 as VM_Capacity, vpx_guest_disk.free_space / 1024000000 as VM_Free_Space
WHERE ( (vpxv_vms.vmid = vpxv_vm_datastore.vm_id)
AND (vpxv_vm_datastore.ds_id = vpx_datastore.ID)
AND (vpxv_hosts.hostid = vpxv_vms.hostid)
AND (vpx_guest_disk.vm_id = vpxv_vms.vmid)
ORDER BY vpxv_vms.NAME ASC , path ASC
How can I get vmdk size assigned? the propose of that is for charge back. A VM can be allocated for 20GB vmdk, but only 16GB for c$.
vpx_guest_disk looks like the information from guest OS level.
if I try to execute the command:
delete from vpx_event_arg where event_id in (select a.event_id from vpx_event_arg a, vpx_event b where a.event_id = b.event_id and create_time < sysdate - 30)
I allways get:
Invalid column name 'sysdate'.
What's the problem?
You need to talk to a SQL admin, you don't need Access.
SQL can handle millions of records from hundreds of thousands of concurrent users, and the VC database is nothing more than a table in a database. A SQL admin can help you copy the table (VC database) to another table and you can manipulate the info on the same SQL database, no need to make a copy. If they know what they are doing, they can even link it (or help) to connect FROM access so you can get this info anytime you want.
There isn't a problem using Access, but it would make things easier if you can have live links to the data you want to copy, just don't change the original table and you are fine.
This is great information. I'm working on a Web interface so my manager and his manager won't have to ask me everytime they want some information about our VMware environment. The pages are in asp. I have most of it completed, the part on the statistical history will be very useful.
Great post. We have an issue where our Oracle database is growing at a rapid rate. We've had Vcenter installed and configured for a little over a month now. Our database has already exceeded 11Gb which is way out of line with the estimates given in VCenter statistics. I'm thinking some of this may be due to us gaving verbose logging turned on in the beginning when we were having some troubles getting it configured. I'm having our Oracle DBA look at the tables and purge any event and task data to see if that helps.
I'm still concerned there is something else going on as we have changed the logging options yet our tablespace is still growing. We have 4 ESX hosts and 35 VMs (small environment).
Does anybody have any ideas as to what might be causing our problem?
Thanks, in advance.
Does anybody have similar information specific to VC 2.5? It seems that there have been some major changes with regard to VM performance metrics between 2.0.x and 2.5.
Thanks in advance!
Even though it's been 3 years since this was posted I thought I'd add something in case it helps out anyone. It turns out that "sysdate" is an Oracle function, so if you're trying to do this in SQL it won't work. The SQL function is "getdate()" So, the command would turn out to look like this: delete from VPX_TASK where complete_time < (getdate() - 30)
Also, I came upon this page because my vCenter databased reached the max 4GB size for SQL 2005 Express. After finding this article, http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102591..., and going through the process to run the SQL script that is supposed to delete old data as well as shrinking the database, my database size remained at 4GB. I think after running that script and shrinking the database I freed up 20mb or something. Even after I set up the script to delete all old data that was more than a day old it didn't shrink the database size from 4GB.
So, I started looking at the size of the tables and noticed that the VPX_TASK table was 2GB. Then I found this article and ended up running the above command that I posted and then shrinking the database. Now my database is around 2GB and functioning. Thanks again!!
I know it's been like 3 years since this post, but for the overflow issue, you can just cast to a different type. I just used float cause it was the first thing that came to mind:
Thanks for this useful post.
I'm not a vmware admin, I'm responsible for reporting and now I have to figure out a few things
My first question is to get a list of all vms, running at this environment.
I have resolved it with this query
VPXV_HOSTS on vpxv_vms.hostid = vpxv_hosts.hostid
VPXV_VMS.IS_TEMPLATE = 0
My second question is a little bit more tricky, because I didn't found an attribute which can tell me,
which host or vm belongs to a specific cluster.
I found the cluster name in the name attributein the view vpxv_entity_vmop_count, but I didn't found a connector to the vm or the host!?
Do you have a clue how I can connect the vms or host to the cluster ?