I did some research on this to better understand how VC uses it database. Below is the results of this if anyone is interested:
What data is stored in the VirtualCenter database?[/b] As of version 2.0.1 of VirtualCenter the database consists of the following tables. The database mainly consists of alarm/event data, HA/DRS data, ESX host information, task/scheduled tasks and VM information. All ESX server and VM configuration data is stored on each ESX server and is simply read and displayed by VC. VC is just a central management front end to set configuration information and displays information that is read from all ESX hosts instead of having to manage each server individually. You can also use the VIC to connect directly to the ESX servers without VirtualCenter and modify the same configuration data. Once you add a ESX host back into VC it reads all the configuration info from that host.
The database is not critical to the operation of ESX servers or their virtual machines, they would continue to function normally if VC or its database were unavailable (Except for DRS and vMotion which would not work, HA would still work). If the database were to crash and a new one created you could add your ESX servers back in and it would repopulate the configuration information. The only data unique to the database is performance statistics, alarms, events, tasks, resource pools and custom attributes. This is not official documentation and is based on my browsing the database schema and data with a SQL client. Using a SQL client like Toad (Quest Software) or WinSQL Lite (Free) you can browse the data contained in these tables, you can also write SQL code to query information contained in the tables. Below is some sample SQL code to query information on VMs and hosts. For information on the views defined in VirtualCenter see this white paper: http://www.vmware.com/pdf/vc_dbviews_2x.pdf
VPX_ALARM_RUNTIME[/b] Used to store alarm events for each host or VM that is covered by a defined alarm, this table has 7 columns and as many rows as hosts/VMs that have alarms configured. For example if you had 10 VMs and each had 3 alarms assigned to them you would have 30 rows.
VPX_COMPUTE_RESOURCE_DAS_VM[/b] Used to store resource information for HA (priority, # of failure, retry period, power off/on isolation). This table has of 6 columns and as many rows as you have VMs that are part of HA.
VPX_DATASTORE[/b] Used to store VMFS/NAS datastore information (storage URL, capacity, free space, type). This table has 9 columns and as many rows as you have defined data stores (RDMs will not show up here).
VPX_EVENT[/b] Used to store all events as a result of tasks or alarms in VC (event type, date/time, VM name, username, category, hostname), this table is typically large and has 15 columns and usually a large amount of rows but is generally small in megabytes, 50,000 rows will equal approximately 9MB.
VPX_EVENT_ARG[/b] This corresponds to the VPX_EVENT table and contains event ids, argument types & data and miscellaneous IDs. This table is usually pretty large and contains the text of the events from the VPX_EVENT table. This table has 14 columns and usually has more records then the VPX_EVENT table, 150,000 records will generally use about 20MB of space.
VPX_FIELD_VAL[/b] Used to store custom attribute values that are displayed in the VI client. This table has 3 columns, field id (corresponds to VPX_FIELD_DEF table), entity id (corresponds to VPX_ENTITY table) and value. It will have as many rows as you have values for custom attributes.
VPX_GUEST_DISK[/b] Used to store disk space information for VMs. This table has 4 columns, vm id (corresponds to VPX_VM table), path (drive letter), disk capacity and free space. It will have a row for each drive partition that a VM has configured.
VPX_GUEST_IP_ADDRESS[/b] Used to store IP address information for VMs. This table has 3 columns, vm id (corresponds to VPX_VM table), device id (usually 4000, increments if you have more then one IP address), and ip address. It will have a row for each ip address that a VM has configured.
VPX_GUEST_NET_ADAPTER[/b] Used to store the VM network name for each network adapter, these correspond to the network names in each vswitch configuration. This table has 5 columns vm id (corresponds to VPX_VM table), device id (usually 4000, increments if you have more then one IP address), mac address, is connected and network name. It will have a row for each network adapter that a VM has configured.
VPX_HIST_STAT[/b] Used to store historical performance statistics that are collected by VirtualCenter. This is the biggest table and only has 5 columns (sample id, stat id, entity id, device id and stat value) but can have millions of rows in it.
VPX_HOST_CPU[/b] Used to store ESX server host CPU information. This table has 6 columns (host id, cpu index, hertz, bus hertz, cpu description and cpu vendor) and will have a row for each CPU (not core) that a ESX server has in it.
VPX_HOST_CPUID_FEATURE[/b] Used to store ESX server host CPU Identification masks. This table has 7 columns (host id, feature level, feature vendor, EAX mask, EBX mask, ECX mask, EDX mask) and usually has 5 rows for each ESX server that you have regardless of the number of CPUs in each host.
VPX_HOST_CPU_CPUID_FEATURE[/b] Similar to the previous table. This table has 8 columns (same as above plus a cpu index column) and usually has 10 rows per ESX server that you have regardless of the number of CPUs in each host.
VPX_HOST_CPU_THREAD[/b] Used to store ESX server host CPU thread information. This table has 3 columns (host id, cpu index, thread id) and a row for each CPU core in each ESX server. If you had a dual-core server there would be 4 rows with thread IDs of 0,1,2 and 3.
VPX_HOST_NODE[/b] Used to store ESX server host memory information. This table has 4 columns (host id, numa id, mem range begin, mem range length) and a row for each CPU (not core) that a ESX server has in it.
VPX_HOST_PCI_DEVICE[/b] Used to store ESX server host hardware information. This table has 12 columns (host id, pci id, class id, bus, slot, pci function, vendor id, sub vendor id, vendor name, device id, sub device id, device name) and a row for each hardware device in each ESX server (approx. 32). This includes NIC, FC cards, processors, video cards, USB, etc.
VPX_HOST_VM_CONFIG_OPTION[/b] Unsure what this is used for, as the name implies it has something to do with VM configs. This table has 4 columns (host id, config option ver, data, array index) and 2 rows for each ESX server.
VPX_LICENSE[/b] Used to store ESX server license information. This table has 3 columns (serial number, serial key, type) amd appears to not be used if you use a License Server in your environment. Possible used if you use host based licenses instead of a license server.
VPX_NETWORK[/b] Used to store ESX server host network name configurations, these correspond to the network names in each vswitch configuration. This table has 3 columns (id, name, data center id) and a row for each unique network name configuration.
VPX_PRIV_ROLE[/b] Used to store all VirtualCenter privileges (ie. VirtualMachine.Interact.PowerOn, ScheduledTask.Run) that can be assigned to users and groups. This table has 2 columns (privilege name, role id) and as many rows as there are unique privileges (approximately 293).
VPX_RESOURCE_POOL[/b] Used to store Resource Pool information. This table has 13 columns (id, config spec, allocated cpu, allocated vm cpu, allocated mem, allocated vm mem, available pool mem, available vm mem, current cpu, current mem, overall status) and as many rows as you have resource pools defined.
VPX_ROLE[/b] Used to store all VirtualCenter Role information (ie. VirtualMachineAdministrator, VirtualMachineUser). This table has 2 columns (id, name) and will have as many rows as roles that are defined.
VPX_SAMPLE[/b] Used to store historical statistic sample times and intervals, corresponds with VPX_HIST_STAT table. This table has 3 columns (id, sample time, sample interval) and usually a large amount of rows (although now where near as large as VPX_HIST_STAT).
VPX_SCHED_ACTION[/b] Used to store scheduled task action information in VirtualCenter. This table has 3 columns (scheduled task id, action type, action data) and will have as many rows as scheduled tasks that are defined.
VPX_SNAPSHOT[/b] Used to store snapshot information for VMs. This table has 11 columns (id, host snapshot id, vm id, snapshot name, snapshot desc, create time, power state, is quiesced, parent snapshot id, is current snapshot, config) and will have a row for as many snapshots that exist for your VMs.
VPX_STAT_CONFIG[/b] Used to store statistic collection intervals in VirtualCenter. This table has 3 columns (length, name, sample interval) and as many rows as you have collection intervals defined (usually 4).
VPX_ TASK[/b] Used to store all task information (ie. VM Power On, VM Re-configure, Alarm Create) that occurs in VirtualCenter. This table has 27 columns and as many rows as tasks that have occurred in VirtualCenter (can be thousands).
VPX_VM[/b] Used to store virtual machine configuration information (ie. vm id, path to vmx file, uuid, os, state, memory, resource group, ip address, vmware tools version, etc.) read from the ESX server is is hosted on. This table has 41 columns and as many rows as VMs that are managed by VirtualCenter.
Sample SQL code to query Disk Space info from all VMs (in gigabytes):
o select b.name, path, round(capacity/1073741824,2) "Total", round(free_space/1073741824 ,2) "Free" from vpx_guest_disk a, vpx_entity b where a.vm_id = b.id order by b.name
Sample SQL code to display Up Time info from all VMs:
o select b.name, (to_char(sysdate, 'J') - to_char(boot_time, 'J')) "Up Days" from vpx_vm a, vpx_entity b where a.id = b.id order by b.name
The size of the Statistics database (VPX_HIST_STAT) will vary based on the number of hosts & VMs managed, frequency of performance data collection, collection level and type of database. Each stat sample collected is about 60 bytes for SQL, 100 bytes for Oracle, and each event stored is 1600 bytes for SQL, 600 bytes for Oracle.
Using default settings, the statistical data for 25 hosts running 8-16 VMs per host will plateau around 40-60 MB in a year (80-140 MB if set to full). Each month, the average number of events generated will also consume about 190 MB in SQL, and 70 MB in Oracle. Total DB size after a year is expected to be around 2.20 Gb in SQL, and 1.0 Gb in Oracle.
Using default settings, the statistical data for 75 hosts running 8-16 VMs per host will plateau around 90-150 MB in a year (200-330 MB if set to full). Each month, the average number of events generated will also consume about 190 MB in SQL, and 70 MB in Oracle. Total DB size after a year is expected to be around 2.40 Gb in SQL, and 1.2 Gb in Oracle.
The statistic sample frequency and retention is specified in the VI client. Select Administration, VirtualCenter Management Server Configuration from the top menu then select Statistics. The default settings are:
o Past Day 5 minutes per sample, 288 total samples
o Past Week 15 minutes per sample, 672 total samples
o Past Month - 60 minutes per sample, 720 total samples
o Past Year 1440 minutes per sample, 365 total samples
For example if you wanted to collect less statistical data you could change the settings to something like below:
o Past Day 15 minutes per sample, 96 total samples
o Past Week 60 minutes per sample, 168 total samples
o Past Month - 120 minutes per sample, 360 total samples
o Past Year 1440 minutes per sample, 365 total samples
Or if you wanted to collect more statistical data you could change the settings to something like below (caution your database will be very large):
o Past Day 1 minutes per sample, 1440 total samples
o Past Week 5 minutes per sample, 2016 total samples
o Past Month - 30 minutes per sample, 1440 total samples
o Past Year 60 minutes per sample, 8760 total samples
You can also delete any ones you do not want, for example if you only want a weeks worth of data then delete the Past Month and Past Year. Once the total samples number has been reached the older samples are automatically purged from the database. If you modify these values you must follow these rules.
o The new sampling period must be a multiple of a previous one. In the defaults youll notice that week (15) is a multiple of day (5), month (60) is a multiple of week(15) and year (1440) is a multiple of month (60).
o The new sampling length must be longer than a previous one. Week (15) is longer then day (5), etc.
o The user-specified name of the historical interval must be unique. (In case you add new ones)
o Note: Existing data is reset (lost) when you change the interval configuration. However, only the data for that interval is reset. For example, if you change only the weekly time interval, the daily and monthly data are retained.
You can also specify the amount of statistical detail that is captured by setting the Statistic Collection Level. The default for this setting is Level 1 which includes basic metrics, it can be changed anywhere up to Level 4 which provides the most statistical detail but can significantly increase the size of the database and can cause additional performance overhead. You can also change the number of threads that are used for collecting performance statistics from managed hosts. Use multiple threads to increase the number of managed hosts and take advantage of the additional processor.
VirtualCenter does not support doing this directly, the only way you can do this by deleting directly from the database tables. It is strongly recommend to have a functional backup of the VC database before clearing these tables in case of an unforeseen problem. Also the the VC service must be stopped in order to avoid data corruption. This is not a supported procedure by Vmware but should be in my opinion, these tables can get pretty large in big environments as every single task and event is preserved in the database.
The tables where the data is stored are VPX_TASK (Tasks) and VPX_EVENT, VPX_EVENT_ARG (Events). The VPX_EVENT_ARG will usually have more rows then the VPX_EVENT table. This is because the VPX_EVENT table only has one row per event but the VPX_EVENT_ARG table can have multiple rows per event (The link between the 2 tables is the EVENT_ID column).
o Shutdown the VirtualCenter service
o Connect to the database server that is hosting the VC database with a SQL browser/client. You can use a free tool like WinSQL Lite (http://www.snapfiles.com/get/winsql.html) for this.
o To delete all data in the tables type: delete from VPX_TASK then delete from VPX_EVENT and then delete from VPX_EVENT_ARG and finally Commit
o Optionally you can do truncate table VPX_TASK then truncate table VPX_EVENT and then truncate table VPX_EVENT_ARG (Truncate is faster and does not use as much undo space as delete)
o To selectively delete data older then 30 days: delete from VPX_TASK where complete_time < sysdate - 30)
o To selectively delete data older then 30 days from VPX_EVENT and VPX_EVENT_ARG is trickier because VPX_EVENT_ARG does not have a date field and it tied to the events in VPX_EVENT by the Event_id field.
o First delete from the child table (VPX_EVENT_ARG): delete from vpx_event_arg where event_id in (select a.event_id from vpx_event_arg a, vpx_event b where a.event_id = b.event_id and create_time < sysdate - 30)
o Next commit the delete by typing Commit
o Then delete from the parent table (VPX_EVENT): delete from vpx_event where create_time < sysdate 30
o Finally commit again by typing Commit
o Start the VirtualCenter service