VMware Cloud Community
TopHatProductio
Hot Shot
Hot Shot

New Server Project

Hello! It's been a while since I last posted here with my own topic. I now have a dedicated ESXi server in the works. The server project is meant to replace (and exceed) my previous workstation - a Dell Precision T7500. Here are the specs for the hardware:

 

HPE ProLiant DL580 G7

 

 

    OS   :: VMware ESXi 6.5u3 Enterprise Plus
    CPU  :: 4x Intel Xeon E7-8870's (10c/20t each; 40c/80t total)
    RAM  :: 256GB (64x4GB) PC3-10600R DDR3-1333 ECC
    PCIe :: 1x HP 512843-001/591196-001 System I/O board + 
                1x HP 588137-B21; 591205-001/591204-001 PCIe Riser board
    GPU  :: 1x nVIDIA GeForce GTX Titan Xp +
                1x AMD FirePro S9300 x2 (2x "AMD Radeon Fury X's")
    SFX  :: 1x Creative Sound Blaster Audigy Rx
    NIC  :: 1x HPE NC524SFP (489892-B21) +
                2x Silicom PE310G4SPI9L-XR-CX3's
    STR  :: 1x HP Smart Array P410i Controller (integrated) +
                1x HGST HUSMM8040ASS200 MLC 400GB SSD (ESXi, vCenter Appliance, ISOs) + 
                4x HP 507127-B21 300GB HDDs (ESXi guest datastores) +
                1x Western Digital WD Blue 3D NAND 500GB SSD + 
                1x Intel 320 Series SSDSA2CW600G3 600GB SSD +
                1x Seagate Video ST500VT003 500GB HDD
    STR  :: 1x LSI SAS 9201-16e HBA SAS card +
                1x Mini-SAS SFF-8088 cable + 
                        1x Dell EMC KTN-STL3 (15x 3.5in HDD enclosure) + 
                                4x HITACHI Ultrastar HUH728080AL4205 8TB HDDs +
                                4x IBM Storewise XIV v7000 98Y3241 4TB HDDs
    I/O  :: 1x Inateck KU8212 (USB 3.2) +
                1x Logitech K845 (Cherry MX Blue) +
                1x Dell MS819 Wired Mouse
            1x Sonnet Allegro USB3-PRO-4P10-E (USB 3.X) +
                1x LG WH16NS40 BD-RE ODD
    PRP  :: 1x Samsung ViewFinity S70A UHD 32" (S32A700)
            1x Sony Optiarc BluRay drive
    PSU  :: 4x HP 1200W PSUs (441830-001/438203-001)

 

 


The details for the ProLiant DL380 Gen9 will appear here once data migration is complete. VMware Horizon (VDI) will have to wait for a future phase (if implemented at all). The current state of self-hosted VDI is Windows-centric, with second-hand support for Linux and no proper support for macOS.

The planned software/VM configurations have been moved back to the LTT post, and will be changing often for the foreseeable future.

Product links and details can be found here.

 

ESXi itself is usually run from a USB thumb drive, but I have a drive dedicated to it. No harm done. A small amount of thin provisioning/overbooking (RAM only) won’t hurt. macOS and Linux would have gotten a Radeon/FirePro (ie., Rx Vega 64), for best compatibility and stability, but market forces originally prevented this. Windows 10 gets the Audigy Rx and a Titan Xp. The macOS and Linux VMs get whatever audio the Titan Z FirePro S9300 x2 can provide. The whole purpose of Nextcloud is to phase out the use of Google Drive/Photos, iCloud, Box.com, and other externally-hosted cloud services (Mega can stay though).

 

There are three other mirrors for this project, in case you're interested in following individual conversations from the other sites (in addition to this thread).

 

P.S. Out of all the sites that I've ever used, this forum has one of the best WYSIWYG editors I've used in a while Smiley Happy

Kudos to the devs!

Tags (1)
259 Replies
TopHatProductio
Hot Shot
Hot Shot

I had a small slip-up ~2 days ago (10/07/2023). The certificates for vCenter expired, before I got a chance to renew them. I was distracted with other IRL tasks, and am now unable to use vCenter until this gets resolved. This is gonna delay pretty much everything else that I'm working on.

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

vCenter troubleshooting​​​​​​​ under way...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

Finished troubleshooting one issue, and now hit with another one. After updating Keycloak (Docker Compose stack), my realm settings got nuked back to defaults. Undid the AD integration config that I toiled for days over to figure out. While it would be nice to have local MFA and IAM, I can't justify using a solution if my settings aren't safe. The only Docker volume options that I saw in the official documentation were for development purposes (not for use in production). I have other things to do, and this would only delay me further. I can't afford that. Very close to just kicking Keycloak from the server project...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

I've kicked Keycloak from the project indefinitely, and am replacing RustDesk with MeshCentral. The included AD integration is a plus.
Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

Currently working on a new OpenRC script for the Wazuh Agent. Meant to be used with Artix:

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

I will be disabling SSH on the Artix VMs, since I never use it. One less attack surface to worry about. From what I can remember, the Radeon Pro v340 requires Above 4G Decoding. That would prevent it from working on the DL580 G7, but not the DL980 and higher/newer. I will probably have to put off using that card until I move out. Managing Windows (especially on baremetal) is a pain, and I can't wait to get a Linux laptop...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

I'm considering moving the host OS target to ESXi 7.X. However, that would also require me to upgrade the vCSA. May do that in summer 2024, preemptively...

Reply
0 Kudos
Kinnison
Commander
Commander

Hello,


Just out of curiosity, what version of vSphere do you use? If you wait a little longer, assuming things don't change, vSphere 7.x should go into "End of General Support" status in April 2025 that now all the previous ones are already beyond the "End of Technical Guidance" status. 😁


A good weekend,
Ferdinando

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

So, the current Sonnet Allegro USB card appears to be defective (causing PSODs when passed through to VMs). I've purchased a USB3-PRO-4P10-E to replace the current USB card, hoping that this one will last a bit longer. However, the GPU situation is a strange one. I've wanted to use the FirePro X2 so I can share it with multiple VMs (or even for VDI). However, I haven't had the free time and/or end users to make good on that. I don't think the Radeon Pro v340 will work in the DL580 G7 (Above 4G Decode). I'm tempted to just use a Vega 64 and call it even, since I'm the only one actually using the server on a regular basis. I may just save MxGPU and SR-IOV for the DL580 Gen9, since no one has asked for VDI/remote desktop...
Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

On the current server (DL580 G7), I'm running ESXi 6.5u3 and vCenter 6.7. The DL580 Gen9 is new enough to support ESXi/vCenter 6.7, 7.0, iirc. Hoping I will be able to afford Enterprise Plus licensing when the time comes, if my current license(s) can't be transferred to the new server.

Reply
0 Kudos
rezi_jam10
Contributor
Contributor

It's been a while since I last posted here with my own topic. I'm currently in the process of setting up my dedicated ESXi server using hardware. The server project aims to replace and exceed my previous workstation, a Dell Precision T7500. The HPE ProLiant DL580 G7 from hpe, dell and NetPardaz is at the core of this endeavor, and I'm eager to leverage its capabilities for optimal performance.

ESXi is typically run from a USB thumb drive, but I've dedicated a drive for this purpose. I believe it's a solid choice that won't impact the overall performance. The thin provisioning/overbooking strategy, particularly for RAM, is also something I'm considering to optimize resource utilization.

For VM configurations, I'm exploring various options, and the hardware provides a robust foundation for this experimentation. I'm especially interested in the compatibility and stability considerations, particularly for macOS and Linux VMs, and I'm contemplating the use of Radeon/FirePro for the best performance.

As I move forward with the project, I'll be sure to share updates and insights. If you have any experience or recommendations with hardware or ESXi setups, feel free to share. I'm excited to see how this dedicated server will elevate my computing experience.

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

Careful about the macOS thing. Can't post about macOS VMs here unless you're running them on Apple hardware.
Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

Due to limitations of the DL580 G7, no other VMs or containers requiring/preferring AVX and/or GPU passthrough will be added during this project phase. That includes (but is not limited to):

  • waydroid
  • redroid
  • blissos
  • f@h
  • self-hosted LLMs

These are all in the same boat as VMware Horizon (VDI) now, and will be re-evaluated once the DL580 Gen9 is the VM host. The last major hardware upgrade/change for this phase will be the GPU to (potentially) replace the FirePro S9300 x2.

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

At the beginning of the month (12/01), I started researching how to configure MDM in ManageEngine Endpoint Central. I was also considering adding Overseerr as a Docker/Podman container. However, this month was not a calm one...

Changelog:

  • [12/06] I updated the graphics drivers for the Windows 10 VM (Titan Xp).
  • [12/07] The Windows 10 VM had its first BSOD in almost half a year. Haven't run outside of stock values lately due to previous BSODs, but uninstalled MSI Afterburner to be safe.
  • [12/08] The Windows 10 VM ran drive error repairs -- probably caused by the recent BSODs.
  • [12/09] The primary display is now likely to change to the VMware display adapter if I open it, so I now must avoid opening it. I (re-)learned [Win] + [Ctrl] + [Shift] + B .
  • [12/10] Updated Nextcloud to 26.0.9, and planning a (delayed due to BSOD troubleshooting) move to 27 later this month.
  • [12/11] The Windows 10 VM had another BSOD. I couldn't run the DISM command. After checking Event Viewer and Device Manager logs, I decided to DDU and rollback graphics drivers.
  • [12/12] I observed the Windows 10 VM, to make sure the BSODs stopped. The 2nd Radeon Pro v340 arrived by this time.
  • [12/13] Internet connectivity for the entire server rack dropped (~01:20 EST) -- but there've been no network configuration changes since May. Had to re-configure the MikroTik Audience.
  • [12/14] Attended an installation demo/evaluation for ManageEngine Log360, to compare to Wazuh XDR. Installation is incomplete, and will be continued next week.


Still need to evaluate whether to move to Windows 11 on the Windows 10 VM, whether to add Overseerr, and apply configuration changes suggested in Wazuh XDR...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

OSRM can run just as well in an application container as it would in a system container:

I can leave that in a Podman container now, and not be concerned about potential performance penalties.

 

I also encountered a thread yesterday, mentioning this repo:

AD CS can be made compatible with ACME clients, to allow for easier certificate renewal automation.

 

The vSphere version target for the DL580 Gen9 has been moved, from 6.7 to 7.0.

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

Still looking into solutions for using newer cards in the DL580 G7, until I can move to the DL580 Gen9. From what I've seen in documentation, I could try disabling unneeded PCIe devices to free up resources for other PCIe devices:

However, I'm not sure which ones to disable yet. I may have to open a support ticket with HPE:

That will take a while to investigate. Still need to get vBIOS for the FirePro S9300 X2, to re-test the VMX parameters.

hMailServer is no longer actively maintained. I'll be attempting a migration to Stalwart this year. But need a way to either migrate or archive and access e-mails handled and generated with the previous mail server. Currently looking into MailStore for that.

On a side note, I'm taking another shot at RADIUS with ClearBox Enterprise RADIUS server. As usual, the MikroTik Chateau isn't playing nice. Same results as last time, with TekRADIUS OD. I'm starting to wonder if I should just ditch the idea of having LTE failover in the future...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

I finished installing and configuring MailStore Server, in preparation for the move from hMailServer to Stalwart. Evaluation of ClearBox Enterprise RADIUS server has been delayed indefinitely (best candidate tested). Project:ArcZ has changed a bit more, swapping LightDM for ly. Working on releasing an ISO for a small group of testers. The ISO repair for the Windows 10 VM appears to have been successful -- no issues since completion in mid-February. Swapped the current PDU for one with more outlets, since I was running out of usable ones. Too many appliances have chunky rectangular plugs, that block adjacent outlets on the PDU. The next version of the server project has moved on from 400GB SAS SSDs to 800GB ones. It appears that running TrueNAS as a VM, in production, is no longer discouraged:

If such is the case, I may no longer need the DL380 Gen9. If I had known (late last year) that such a change-up was coming, I would not have gotten a dedicated file server. But, it's here now...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

The month of March has been very eventful. At first, I was looking into whether I should split the Windows Server VM into 2-3 different VMs instead:

During this brief period, I was also reviewing some security policy changes/software patches that were suggested in ManageEngine Endpoint Central. One of the software patches were for MariaDB, which would require me to check version compatibility with each app/service accessing it. Knowing my luck, things were bound to get complicated on day 5.

I then found multiple pages from iXsystems, stating that it's safe to virtualise TrueNAS Scale. I'd already spent money on the DL380 Gen9 for that, but I guess there's no use getting peeved about that. This simply means that I can get away with one less physical server in my rack (and less power draw), so there is a plus side to it. Most of the monetary loss is still there, but I can at least use the SSDs (and the discrete HBA) planned for it elsewhere.

On that same day, the VM for Project:ArcZ also threw warnings related to deprecated options/hooks in image build config file (initcpio). The older Artix OpenRC VM did not give the same warning. I got help from a contact on Discord, to correct the deprecated config parameters. Two days later, I was installing a service pack for Endpoint Central.

The next day, I was testing the Nextcloud Social app, and found out that I finally had to  configure .well-known/webfinger (CardDAV/CalDAV related) for the instance. I started looking into how to edit the Nextcloud container's config for it. Attempts for this concluded on the 21st. I committed changes to the .htaccess file in Nextcloud itself, and the subdomain > custom location(s) in NGINX Proxy Manager (reverse proxy). Both methods did not work, leaving me with no clear path forward. I'll have to leave self-hosting federated services for later. Five days later, I was reviewing FreePBX extension configs when I decided to buy more DID numbers to use with FreePBX.

Four days later, I was advised to move /boot/efi to its own dedicated partition (/efi) while updating GRUB on Project:ArcZ. I spent the next 2 days working on it, with help from the same Discord contact. At this point, if you couldn't tell, they're pretty amazing! Still need to write a pacman hook for auto-generating GRUB configuration whenever GRUB gets updated. I then started work on a dedicated VoIP VLAN for FreePBX the next day. Work for this concluded on the 22nd.

After that, I was applying and testing more security policy changes through Endpoint Central. On the 25th, I decided to remove the * (wildcard) user from SoftEther VPN, due to the rapid increase in reported software vulnerabilities. Now, each VPN user has to be explicitly defined with an AD-linked account. On the 26th, I started clearing out TimeShift backups on the Artix OpenRC VM (backup partition ran out of space for new backups).

This morning, the Windows Server VM reported an unexpected shutdown from the previous night -- even though I had issued the last shutdown command myself. I checked the Event Logs, and found multiple warning/error events from yesterday and today. Investigation and remediation for it is ongoing...

Reply
0 Kudos
TopHatProductio
Hot Shot
Hot Shot

After seeing a notification in the Server Manager (mentioned unexpected power event) I had the Windows Server VM perform a check-disk on next power-on and checked Event Viewer. That's where I started seeing errors and event IDs that I hadn't encountered before. I ended up doing the same on the Windows 10 VM. Here are some (not all) of the things I had to review, mask, and/or remediate in the last 24 hours:

Still more for me to take on in the coming months. Some of these started popping up after taking actions suggested in ManageEngine Endpoint Central, as security policy/configurations (like the RPC-related one). While most of the heavy-lifting in Endpoint Central is done, I now have to start doing the same in Wazuh XDR. The work never stops.

 

On a side note, I also need a Redis replacement for the Nextcloud instance...

Reply
0 Kudos