Could someone please help with the DHCP/PXE configuration issues. I have to same problem with the PXE client not receiving the DHCP Offer from the DHCP server. We have run a sniff on the network and determined that the DHCP server receives the request and replies with an Offer. The packets never get back to the PXE client.
I am trying to install ESX 3.0.1 using a PXE boot to a MS DHCP/PXE server. The ESX server (Dell 1955 blade) is connected to the network via a switch port on a 6509. The DHCP/PXE server is a Windows 2003 server running in a VM on another ESX host on the same subnet and vlan (Native) as the PXE client.
The cisco port configuration of the ESX server containing the DHCP server VM is as follows:
interface GigabitEthernet3/39
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 322
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast
spanning-tree bpduguard enable
spanning-tree guard root
end
The cisco port configuration of PXE client machine is as follows:
interface GigabitEthernet3/42
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 322
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast
spanning-tree bpduguard enable
spanning-tree guard root
end
We have the Virtual Switch on the ESX host that the DHCP VM is running on, set up with a pxe vlan defined with 322 (the vlan defined as native) to allow the Virtual Switch to see the native vlan. and a mgmt (vlan 310) vlan defined to allow tagged vlan traffic to go through to our subnet.
We have 2 vNICs defined in the DHCP VM one using the pxe vlan and one using the mgmt vlan to allow RDP and other network type connections to work across the tagged vlans.
I have tried disconnecting the mgmt vlan connected vNIC to make sure there was not conflict there. (Both Nics have IP addresses on the same subnet I KNOW I KNOW If someone can tell me how to add a second vlan to the one vNIC i would appreciate that as well.) DHCP is setup to only service the vNIC using the the pxe vlan. With this setup the RDP traffic fails and the DHCP Offer is still outbound and not received by the PXE client.
If anyone can please point me to some documentation with the complete configurations needed to setup the Cisco ports and the vSwitches and vNICs or can help me with this directly I would greatly appreciated it.
Hi Michael,
You no longer seem to have the native VLAN statement set on port 3/42?
It was on in your first post?
Mind you I don't see how all the other bits that now work would work without that?
What is option 60 on MS DHCP?
My DHCP server dosn't seem to recognise that?
To award points.
Click on "helpful" when you reply to award six points
You can do this twice per thread.
Click on "correct" if it is answered to award 10 points
Dinny
A few people have reported problems like this. Sounds like a bug.
See this link
As a test are you able to put a hub between your ESX server and the 6509 ? And retry the build?
Hiya,
I saw your posts on a couple of other threads.
It sounds like your problem is happening well before the second dhcp request in the kickstart referenced in the bug check above?
Is that right?
I'm running ESX servers in two sites - one with Cisco servers running IOS and one with Cisco servers running Cat OS - not sure which hardware off hand.
I am in the irritating position where the on site IOS ones work OK and the remote site Cat OS ones fail - almost certainly with the problem in the bugcheck (thanks Mooihoek) - whereby I get to the PXE server fine - pick an build option - then kickstart fails to get an IP address and bombs out with "unable to find the relevant .img file".
Are you using the UDA app (where one of your other posts was) - or are you actually running a PXE server on your DHCP server?
I am using UDA, via a MS DHCP server with scope options 66 and 67 set to locate the UDA server.
I did initially have a problem accessing the UDA server as I was only using tagged packets and the PXE agent didn't recognise them. When I added the native vlan it worked OK.
However I have not tried to actually bind the Native VLAN to an ESX portgroup.
There is a whitepaper on it - not sure if you've seen it:
http://www.vmware.com/pdf/esx3_vlan_wp.pdf
There are a couple of commands mentioned at the bottom which might make a difference?
"vlan dot1q tag native" etc...
If you can it might be worth temporarily setting up a standalone server to host DHCP/PXE - then you will be able to pinpoint your problem a little better.
If that works OK - then it would point to issues with passing the native vlan on to the ESX port group - if that fails - then it is presumably more to do with your actual Cisco environment?
Dinny
Sorry for posting more than once realized i had posted on a thread that had been answered. and then realized i was somehow logged in to someone elses ID (Darrius) Dont know how that happened.
Anyway I have a MS DHCP server running as a VM on an ESX host, that is also the PXE server. I plan on using the MS Deployment server and BDD services to provision VM's, as well as provisioning and patching my ESX hosts.
The PXE boot works inside the vSwitch. I found that out today. I created a VM on the same ESX host as the DHCP server resides and let it PXE Boot. It found the server and PXELinux booted the PXE menu i had set up. Yes the problem is before the bug with kickstart. The PXE client never receives the offer response from the DHCP server as my first post stated (This was proven with a network sniff from both the Cisco and the vSwitch sides. The DHCP offer broadcast just disappears somewhere between the DHCP's vNIC and the Physical Cisco switch port. I believe it is because the vSwitch doesnt know that the the Native untagged VLAN 322 traffic isnt tagged and is tagging it. My knowledge of vlan tagging is limited but this appears to be the issue. At first i thought maybe the ESX host firewall was blocking the the packets but then i got my mind right and realized that the firewall only looks at packets destined for the ESX server itself.
I have read the white paper that you mentioned. The problem is I was not able to get the DHCP server to even see the DHCP request before the I set up the native VLAN. Now at least the request gets processed by the DHCP server. As for tagging Native VLAN packets wouldnt that just cause the same problem as i had in the first place?
Dinny could you send me a sample of your Cisco Port configs that you are using. This isnt i believe a complex problem just one that the solution is elusive. You mention that you had the problem with PXE and tagged packets. I set up the untagged native VLAN and bound it to the vSwitch as a means to try to fix the problem. Since you have had success at least to that point in doing this. I think your configuration could help me.
Thanks for your reponse If anyone else has anything that might help I am open to anything that you can come up with.
I also plan on trying a stand alone DHCP server. But if you could send me the configs before i start adding services to another server I could at least rule out that issue.
Hiya,
I saw your posts on a couple of other threads.
It sounds like your problem is happening well before
the second dhcp request in the kickstart referenced
in the bug check above?
Is that right?
I'm running ESX servers in two sites - one with Cisco
servers running IOS and one with Cisco servers
running Cat OS - not sure which hardware off hand.
I am in the irritating position where the on site IOS
ones work OK and the remote site Cat OS ones fail -
almost certainly with the problem in the bugcheck
(thanks Mooihoek) - whereby I get to the PXE server
fine - pick an build option - then kickstart fails to
get an IP address and bombs out with "unable to find
the relevant .img file".
Are you using the UDA app (where one of your other
posts was) - or are you actually running a PXE server
on your DHCP server?
I am using UDA, via a MS DHCP server with scope
options 66 and 67 set to locate the UDA server.
I did initially have a problem accessing the UDA
server as I was only using tagged packets and the PXE
agent didn't recognise them. When I added the native
vlan it worked OK.
However I have not tried to actually bind the Native
VLAN to an ESX portgroup.
There is a whitepaper on it - not sure if you've seen
it:
http://www.vmware.com/pdf/esx3_vlan_wp.pdf
There are a couple of commands mentioned at the
bottom which might make a difference?
"vlan dot1q tag native" etc...
If you can it might be worth temporarily setting up a
standalone server to host DHCP/PXE - then you will be
able to pinpoint your problem a little better.
If that works OK - then it would point to issues with
passing the native vlan on to the ESX port group - if
that fails - then it is presumably more to do with
your actual Cisco environment?
Dinny
Hello,
I also PXEboot/install VMs and hosts from my systems. I have a standalone DHCP server, well it is more than DHCP but it is external to the ESX server.
I would try this combination without Vlan Tagging and see if that solves the problem. If this works then I would introduce one change at a time to see where the break is. It could be the physical switch vlan or even vlan tagging in the vSwitch. I think this may be your only way of telling if it is one or the other.
I would also check the DHCP Server for DHCP related messages, it could be that the DHCP server is not responding correctly.
Just a thought. We considered having the DHCP server inside ESX, but dismissed that as everything at the site uses DHCP except for PDUs and the ESX servers themselves.
Best regards,
Edward
This is going to be my next coarse of action. I already tested the dhcp within the vSwitch on the same ESX host. I created another VM on the same ESX Host as the DHCP server and booted it via PXE with a test PXELinux loader.
I plan on setting up the second adapter on the Vcenter server within the same native vLAN and install DHCP from there. This will provide probably be our permanent solution for DHCP in both of our ESX subnets (We have a half of each blade chassis setup as internal and half serving our DMZ VM's This solution is intended to provide provision of both. We will be adding probably 5 more chassis in the next 2 years. So an automated provisioning solution is needed to meet the growing environment.
Our environment is the opposite I am using DHCP to provision the servers only. The only other DHCP in our environment is in the user subnet that the developers workstations reside in. This is a very "Server" heavy environment. In that we have more than 3 times as many servers as workstations.
Hello,
I also PXEboot/install VMs and hosts from my systems.
I have a standalone DHCP server, well it is more than
DHCP but it is external to the ESX server.
I would try this combination without Vlan Tagging and
see if that solves the problem. If this works then I
would introduce one change at a time to see where the
break is. It could be the physical switch vlan or
even vlan tagging in the vSwitch. I think this may be
your only way of telling if it is one or the other.
I would also check the DHCP Server for DHCP related
messages, it could be that the DHCP server is not
responding correctly.
Just a thought. We considered having the DHCP server
inside ESX, but dismissed that as everything at the
site uses DHCP except for PDUs and the ESX servers
themselves.
Best regards,
Edward
Hiya,
Not sure if it will help much - as I don't ever use the Native VLAN zz on the VM portgroups at all - but here is how the (working) Cisco IOS port is set up.
I use this port to PXE boot via the native VLAN zz to actually build the ESX server, once the ESX server is built I use it as my primary SC connection on vlan xx, and my backup vmotion connection on vlan yy.
i.e. I have two port groups set up on vSwitch0 - one on vlan xx for the SC and one on vlan yy for vmotion.
The zz native vlan is purely used by the PXE agent on the NIC at the initial ESX server build time. After build it is not used at all - ESX knows nothing about it.
interface GigabitEthernet pp/qq
description blah de blah.....
no ip address
no snmp trap link-status
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan zz
switchport trunk allowed vlan xx,zz,yy
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk
Dinny
TY Dinny
I have one question though. If your ESX host (the one that the UD toaster is on) doesnt have the native vlan defined in a vswitch. How is the UD toaster (sorry bad humour) VM seeing the Native vlan.
Maybe this is the part that I'm missing. I assumed that the vSwitch would drop the untagged packets since it wouldnt know what vlan to send them out on, because by default vswitches dont participate in vlan 0. I took that to mean they would not send untagged packets at all.
Am I incorrect in my understanding of that?
Sorry my ESX experience is limited so am flying by the seat of my pants here.
Hiya,
My UDA app is a VM - but that is on a completely different vlan to the native vlan that I PXE boot on, and is different again to the vlan that the SC is on.
When the PXE agent on the ESX server NIC tries to PXE boot on vlan zz - the Cisco VLAN is set up to have IP forwarders to my DHCP servers - lets say the dhcp servers are on vlan ss?
I then have a dhcp scope set up (on my dhcp servers sitting on vlan ss) for vlan zz (that I am pxe booting on)
I then use the DHCP scope option 66 to give the IP address of my UDA app.
Lets say the UDA app is on VLAN tt.
(Which happens to be a portgroup for my VMs on my ESX server - but it could just as well be a standalone server)
The process is then as follows:
The unbuilt ESX server boots on native vlan zz
An ip forwarder statement on cisco passes the dhcp request from vlan zz to one of my (standalone) DHCP servers on vlan ss
One of theDHCP servers replies with an IP address on vlan zz
Once a dhcp lease has been negotiated, the dhcp scope option 66 tells my unbuilt ESX server to go to my UDA app (on vlan tt) to get it's PXE info.
Option 67 in the dhcp scope tells it what PXE file to use.
The build then commences, using a DHCP address on vlan zz, and getting the relevant kickstart data from the UDA server on vlan tt, using http or NFS, in the usual manner.
The irritating part is that when the ESX server is built I actually want the service console to be on VLAN xx and vmotion yy.
So I then have to script setting up the relevant port groups and IP addresses in the rc.local using esxcfg and vimsh commands.
Finally I reboot - and I'm happily using the ESX server on vlan xx for the SC and vlan yy for vmotion.
I only ever use the native vlan zz on that cisco port again, if I ever need to rebuild my ESX server.
Hope that makes sense?
Must be worth a few points if it does
Dinny
Hello,
You are using DHCP only during the provisioning of an ESX Host and not during the running of an ESX host? If the first is the case, we do the same, all our ESX Servers are provisioned using DHCP and then we finally give them a static address. We then download and run a configuration script.
If it is the second, where you are using DHCP for an ESX server service console, that is not necessarily a great idea. Everytime I have tried to do that I end up with very interesting logfiles of the DHCP server, specifically, the MAC address of the ESX Server never stayed the same twice.
How is your DHCP server configured? Only to allow by specific MAC Address or a range of open ports.
Best regards,
Edward
Did you install ESX per pxe with connected san?
http://www.vmware.com/community/thread.jspa?threadID=86894&tstart=10
ok the picture is much clearer.
The answer is the DHCP/bootp forwarder. The forwarder is acting like a vlan traffic cop. It is taking the broadcast traffic and pointing it to a specific DHCP server on another vlan. It then catches the reply from that server on that vlan and rebroadcasts it on the native vlan to the PXE client. Am I correct in this interpretation.
My DHCP server is setup very simular to what you describe here the difference being my DHCP server is also my PXE server. The (66) Bootserver host name is pointing to its own IP address. This config is working within the vSwitch environment on the host that it resides on. The VM's can PXE boot to it and receive the boot loader file from the DHCP/PXE server so I know that part is working.
Your wrote "When the PXE agent on the ESX server NIC tries to PXE boot on vlan zz - the Cisco VLAN is set up to have IP forwarders to my DHCP servers"
I assume you are referring to an "IP helper-address" on the interface of your native vlan. That seems to be the most likely place to put the forwarder. But the vlan interface doesnt have an IP address to respond to the request??? So the DHCP server has no way to respond.
So if it is possible can you and I talk via IM or take this private I know I'm asking alot here. My email address is moberle@NRTwebservices.com.
Youve been more than accomodating. TYVM. I appreciate the help you've already given me. And Yes bunches of Points will be awarded!!! LOL
Hiya,
My UDA app is a VM - but that is on a completely
different vlan to the native vlan that I PXE boot on,
and is different again to the vlan that the SC is
on.
When the PXE agent on the ESX server NIC tries to PXE
boot on vlan zz - the Cisco VLAN is set up to have IP
forwarders to my DHCP servers - lets say the dhcp
servers are on vlan ss?
I then have a dhcp scope set up (on my dhcp servers
sitting on vlan ss) for vlan zz (that I am pxe
booting on)
I then use the DHCP scope option 66 to give the IP
address of my UDA app.
Lets say the UDA app is on VLAN tt.
(Which happens to be a portgroup for my VMs on my ESX
server - but it could just as well be a standalone
server)
The process is then as follows:
The unbuilt ESX server boots on native vlan zz
An ip forwarder statement on cisco passes the dhcp
request from vlan zz to one of my (standalone) DHCP
servers on vlan ss
One of theDHCP servers replies with an IP address on
vlan zz
Once a dhcp lease has been negotiated, the dhcp scope
option 66 tells my unbuilt ESX server to go to my
UDA app (on vlan tt) to get it's PXE info.
Option 67 in the dhcp scope tells it what PXE file to
use.
The build then commences, using a DHCP address on
vlan zz, and getting the relevant kickstart data from
the UDA server on vlan tt, using http or NFS, in the
usual manner.
The irritating part is that when the ESX server is
built I actually want the service console to be on
VLAN xx and vmotion yy.
So I then have to script setting up the relevant port
groups and IP addresses in the rc.local using esxcfg
and vimsh commands.
Finally I reboot - and I'm happily using the ESX
server on vlan xx for the SC and vlan yy for
vmotion.
I only ever use the native vlan zz on that cisco port
again, if I ever need to rebuild my ESX server.
Hope that makes sense?
Must be worth a few points if it does
Dinny
I am building the ESX server using DHCP only. It will have its own static address after it the build is complete.
Hi Moberle,
Dropped you an email a few hours ago...
Did it arrive?
Off home now....
Dinny
Yes it did. I was already gone from work
I have sent a reply.
Regards,
Michael
Hi Moberle,
Dropped you an email a few hours ago...
Did it arrive?
Off home now....
Dinny
Dinny
Hope things are good.
I am posting this to the forum as well.
I have finally had time to get back to this.
Have had partial success.
Set VLAN 322 as a Layer three VLAN with an IP address and IP Helper Addresses. (This of course is complicated by the fact that we have 2 6509s working as redundant routers, with GLBP set and working. And we have 1 NIC on each server attached to each Router.)
interface Vlan322 (on Router 0)
ip address 10.234.20.2 255.255.255.240
ip helper-address 10.158.121.94
glbp 5 ip 10.234.20.1
glbp 5 load-balancing host-dependent
interface Vlan322 (on Router 1)
ip address 10.234.20.3 255.255.255.240
ip helper-address 10.158.121.94
glbp 5 ip 10.234.20.1
glbp 5 load-balancing host-dependent
removed no ip forward-protocol udp TFTP to enable TFTP forwarding.
Added ip forward-protocol udp bootps to enable DHCP forwarding.
ip forward-protocol udp bootpc to enable PXE boot forwarding.
PXE Interface Cisco Port
interface GigabitEthernet3/42
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast trunk
spanning-tree bpduguard enable
spanning-tree guard root
ESX Host Port that the MS DHCP VM resides on.
interface GigabitEthernet3/39
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 322
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast
spanning-tree bpduguard enable
spanning-tree guard root
And Wallaaaa.
The NIC is Broadcasting a DHCP Discover Packet.
The VLAN Interface (IP Helper) is relaying DHCP Discover to the DHCP server.
The DHCP Server is responding with a DHCP OFFER back to the IP Helper.
The IP Helper is broadcasting the OFFER back to the NIC.
The NIC is seeing the OFFER and responding with a DHCP REQUEST.
The IP Helper is Relaying the DHCP REQUEST to the DHCP Server.
The DCHP server is responding with a DHCPACK to the IP helper.
The IP Helper is broadcasting the DHCPACK to the NIC.
And we now have an IP address and BOOTP with PXE options on the NIC.
And all is well, Well almost.
The PXE boot now attempts to TFTP the PXE boot file (pxelinux.0) from the PXE server. (The works when another VM PXE boots)
It broadcasts an ARP request for the MAC address of the IP helper. (Not sure why this is happening)
The IP helper responds with an ARP Reply. .. But the reply is never seen resulting in a PXE-E11. ARP Timeout error.
I have done some Googling of the this and found two possible errors that seem to apply.
DHCP option 60 on a MS DHCP server with a DHCPproxy running (Applies here. I removed the option with no effect.)
Cisco Routers not forwarding ARP requests (the request is received and replied to. And both interfaces are in the same subnet and VLAN) so doestnt seem to be an issue.
Anyone have any guidance on this. Points will be awarded to all (of course someone will have to tell me how to do that too LOL)
Hi Michael,
You no longer seem to have the native VLAN statement set on port 3/42?
It was on in your first post?
Mind you I don't see how all the other bits that now work would work without that?
What is option 60 on MS DHCP?
My DHCP server dosn't seem to recognise that?
To award points.
Click on "helpful" when you reply to award six points
You can do this twice per thread.
Click on "correct" if it is answered to award 10 points
Dinny
Ok Heres the deal, Finally I figured it out. Thanks to alot of help from you guys.
The only way that I got that far was to remove the Native VLAN from that interface. The reason it worked (and the reason it didnt) is that the 1955 uses Broadcom NETXtreme II Network cards. Which allow the setting of VLAN processing during PXE Boot. I had set the card to PXE boot using VLAN 322 in the PXE Settings. This made the DHCP part work. But the ARP broadcast failed because it didnt understand the Tagged Frame.
To fix this I disabled the VLAN settings in the PXE settings on the card and re-added the Native setting to the Interface. And dam if it didnt work.
DHCP Option 60 is used to designate the DHCP server as the PXE server in an environment that doesnt use DHCP relays. It has to be added using the NETSH command. MS doesnt officially support this option. And it did work before I added the Layer 3 VLAN IP helpers.
Now to actually get ESX to install from the network!!!!!
Thank you all for your help and suggestions.
and especially to Dinny for his patience.
Here is a synopisis of how to get PXE to work in a MS environment running on a VM.
Set VLAN xxx (In my case we used VLAN 322 for this purpose) as a Layer three VLAN with an IP address and IP Helper Addresses. (This of course is complicated by the fact that we have 2 6509s working as redundant routers, with GLBP set and working. And we have 1 NIC on each server attached to each Router.)
interface Vlan322 (on Router 0)
ip address 10.234.20.2 255.255.255.240
ip helper-address 10.158.121.94
glbp 5 ip 10.234.20.1
glbp 5 load-balancing host-dependent
interface Vlan322 (on Router 1)
ip address 10.234.20.3 255.255.255.240
ip helper-address 10.158.121.94
glbp 5 ip 10.234.20.1
glbp 5 load-balancing host-dependent
removed no ip forward-protocol udp TFTP to enable TFTP forwarding.
Added ip forward-protocol udp bootps to enable DHCP forwarding.
PXE Interface Cisco Port
interface GigabitEthernet3/42
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 322
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast trunk
spanning-tree bpduguard enable
spanning-tree guard root
Cisco Port of the ESX Host that the MS DHCP VM resides on.
interface GigabitEthernet3/39
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 322
switchport trunk allowed vlan 2-4094
switchport mode trunk
switchport nonegotiate
no ip address
no cdp enable
spanning-tree portfast
spanning-tree bpduguard enable
spanning-tree guard root
Set DHCP Options 66 and 67 on the DHCP server
Option 66 is the TFTP Server name (Use the IP address DNS is not supported)
Option 67 is the PXE boot file name in my case (PXELinux.0)
Install MS WDS services on the DHCP server or another server (the IP address must correspond to the IP mentioned in Option 66 above.
Copy the PXELinux.0 file to the root of the TFTP folder and create the PXELinux.cfg folder. (This is documented on the PXELinux website.)
Alternatively you can change the TFTP root folder by doing a Registry edit. (I found this documented on the PXELinux website as well)
I have awarded points to all who helped TY again.