A month ago, Ron and Tom setup an Active Directory domain to demonstrate the capabilities of WDS. A few weeks ago, Kristian and I added Server 2008 clustering capabilities to the AD environment. Elaboration regarding this environment will happen in the future.
Yesterday, Tom and I wanted to move the DHCP service from the WDS server to the cluster, as to provide highly available DHCP. So we had two servers: one running WDS + DHCP (hereafter referred to as the WDS Server) and another running DHCP (hereafter referred to as the DHCP Server). The goal was to split DHCP and WDS, so we copied the DHCP options from the WDS Server in the picture below to the fresh new DHCP server.
Working Options from WDS Server
Options on the DHCP Server
We rebooted a workstation whose operating system had been deployed from our WDS Server prior to our WDS & DHCP split. The workstation churned along at the PXE screen and then displayed the following PXE error message:
PXE-E55 Proxy DHCP Service did not reply to request on port 4011
Uh oh. We called it a day.
Today, Tom and I revisited the problem by attaching some hubs to our imaging infrastructure and playing the packet capture game. The WDS server is 10.150.150.1 and the DHCP server is 10.150.150.23 -- the following DHCP scope options were configured when the issue was occurring.Options on the DHCP Server
We rebooted a workstation whose operating system had been deployed from our WDS Server prior to our WDS & DHCP split. The workstation churned along at the PXE screen and then displayed the following PXE error message:
PXE-E55 Proxy DHCP Service did not reply to request on port 4011
Uh oh. We called it a day.
The packet capture above shows the problem. The workstation going through the PXE process grabs an IP from the DHCP server and then sends a DHCP discover to port 4011 of the DHCP server. (Note that the error we receive on the workstation mentions port 4011.) Then, the DHCP server replies with an ICMP port unreachable message -- an active rejection of the packet.
So we started searching Google some more and came across this Microsoft page. Microsoft tells us, "
Then we remembered that we have a working pxelinux environment. The pxelinux configuration files are served up by Microsoft's TFTPD and DHCP is offered by Microsofts DHCP 2003 service. Further, the DHCP and TFTP servers are separate! (oh, and IT WORKS)
We decided to setup another capture session, this time monitoring our working pxelinux environment.
Capture of the working pxelinux DHCP+TFTP transaction
Then, Tom expanded the DHCP ACK and noticed DHCP option 43 was used!
So, Tom updated the DHCP server settings in our WDS environment accordingly.
Then, Tom expanded the DHCP ACK and noticed DHCP option 43 was used!
So, Tom updated the DHCP server settings in our WDS environment accordingly.
Capture of working target setup
So, it appears that from our experiment and our working pxelinux environment, the presence of DHCP option 43 with a value of 010400000000FF a PXEClient immediately sends a TFTP get to the DHCP option 66 value for the file value listed in DHCP option 67.
We wanted to make sure, so we changed DHCP option 66 to a non-existent IP address, and the workstation failed with the message: PXE-E11 ARP Timeout. A capture of this event showed that the workstation received an address and tried to ARP requested for the non-existent IP address. This led us to further believe our claim about DHCP option 43.
Re-inspection of the expanded DHCP option 43 in wireshark shows the sub-option PXE mtftp IP setting with no value. We're somewhat confused what this sub-option means, although we've already hypothesized and proven what it accomplishes in the PXE environment. A simple Google for PXE Specification finds a document that might contain documentation about what this stuff means.
So, we tried to actually boot into PE 2, but it failed with the message:
WDSClient: There is a problem initializing WDS mode
Suck. The clear difference between our target environment and the working WDS environment is that a second DHCP request/ACK doesn't occur. The ACK in this communication contains DHCP option 252, Proxy Autodiscovery. A few more captures of the working WDS environment proved that this value changes per DORA/RA scenario.
It looks like we'll have to do some more digging into how WDS dynamically creates BCD files, etc. Expect another post regarding our end environment in the future.
Remaining thoughts:
We wanted to make sure, so we changed DHCP option 66 to a non-existent IP address, and the workstation failed with the message: PXE-E11 ARP Timeout. A capture of this event showed that the workstation received an address and tried to ARP requested for the non-existent IP address. This led us to further believe our claim about DHCP option 43.
Re-inspection of the expanded DHCP option 43 in wireshark shows the sub-option PXE mtftp IP setting with no value. We're somewhat confused what this sub-option means, although we've already hypothesized and proven what it accomplishes in the PXE environment. A simple Google for PXE Specification finds a document that might contain documentation about what this stuff means.
So, we tried to actually boot into PE 2, but it failed with the message:
WDSClient: There is a problem initializing WDS mode
Suck. The clear difference between our target environment and the working WDS environment is that a second DHCP request/ACK doesn't occur. The ACK in this communication contains DHCP option 252, Proxy Autodiscovery. A few more captures of the working WDS environment proved that this value changes per DORA/RA scenario.
It looks like we'll have to do some more digging into how WDS dynamically creates BCD files, etc. Expect another post regarding our end environment in the future.
Remaining thoughts:
- Do we lose any functionality by removing DHCP from the WDS server and implementing it elsewhere?
- Are there automatic changes to Option 67 by the WDS server?
- Are there other lost functions we don't know about or can't think of now?
- Probably
- The target ending architecture includes WDS outside of the high availability cluster.
- Can we distribute WDS across the cluster nodes, and use network load balancing to make TFTP via WDS highly available in a similar sense as clustered high availability?
- We shall see...
- Could this solve our dynamic BCD creation issues?
- Differential analysis -- the comparison of system states -- to solve problems is strong and effective. Not only can it be used in cryptanalysis or other math-oriented problem solving situations, it can be used in system administration. Thankfully, RIT's ANSA degree program taught us how to read packet captures.
- Sitebooks are great!
- We had documentation about this DHCP option 43 for our pxelinux environment, but we didn't look at it. In the old documentation, we should have sought to understand what the option accomplished for our pxelinux environment.
- This post is a sitebook!
- To detach DHCP from your WDS server, you need the following options in DHCP options defined in the new DHCP service
- Predefined Option 43 - 010400000000FF
- Custom-made Option 60 - String - PXEClient
- Predefined Option 66 - IP or Hostname of the WDS Server
- Predefined Option 67 - filename in WDS for architecture ( in our case it was boot\x86\pxeboot.com )
5 comments:
hey guys!
u missed some documentation:
u need options 60 only if u use DHCP+WDS on same server, but if they are on different servers u need to use only 66 and 67 options, leaving 60 option unset
that's working in my invirement
cheers, c0re
option 60 means check local dhcp server for tftp...don't use if ris/wds and dhcp are on different servers
hey guys!
Just want to say:
THANK YOU!!!!!
wish you all the best
Edi Pfisterer/Austria
PS: for me, its working fine WITH option 60 (PXE-Client)
I had this same problem. It was fixed by removing option 60 from the DHCP server, rather than adding additional configurations.
спасибо огромное товарищи)
thank you very much, comrades)
Post a Comment