Tuesday, October 7, 2008

RegRipper, regview, and Bluetooth Registry Settings

During DFRWS '08, I thought it might be worthwhile to have an easy way to make RegRipper plugins. In fact, I asked Harlan if he had a plugin generator; back then, he said he just had some templates that provided basis for the plugins. Since then, I've been busy.

Recently, Harlan posted that the v0.40 update for Parse::Win32Registry came with a GTK-perl registry hive viewer. Naturally, I became very interested in modifying James' work to make the RegRipper plugin generator that had been on my mind in August.

To setup shop, I had to get Parse::Win32Registry and checkout this script. Quickly, I realized that there were quite a few dependencies for Gtk2-perl. Once all those were resolved (in a clean XP VMware WS 6.5 virtual machine, it didn't seem to like my host OS instance), I launched up the viewer. First, let me say that it's marvelous. James did a really slick job of programming this UI and his entire perl registry package. Until now, I hadn't looked too closely at its internals, but it's truly great code.

Default regview.pl 0.40 opening SYSTEM hive

Once I got it running, I thought that I only had one thing to do: hack up the UI to enable plugin generation. Well, at first, it appeared that plugin generation was too complicated to use a single template (which was my original idea). I needed to have a good understanding of the current RR plugin features as well as the other programming libraries in-use (GTK2-perl, Parse::Win32Registry, perl). I started writing a small document, in which I articulated the processes of some RR plugins. A lot of the plugins that Harlan wrote were pretty unique, but all followed similar strategies: from a starting registry key, traverse some subkeys, maybe select a few of those subkeys, and find some (or maybe all) values of the subkeys.

So, I thought, why not look at a RR plugin as something that traverses this registry tree with a filter? So, I wrote a special RR plugin that accepts a starting point, depth, a key filter per depth and a value filter per depth (plus one). This special RR plugin was further modularized by extracting plugin specific information to Harlan's %config hash in each RR plugin.

So, now I've got a bare-bone template plugin. From this, I hacked up James' UI (forgive me, at some point in the next weeks I want to remove my code from regview and make my own module). I've added the ability to create a new RR plugin based on a selected key or value. Below is a screenshot of me creating the Widcomm Bluetooth plugin.




Hacked Up regview UI

OK. So, in this SS, I've selected a key (Widcomm) and then from the Generate menu, clicked "from selected key." Next, the Plugin Detail Specification dialog appears. Here, I select an output folder, depth, name and description. Once depth is chosen, we can click on the Modify Filters button to bring up the Plugin Filter Specification dialog. This dialog has a dynamic number of input boxes, depdending on the user-specified depth.

The filter dialog is where the magic happens. From these boxes, a set of perl regular expressions are built within the a new plugin. In this example, I selected the key Widcomm, and would like to get subkeys of Widcomm that start with Devices and LinkKeys. From there, I want to traverse all subkeys. I've specified for all values (other than those from a fourth level subkey) to include all words (perl's \w). The fourth-level subkey value must start with ServiceNameUTF8. A few OKs and:

RR Plugin Created!

What, you don't believe me? :)


btwindcomm RR plugin output

Right now, I've committed my the code to my code page. A tag exists in the soruce tree called tuesday_morning that contains the script and template that you need to get going. There are bugs and I haven't coded proper error handling -- so, you you can break it easily. This Bluetooth demo is the last thing that I've tested, so it is probably broken in other areas. I'll bugcheck Tuesday afternoon and tag a new version on Wednesday.

With regard to Harlan's Bluetooth post, I still have further investigation to perform. Apparantly, there are different Bluetooth stacks., which undoubtedly means different registry keys. I'll play with MS-XP and MS-Vista stuff sometime this week.

In my small investigations of RR plugins, I noticed that the applets plugin could also rip Wordpad's recently opened files, so I updated applets.pl.

That is all for now.

Wednesday, October 1, 2008

Hyper-V Server


Today, Microsoft announced the availiablity of the free Hyper-V Server.

I quickly downloaded the 1.09GB ISO and began installation. Installation took about 15 minutes in a VM with ~1.5GB memory.

Here are some Hyper-V Server screenshots:

Choose your language


Hyper-V Server


Intial Configuration Script


Available & Installed Packages


The majority of the installation was the exact same as a Server 2008 install except there is no version/type selection (std, ent, dc/full, core). The text-based initial script enables quick configuration without memorizing the netdom and netsh commands . Further, from the last screenshot, we can see the small number of packages that come with Hyper-V and that only one is installed. This is considerably different from a basic Hyper-V Server Core installation, illustrated below.


Server Core with Hyper-V

Microsoft did a good job of stripping unecessary components as well as making an easy-to-install and easy-to-initialize Hyper-V solution. Thanks!
Links:

Tuesday, September 30, 2008

Hyper-V and recursive virtualization

After I got ESXi to run inside of Workstation 6.5. I asked myself, "Can Hyper-V run within Workstation 6.5?"

Well, the role installs, and I can create Hyper-V virtual machines, but I can't start them:




Note that I have two VMs running. The main VM in the screen shot is my Server 2008 Enterprise Full Domain Controller while the Server 2008 x64 Enterprise Core Hyper-V server. I made the same modification to the Core VMX as I did for the ESXi-3 VM from my earlier post. The Hyper-V role installed and updated fine. As you can see from the screen shot, I was able to manage Hyper-V remotely and create a VM. When I connect to the VM to start it, it displays the error message that it cannot create the partition because of an unspecified error 0x80004005. There aren't many search results about this error with Hyper-V other than two forum posts (one of which I posted an updated error message). There are some results about the error code.

I attempted the same thing in a Full installation and the Hyper-V installation was prevented by ServerManager. Is possible that ocsetup, the utility which installs Hyper-V in CORE does not properly detect VT and DEP capabilities and thus allows installation on non-Hyper-V supported hardware? I ran ocsetup on the Full installation, and the role installs just fine....


An error occurs upon attempting to start the VMbus:


John Howard mentions that this could happen if resources are scarce. In my instance, I confirm that it is unsupported hardware with no driver:

Looks like Hyper-V isn't actually running. Apparently ocsetup doesn't prevent installation of Hyper-V as ServerManager does...

Graphical ImageX

While using WDS, the imagex binary is at the forefront of WIM creation. Apparently, there is a graphical front end, GImageX! This is fun, and exciting. An intuitive tabbed interface immediately enables you to hone in on imaging process you're about to undertake.


While I was playing with ImageX this summer at RIT, I was injecting large VMDKs into a mounted, writeable WIM. When I unmounted and committed changes, it would take an incredibly long amount of time to unmount. So long so that, at times, it appeared like the utility 'froze' because there was no significant resource usage displayed in the Windows Resource Monitor or Task Manager. So, I launched process explorer and filtered events related to imagex. Sure enough, there are a ton of events occuring that relate directly to imagex; therefore, it's not 'frozen.'' It was definitely annoying that there is no progress bar or good indication that the utility is still running successfully.


One of the first things I wanted to discover about GImageX was how the utility handled these unmount commit scenarios. Well, the good news is, there was a significant amount of disk usage directly from gimagex.exe throughout the few minute unmount. Further, there is a little cursor inside the dialog showing that the program is indeed still working. Since this is a GUI, you can tell that the application has still responding because you can move its windows.


First impressions of this utility are very high. I'm happy that I bumped into it from Ulli's latest post about the ESX Bandit.

ESX inside of Workstation




I stumbled upon a post about running ESX inside of VMware Workstation.  While I'm not sure of the practicality behind recursive virtualization, I want to see this working.  This post is a walkthrough of my ESXinsideWS process.

For background, I've been running the WS6.5 beta since the beta program started.  Recently, WS6.5 was publically released.  So, now, I'm running the first public release of WS6.5.  Further, my computer operates the MSI P6NSLI Platinum motherboard, with a Intel E6550 Core 2 Duo @ 2.33Ghz with 4GB of DDR2 800 G.Skill memory.

I set out to create a WS6.5 VM from just Eric's post.  This first VM is based on the Other Linux 2.6 Kernel 64-bit with 2 processors and an IDE disk.  I followed the recommendation from Ulli in Eric's post and added the following lines to the VMX prior to startup.  I also made sure the network adapters abstracted e1000.

monitor.virtual_exec = "hardware" 
monitor_control.restrict_backdoor = "true" 

PSOD when I launch the installer:


The second VM I created was based on the Other Linux 2.6 Kernel with one CPU and IDE disks.  Note that this time, I chose not to use either 64-bit or two CPUs.  This first time I started this second VM, a dialog stating that my CPU had been disabled!  Oh no!

Then, the following error reveals the cause.  I configured it to operate with 620MB of memory.  Apparently this is insufficient.



OK, so I increased the RAM to 1536 and the installer got farther, and displayed a new error.



After getting this error, the following questions were bouncing around in my mind:
  • Eric and Ulli talk about getting ESX running in WS, not ESXi which is what i've been trying -- Will ESXi work?
  • What types of disks did Eric and Ulli use? I thought Eric's post recommnded using IDE disks..
So, I did some quick searches and found a post on petri.  That video is based on this paper.  Following the video, I recreated my VM to be based on RHEL4 64-bit, one processor, SCSI disk and 1GB of memory.  This VM gets further




Excellent.  I'll post more information about this little recursive virtualization environment of mine :)

Saturday, September 27, 2008

Master's Thesis

Well, I'm all done with my MS in Computer Security and Information Assurance from RIT. The thesis, titled Differential Virtualization for Large-Scale System Modeling, is posted here. Some of the stuff previously posted on the blog is incorporated into the thesis including WDS/DHCP and multicasting files.

Tuesday, September 16, 2008

Multicast File Transmission in WDS

In the environment detailed in my MS thesis and a recently accepted paper to SIGITE '08, we describe an environment that uses a standard set of virtual machine templates. This set of virtual machine templates is then distributed (and kept consistent) across a set of workstations. Then, users can create differential virtual machines (VMware's linked clones) based on the templates. Users store the linked clones on a file server, and can achieve virtual machine mobility between workstations. At RIT NSSA, this environment is semi-operational as I write this post. Since RIT NSSA teaches many different operating system technologies, there are many virtual machine templates that reach a summed size of 100GB. One of the issues that we discusssed in our paper and that I present in my thesis is the notion of updating template virtual machines across all workstations. This is a difficult subject because as the number of workstations increases, the copies of this template repository increase. Now, RIT NSSA has 80 workstations in the pilot-lab -- that means, right now, when they want to update or add a template to each machine, they have to inject the files into an image and re-deploy the OS and data on all 80 workstations. Another way they can achieve an update is through some differential robocopy script that copies the templates from the file server to the workstations -- this can be done in series or parallel (I've found that robocopies in series seem to work much better with the storage devices in that file server -- Adaptec 2820SA with 5 SATAII in RAID5). However, all of these approaches are inefficient because they either copy superflous data once as in deploying an install image, or they copy the same data 80 times as in differential robocopy. There has to be a better way!
Enter WDSMCAST.exe from the Server 2008 AIK. WDSMCAST enables multicast transmissions of custom data stores. So, I can create diretory and make a custom WIM with the directory's contents. Once I have a custom WIM, I can create a custom namespace on my WDS Transport Server using wdsutil /new-namespace with the /configstring parameter specified as the location of the custom WIM. Microsoft's documentation states that the custom WIM can be stored in any directory. This, however, caused a divide by zero in my test runs with WDSMCAST.exe:

So, I moved the custom WIM inside the RemoteInstall directory and then multicast transfer of the image works just fine.

WDSMCAST runs just fine inside Vista:



Therefore, we could create a differential version of the repository, generate a new WIM, create a multicast session based on the WIM, instruct each workstation to join the multicast session, and then have each workstation extract contents of the WIM ontop of the repository at the workstation.
While this is a nice solution, it is possible that a workstation require twice the size of the update in free disk space. For example, if we wanted to add 20GB of templates to all workstations, the workstations need at least 40GB of free space because 20GB is required for the WIM and 20GB is required for the extracted templates. In an environment where this is realistic, it would be neat to issue these differential updates in a multicast fashion with wdsmast.

Microsoft says, "You can create a custom content provider for cases where the default provider is not sufficient (for example when using Transport Server to deploy an operating system from inside a .vhd image). See the Windows Server 2008 SDK for guidelines and samples for authoring and registering the provider." I'm going to investigate custom content providers for the purposes of transmitting a template repository version and talk about custom content providers in future post.

Wednesday, July 23, 2008

Follow Up: Differential Analysis - WDS & DHCP

So I was doing some more reading about the WDS & DHCP service split Jason and I talked about in these two posts when I found a technet article that had some information in it that could have saved us some time. The section titled Known issues with configuring Windows Deployment Services says "If DHCP is installed on a server that is located in a different subnet, you will need to do one of the following ... Add DHCP options 66 and 67. Option 66 should be set to the Windows Deployment Services server, and option 67 should be set to boot\x86\wdsnbp.com."

The article also has a link
here to another technet article with more detailed information about network boot programs. After doing some further reading it turns out that the wdsnbp.com image has the following purposes:
1. Architecture detection
2. Pending computer scenarios. When the Auto-Add policy is enabled, it is sent to pending computers to pause the PXE boot and report back the client computer's architecture to the server.
3. PXE referral cases (including use of Dynamic Host Control Protocol (DHCP) options 66 and 67)




So I was able to setup a split WDS/DHCP environment in production, all of the packets were being passed from client to server based on my packet captures. The PCs that I am attempting to deploy to have an x64 architecture so based on Microsoft's documentation ("In addition, x64-based computers can run x86-based or x64-based boot images. Therefore, for each of these tasks, you could have two boot images—one for x86 and one for x64. The boot menu on x86-based computers will only display x86 boot images (because x86-based computers cannot run x64 boot images).") I should be fine using an x86 boot.wim to boot.

But when I go to boot the client into the default boot.wim boot image (taken from a 2008 Server DVD) it gets the following error:
WdsClient: An error occurred while communicating with the Windows Deployment Services server. Please check to ensure that the server is operational and that the necessary ports are open on the server's firewall. Server name [name], Server IP address [ip].

By hitting Shift+F10 I get a command shell where I checked for a valid IP address which I had.
Then I checked the detailed log file of the boot process in: x:\Windows\Panther\Setupact.log.

The very bottom of the log file has the following error messages:
Info "InitializeLogging: RPC_S_SERVER_UNABAILABLE - Retrying server request for initializing logging."
Error "CreateClientSession: Failed to initialize Client -> Server logging. Error code [0x800706BA].[gle=0x000006ba]"
Error "CreateClientSession: Failed to create client session. Error code [0x800706BA].[gle=0x000006ba]"
Error "CallBack_WdsClient_DetectWdsMode: Failed to create client session or initialize WDS unattend. Error [0x800706BA].[gle=0x000006ba]"


Now the weird thing is that I can boot to the capture.wim image (still x86) with no problems, so I did some more research and found out that this data was being blocked at the network...

Looking at some more documentation from Microsoft I see that the following ports must be open for WDS to work (the error message mentioned above was due to port 5040 needed for WDS to create an RPC connection being blocked):
  • UDP - 67, 68, 69, 4011
  • TCP - 135, 137, 138, 139, 5040

After changing the firewall rules everything started working again.

Great!



I hit another snag in the deployment. Now I have an image (52GB Vista Business) which I created overnight (I estimate it took about 5 or 6 hours to capture). I saved the initial WIM file to an external hard drive due to its size and overnight the connection to the server was lost so the image was not moved.

No big deal, I just simply plug the drive into the server go to WDS and import the new image into the new Image Group that I created.

So after this gets done I go and try to pull this image but when I boot the client up into the WDS PE boot environment I do not see any images (I should see two at this point).

Back to the server where I enabled trace logging on all the components with regarding to WDS, these are located in the registry under:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Tracing\

Just look for the keys that start with WDS.

These log files will turn up in %windir%\Tracing\

I found the following errors in the log:

[WdsImgSrv] Error in enumerating images. Error [13].

So I disabled all the images on the server and copied the new image from the USB drive onto the local hard drive then imported it into WDS under my original Image Group.

So this worked, the image pushed down in one and a half hours and everything worked.

Next up is documentation about MDT, AIK and Unattended installation of Vista and Server 2008.

Monday, July 14, 2008

Moving from VMware Workstation to ESX

At RIT NSSA, we've been using a VMware Workstation implementation, dubbed Remote Laboratory Emulation System (RLES), for well over two years as a platform to teach NSSA applied curriculum. This project has been faculty designed, student built and student maintained for these years. This past winter, with the help of Information Technology Services, our department started to construct an ESX cluster that would more effectively support the RLES concept. So far, one course Security Audits of Web Servers and Applications, has been offered on the ESX version of RLES. Another course, Computer Viruses and Malware, is being offered right now. Luckily, both of these labs for these courses were designed with the intent to succeed in a VMware Workstation environment. This made migrating from Workstation to ESX somewhat simpler because we were able to convert or directly import some of the virtual machines. Note the qualifier some. Other courses that will be moving to a virtualization platform aren't as lucky. The following post describes, briefly, the issues we’ve experienced thus far migrating from virtualization to virtualization.

This past week, Kristian Stokes and I attempted to import the DMZ auditing lab for Security Audits of Web Servers and Applications. This lab involves four systems inside a virtual DMZ as well as the firewall/router virtual machine that routes to the DMZ. The four DMZ systems include vulnerable instances of FreeBSD 5, Fedora 6, Windows Server 2000 and Windows 2003 all running vulnerable services with horribly exposing mis-configurations. There are two major issues with making this lab succeed with the ESX setup: (1) ESX requires all virtual machines to use SCSI virtual disks, (2) We’re running Lab Manager 2.5 which only supports 1 networking adapter. Oh, and an even bigger issue: the DMZ systems LACK SUFFICIENT DOCUMENTATION! An aside: this DMZ lab was created by a graduate student two years ago. The cost required to learn and re-implement these systems is very high. Theoretically, since this is part of an auditing course, any student who fully audited the DMZ systems should be able to recreate them (pfft, ya right).
So, Kristian is formally documenting the trials and tribulations of this migration, but below are the migration paths for the DMZ virtual machines.

  • FreeBSD
    1. the original machine had an IDE virtual drive, so we have to convert it to SCSI.
    2. we tried was a straight vmdk conversion using a tutorial. This failed to create a virtual disk that was even recognizable by another FreeBSD system.
    3. we tried using clonezilla (without reading clonezilla support docs) to duplicate the data from the IDE disk to a fresh SCSI disk. Instantly we noticed that clonezilla dropped to a normal dd operation and figured that the FreeBSD file system wasn’t supported by the clonezilla suite. Clonezilla doesn’t support UFS; which was the file system type of our virtual machine. The dd from clonezilla made a drive, and it appeared to data… just not in the UFS slices that needed to be there.
    4. we tried using CloneHDD to duplicate from IDE to SCSI. CloneHDD is a utility to duplicate FreeBSD installations. Once we got the script to run, it would pause after copying one slice.

      This is where I went home, because we’d already spent 5 hours working on one image.
    5. Kristian spent some time manually trying to duplicate the partitions from the IDE disk to the blank SCSI disk with some moderate success. While manually using dump on /var he got a message stating that a filename was too long, above the 1044 max character limit. He then found a directory with many subdirectories all with really long names like XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX etc. Assuming this was the problem, he deleted the directory structure.
    6. Kristian used CloneHDD to successfully copy the slices. The CloneHDD script fixed /etc/fstab on the SCSI disk to mount da0 (SCSI0:0) rather than ad0 (IDE0:0). He removed the IDE referenced in the .vmx and booted the virtual machine with VMware Workstation on his desktop.
    7. Kristian imported this virtual machine into Lab Manager.
    8. When trying to deploy the virtual machine, it boots to bootrom> and the keyboard doesn’t work.

  • Fedora – this machine also has an IDE drive.
    1. we tried to use clonezilla to convert the IDE drive to SCSI; however it failed because of RedHat’s default LVM file system feature.
    2. we’re still waiting to work on this one more, a straight dd should work with modifications of the grub boot configuration file. (we’ll update this post when we finish this migration)

  • Windows Server 2000
    1. This machine had a SCSI disk, but when we attempted to import it into Lab Manager, LM spit back this:

      • Error executing lm-vmkimport: Failed to open '/pathto/labmanager/mnt/mysmbshare:Classes_1349690868/path/to/vmdk/Windows 2000 Server-000002.vmdk': The parent of this virtual disk could not be opened (23). . The originating server for this exception is: esxnode1.local

    2. We speculate that a peice of the vmdk (the "parent" geometry) isn't in the /path/to/vmdk/
    3. This machine is still waiting to be imported

  • Windows 2003
    1. This machine had a SCSI disk and properly imported without fuss! Yay!

  • Router
    1. this machine had a SCSI disk but had two NICs (because its simulating a corporate firewall/router). This machine is still waiting to be imported – with any luck, the upgrade to LM3.0 will allow us to have multiple network adapters.



Another student's progress on converting Computer Viruses and Malware this summer is going well, I think. He had some hiccups trying to import the old virtual machines from the lab, so he just created new ones. The virtual machines for these labs involve some XP instances with Sysinternals TCPView, FileMon, RegMon, etc, IDA, bagle, sasser, and some trivial SANS-ish malware examples as well as a honeyd/snort. The re-implementation of these virtual machines in the ESX environment is much more manageable than the DMZ lab. When this student coughs up some documentation, I’ll post it up here.


Anyways, as you can read, there is a lot going on with simply moving from Workstation to ESX. Beyond all the nitty gritty technical work, we’re also looking at the bigger picture -- like how all of these changes affect student productivity, curricular benefits, etc. Expect some more information regarding our setup rather soon.

Thursday, July 10, 2008

Idenfitication Woes

Last year, Tom and I worked on a project called Obfuscator for our forensics course. The project was to demonstrate, to our class, that changing file signatures was as easy as changing file extensions and therefore the thoroughness of file signature analysis tools is questionable. When Harlan blogged that anti-forensics , "techniques don't defeat tools...they defeat examiners." I quickly replied alluding to our (Tom and my) conclusion about fully understanding the capabilities of our forensic tools and how file identification (just like people authentication) is HARD.


A while back, Harlan asked a question about a script from Didier Stevens that embeds an executable inside a VBScript.

"What would you look for if you were analyzing a system and trying to determine if something like this had been used?"

Well no one posted a reply to you Harlan... and after thinking about this question since July 2nd, my response is still: I don't know.

Static file analysis could search for binary execution methods ... like Run for wscript ... but that would be impractical, I think. As with identifying a file, identifying a malicious script isn't as easy as it looks.

So rather than really answering Harlan's question, I'll ask one:
Is writing and executing an executable a common scenario in scripting?

Registry Analysis #1

Summary:
  1. HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PagingFiles
  2. I need to read Hobocopy documentation to make it work in Vista x64
  3. If you can't copy a file from a mounted VMDK, try mounting and copying as administrator
  4. Yay -- now I can play with RR

After Harlan posted about an interesting registry entry this morning, I thought of the systeminfo utility. I thought, "I wonder if the systeminfo tool queries the registry for similar information?". So I fired up Process Monitor, set filters for the Registry Event Class and executed systeminfo. Once systeminfo finished, I stopped the capture and searched for systeminfo within Process Monitor. It appears that the systeminfo binary directly queries some registry values and also utilizes WMI. Cool. So I posted a reply to Harlan saying there's some interesting material there. But, I didn't say which entries seemed interesting. Harlan asked me what was interesting, so I went back to look. I found TimeZoneInformation and some network adapter information (both of which were covered by RR plugins). So I tried to find something that wasn't in RR, and I think I did:

Paging File Location
(HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PagingFiles)
Information: here ... this value is a REG_MULTI_SZ in Vista rather than the REG_BINARY as listed inthe MS article.
Significance to RA: If the page file is not in the default location \pagefile.sys, you'll want to know where it is.

The systeminfo utility also reports tidbits like patch levels; however, I'm not sure (yet) if this is listed in the registry.







The remainder of this post shows some of my experiences in registry hive acquisition (summary items 2-4)

Until now, I did not think that I had an easy way to get access to registry hives. Before tonight, I tried mounting a vmdk on my desktop with VMware Workstation's drive mapping feature so that I could simply copy the hives, but that failed:




I figured the file was just locked or something -- and a while ago, I stumbled upon a post that mentioned using VSS to copy a file that is in use but I shrugged it off because they didn't have Vista binaries. Well, I just searched and there's an open source project! Its called HoboCopy (enter chuckle about my scripting issue).

So, I downloaded hobocopy for vista x64 and executed it:


I missed the Visual Studio 2008 libraries dependencies (vcredist_x64.exe)... Once those were installed, the hobocopy still wouldn't run under my Jason user... so I opened an elevated shell and hobocopy ran fine.

When I tried to change to virtual drive of the VMDK within my elevated shell, the shell explained:

The system cannot find the drive specified.

PowerShell also explained:


I confirmed that the drive was still accessible in my Jason shell (PowerShell background window). It appears I have a user-specific drive letter? Weird...

I really wanted to get hobocopy to copy the system hive, so I went to mount the vmdk with vmware-mount in the elevated shell, but it didn't exist in my workstation folder! I'm assuming this is because I'm using VMware WS 6.5b2.

So I elevated VMware Workstation, mounted the vmdk and tried to copy the file with hobocopy -- it failed. But, I was able to copy the hive file just fine with copy!

Summary:
  • HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PagingFiles
  • I need to read Hobocopy documentation to make it work in Vista x64
  • If you can't copy a file from a mounted VMDK, try mounting and copying as administrator
  • Yay -- now I can play with RR

Monday, June 30, 2008

Scripting Mistake

This evening, a scripting mistake led me to learn about a feature in Systintenals' Process Explorer that I had not previously known about. I subscribe to Mark's blog (listed in the System Administration blogs) and he writes a series titled, "The Case of..." where he details his troubleshooting process for a specific issue. In his post, The Case of the System Process CPU Spikes, he demonstrated pretty cool uses of Process Explorer -- since I read that post in April, I've been running Process Explorer in my little tray just like he described in his post. Occasionally I find uses for it like closing stale processes -- but I've never REALLY used it until tonight.

Earlier, I wanted utilize robocopy to synchronize my local caches of the Systinternals' tools from file://live.sysinternals.com/. I maintain two copies: one on my OS drive and one on my thumb drive. My purpose for this is is that I want these tools immediately available when I want to run them -- and while the live share is nice, it takes a few seconds to load.

So I wrote a simple batch script; and made a rudimentary scripting mistake: I wrote a batch script titled robocopy.bat and attempted to call the robocopy executable without specifying it's full path. Here is my original script:



So, this mistake led to some Windows Command Processor crashes:





Interesting right? The infinite loop caused CMD.exe to crash (I guess it's time to install PowerShell and actually read those powershell books on my bookshelf.) When CMD crashed, I went to edit the batch script but wasn't able to save modifications:


OK. I figured that cmd.exe had a lock on the handle -- and I remembered that OpenedFilesView wouldn't work in Vista x64, I didn't feel like downloading a utility and I certainly didn't feel like logging out.

Somehow, I ended up searching for the file handle with Process Explorer's handle or DLL search feature.




After closing the handle, I was able to save modifications to the batch script.

Finally, I fixed my script to call the robocopy executable by it's absolute full path:

C:\Windows\System32\Robocopy.exe


I'm glad I learned about the file handle search feature in Process Explorer -- this utility is incredible. Thanks Sysinternals.

Friday, June 27, 2008

Revisit: Differential Analysis - WDS & DHCP

Tom and I wanted to make a couple statements and clarifications about our post earlier this week. First, we got the environment to function. Read on to find out how.

Purpose
The reason we undertook this project was two fold: primarily to see if it would work, but also because we believe that an established DHCP infrastructure should welcome a separate WDS server -- we believe that WDS should be able to function within the same domain but on a different server than the DHCP service. It should be as simple as specifying which DHCP services work with WDS.


Semi-Functional?
This morning, we were able to get a deployment of Vista running beyond the standstill we had a few days ago. The procedures we described in our previous post were wrong. We had defined DHCP option 67 as boot\x86\pxeboot.com because thats what was on the original WDS+DHCP server.

Wrong DHCP Option 67

This, caused the client to load and execute the files in the following order:

Capture of failing boot session

So, after some more searches on Google, we came across this article. Apparently, we skipped steps 1 and 2 of the Deployment Process (load WDSNBP, validate DHCP packet, and download pxeboot.com). The change we made today (changing DHCP option 67 from boot\x86\pxeboot.com to boot\x86\wdsnbp.com) caused the PXEClient to load and exeucte wdsnbp.com prior to loading and executing pxeboot.com. In fact, we're pretty sure that WDSNBP sends the DHCP request that causes the server to send the ACK with boot\x86\pxeboot.com as DHCP option 67.


Capture of working boot session.

So here are the DHCP settings to define in order to run WDS with a pre-existing DHCP infrastructure.

DHCP Settings to deploy x86 architecture:
  • Predefined Option 43 - 010400000000FF
  • Custom-made Option 60 - String - PXEClient
  • Predefined Option 66 - IP or Hostname of the WDS Server (in our case 10.150.150.1)
  • Predefined Option 67 - boot\x86\wdsnbp.com

Limitations
We're pretty sure that we're losing functionality doing this. Our setup launches TFTP right after the first DHCP transaction -- skipping a request/ACK. (this is what DHCP option 43 accomplishes). It appears that the DHCP request/ACK that we're skipping might be the packet that tells the WDS server which architecture the client is running (x86, x64, ia64).

Capture of original WDS+DHCP transactions



Knowledge
We don't know much about WDS and how its innards function -- we're learning while experimenting. This, most definitely, leads to some dead-ends with respect to progress. Further, when trying to learn about wdsnbp.com, we found Network Bootstrap Program (NBP). However, there isn't a lot of information on exactly what it accomplishes. We assume its some enhanced PXE kernel.

Attention to detail
Because we're barreling through this process, we missed some obvious signs that the change in the filename was the solution. First, the loading dialog on the workstation after it receives an address states its executing WDSNBP.

Contrarily, we assumed that the boot file name (DHCP Option 67) meant something in the first DHCP transaction when WDS and DHCP existed on the same server; this value means nothing in the first DHCP transaction when WDS and DHCP exist on the same server. When we added DHCP Option 43 (PXE subption for mtftp), we instructed our client to immediately download and execute whichever file we specified in DHCP option 67. Apparently this needs to be architecture specific wdsnbp.com.


Quote from the PXE Specification
"Redirection by the Boot Service to a TFTP service on a remote server should not be done as it is not reasonably possible for the redirecting server to know for certain that the TFTP server being redirected to is truly available."

Quote from MS about DHCP & WDS
"Microsoft does not support the use of these options on a DHCP server to redirect PXE clients."

While the PXE developers might not recommend it and Microsoft says the don't support it, we accomplished it (although its architecture specific). Yay.

Thursday, June 26, 2008

Thinclient and Network Booting

A few months ago, an acquaintance pitched an idea about an authenticated network boot environment and pointed me to emBoot. I downloaded their trial winBoot software and never used it. I revisited the site to read about recent updates and I stumbled upon two interesting utilities that I missed a few months ago: SimplyRDP and sanFly.

SimplyRDP - this utility uses PXE to boot into a small OS that just runs an RDP/TS client.

netBoot/winBoot - these utilities enable PXE clients to boot from an iSCSI target (netBoot works for Windows 2000, XP and Linux; winBoot works with Windows Vista, 2003 and 2008).

sanFly - enables the creation and management of iSCSI targets in Windows XP, Vista, 2003 and 2008. sanFly is available for download at no cost, but additional functionality can be unlocked by purchasing a license key (emBoot)

I'll report back when I've had a chance to play with the utilities or expand the idea of secure network booting.

Wednesday, June 25, 2008

Differential Analysis - WDS & DHCP Separation

This post outlines the issues and resolution that Tom and I uncovered while removing DHCP from a Windows Deployment Services (WDS) system and moving it to a separate system. The post is rather lengthy, so if you're seeking a solution to this problem, we haven't found one. There is a bulleted list of our take-aways and thoughts so far at the end of this post. The title of this post includes differential analysis because Tom and I compared the functional states of two environments with the non-functional state of our broken system to try to determine a solution.

A month ago, Ron and Tom setup an Active Directory domain to demonstrate the capabilities of WDS. A few weeks ago, Kristian and I added Server 2008 clustering capabilities to the AD environment. Elaboration regarding this environment will happen in the future.

Yesterday, Tom and I wanted to move the DHCP service from the WDS server to the cluster, as to provide highly available DHCP. So we had two servers: one running WDS + DHCP (hereafter referred to as the WDS Server) and another running DHCP (hereafter referred to as the DHCP Server). The goal was to split DHCP and WDS, so we copied the DHCP options from the WDS Server in the picture below to the fresh new DHCP server.

Working Options from WDS Server



Options on the DHCP Server

We rebooted a workstation whose operating system had been deployed from our WDS Server prior to our WDS & DHCP split. The workstation churned along at the PXE screen and then displayed the following PXE error message:

PXE-E55 Proxy DHCP Service did not reply to request on port 4011


Uh oh. We called it a day.


Today, Tom and I revisited the problem by attaching some hubs to our imaging infrastructure and playing the packet capture game. The WDS server is 10.150.150.1 and the DHCP server is 10.150.150.23 -- the following DHCP scope options were configured when the issue was occurring.

Capture of the problem

The packet capture above shows the problem. The workstation going through the PXE process grabs an IP from the DHCP server and then sends a DHCP discover to port 4011 of the DHCP server. (Note that the error we receive on the workstation mentions port 4011.) Then, the DHCP server replies with an ICMP port unreachable message -- an active rejection of the packet.

So, when we noticed this, we knew the problem was going to be getting the workstation to send that second DHCP discover to the WDS server on port 4011 rather than back at the DHCP server. We captured the traffic for a working DHCP + WDS transaction thinking we could compare the working setup with our target setup.


Capture of the working DHCP+WDS transaction

So, we tried mucking with some settings on both the DHCP and WDS servers based on analyzing the differences in the DHCP ACKs from the working (packet #31 - capture of the working DHCP+WDS transaction) and non-working (packet #23 - capture of the problem) captures and no combination of configuration changes led to a different error or a success. Some of the settings we messed with include DHCP Option 54 Server Identifier, Do not listen on port 67, and changing DHCP Option 66 to a non-existent IP address in the working environment to see if the change would break the system.

So we started searching Google some more and came across this Microsoft page. Microsoft tells us, "
Important: Microsoft does not support the use of these options on a DHCP server to redirect PXE clients." Well, thanks, but no thanks.

Then we remembered that we have a working pxelinux environment. The pxelinux configuration files are served up by Microsoft's TFTPD and DHCP is offered by Microsofts DHCP 2003 service. Further, the DHCP and TFTP servers are separate! (oh, and IT WORKS)

We decided to setup another capture session, this time monitoring our working pxelinux environment.


Capture of the working pxelinux DHCP+TFTP transaction

Then, Tom expanded the DHCP ACK and noticed DHCP option 43 was used!

DHCP option 43


So, Tom updated the DHCP server settings in our WDS environment accordingly.

Updated DHCP options (working!)

And, voila! The workstation in the WDS environment now directs TFTP GETs to the WDS server right after the DHCP transaction. Cool.


Capture of working target setup

So, it appears that from our experiment and our working pxelinux environment, the presence of DHCP option 43 with a value of 010400000000FF a PXEClient immediately sends a TFTP get to the DHCP option 66 value for the file value listed in DHCP option 67.

We wanted to make sure, so we changed DHCP option 66 to a non-existent IP address, and the workstation failed with the message: PXE-E11 ARP Timeout. A capture of this event showed that the workstation received an address and tried to ARP requested for the non-existent IP address. This led us to further believe our claim about DHCP option 43.

Capture of ARP Timeout

Re-inspection of the expanded DHCP option 43 in wireshark shows the sub-option PXE mtftp IP setting with no value. We're somewhat confused what this sub-option means, although we've already hypothesized and proven what it accomplishes in the PXE environment. A simple Google for PXE Specification finds a document that might contain documentation about what this stuff means.

So, we tried to actually boot into PE 2, but it failed with the message:

WDSClient: There is a problem initializing WDS mode

Suck. The clear difference between our target environment and the working WDS environment is that a second DHCP request/ACK doesn't occur. The ACK in this communication contains DHCP option 252, Proxy Autodiscovery. A few more captures of the working WDS environment proved that this value changes per DORA/RA scenario.


DHCP Option 252

It looks like we'll have to do some more digging into how WDS dynamically creates BCD files, etc. Expect another post regarding our end environment in the future.

Remaining thoughts:
  • Do we lose any functionality by removing DHCP from the WDS server and implementing it elsewhere?
    • Are there automatic changes to Option 67 by the WDS server?
    • Are there other lost functions we don't know about or can't think of now?
      • Probably
  • The target ending architecture includes WDS outside of the high availability cluster.
    • Can we distribute WDS across the cluster nodes, and use network load balancing to make TFTP via WDS highly available in a similar sense as clustered high availability?
      • We shall see...
      • Could this solve our dynamic BCD creation issues?
Lessons of the day:
  • Differential analysis -- the comparison of system states -- to solve problems is strong and effective. Not only can it be used in cryptanalysis or other math-oriented problem solving situations, it can be used in system administration. Thankfully, RIT's ANSA degree program taught us how to read packet captures.
  • Sitebooks are great!
    • We had documentation about this DHCP option 43 for our pxelinux environment, but we didn't look at it. In the old documentation, we should have sought to understand what the option accomplished for our pxelinux environment.
    • This post is a sitebook!
Procedures:
  • To detach DHCP from your WDS server, you need the following options in DHCP options defined in the new DHCP service
    • Predefined Option 43 - 010400000000FF
    • Custom-made Option 60 - String - PXEClient
    • Predefined Option 66 - IP or Hostname of the WDS Server
    • Predefined Option 67 - filename in WDS for architecture ( in our case it was boot\x86\pxeboot.com )