Working with
Domain Member Virtual Machines and Snapshots
One of the benefits of using a virtualization product that
allows you to create snapshots, is the ability to create a "point in
time" to which you can always revert your virtual machines. By reverting
to this snapshot, you get your VM to the state in which it was saved, and are
able to perform various tasks such as testing software, doing QA, creating labs
and so on.
Are you able to identify precisely which processes are
sucking up resources and slowing down your servers? Can you do this equally
well over VM guests that VMotion?
OpManager also allows admins to remotely shut down problem-causing processes. With over 500 built-in monitors & 70 deep VMware metrics reported on, OpManager is one of the most comprehensive fault & performance management solutions available today for entire server infrastructure - both physical and virtual.
OpManager also allows admins to remotely shut down problem-causing processes. With over 500 built-in monitors & 70 deep VMware metrics reported on, OpManager is one of the most comprehensive fault & performance management solutions available today for entire server infrastructure - both physical and virtual.
However, one of the nasty issues of working with snapshots
is when you have one or more virtual machines that are members of an Active
Directory domain. When you create snapshots of such machines and restore them,
you might occasionally find that all authentication involving the VM seem to
fail, and face an issue of not being able to log on to the virtual machines, or
not being able to access files and shares across the network. You might even
get errors like this one:
Windows cannot connect to the domain, either because the
domain controller is down or otherwise unavailable, or because your computer
account was not found. Please try again later. If this message continues to
appear, contact your system administrator for assistance.
If you log on locally (not using a domain account) to
the computer (in this example it's a Windows XP Pro client), you'll see the
following events in the Event Viewer.
NETLOGON 3210
This computer could not authenticate with
\\WIN2003-SRV1.petrilabs.local, a Windows domain controller for domain
PETRILABS, and therefore this computer might deny logon requests. This
inability to authenticate might be caused by another computer on the same
network using the same name or the password for this computer account is not
recognized. If this message appears again, contact your system administrator.
LSASRV 40961
The Security System could not establish a secured connection
with the server cifs/WIN2003-SRV1.petrilabs.local. No authentication
protocol was available.
W32Time 18
The time provider NtpClient failed to establish a trust
relationship between this computer and the petrilabs.local domain in order to
securely synchronize time. NtpClient will try again in 15 minutes. The error
was: The trust relationship between this workstation and the primary domain
failed. (0x800706FD)
And possibly others.
This is nasty, however, if you carefully remember the days
when Ghost was the only way to image a computer, you might also remember that
it was always a good practice not to "ghost" a machine that was a
member of a domain, and that if you didn't do that, you ended up with a cloned
computer that was "ghosted" back from an image, and that, in some
cases, could not log on to the domain it was a member of. So this is not a new
situation, it's just the new "ghosting" tools we're using.
The reason for this is that there is a computer account
password mismatch. The Windows-based domain member VM thinks that its machine
account password is something X, while the domain controller believes it to be
something Y. Because of this, the VM cannot authenticate itself to the domain
controller(s).
So how does this work? Just like user account passwords,
computer account password is a "secret" that is set up by the
computer account, and that is used when a Windows-based domain member computer
authenticates itself to the domain controller and establishes a secure channel.
When the computer is started, a service called NetLogon uses
the machine account password and tries to establish a secure session with the
domain controller. The usual CTRL+ALT+DEL Winlogon process also relies on this
authenticated secure channel to send user credentials to the domain controller
for verification and log them into the computer. Other services running on this
machine that work with the LocalSystem or NetworkService credentials also
require this authenticated secure channel to get access to domain resources.
So without this proper password there cannot be a secure
channel, and hence the issues described above, and various things fail as a
consequence.
The password is first created when the computer is joined to
a domain. It is shared by domain controller and the computer.
So what happens during regular operations? Well, to explain
this, we need to think or 3 scenarios:
1. Regular operation, client computer works
"regularly", never offline for extended periods of time. Each
Windows-based computer maintains a machine account password history containing
the current and previous passwords used for the account. Regularly, the
computer account password change is initiated by the Netlogon service on the
client computer every 30 days by default . Since Windows 2000, all versions of
Windows have the same value. After this change, both the domain controller and
the computer use the new password for authentication.
When a client determines that the machine account password
needs to be changed, it would try to contact a domain controller for the domain
of which it is a member of to change the password on the domain controller. If
this operation succeeds then it would update machine account password locally.
When two computers attempt to authenticate with each other
and a change to the current password is not yet received, Windows then relies
on the previous password. If the sequence of password changes exceeds two
changes, the computers involved may be unable to communicate, and you may
receive error messages.
2. Not-so-regular-operation but still possible, when a
client computer is taken offline for an extended period of time, 30, 60, 90
days or more. In this scenario, if a computer is turned off for three months
nothing expires. When the computer starts up, it will notice that its password
is older than 30 days and the Netlogon service on the client computer will
initiate action to change it. This is only applicable if the machine is turned
off for such a long time.
3. Snapshots, when either a "Ghost"-type image or
(related to this article) a VM snapshot is taken, then the computer resumes
regular operation (as of scenario #1). Then suddenly, after working for 30, 60,
90 days or more, the snapshot is brought back. While using snapshots, when the
domain member is restored to an older snapshot, it loses track of any password
change changes done later and tries to use an older password. Hence it fails to
authenticate itself.
So how do you fix this? Well, first of all, if you've
already gotten to the point where the error occurred and you cannot log-in, you
will need to read my Fixing "Windows cannot connect to the domain, either
because the domain controller is down or otherwise unavailable, or because your
computer account was not found" Errors article for a solution.
However, if you wish to prevent this from happening AND
you're using virtualization software and snapshots, you may want to do one of
the following:
Option #1
Increase the computer account password age, or disable
password changes altogether. Both these can reduce likelihood of the problem,
but may reduce the level of security in the domain. On the other side, since
this is probably a test, a QA or a demo environment, you may consider it as a
valid option . These settings are available on the domain member (and not on
the domain controller), and as such, you can change them on your computer
before you create a snapshot out of it.
Warning!
This document contains instructions for editing the
registry. If you make any error while editing the registry, you can potentially
cause Windows to fail or be unable to boot, requiring you to reinstall Windows.
Edit the registry at your own risk. Always back up the registry before making any
changes. If you do not feel comfortable editing the registry, do not attempt
these instructions. Instead, seek the help of a trained computer specialist.
As noted above, these settings are configured on the domain
member, and are controlled by the Netlogon service. Settings are found in the
following Registry key:
HKLM\SYSTEM\CurrentControlSet\Services\NetLogon\Parameters
DisablePasswordChange
(default off) prevents the client computer from changing its computer account
password. To disable, give it a value of 1.
MaximumPasswordAge
(default 30 days) determines when the computer password needs to be changed.
Change it to whatever number of days you think may be enough. For example, if
you use snapshots that are less than 100 days old, then you can set this value
to 100 or similar.
Settings can also be configured by using Group Policy
(either domain-based GPO or local):
Computer Configuration\windows Settings\Security
settings\Local Policies\Security Options
Domain member: Disable machine account Password changes
Domain member: Maximum machine account Password age
After making the changes, reboot the client computer(s), and
then create a snapshot, if you need one.
Option #2
Live with it, know it's an issue, fix it every time. It's
time consuming, sure, but it's probably more secure than option #1. Read my Fixing "Windows cannot connect to the domain"
Errors article.
Option #3
If these VMs are used for testing, QA, demos etc. you could
consider creating a "closed" environment, where not only the client
computer has a snapshot, but also the domain controller(s) have one. When you
revert to a snapshot, you also revert to the same snapshot level on the DCs, all
of them, at the same time. For some settings this may actually be a nice
setup. However, if you cannot create such a setup then you're probably have to
either go with option #1 or #2.
No comments:
Post a Comment