Exchange
DAG Node Failure – Force Switchover With Queues
I had this issue with a client last week, the
system was Exchange 2010 with a 2 node Database Availability Group (DAG) setup.
One of the Exchange nodes had gone offline and this would be permanent as the
failure was catastrophic. I checked that the second node had kicked into action
but it had not. The mailbox database was down and upon checking the replication
status of the mailbox database to the second node the copy queue was at 9223372036854775766.
Because of this when I tried to force fail
over I was greeted with the following error.
An Active Manager
operation failed. Error The database action failed. Error: An error occurred
while trying to validate the specified database copy for possible activation.
Error: Database copy ‘Database1’ on server ‘dagnode2.domain.com’ has a copy
queue length of 9223372036854725486 logs, which is too high to enable automatic
recovery. You can use the Move-ActiveMailboxDatabase cmdlet with the
-SkipLagChecks and -MountDialOverride parameters to move the database with
loss. If the database isn’t mounted after successfully running
Move-ActiveMailboxDatabase, use the Mount-Database cmdlet to mount the
database.
I was pretty confident that no mail would be
lost as all my clients are in cached mode so upon reconnecting to the CAS
server they would sync mail backup to to second mailbox server.
Upon running the command mentioned in the
error I was again greeted with red warning errors stating the that it could not
start the Microsoft Exchange Search Indexing service on the failed node…that’s
because it does not exist anymore, great.
To get around this we need to add a few extra
flags to the command above. They are as below.
·
–SkipActiveCopyChecks – The SkipActiveCopyChecks parameter specifies whether to skip checking the current
active copy to see if it’s currently a seeding source for any passive
databases. Be aware that when using this parameter, you can move a database
that’s currently a seeding source, which cancels the seed operation.
·
–SkipClientExperienceChecks – The SkipClientExperienceChecks parameter specifies whether to skip the search catalog
(content index) state check to see if the search catalog is healthy and up to
date. If the search catalog for the database copy you’re activating is in an
unhealthy or unusable state and you use this parameter to skip the search
catalog health check and activate the database copy, you will need to either
re-crawl or reseed the search catalog.
With this in mind we run the following
commands to force our node back into life even though the mailbox database is
not fully synced.
Move-ActiveMailboxDatabase database1 -ActivateOnServer dagnode2
-SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks
-SkipLagChecks -MountDialOverride:BESTEFFORT
Ex: Move-ActiveMailboxDatabase "Mailbox Database 1984723136" -ActivateOnServer SRVEX2 -SkipHealthChecks -SkipActiveCopyChecks -SkipClientExperienceChecks -SkipLagChecks -MountDialOverride:BESTEFFORT
Once ran your database will now mount and
clients will be able to connect. As mentioned this works well for situations
where you have a 2 node DAG cluster with one node down and the copy queue
length does not allow automatic failure.
No comments:
Post a Comment