Archive: ‘Troubleshooting’ Category

FRS and the A-record and CNAME

No comments May 29th, 2010

Case:

‘DC01test’ has a modified object that should be replicated to its partner ‘DC4test’:

1. ‘DC01test’ queries AD for a configured replication partner (default defined by the KCC service)

2. ‘DC01test’ knows the name (‘DC4test’) of his replication partner, but needs to find the GUID of ‘DC4test’.

3. ‘DC01test’ compare all CNAME record in the “_msdcs zone” and finds the GUID that match the name ‘DC4test’

4. Next step ‘DC01test’ needs to find is the IP of ‘DC4test’ so it can connect to ‘DC4test’.

5. ‘DC01test’ sends a recursive DNS query to its primary configured DNS server asking for a CNAME (the alias of the GUID).

Query: guid._msdcs.spurs.local
DNS server respond with: ‘DC4test.spurs.local’

6. ‘DC01test’ ask his DNS for the A-record for ‘DC4test.spurs.local’
DNS server returns the IP: 10.1.88.50

7. ‘DC01test’ connects to ‘DC4test’ and flags that “I have a change you need to get from me”.

8. Since FRS is based on PULL (not push), ‘DC4test’ will pull the changes on the object from ‘DC01test’.

If the A-record or the CNAME is missing or not correct, this process will fail. As a result, the replication will fail.
A handy tool that will test that all records are registered on all authoritative DNS servers is “dnslint”. It will create a HTM-report and highlight errors/warnings.

ie. dnslint /ad /s 10.1.88.150 /v

If a CNAME is missing:

Ref:
DNSLint usage: http://support.microsoft.com/kb/321045
Troubleshooting with DNSLint: http://support.microsoft.com/kb/321046

Diagnozing

No comments May 5th, 2010

Have you ever felt that sometimes your girlfried is crumpy but still says everything is fine? You feel a tension in the air.
You: Something wrong?

Her: No (*gosh* she thinks. Why can’t he read my mind that senseless bastard)

You: Cool!
(but you isn’t 100 per cents comfortable with the answer. You feel that there is something in the air, but you can’t tell what it is)

Four days goes by. You have just got home from a football game (Tottenham vs Liverpool: 2-1). Happy as you can be, but you notice your girlfriend is on fire!!

Her (shouting): Why did you say no to visiting my parents two weeks ago? You and your brainless soccer.

You (thinking): it’s called “football” not “soccer”, but wisely you keep your mouth shut.

Her: You spend more time with your Tottenham compared to me and bla,bla,bla…

You (thinking): ahhh.. that’s what was in the air a week ago…

Everything in the OSI model below layer 7 is straight forward and well documented. It’s “layer 8” that is the most complex layer and hardest to understand.
In Active Directory this is not a case, unless you’re not dealing with a “Slow logon problem” (which can be a layer 8 problem).
If you feel there is something wrong in AD, you’ll get a straight forward answer by asking your domain what’s the problem. You just need the tools and syntax to do the questions for you.

Here are the tools and syntaxes I use most of the time to get the answers:

The MS Support Tools package. This is a “must have” package as long as you have a Domain Controller (<= 2003). Both for maintaining and troubleshooting.

1. Event log
– Look for Warnings and Errors (System, DS, DNS and FRS)

2. dcdiag /v /e /c /f:dcdiag.txt
– My favorite. This will diagnose all DC’s and write the result to a single log file (here: dcdiag.txt). Be aware that this will generate some network traffic if you have many DC’s in various sites.

3. netdiag /v
– diagnose network related issues

4. nltest /dclist:spurs.local
– list all domain controllers in the spurs.local domain and what site they are located (handsome to get a quick overview in a new domain)

5. netdom query fsmo
– list the FSMO holders in the domain/forest

6. netdom query dc
list all domain controllers in spurs.local. It can’t list RODCs.

7. dsquery server -isgc
– list all the Global Catalogs

8. repadmin /showrepl and repadmin /replsum
– show the last replication cycle

9. repadmin /showbackup *
– show when the last backup was taken

10. dcdiag /test:dns /f:dnstest.txt /v
– to test DNS issues. Look at the end of the file for the summary.

11. dnslint /ad /s <ip-address of DNS server> /v
Verifies registration and records and create a htm file for presentation.

Other useful tools I like:

Account lockout and management tools:
http://www.microsoft.com/downloads/details.aspx?FamilyId=7AF2E69C-91F3-4E63-8629-B999ADDE0B9E&displaylang=en

Group Policy Management Consol (must have):
http://www.microsoft.com/downloads/details.aspx?familyid=0a6d4c24-8cbd-4b35-9272-dd3cbfc81887&displaylang=en

Oldcmp (for cleanup):
http://www.joeware.net/freetools/tools/oldcmp/index.htm

Wireshark (for network troubleshooting):
http://www.wireshark.org/

Policy Reporter (for parsing Userenv logs):
http://www.sysprosoft.com/policyreporter.shtml

How nice would it be to have a toolkit for females where you could easily debug them and get straight forward answers? Maybe someday in the future….

USN-rollback

1 comment April 26th, 2010

1. You discover that the netlogon service is in a pause state
2. Event ID 2095 / 2103 is logged in the directory service event log
3. Inbound and outbound replication is disabled

You’re in a now in an USN-Rollback state!

(This situation will never happen in a single DC forest)

The most common reasons for this state is that an unsupported restore method has occurred.

Like:

– You clone a DC with ie. VMWare Converter
– You revert to a snapshot of a DC and start the DC in normal mode without setting the “Database restored from backup” registry key [1][2]

If your HW is set to do buffered disc writes and an unexpected power failure occur, may also put you in an USN-Rollback state

The way to get out of this state is to remove the bad DC out of your domain:

1. on the bad DC: dcpromo /forceremoval
2. on a healthy DC: Run a Metadata cleanup [3]
3. on a healthy DC: seize the FSMO if necessary [4]
4. rebuild the “bad” DC and promote it back

I’m living in a freezing country (bah!) and we use to say “pee in your pants to get warm (for about 5 seconds…)”.

I have seen some recommendation at some other blogs and forums that provides a “quick” solution to recover from a USN-Rollback state.

Here is a “pee in your pants” solution:

– Delete the “DSA Not Writable” registry key
– Enable replication with repadmin
– Reboot
– Fixed? Nah, not really!

Here is what Microsoft says about this in an extention to KB 875495 [5] that for some reason is hidden for the general public:

Deleting or manually changing the Dsa Not Writable registry entry value puts the rollback domain controller in a permanently unsupported state. Therefore, such changes are not supported. Specifically, modifying the value removes the quarantine behavior added by the USN rollback detection code. The Active Directory partitions on the rollback domain controller will be permanently inconsistent with direct and transitive replication partners in the same Active Directory forest.”

References:

[1] DC’s and VM’s – Avoiding the Do-over
[2] Backup and Restore Considerations for Virtualized Domain Controllers
[3] Metadata cleanup
[4] Seizing the FSMO’s
[5] KB 875495 – How to detect and recover from a USN rollback