Archive: Posts Tagged ‘Diag’

RpcSs expected value WIN32_SHARE_PROCESS

2 comments February 12th, 2011

If you have a domain with a mixture of Win2003 and Win2008 domain controllers, you might get some ”false-positive” errors running DCDIAG.exe.

Starting test: Services
      Invalid service type: RpcSs on DC-Win2003, current value
      WIN32_OWN_PROCESS, expected value WIN32_SHARE_PROCESS
......................... DC-Win2003 failed test Services

If you run DCDIAG from a Win2003 DC it will not report any errors, but if you run it from a Win2008 DC it will report this error.

I.e. from a Win2008 DC:

Dcdiag /e (testing all DCs in the domain)

or

Dcdiag /s:DC-Win2003 /test:services (run test only against DC-Win2003).

If you look at the service on a Win2003 DC, its Type is 0x10 (own), while on a Win2008 DC its 0x20 (shared).

HKLM\System\CCS\Services\RpcSs\Type

So when you run DCDIAG from a Win2008 DC it assumes the Type should be 0x20 on all DCs it runs a diagnostic on. The DCDIAG version on Win2008 will not check if it’s testing against a Win2003 DC.

If you try to change how this service runs on a Win2003 DC with: ”sc config rpcss type= share”, it will change the Type to 0x20 and a DCDIAG (/e) will be clean.

I had to ask the MS DS team about this, since there ain’t a KB regarding this and they made a KB regarding this issue. If you google it you will get various recomendations to change the RpcSs service to run as shared. The DS team said this is expected behavior from DCDIAG. You should NOT change the way this service run on a Win2003 DC. Leave it as it is, as it will not share its memory space of the instance of svchost with anyone (nobody is requesting to share the space). Even if you change it to shared.

Reference: http://blogs.technet.com/b/askds/archive/2011/02/11/friday-mail-sack-the-year-3000-edition.aspx

Checking AD Replication

2 comments November 29th, 2010

When you have multiple domain controllers they need to replicate since they are multi-masters. DC1 should hold the same data as DC2 and vice versa, and changes can be done on the DC that suits you (in theory).

If you want to have a quick look if the replication in your forest is ok, you can use a powerful command line tool called “repadmin”.

Open cmd and run: repadmin /replsum

If “largest delta” is less than 1 hour (intrasite) and “fails” = 0, your AD replication (not testing FRS replication) between all DCs in the forest is good.

If fails > 0 you need to investigate further.

Replication is based on pull, so you should focus on “Destination DSA” and “Inbound Neighbors”.

If DC01Test had some failures, I would run: “repadmin /showrepl dc01test” to see which DC(s) it can’t pull changes from, or if it’s a single Naming Context or all NC’s that it has problem replicating. Replication is 100% dependent of DNS, so DNS is a common cause of replication problems.

 REPADMIN /REPLSUM:

The five dots says I have 2 domain controllers in the forest. The first three dots are “processing dots”, while each of the rest represent a DC. 5 – 3 = 2 domain controllers.

Largest Delta: longest replication gap amongst all replication links for a particular DC

A. DC01Test Largest Delta: 47m:15s
B. Last attempt: 19:57:13 (from showrepl, where DC01test pulled schema changes from DC4test)

A + B = Rep. Summary Start Time: 20:44:28

REPADMIN /SHOWREPL <source DC>

Inbound Neighbors: Shows the DC’s <source DC> is pulling from and the 4 NC’s (5 links).

DSA Object GUID: The GUID of the source or destination. A CNAME named GUID located in the _msdcs domain zone must be present and have a value of the hostname of the correct DC.

Last attempt @: last time DC01Test pulled from DC4Test and if it was successful.

If you want to read more about what repadmin can do, you can download the whitepaper:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c6054092-ee1e-4b57-b175-5aabde591c5f&displayLang=en

Troubleshooting slow logon

3 comments June 24th, 2010

If a light bulb has stopped glowing, you wouldn’t start with tearing down the wall to check the cables. You’d probably check if pressing the light switch will help you get the light back. If that didn’t work, you’ll move on to replace the light bulb. Still no luck? You’ll move on to check the fuse.

If the fuse is ok, is it just this room that is affected? You might look out the window to see if your neighbours have some lights on.

If you can’t figure it out, you will maybe call en electrician. You’d tell him all the things (steps) you have checked. Maybe the electrician have some additional steps he/she ask you to check…

Why would you follow these steps?

A wise man once said “if you know how a system works, then you’d be able to troubleshoot the system” (at least I think he said it:)

This rule applies to almost all efficient troubleshooting. If you’re going to troubleshoot slow logon issues, then it’s not as easy as the light bulb example. The cause can be hundreds of things. So where should I start looking?

Instead of re-inventing the wheel, here is my four favorite MS team blogs regarding the issue. If you take your time and read them, you will have a very good chance to find the culprit.

1. Ask the Perf team

2. Ask the DS team part#1

3. Ask the DS team part#2

4. Troubleshooting AD by Instan

Journal Wrap

8 comments June 6th, 2010

Environment:

– Windows 2000/2003/2008 domain controllers using FRS (not DFSR).
– More than one Domain Controller
– Atleast one DC with a healthy SYSVOL

Why do Journal Wraps occur?

Instan at the AD Troubleshooting blog made an excellent blog entry about:

What happens in a Journal Wrap?

You should give it a read to understand what is going on under the hood.

Symptoms that might occur:

  • Event ID 13568 is logged in the NtFrs event log
  • A generic Event ID 1058 may be logged
  • You make changes to a logon script but not all users got the change
  • Changing a GPO or creating a new GPO is not applied to all users or computers
  • Missing SYSVOL share
  • A RSoP or gpresult report that data or policy object is missing or corrupt

If you take a look at the 13568 event you’ll see that there is a “solution” to this problem:

Set the “Enable Journal Wrap Automatic Restore” registry parameter to 1

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters”

Restart ntfrs service.

This is not a good solution for post-Server 2000 SP3.
I don’t know why Microsoft still have this “how-to-fix” in event 13568, but they say in KB 290762:

Important: Microsoft does not recommend that you use this registry setting, and it should not be used post-Windows 2000 SP3. Appropriate options to reduce journal wrap errors include…

Update: I had to ask around about this since it was nagging me:

The event was never changed because the product group didn’t want to pay for the localization cost, nor admit that this registry setting caused more problems than it fixed. It actually came down to ego – the developer of FRS was a real piece of work. So instead the public docs were updated to state not to use that autorecovery registry setting.


Instead you should go for the Burflags method. This will kick start your SYSVOL up and running. Most often a “non-authoritative” (D2) approach will fix you up.

The “D2” key can be set two places in registry:

Global re-initialization:

HKLM\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

or

Replica set specific re-initialization:

HKLM\System\CurrentControlSet\Services\NtFrs\Parameters\Cumulative Replica Sets\GUID

If you’re using DFS replica sets that holds a large amount of data that is healthy, go for the “Replica set specific re-initialization”. If you set the Global Burflags, FRS will re-initialize all replica sets, including the DFS namespace the member holds. If they hold a large amount of data… that might take some time.

To find the GUID of SYSVOL, look for the “Replica Set Name” named “Domain System Volume (SYSVOL SHARE)” under the subkey “HKLM\..\..\Replica Sets”:

This screenshot have only one GUID since I don’t use DFS in my lab.

Change the value of Burflags to D2 (hex).
If you don’t uses DFS you could just set the Global Burflags to D2. It will not make any difference under what subkey you set it. This will re-initialize all replica sets the member holds (in this case the SYSVOL).

After you have set the Burflags key to D2, you have to restart the NTFRS service on the affected DC.

Overview of what happens:

1. The Burflags is set to 0
2. Event ID 13565 is logged. non-authoritative restore has started
3. The content of SYSVOL are moved to the pre-existing folder
4. Event ID 13520 is logged
5. The local FRS database is rebuilt
6. It re-join (vvjoin) the replica set
7.  The “bad DC” will compare all files (file ID and MD5 sum) it has in the Pre-existing folder with the files from an upstream partner.
8. If a match is found, it will copy the file from the Pre-Existing folder to the original location. If they don’t match, it will pull the file from the upstream partner.
9. Event ID 13553 is logged
10. FRS notifies (SysvolReady reg.key = 1) the Netlogon service that SYSVOL is ready and can be shared.
11. The Netlogon service will share SYSVOL and Netlogon.
12. Event ID 13516 is logged (finished)

 

When you have verified that SYSVOL is shared and in sync, you can delete the content in the Pre-Existing folder to free up space.


Authoritative restore (D4):

If your SYSVOL is all messed up on every DC’s, you might have to do an “authoritative restore” using both the D4 and D2 values.

By the way you should never, ever use the D4 flag on more than one DC as you will have a lot of collisions and morphed folders. The D4 flag should only be set like Microsoft says, as a last resort.

Quick overview:

1. Stop the NtFrs service on every DC
2. Set the D4 flag on one DC that will be authoritative for the replica set(s). The SYSVOL content will not be moved to the pre-existing folder on the authoritative member.
3. Set the D2 flag on the other DC’s (non-authoritative)
4. Start the NtFrs service on the “D4” DC.
5. Check that Event ID 13553 and 13516 is logged.
6. If step 5 is ok, start NtFrs on the “D2” DC’s.

For detailed steps, see “How to rebuild the SYSVOL tree and its content in a domain”


References
:

FRS event codes: http://support.microsoft.com/kb/308406

What happens in a Journal Wrap?
http://blogs.technet.com/instan/archive/2009/07/14/what-happens-in-a-journal-wrap.aspx

How to rebuild the SYSVOL tree and its content in a domain
http://support.microsoft.com/kb/315457

Using the BurFlags registry key to reinitialize File Replication Service replica sets
http://support.microsoft.com/kb/290762

Backing Up and Restoring an FRS-Replicated SYSVOL Folder
http://msdn.microsoft.com/en-us/library/cc507518(VS.85).aspx

Diagnozing

No comments May 5th, 2010

Have you ever felt that sometimes your girlfried is crumpy but still says everything is fine? You feel a tension in the air.
You: Something wrong?

Her: No (*gosh* she thinks. Why can’t he read my mind that senseless bastard)

You: Cool!
(but you isn’t 100 per cents comfortable with the answer. You feel that there is something in the air, but you can’t tell what it is)

Four days goes by. You have just got home from a football game (Tottenham vs Liverpool: 2-1). Happy as you can be, but you notice your girlfriend is on fire!!

Her (shouting): Why did you say no to visiting my parents two weeks ago? You and your brainless soccer.

You (thinking): it’s called “football” not “soccer”, but wisely you keep your mouth shut.

Her: You spend more time with your Tottenham compared to me and bla,bla,bla…

You (thinking): ahhh.. that’s what was in the air a week ago…

Everything in the OSI model below layer 7 is straight forward and well documented. It’s “layer 8” that is the most complex layer and hardest to understand.
In Active Directory this is not a case, unless you’re not dealing with a “Slow logon problem” (which can be a layer 8 problem).
If you feel there is something wrong in AD, you’ll get a straight forward answer by asking your domain what’s the problem. You just need the tools and syntax to do the questions for you.

Here are the tools and syntaxes I use most of the time to get the answers:

The MS Support Tools package. This is a “must have” package as long as you have a Domain Controller (<= 2003). Both for maintaining and troubleshooting.

1. Event log
– Look for Warnings and Errors (System, DS, DNS and FRS)

2. dcdiag /v /e /c /f:dcdiag.txt
– My favorite. This will diagnose all DC’s and write the result to a single log file (here: dcdiag.txt). Be aware that this will generate some network traffic if you have many DC’s in various sites.

3. netdiag /v
– diagnose network related issues

4. nltest /dclist:spurs.local
– list all domain controllers in the spurs.local domain and what site they are located (handsome to get a quick overview in a new domain)

5. netdom query fsmo
– list the FSMO holders in the domain/forest

6. netdom query dc
list all domain controllers in spurs.local. It can’t list RODCs.

7. dsquery server -isgc
– list all the Global Catalogs

8. repadmin /showrepl and repadmin /replsum
– show the last replication cycle

9. repadmin /showbackup *
– show when the last backup was taken

10. dcdiag /test:dns /f:dnstest.txt /v
– to test DNS issues. Look at the end of the file for the summary.

11. dnslint /ad /s <ip-address of DNS server> /v
Verifies registration and records and create a htm file for presentation.

Other useful tools I like:

Account lockout and management tools:
http://www.microsoft.com/downloads/details.aspx?FamilyId=7AF2E69C-91F3-4E63-8629-B999ADDE0B9E&displaylang=en

Group Policy Management Consol (must have):
http://www.microsoft.com/downloads/details.aspx?familyid=0a6d4c24-8cbd-4b35-9272-dd3cbfc81887&displaylang=en

Oldcmp (for cleanup):
http://www.joeware.net/freetools/tools/oldcmp/index.htm

Wireshark (for network troubleshooting):
http://www.wireshark.org/

Policy Reporter (for parsing Userenv logs):
http://www.sysprosoft.com/policyreporter.shtml

How nice would it be to have a toolkit for females where you could easily debug them and get straight forward answers? Maybe someday in the future….