Archive: Posts Tagged ‘Troubleshooting’

Troubleshooting slow logon/boot on Win7

No comments August 17th, 2012

As I mentioned earlier in a post, causes of slow logons can be many things and troubleshooting this is often time consuming. I recommend you to download the ADK tools for Win8 (for the moment it’s for the Consumer preview) and use the “Windows Performance Recorder” and “Windows Performance Analyzer” to help you find the culprit.

To get an overview of the tool and some examples, take a look at this excellent TechEd session: How many coffees can you drink while Windows 7 boots?

 

 

Reset the Secure Channel

No comments October 25th, 2011

When a computer joins a domain, a computer account is created in AD. The computer account gets its own password that will expire after 30 days (default). When the password expire, the computer itself will initiate a password change with a DC in its domain.

When the computer starts up, it uses this password to create a secure channel (SC) with a DC. The computer will request to sign all traffic that passes the SC. If a DC says “go ahead”, all traffic that is signed passes through this channel.

Traffic like NTLM pass through authentication is typically signed traffic.

So what will happen if there is a mismatch between the computer account password? The computer tries to authenticate, but the DC says this is not the correct password.

The SC is down.

Tools like “netdom” could be used to reset the password, but this only worked to reset the SC between two DCs. It was not possible to reset the SC on a domain member. The computer had to rejoin the domain.

Syntax:

netdom resetpwd /server:<Name of a DC> /userd:domain\administrator /passwordd:admin_password

Netdom was written back in the NT4 days, and a new tool has taken over. Not just taken over for Netdom, but also for tools like Nltest. Windows PowerShell.

To reset the SC between a computer and a DC:

Open PowerShell on the computer and run the *cmdlet:

Test-ComputerSecureChannel -repair

*The cmdlet requires PowerShell 2.0, which is pre-installed on Win7/2008R2.

In Win8 there are thousands of new cmdlets, so if you have not began to look at PS. Now is a good time.

 

References:

PowerShell 2.0 for XP, 2003, Vista, 2008: http://support.microsoft.com/kb/968929

Symptoms of a broken SC: http://blogs.technet.com/b/asiasupp/archive/2007/01/18/typical-symptoms-when-secure-channel-is-broken.aspx

Test-ComputerSecureChannel cmdlet: http://technet.microsoft.com/en-us/library/dd367893.aspx

 

dfsr migration

No comments August 29th, 2011

If you only have 2008 DCs, and you are replicating SYSVOL with FRS. You could/would/should migrate to DFS replication.

Like with any major changes you do to your domain, you should run a dcdiag before you do anything.

I just saw a case where an old Reference was still alive and stalled the migration. The DC (SYSVOL member) was cleaned out long ago, but it looked like it failed removing all traces. The solution was to delete the reference manually with adsiedit.

 

dcdiag:

Starting test: VerifyEnterpriseReferences
The following problems were found while verifying various important DN
references.  Note, that  these problems can be reported because of
latency in replication.  So follow up to resolve the following
problems, only if the same problem is reported on all DCs for a given
domain or if  the problem persists after replication has had
reasonable time to replicate changes.
 
[1] Problem: Missing Expected Value
 
Base Object: CN=Win2008-DC01,OU=Domain Controllers,DC=spurs,DC=local
Base Object Description: "DC Account Object"
Value Object Attribute Name: msDFSR-ComputerReferenceBL
Value Object Description: "SYSVOL FRS Member Object"
Recommended Action: See Knowledge Base Article: Q312862
 
[2] Problem: Missing Expected Value
 
Base Object: CN=Win2000-DC1, OU=Domain Controllers,DC=spurs,DC=local
Base Object Description: "DC Account Object"
Value Object Attribute Name: serverReferenceBL
Value Object Description: "Server Object"
Recommended Action: Check if this server is deleted, and if so
clean up this DCs Account Object.
 

Beware that the “VerifyEnterpriseReferences” tested from a Win2008 DC will report back a “Missing Expected Value” for msDFSR-ComputerReferenceBL. This is expected since the 2008 version of dcdiag don’t know that SYSVOL is still replicated with FRS.

So, don’t touch DFSR references.

Migrating step-by-step:

http://blogs.technet.com/b/filecab/archive/2008/02/08/sysvol-migration-series-part-1-introduction-to-the-sysvol-migration-process.aspx

 

 

Folder Redirection + Microsoft Dynamics CRM 2011 = false

8 comments June 22nd, 2011

Consider the following environment:

3 x Win2008R2 SP1 RDS (terminal servers with load balancing)
1 x Win2008R2 SP1 Microsoft Dynamics CRM 2011 (Rollup pack 2 at the moment)
CRM for Outlook installed on the RDS servers.

Since you don’t want users to save documents, pictures, etc. on the RDS servers, and you want the users environment to be the same no matter what RDS server they happen to be routed to, you configure Folder Redirection and Roaming Profiles.

Doing this will leave your MS CRM installation in an unsupported state as MS CRM 4 and CRM 2011 don’t support Folder Redirection.

Problems I experienced:

If you open up a window from CRM and then you close it, you’ll get: An error occurred. Send Report to Microsoft?

If you open CRM for Outlook as a normal user, and you try to track an email, you’ll get and error stating that it didn’t work. If you look in the Event log on the RDS server you’ll see:

EventID 5972 Source MSCRMAddin

I opened a support case with Microsoft, and got in contact with the MS CRM team. They told me that Folder Redirection (FR) is unsupported in MS CRM, so I had to remove FR if they should be able to investigate any further.

That would be a huge drawback, since we uses load balancing between the RDS’s, and the users would be saving documents directly on the RDS servers. Ouch!


Solution: Remove Folder Redirection completly

Solution (unsupported):

There are two files (caches) that have to be local on the RDS for CRM to work. “EmailCache.sdf” and “OutlookSyncCache.sdf”.

They are located in the “%userprofile%\AppData\Roaming\Microsoft\MSCRM” folder. If you redirect “Appdata(Roaming)” those two files will be on a file share. That will cause problems for the CRM client and present you some weird errors.

So if you have to use FR, you can’t redirect “AppData”. That folder has to be local. The rest of the folders didn’t seem to cause any problems redirecting.

There are no official KB’s stating that Folder Redirection is unsupported in CRM 4 and CRM 2011, but it is. The CRM support team told me the product team was working on it, and there might come a resolution in the upcoming versions / rollups.

COYS!

A good friend…

No comments March 15th, 2011

In Star Wars, “R2-D2” was Luke Skywalker’s good friend. If you’re running a domain with FRS, D2 is your good friend. Even thought (2008) R2 (and DFSR) should be your buddy.

So when should you call your D2 buddy and give him a run?

You experience:

  • One of your DCs are in Journal Wrap
  • The local FRS jet database has become corrupt
  • Assertions in the FRS service
  • Missing FRS junction points
  • Missing FRS attributes/objects
  • Missing SYSVOL/NETLOGON share
  • Corrupt/missing NTFS journal
  • You are bored… (meaning the list is long)

Setting the backup/restore flag , a.k.a. “Burflags”, to D2, and you restart the NTFRS service things start moving.

HKLM\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

The bad DC will move all its SYSVOL data, if it holds any, into the “NtFrs_PreExisting_See_EventLog” folder. The bad DC will compare all these files with the ones of an upstream partner. It will compare the file IDs and the MD5 checksum from the upstream partner with the local ones. If a match is found, it will copy this file from the Pre-Existing folder into the original location. If it don’t match, it will copy the file from its partner.

When the replication has finished (Event ID 13516 is logged), you can delete the content in the Pre-Existing folder to free up space.

RpcSs expected value WIN32_SHARE_PROCESS

2 comments February 12th, 2011

If you have a domain with a mixture of Win2003 and Win2008 domain controllers, you might get some ”false-positive” errors running DCDIAG.exe.

Starting test: Services
      Invalid service type: RpcSs on DC-Win2003, current value
      WIN32_OWN_PROCESS, expected value WIN32_SHARE_PROCESS
......................... DC-Win2003 failed test Services

If you run DCDIAG from a Win2003 DC it will not report any errors, but if you run it from a Win2008 DC it will report this error.

I.e. from a Win2008 DC:

Dcdiag /e (testing all DCs in the domain)

or

Dcdiag /s:DC-Win2003 /test:services (run test only against DC-Win2003).

If you look at the service on a Win2003 DC, its Type is 0x10 (own), while on a Win2008 DC its 0x20 (shared).

HKLM\System\CCS\Services\RpcSs\Type

So when you run DCDIAG from a Win2008 DC it assumes the Type should be 0x20 on all DCs it runs a diagnostic on. The DCDIAG version on Win2008 will not check if it’s testing against a Win2003 DC.

If you try to change how this service runs on a Win2003 DC with: ”sc config rpcss type= share”, it will change the Type to 0x20 and a DCDIAG (/e) will be clean.

I had to ask the MS DS team about this, since there ain’t a KB regarding this and they made a KB regarding this issue. If you google it you will get various recomendations to change the RpcSs service to run as shared. The DS team said this is expected behavior from DCDIAG. You should NOT change the way this service run on a Win2003 DC. Leave it as it is, as it will not share its memory space of the instance of svchost with anyone (nobody is requesting to share the space). Even if you change it to shared.

Reference: http://blogs.technet.com/b/askds/archive/2011/02/11/friday-mail-sack-the-year-3000-edition.aspx

Published application launching full screen

No comments January 26th, 2011

We had a problem with some 2008 R2 terminal servers (RDS) with XenApp 6.0 installed on top. When a new user started a published application for the first time, the user got lunched into a full screen session. The “funny” thing was that this didn’t happened to all new users. It was just a random issue.

Nothing was logged in the event log, so we tried almost anything to figure out what caused this.

We were about to move many users over from 2003 terminal servers, so this was going to be a huge problem. We had read about the bug regarding roaming profile folders that already existed, and you tried to change i.e. “Start the following program at startup”, the change was not applied.

In lack of ideas we installed the hotfix on all the domain controllers (2008 R1) and the RDS servers. Guess what, that fixed the random problem launching some users into full screen.

I can’t wait for the release of SP1 for 2008 R2!

You can request the hotfix here.

Checking AD Replication

2 comments November 29th, 2010

When you have multiple domain controllers they need to replicate since they are multi-masters. DC1 should hold the same data as DC2 and vice versa, and changes can be done on the DC that suits you (in theory).

If you want to have a quick look if the replication in your forest is ok, you can use a powerful command line tool called “repadmin”.

Open cmd and run: repadmin /replsum

If “largest delta” is less than 1 hour (intrasite) and “fails” = 0, your AD replication (not testing FRS replication) between all DCs in the forest is good.

If fails > 0 you need to investigate further.

Replication is based on pull, so you should focus on “Destination DSA” and “Inbound Neighbors”.

If DC01Test had some failures, I would run: “repadmin /showrepl dc01test” to see which DC(s) it can’t pull changes from, or if it’s a single Naming Context or all NC’s that it has problem replicating. Replication is 100% dependent of DNS, so DNS is a common cause of replication problems.

 REPADMIN /REPLSUM:

The five dots says I have 2 domain controllers in the forest. The first three dots are “processing dots”, while each of the rest represent a DC. 5 – 3 = 2 domain controllers.

Largest Delta: longest replication gap amongst all replication links for a particular DC

A. DC01Test Largest Delta: 47m:15s
B. Last attempt: 19:57:13 (from showrepl, where DC01test pulled schema changes from DC4test)

A + B = Rep. Summary Start Time: 20:44:28

REPADMIN /SHOWREPL <source DC>

Inbound Neighbors: Shows the DC’s <source DC> is pulling from and the 4 NC’s (5 links).

DSA Object GUID: The GUID of the source or destination. A CNAME named GUID located in the _msdcs domain zone must be present and have a value of the hostname of the correct DC.

Last attempt @: last time DC01Test pulled from DC4Test and if it was successful.

If you want to read more about what repadmin can do, you can download the whitepaper:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c6054092-ee1e-4b57-b175-5aabde591c5f&displayLang=en

Troubleshooting slow logon

3 comments June 24th, 2010

If a light bulb has stopped glowing, you wouldn’t start with tearing down the wall to check the cables. You’d probably check if pressing the light switch will help you get the light back. If that didn’t work, you’ll move on to replace the light bulb. Still no luck? You’ll move on to check the fuse.

If the fuse is ok, is it just this room that is affected? You might look out the window to see if your neighbours have some lights on.

If you can’t figure it out, you will maybe call en electrician. You’d tell him all the things (steps) you have checked. Maybe the electrician have some additional steps he/she ask you to check…

Why would you follow these steps?

A wise man once said “if you know how a system works, then you’d be able to troubleshoot the system” (at least I think he said it:)

This rule applies to almost all efficient troubleshooting. If you’re going to troubleshoot slow logon issues, then it’s not as easy as the light bulb example. The cause can be hundreds of things. So where should I start looking?

Instead of re-inventing the wheel, here is my four favorite MS team blogs regarding the issue. If you take your time and read them, you will have a very good chance to find the culprit.

1. Ask the Perf team

2. Ask the DS team part#1

3. Ask the DS team part#2

4. Troubleshooting AD by Instan

Journal Wrap

6 comments June 6th, 2010

Environment:

– Windows 2000/2003/2008 domain controllers using FRS (not DFSR).
– More than one Domain Controller
– Atleast one DC with a healthy SYSVOL

Why do Journal Wraps occur?

Instan at the AD Troubleshooting blog made an excellent blog entry about:

What happens in a Journal Wrap?

You should give it a read to understand what is going on under the hood.

Symptoms that might occur:

  • Event ID 13568 is logged in the NtFrs event log
  • A generic Event ID 1058 may be logged
  • You make changes to a logon script but not all users got the change
  • Changing a GPO or creating a new GPO is not applied to all users or computers
  • Missing SYSVOL share
  • A RSoP or gpresult report that data or policy object is missing or corrupt

If you take a look at the 13568 event you’ll see that there is a “solution” to this problem:

Set the “Enable Journal Wrap Automatic Restore” registry parameter to 1

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters”

Restart ntfrs service.

This is not a good solution for post-Server 2000 SP3.
I don’t know why Microsoft still have this “how-to-fix” in event 13568, but they say in KB 290762:

Important: Microsoft does not recommend that you use this registry setting, and it should not be used post-Windows 2000 SP3. Appropriate options to reduce journal wrap errors include…

Update: I had to ask around about this since it was nagging me:

The event was never changed because the product group didn’t want to pay for the localization cost, nor admit that this registry setting caused more problems than it fixed. It actually came down to ego – the developer of FRS was a real piece of work. So instead the public docs were updated to state not to use that autorecovery registry setting.


Instead you should go for the Burflags method. This will kick start your SYSVOL up and running. Most often a “non-authoritative” (D2) approach will fix you up.

The “D2” key can be set two places in registry:

Global re-initialization:

HKLM\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

or

Replica set specific re-initialization:

HKLM\System\CurrentControlSet\Services\NtFrs\Parameters\Cumulative Replica Sets\GUID

If you’re using DFS replica sets that holds a large amount of data that is healthy, go for the “Replica set specific re-initialization”. If you set the Global Burflags, FRS will re-initialize all replica sets, including the DFS namespace the member holds. If they hold a large amount of data… that might take some time.

To find the GUID of SYSVOL, look for the “Replica Set Name” named “Domain System Volume (SYSVOL SHARE)” under the subkey “HKLM\..\..\Replica Sets”:

This screenshot have only one GUID since I don’t use DFS in my lab.

Change the value of Burflags to D2 (hex).
If you don’t uses DFS you could just set the Global Burflags to D2. It will not make any difference under what subkey you set it. This will re-initialize all replica sets the member holds (in this case the SYSVOL).

After you have set the Burflags key to D2, you have to restart the NTFRS service on the affected DC.

Overview of what happens:

1. The Burflags is set to 0
2. Event ID 13565 is logged. non-authoritative restore has started
3. The content of SYSVOL are moved to the pre-existing folder
4. Event ID 13520 is logged
5. The local FRS database is rebuilt
6. It re-join (vvjoin) the replica set
7.  The “bad DC” will compare all files (file ID and MD5 sum) it has in the Pre-existing folder with the files from an upstream partner.
8. If a match is found, it will copy the file from the Pre-Existing folder to the original location. If they don’t match, it will pull the file from the upstream partner.
9. Event ID 13553 is logged
10. FRS notifies (SysvolReady reg.key = 1) the Netlogon service that SYSVOL is ready and can be shared.
11. The Netlogon service will share SYSVOL and Netlogon.
12. Event ID 13516 is logged (finished)

 

When you have verified that SYSVOL is shared and in sync, you can delete the content in the Pre-Existing folder to free up space.


Authoritative restore (D4):

If your SYSVOL is all messed up on every DC’s, you might have to do an “authoritative restore” using both the D4 and D2 values.

By the way you should never, ever use the D4 flag on more than one DC as you will have a lot of collisions and morphed folders. The D4 flag should only be set like Microsoft says, as a last resort.

Quick overview:

1. Stop the NtFrs service on every DC
2. Set the D4 flag on one DC that will be authoritative for the replica set(s). The SYSVOL content will not be moved to the pre-existing folder on the authoritative member.
3. Set the D2 flag on the other DC’s (non-authoritative)
4. Start the NtFrs service on the “D4” DC.
5. Check that Event ID 13553 and 13516 is logged.
6. If step 5 is ok, start NtFrs on the “D2” DC’s.

For detailed steps, see “How to rebuild the SYSVOL tree and its content in a domain”


References
:

FRS event codes: http://support.microsoft.com/kb/308406

What happens in a Journal Wrap?
http://blogs.technet.com/instan/archive/2009/07/14/what-happens-in-a-journal-wrap.aspx

How to rebuild the SYSVOL tree and its content in a domain
http://support.microsoft.com/kb/315457

Using the BurFlags registry key to reinitialize File Replication Service replica sets
http://support.microsoft.com/kb/290762

Backing Up and Restoring an FRS-Replicated SYSVOL Folder
http://msdn.microsoft.com/en-us/library/cc507518(VS.85).aspx