Monday, December 28, 2009

Active Directory database corruption

Below are a few common causes behind the NTDS database corruption:
================================================
- Hard disk failure (Bad sectors).
- ‘Disk write caching’ enabled on the disk.
- Dirty (unexpected) shutdown of the server.
- Realtime Antivirus scanning of the NTDS database and transaction log files.
- Large fragmented database.
- Drive containing the NTDS database is compressed.
- Some activity other than the ones above that would prevent the transactional changes to be written to the local copy of the Active Directory database.
- The NTFS file system permissions on the NTDS folder or the root drive is too restrictive.

Below are the events reported in the eventlog that would indicate a NTDS database issue. You can use this with your monitoring software to detect any NTDS database corruption issue.

Event Type: Error
Event Source: NTDS Replication
Event Category: Replication
Event ID: 2108
Description: Active Directory could not update the following object with changes received from the following source domain controller. This is because an error occurred during the application of the changes to Active Directory on the domain controller.
8409 A database error has occurred.

Event Type: Error
Event Source: NTDS Replication
Event Category: Replication
Event ID: 1084
Description: This message indicates a specific issue with the consistency of the Active Directory database on this replication destination. A database error occurred while applying replicated changes to the following object.

Event Type: Error
Event Source: NTDS ISAM
Event Category: Database Corruption
Event ID: 467
Description: NTDS (540) NTDSA: Index INDEX_0009028F of table datatable is corrupted (0). or index INDEX_0009039a

Basic troubleshooting:
================


1. In DSRM mode, check the integrity of the Active Directory database. To do this, type "ntdsutil files integrity" at the command prompt.

If the integrity check indicates no errors, restart the domain controller in normal mode. If the integrity check does not finish without errors, continue to the following steps.

2. In DSRM mode, perform a semantic database analysis. To do this, type the following command at the command prompt, including the quotation marks:
'ntdsutil "sem d a" go'

3. If the semantic database analysis indicates no errors, continue to the following steps. If the analysis reports any errors, type the following command at the command prompt, including the quotation marks:
'ntdsutil "sem d a" "go f"'

4. Perform an offline defragmentation of the Active Directory database, KB 232122 (http://support.microsoft.com/kb/232122/ ).

Reboot into normal mode again and check if the database is good. if error continue or it does not allow you to login to thr normal mode then do any one o fthe following.

5. Restore from a latest system state backup when there were no errors related to NTDS corruption.

6. Demote and repromote the domain controller.
KB232122 (http://support.microsoft.com/kb/232122/ ) Performing offline defragmentation of the Active Directory database

References:
========
"Directory Services cannot start" error message when you start your Windows-based or SBS-based domain controller
http://support.microsoft.com/?id=258062

Issues with Jet Databases on Compressed Drives
http://support.microsoft.com/?id=318116

Event ID 2108 and Event ID 1084 occur during inbound replication of Active Directory in Windows 2000 Server and in Windows Server 2003
http://support.microsoft.com/?id=837932

An "Event ID 467" database corruption error may be intermittently logged in the Directory Services event log on a Windows Server 2003-based domain controller
http://support.microsoft.com/?id=902396

Event ID 1539 " Database integrity
http://technet.microsoft.com/en-us/library/dd941847(WS.10).aspx


- Aby

Saturday, August 1, 2009

Microsoft IT Environment Health Scanner - AD, DNS and Exchange

New tools released my Microsoft, that does a health check of your domain by running around 100 tests.

Microsoft IT Environment Health Scanner - Overview
The Microsoft IT Environment Health Scanner is a diagnostic tool that is designed for administrators of small or medium-sized networks (recommended up to 20 servers and up to 500 client computers) who want to assess the overall health of their network infrastructure. The tool identifies common problems that can prevent your network environment from functioning properly as well as problems that can interfere with infrastructure upgrades, deployments, and migration.


Microsoft IT Environment Health Scanner – download
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=dd7a00df-1a5b-4fb6-a8a6-657a7968bd11

Saturday, July 25, 2009

Troubleshooting Error: 1753 ( There are no more endpoints available from the endpoint mapper )



You may get the Error: 1753 ( There are no more endpoints available from the endpoint mapper ) when communicating with a Windows server. This error may be reported when a DC is Replicating Active Directory with its partner DC or when 2 servers are replicating files using FRS or DFSR etc.

So what should you interpret from this error?

This error means that there may not be any available ports for communication, on the server.
Ports can be utilized by services on the server, which may use it for listening. Apart from this, when the server established a session with the DC or another server, it will use a dynamic source port for it. So when all the available ports on the server are in use and the server cannot allocate another ports for communication, it returns the following error.

How-to Troubleshoot:

Connect to the server and collect the ‘Netstat –ano > netstat.txt’ output. This output will give us the summary of the network sessions on the server as well as it would tell us the PID of the process that owns the session.

===================================================
Active Connections

Proto Local Address Foreign Address State PID (PID of the process owning the session)
TCP 0.0.0.0:80 0.0.0.0:0 LISTENING 4
TCP 0.0.0.0:135 0.0.0.0:0 LISTENING 924
TCP 0.0.0.0:445 0.0.0.0:0 LISTENING 4
TCP 0.0.0.0:623 0.0.0.0:0 LISTENING 2460
TCP 0.0.0.0:1311 0.0.0.0:0 LISTENING 2412
TCP 0.0.0.0:3389 0.0.0.0:0 LISTENING 1060
TCP 0.0.0.0:49155 0.0.0.0:0 LISTENING 328
TCP 0.0.0.0:49502 0.0.0.0:0 LISTENING 320
TCP 0.0.0.0:49510 0.0.0.0:0 LISTENING 656
TCP 10.100.0.35:139 0.0.0.0:0 LISTENING 4
TCP 10.100.0.35:139 10.100.0.1:1152 ESTABLISHED 4
TCP 10.100.0.35:445 10.100.0.49:63298 ESTABLISHED 4
TCP 10.100.0.35:445 10.101.0.60:4779 ESTABLISHED 4
TCP 10.100.0.35:445 10.101.0.162:4681 ESTABLISHED 4-> PID of the SYSTEM process
TCP 10.100.0.35:445 10.101.0.164:1467 ESTABLISHED 4
TCP 10.100.0.35:445 10.101.0.170:1193 ESTABLISHED 4
TCP 10.100.0.35:445 10.101.2.92:3153 ESTABLISHED 4
TCP 10.100.0.35:3389 10.101.8.12:2575 ESTABLISHED 1060
TCP 10.100.0.35:49493 10.100.0.113:1025 ESTABLISHED 3424
TCP 10.100.0.35:49497 10.100.0.112:389 ESTABLISHED 320
TCP 10.100.0.35:49499 10.100.0.112:1025 ESTABLISHED 320
TCP 10.100.0.35:49502 10.200.15.26:62352 ESTABLISHED 320
TCP 10.100.0.35:49502 172.27.2.136:55187 ESTABLISHED 320
TCP 10.100.0.35:50673 10.100.0.112:389 CLOSE_WAIT 4428
TCP 10.100.0.35:50902 10.100.0.112:389 CLOSE_WAIT 4428
TCP 10.100.0.35:50914 10.100.0.112:389 CLOSE_WAIT 4428
TCP 10.100.0.35:51498 10.100.0.112:389 CLOSE_WAIT 320
TCP 10.100.0.35:51762 10.100.0.112:1025 TIME_WAIT 0
TCP 10.100.0.35:51774 10.100.0.1:8014 ESTABLISHED 504
TCP 10.100.0.35:51775 10.100.0.111:1025 TIME_WAIT 0
TCP 10.100.0.35:51790 10.100.0.56:139 TIME_WAIT 0
TCP 10.100.0.35:51792 10.100.0.5:135 ESTABLISHED 924
TCP 10.100.0.35:51793 10.100.0.112:445 TIME_WAIT 0
TCP 10.100.0.35:51794 10.100.0.5:135 TIME_WAIT 0
TCP 10.100.0.35:51796 10.100.0.5:49153 TIME_WAIT 0 --> session is closed.
TCP 10.100.0.35:51797 10.100.0.5:49152 TIME_WAIT 0
==========================================================

From the Netstat output check how many active sessions are there on the server. If there are a lot of active ports on the server, check which process owns most of them.

If one process owns most of the ports, it may indicate a problem where the process may not be releasing ports, or there might be a lot of incoming and outgoing connections to that process.

--> Check if Killing/Ending the process resolves the above error temporarily.
--> If the port usage by this process is unusual, this need to be investigated further.

It may be that the port usage is normal as per the role of the server but the available ports on the server are less. By default, in Windows 2000/2003, ephemeral ports are allocated from port number 1024 through port number 5000, until the range is increased using the ‘MaxUserPort’ registry value. The ‘MaxUserPort’ value specifies the highest port number that TCP can assign when an application requests an available user port from the system. ‘MaxUserPort’ can take in value between 5000 and 65534.

The default dynamic port range for TCP/IP has changed in Windows Vista and in Windows Server 2008
http://support.microsoft.com/?id=929851

--> Try to increase the available port range using ‘MaxUserPort’ and Reboot the server.
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
Create a new REG_DWORD value ‘MaxUserPort’ with decimal value of 65534.

At times, you will see the above error even if all the ports from the available range are not exhausted. This is when you should check if the below configuration is in place.

HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc\Internet
Ports =
Range port ports like 1024-1124

How to configure RPC dynamic port allocation to work with firewalls

http://support.microsoft.com/kb/154596

--> If the above registry value is set, then try to increase the range of the ports specified by another 100 ports and reboot to check if it helps.


Hope the above troubleshooting steps help :-)

Script for Metadata Cleanup

VBS script to do a Metadata Cleanup (remove references from AD), of a forcefully demoted/removed Domain Controllers.

Run script and type in the name of the DC from the list displayed..
· It removed DC references from AD.
· It removed DC references from AD sites and services snap-in.
· It removed DC references from DNS.

Awesome stuff!!

Supported on all Windows OS

Remove Active Directory Domain Controller Metadata
http://www.microsoft.com/technet/scriptcenter/scripts/ad/domains/addmvb04.mspx?mfr=true

The manual method of doing the same is using the NTDSUtil Tool on a good writable DC.

Clean up server metadata
http://technet.microsoft.com/en-us/library/cc736378(WS.10).aspx

How to remove data in Active Directory after an unsuccessful domain controller demotion
http://support.microsoft.com/kb/216498