Designing Infrastructure High Availability

IT people, for some reason, seem to have an affinity towards designing solutions that use “cool” features, even when those features aren’t really necessary. This tendency sometimes leads to good solutions, but a lot of times it ends up creating solutions that fall short of requirements or leave IT infrastructure with significant short-comings in any number of areas. Other times, “cool” features result in over-designed, unnecessarily expensive infrastructure designs.

The “cool” factor is probably most obvious in the realm of High Availability design. And yes, I do realize that with the cloud becoming more common and prevalent in IT there is less need to understand the key architectural decisions needed when designing HA, but there are still plenty of companies that refuse to use the cloud, and for good reason. Cloud solutions are not meant to be one size fits all solutions. They are one size fits most solutions.

High Availability (Also called “HA”) is a complex subject with a lot of variables involved. The complexity is due to the fact that there are multiple levels of HA that can be implemented, from light touch failover to globally replicated, multi-redundant, always on solutions.

High Availability Defined

HA is, put simply, any solution that allows an IT resource (Files, applications, etc) to be accessible at all times, regardless of hardware failure. In an HA designed infrastructure, your files are always available even if the server that normally stores those files breaks for any reason.

HA has also become much more common and inexpensive in recent years, so more people are demanding it. A decade ago, any level of HA involved costs that exponentially exceeded a normal, single server solution. Today, HA is possible for as little as half the cost of a single server (Though, more often, the cost is essentially double the single server cost).

Because of the cost reduction, many companies have started demanding it, and because of the cool factor, a lot of those companies have been spending way too much. Part of why this happens is due to the history of HA in IT.

HA History Lesson

Prior to the development of Virtualization (the technology that allows multiple “Virtual” servers to run on a single physical server), HA was prohibitively expensive and required massive storage arrays, large numbers of servers, and a whole lot of configuration. Then, VMWare implemented a solution called “VMotion” that allowed a Virtual Server to be moved between server hardware immediately at the touch of a button (Called VM High Availability). This signaled a kind of renaissance in High Availability because it allowed servers to survive a hardware failure for a fraction of the cost normally associated with HA. There is a lot more involved in this shift that just VMotion (SANs, cheaper high-speed internet, and similar advancements played a big part), but the shift began about the time VMotion was introduced.

Once companies started realizing they could have servers that were always running, regardless of hardware failures, an unexpected market for high-availability solutions popped up, and software developers started developing better techniques for HA in their products. Why would they care? Because there are a lot of situations where a server solution can stop working properly that aren’t related to hardware failures, and VMotion was only capable of handling HA in the event of hardware failures.

VM HA vs Software HA

The most common mistake I see people making in their HA designs is accepting the assumption that VM-level High Availability is enough. It is most definitely not. Take Exchange server as an example. There are a number of problems that can occur in Exchange that will prevent users from accessing their email. Log drives fill up, forcing database dismount. IIS can fail to function, preventing users from accessing their mailbox. Databases can become corrupted, resulting in a complete shutdown of Exchange until the database can be repaired or restored from backup. VM HA does nothing to help when these situations come up.

This is where the Exchange Database Availability Group (DAG) comes in to play. A DAG involves constantly replication changes to Mailbox Databases to additional Exchange servers (as many of them as you want, but 2-3 is most common). With a DAG in place, any issue that would cause a database to dismount in a single Exchange server will instead result in a Failover, where the database dismounts on one server and mounts on the other server immediately (within a few seconds or less).

The DAG solution alone, however, doesn’t provide full HA for Exchange, because IIS failures will still cause problems, and if there is a hardware failure, you have to change DNS records to point them to the correct server. This is why a Load Balancer is a necessary part of true HA solutions.

Load Balancing

A Load Balancer is a network device that allows users to access two servers with a single IP address. Instead of having to choose which server you talk to, you just talk to the load balancer and it decides which server to direct you to automatically. The server that is chosen depends on a number of factors. Among those is, of course, how many people are already on each server, since the primary purpose of a load balancer is to balance the load between servers more or less equally.

More importantly, though, most load balancers are capable of performing health checks to make sure the servers are responding properly. If a server fails a health check for any reason (for instance, if one server’s not responding to HTTP requests), the load balancer will stop letting users talk to that server, effectively ensuring that whatever failure occurs on the first server doesn’t result in users being unable to access their data.

Costs vs. Benefits

Adding a load balancer to the mix, of course, increases the cost of a solution, but that cost is generally justified by the benefit such a solution provides. Unfortunately, many IT solutions fail to take this fact into account.

If an HA solution requires any kind of manual intervention to fix, the time required for notifying IT staff and getting the switch completed varies heavily, and can be anywhere from 5 minutes to several hours. From an availability perspective, even this small amount of time can have a huge impact, depending on how much money is assumed as “lost” because of a failure. Here comes some math (And not just the Trigonometry involved in this slight tangent).

Math!

The easiest way to determine whether a specific HA solution is worth implementing involves a few simple calculations. First, though, we have to make a couple assumptions, none of which are going to be completely accurate, but are meant to help determine whether an investment like HA is worth making (Managers and CEOs take note)

  1. A critical system that experiences downtime results in the company being completely unable to make money for the period of time that system is down.
  2. The amount of money lost during downtime is equal to whatever percentage of a year the system is down times the amount of annual revenue the organization expects to make in a year.

For instance, if a company’s revenue is $1,000,000 annually, they will make an average of $2 per minute (Rounded up from $1.90), so you can assume that 5 minutes of downtime costs that company about $10 in gross revenue. The cheapest of Load balancers cost about $2,000 and will last about 5 years, so you recoup the cost of the load balancer by saving yourself 200 minutes of downtime. That’s actually less than the amount of time most organizations spend updating a single server. With Software HA in place, updates don’t cause downtime if done properly, so the cost of a load balancer is covered in just being able to keep Exchange running during updates (This isn’t possible with just VM HA). But, of course, that doesn’t cover the cost of the second server (Exchange runs well on a low-end server, so $5000 for server and licenses is about what it would cost). Now imagine if the company makes $10,000,000 in revenue, or think about a company that has revenue of several billion dollars a year. HA becomes a necessity according to these calculations very quickly.

VM HA vs Software HA Cost/Benefit

Realistically, the cost difference between VM HA and Software HA is extremely low for most applications. Everything MS sells has HA capability baked in that can be done for very low costs, now that the Clustering features are included in Windows 2012 Standard. So the costs associated with implementing Software HA vs VM HA are almost always justifiable. Thus, VM HA is rarely the correct solution. And mixing the two is not a good idea. Why? Because it requires twice the storage and network traffic to accomplish, and provides absolutely no additional benefit, other than the fact that VM Replication is kinda cool. Software HA requires 2 copies of the Server to function, and each copy should use a separate server (Separate servers are required for VM HA as well, so only the OS licensing  is an increased cost) to protect against hardware failure of one VM host server.

Know When to Use VM HA

Please note, though, that I am not saying you should never use VM HA. I am saying you shouldn’t use VM HA if software HA is available. You just need to know when to use it and when not to. If software HA isn’t possible (There are plenty of solutions out there with no High Availability capabilities), VM HA is necessary and provides the highest level of high availability for those products. Otherwise, use the software’s own HA capabilities, and you’ll save yourself from lots of headaches.

Advertisements

Do I need Anonymous Relay?

Problems

If you have managed an Exchange server in the past, you’ve probably been required to set things up to allow printers, applications, and other devices the ability to send email through the Exchange server. Most often, the solution to this request is to configure an Anonymous Open Relay connector. The first article I ever wrote on this blog was on that very subject: http://wp.me/pUCB5-b .  If you need to know what a Relay is, go read that blog.

What people don’t always do, though, is consider the question of whether or not they need an anonymous relay in Exchange. I didn’t really cover that subject in my first article, so I’ll cover it here.

When you Need an Open Relay

There are three factors that determine whether an organization needs an Open Relay. Anonymous relay is only required if you meet all three of the factors. Any other combination can be worked around without using anonymous relaying. I’ll explain how later, but for now, here are the three factors you need to meet:

  1. Printers, Scanners, and Applications don’t support changes to the SMTP port used.
  2. Printers, Scanners, and Applications don’t support SMTP Authentication.
  3. Your system needs to send mail to email addresses that don’t exist in your mail environment (That is to say, your system sends mail to email addresses that you don’t manage with your own mail server).

At this point, I feel it important to point out that Anonymous relays are inherently insecure. You can make them more secure by limiting access, but using an anonymous relay will always place a technical solution in the environment that is designed specifically to circumvent normal security measures. In other words, do so at your own informed risk, and only when it’s absolutely required.

The First Factor

If the system you want to send SMTP messages doesn’t allow you to send email over a port other than 25, you will need to have an open relay if the messages the system sends are addressed to email addresses outside your environment. The bold stuff there is an important distinction. The SMTP protocol defines port 25 as the “default” port for mail exchange, and that’s the port that every email server uses to receive email from all other systems, which means that, based on modern security concerns, sending mail to port 25 is only allowed if the recipient of the email you send exists on the mail server. So if you are using the abc.com mail server to send messages to bob@xyz.com, you will need to use a relay server to do it, or the mail will be rejected because relay is (hopefully) not allowed.

The Second Factor

If your system doesn’t allow you to specify a username and password in the SMTP configuration it has, then you will have to send messages Anonymously. For our purposes, an “anonymous” user is a user that hasn’t logged in with a username and password. SMTP servers usually talk to one another Anonymously, so it’s actually common for anonymous SMTP access to be valid and is actually necessary for mail exchange to function, but SMTP servers will, by default, only accept messages that are destined for email addresses that they manage. So if abc.com receives a message destined for bob@abc.com, it will accept it. However, abc.com will reject messages to jim@xyz.com, *unless* the SMTP session is Authenticated. In other words, if bob@abc.com wants to send jim @xyz.com a message, he can open an SMTP session with the abc.com mail server, enter his username and password, and send the message. If he does that, the SMTP server will accept the message, then contact the xyz.com mail server and deliver it. The abc.com mail server doesn’t need to have a username and password to do this, because the xyz.com mail server knows who jim@xyz.com is, so it just accepts the message and delivers it to the correct mailbox. So if you are able to set a username and password with the system you need to send mail with, you don’t need anonymous relay.

The Third Factor

Most of the time, applications and devices will only need to send messages to people who have mailboxes in your environment, but there are plenty of occasions where applications or devices that send email out need to be able to send mail to people *outside* the environment. If you don’t need to send to “external recipients” as these users are called, you can use the Direct Send method outlined in the solutions below.

Solutions

As promised, here are the solutions you can use *other* than anonymous relay to meet the needs of your application if it doesn’t meet *all three* of the deciding factors.

Authenticated Relay (Factor #3 applies)

In Exchange server, there is a default “Receive Connector” that accepts all messages sent by Authenticated users on port 587, so if your system allows you to set a username and password and change the port, you don’t need anonymous relaying. Just configure the system to use your Exchange Hub Transport server (or CAS in 2013) on port 587, and it should work fine, even if your requirements meet the last deciding factor of sending mail to external recipients.

Direct Send (Factor #2 applies and/or #3 doesn’t apply)

If your system needs to send messages to abc.com users using the abc.com mail server, you don’t need to relay or authenticate. Just configure your system to send mail directly to the mail server. The “direct send” method uses SMTP as if it were a mail server talking to another mail server, so it works without additional work. Just note that if you have a spam filter that enforces SPF or blocks messages from addresses in your environment to addresses in your environment, it’s likely these messages will get blocked, so make allowances as needed.

Authenticated Mail on Port 25 (Only factor #1 applies)

If the system doesn’t allow you to change the port number your system uses, but does allow you to authenticate, you can make a small change to Exchange to allow the system to work. This is done by opening the Default Receive connector (AKA – the Default Front End receive connector on Exchange 2013 and later) and adding Exchange Users to the Permission settings on the Security tab as shown with the red X below:

default-front-end-enabled

Once this setting is changed, restart the Transport service on the server and you can then perform authenticated relaying on port 25.

Conclusion

If you do find you need to use an anonymous relay, by all means, do so with careful consideration, but always be conscious of the fact that it isn’t always necessary. As always, comments questions on this article and others are always welcome and I’ll do my best to answer as soon as possible.

Exchange Autodiscover – The Active Directory SCP

In a previous post I explained how you can use a SRV record to resolve certificate issues with Autodiscover when your Internal domain isn’t the same as your Email domain. This time, I’m going to explain how to fix things by making changes to Exchange and Active Directory that will allow things to function normally without having to use a SRV record or any DNS records at all, for that matter. But only if the computers that access Exchange are members of your Domain and you configure Outlook using user@domain.local. This is how Exchange hands out Autodiscover configuration URLs by default without any DNS or SRV records. However, if you have an Private Domain Name in your AD environment, which you should try to avoid when you’re building new environments now, you will always get a Certificate Error when you use Outlook because SSL certificates from third party CA providers won’t do private domains on SAN certificates anymore. To fix this little problem, I will first give you a little information on a lesser known feature in Active Directory called the Service Connection Point (SCP).

Service Connection Points

SCPs play an Important role in Active Directory. They are basically entries in the Active Directory Configuration Partition that define how domain based users and computers can connect to various services on the domain. Hence the name Service Connection Point. These will typically show up in one of the Active Directory tools that a lot of people overlook, but is *extremely* important in Exchange since 2007 was released, Active Directory Sites and Services (ADSS). ADSS is typically used to define replication boundaries and paths for Active Directory Domain Controllers, and Exchange uses the information in ADSS to direct users to the appropriate Exchange server in large environments with multiple AD Sites. But what you can also do is view and make changes to the SCPs that are set up in your AD environment. You do this with a feature that is overlooked even more than ADSS itself, the Services node in ADSS. This can be exposed by right clicking the Active Directory Sites and Services object when you have ADSS open, selecting view, then clicking “Show Services Node” like this:

ADSS - Services Node

Once you open the services node, you can see a lot of the stuff that AD uses in the back end to make things work in the domain. Our focus here, however, is Exchange, so go into the Microsoft Exchange node. You’ll see your Exchange Organization’s name there, and you can then expand it to view all of the Service Connection Points that are related to Exchange. I wouldn’t recommend making any changes in here unless you really know what you’re doing, since this view is very similar to ADSIEdit in that it allows you to examine stuff that can very rapidly break things in Active Directory.

Changing the Exchange Autodiscover SCP

If we look into the Microsoft Exchange services tree, you first see the Organization Name. Expand this, then navigate to the Administrative Group section. In any Exchange version that supports Autodiscover, this will show up as First Administrative Group (FYDIBOHF23SPDLT). If the long string of letters confuses you, don’t worry about it. That’s just a joke the developers of Exchange 2007 put into the system. It’s a +1 Caesar Cipher that means EXCHANGE12ROCKS when decoded. Programmers don’t get much humor in life, so we’ll just have to forgive them for that and move on. Once you expand the administrative group node, you’ll be able to see most of the configuration options for Exchange that are stored in AD. Most of these shouldn’t be touched. For now, expand the Servers node. This is the section that defines all of your Exchange servers and how client systems can connect to them. If you dig around in here. Mostly you just see folders, but if you right click on any of them and click Properties, you should be able to view an Attributes tab (in Windows 2008+, at least, prior to that you have to use ADSIEdit to expose the attributes involved in the Services for ADSS). There are lots of cool things you can do in here, like change the maximum size of your Transaction Log files, implement strict limits on number of databases per server, change how much the database grows when there isn’t enough space in the database to commit a transaction, and other fun things. What we’re focusing on here is Autodiscover, though, so expand the Protocols tree, then go to Autodiscover, as seen below.

autodiscover node

Now that we’re here, we see each one of the Exchange CAS servers in our environment. Mine is called Exchange2013 because I am an incredibly creative individual (Except when naming servers). Again, you can right click the server name and then select Properties, then go to the Attribute Editor tab to view all the stuff that you can control about Autodiscover here. It looks like a lot of stuff, right? Well, you’ll really only want to worry about two attributes here. The rest are defined and used by Exchange to do…Exchangey stuff (Technical term). And you’ll really only ever want to change one of them. The two attributes you should know the purpose of are “keywords” and “serviceBindingInformation”.

  • keywords: This attribute, as you may have noticed, defines the Active Directory Site that the CAS server is located in. This is filled in automatically by the Exchange subsystem in AD based on the IP address of the server. If you haven’t created subnets in ADSS and assigned them to the appropriate site, this value will always be the Default site. If you change this attribute, it will get written over in short order, and you’ll likely break client access until the re-write occurs. The *purpose* of this value is to allow the Autodiscover Service to assign a CAS server based on AD site. So, if you have 2 Exchange Servers, one in site A and another in site B, this value will ensure that clients in site A get configured to use the CAS server in that site, rather than crossing a replication boundary to view stuff in site B.
  • serviceBindingInformation: Here’s the value we are most concerned with in this post! This is the value that defines where Active Directory Domain joined computers will go for Autodiscover Information when you enter their email address as username@domain.local if you have a private domain name in your AD environment. By default, this value will be the full FQDN of the server, as it is seen in the Active Directory Domain’s DNS forward lookup zone. So, when domain joined computers configure Outlook using user@domain.local they will look this information up automatically regardless of any other Autodiscover, SRV, or other records that exist in DNS for the internal DNS zone. Note: If your email domain is different from your AD domain, you may need to use your AD domain as the email domain when configuring Outlook for the SCP lookup to occur. If you do not want to use the AD Domain to configure users, you will want to make sure there is an Autodiscover DNS record in the DNS zone you use for your EMail Domain.

Now, since we know that the serviceBindingInformation value sets the URL that Outlook will use for Autodiscover, we can change it directly through ADSS or ADSIEdit by replacing what’s there with https://servername.domain.com/Autodiscover/Autodiscover.xml . Once you do this, internal clients on the domain that use user@domain.local to configure Outlook will be properly directed to a value that is on the certificate and can be properly configured without certificate errors.

Now, if you’re a little nervous about making changes this way, you can actually change the value of the serviceBindingInformation attribute by using the Exchange Management Shell. You do this by running the following command:

get-clientaccessserver | set-clientaccessserver -autodiscoverserviceinternaluri “https://servername.domain.com/Autodiscover/Autodiscover.xml”

This will directly modify the Exchange AD SCP and allow your clients to use Autodiscover without getting certificate errors. Not too difficult and you don’t have to worry about split DNS or SRV records. Note, though, that like the SRV record you will be forcing your internal clients to go out of your network to the Internet to access your Exchange server. To keep this from happening, you will have to have an Internal version of your External DNS zone that has Internal IPs assigned in all the A records. There just is no way around that with private domain names any longer.

Final Note

Depending on your Outlook version and how your client machines connect, there is some additional configuration that will need to be completed to fully resolve any certificate errors you may have. Specifically, you will need to modify some of the Exchange Virtual Directory URLs to make sure they are returning the correct information to Autodiscover.

Avoiding Issues with Certificates in Exchange 2007+

For information, modern Active Directory Best Practices can help you avoid having trouble with certificate errors in Exchange. Go here to see some information about modern AD Domain Naming best practices. If you follow that best practice when creating your AD environment, you won’t have to worry so much about certificate errors in Exchange, as long as the Certificate you use has the Exchange Server(s) name listed. However, if you can’t build a new environment or aren’t already planning to migrate to a new AD environment in the near future, it isn’t worth the effort to do so when small configuration changes like the one above can fix certificate errors.

Resolving DirectAccess Connectivity Issues (The easy solution)

DirectAccess is a relatively new approach to remote connectivity for domain connected devices. It is basically an always on VPN that utilizes IPSec Tunneling to allow access to external client machines. There is no need to deploy or create VPN profiles or handle RADIUS authentication and other such complexities, but the system does utilize PKI (Public Key Infrastructure) to enable a secure VPN tunnel. DirectAccess is also always available for external clients, meaning you don’t have to open a VPN session manually, and it starts *prior* to logon, which means the annoying issues of remote user password resets are easier to handle. However, there are situations where DirectAccess can fail, leaving you without DNS functionality and a lot of headaches.

The NRPT

Direct Access utilizes a feature called the Name Resolution Policy Table (NRPT). This basically controls the way DirectAccess handles name resolution for specific Domains. Entries in the NRPT control where client machines look for name resolution on specific domains and allow finer control of what happens when client machines are utilizing DirectAccess for connectivity. For instance, you can utilize the NRPT to force client machines to look to external DNS servers for resolution on some hostnames or domain zones, while looking to internal DNS for everything else and vice versa. There are really only two ways to modify the NRPT, through the Registry (I don’t recommend modifying the NRPT through the registry), and through group Policy using the Name Resolution Policy node. Technet has some information on how to handle NRPT here: http://technet.microsoft.com/en-us/library/ee649207%28v=ws.10%29.aspx

he Problem with DirectAccess Failures

Usually when DirectAccess stops communicating, it stops working because the NRPT isn’t configured properly. If this happens, you may run into a situation where some systems are unable to ping domain controllers or other systems by using NetBIOS names or through FQDNs. This can be a huge problem, because if DirectAccess fails, systems will typically no longer be able to communicate with the Domain to retrieve corrected NRPT information, since this information is deployed via GPO.

Fixing the Communication Issue

If something causes your DirectAccess configuration on a client machine to corrupt or if Direct Access isn’t properly configured, it may be necessary to reset the NRPT on the client machine to fix the problem. The only way to modify the NRPT on a client machine is through the registry. If you’re experienced enough with DirectAccess, you may be able to resolve the issue directly in the registry. However, it is usually easier to just remove the existing NRPT entries on the client machine entirely. This has to be done in the registry at the following location: HKLM\Software\Policies\Microsoft\Windows NT\DNSClient\DNSPolicyConfig (Pictured below)

NRPTreg

Note the two entries there. Both are composed of DA-{GUID}. DA stands for DirectAccess. Remove any entries that have that DA- prefix and reboot. Once this is done, the system will begin communicating without DirectAccess and will have the ability to connect to the domain to retrieve new NRPT information if it is available.

Active Directory Domain Naming in the Modern Age

One of the subjects that doesn’t get a whole lot of coverage in IT is how to name an Active Directory domain. There’s a lot of confusion around the how and why to name a Domain primarily because the best practices for doing so have changed a number of times over the past decade or so. A short discussion I got into on my last post prompted me to go into a good bit more depth on this subject, since it’s something there is a lot of misunderstanding about.

Current Strategies

There are currently two basic strategies that are in common practice when IT administrators and systems engineers decide on their domain names.

  1. Use an Internal private domain name
  2. Use an External Public domain name

An Internal Private domain name would be something like domain.local or company.internal. A private domain is essentially just a domain that is not publicly available on the Internet. This is because the .internal and .local are not Top Level Domains (TLDs) that are recognized by the Internet Corporation for Assigned Names and Numbers (ICANN), which is the organization that regulates domain names and IP addresses on the Internet. Because ICANN doesn’t recognize internal TLDs, no public DNS servers have zones that include them, so there is no way a domain that uses domain.local or company.internal can be resolved to an IP address on the public Internet.

An External Public domain name is something like domain.com or company.net. These are domain names that use TLDs recognized by ICANN and that can be resolved by public DNS servers.

What most people don’t know, however, is that Microsoft doesn’t really recommend either of these strategies for domain naming any longer. Before I explain what the new recommendations from Microsoft are, I’ll explain what the traditional pros and cons of these strategies are as well as a little history about them, in case you end up in an environment that uses these strategies, since my main purpose with this blog is to help people understand best practices and why they’re best practices rather than just telling people what they are.

 

Public Domains in Active Directory

Using a public domain for your Active Directory Domain Name may seem like a great idea if you own a public domain (which, now, is most businesses). However, there are some distinct disadvantages to doing so. Microsoft used to recommend that businesses avoid using their public domain name for their Internal domain for a lot of reasons.

The big reason to avoid using the same domain name internally and externally is a phenomenon known as Split DNS. Split DNS is where you have two completely separate DNS servers or server groups managing the exact same DNS Forward Lookup Zone. Split DNS isn’t necessarily a bad thing, but it does greatly increase the administrative burden of managing DNS. This is because you have to create a new record on both DNS servers every time you add a new server if you want it to be accessible publicly and privately. Active Directory also throws an additional wrench into this because domain.com is always reserved as a host name for discovering Domain Controllers. If you have an external DNS server that uses domain.com as an A record to a web server, that web server will never be directly accessible from inside your network.

Even more annoying, though, is the possibility that you might end up with a host name that points to a completely different server externally and internally. In this situation, users that go to http://www.company.com when outside the company network would be forwarded to one web server (Say, a public web page for the general public), and users that go there while inside the company network would go to a different server (An Intranet Publication site, for instance). In this situation, a user that connects to the company network by VPN would actually have the public web page stuck in their DNS cache and would be unable to access the Intranet Publication site without clearing their DNS cache. I’ve seen this happen in a couple of different environments and it results in a lot more helpdesk calls than is necessary, which in turn costs the company a lot of money.

Private Domains in Active Directory

Because of the limitations inherent in using Public domains for Active Directory, Microsoft recommended using Private domains when selecting an Active Directory Domain name from the release of Windows 2000 til about 2007. With a private internal domain, there is no need to manage Split DNS. Users would connect to external DNS servers when they wanted to access the servers on the public Internet and connected to internal DNS servers when accessing internal resources without any need to manage multiple DNS zones. It was also thought that using a private domain added a level of security, since no one outside the company would know the domain name for the internal network. This, of course, is just security through obscurity, and doesn’t really provide much security at all (if any). Because of the decreased administrative burden and presumed added security, administrators have used Private domain names for Active Directory for well over a decade now.

Big Changes, the Future is Now

But, as things so often do, technology has changed significantly and Microsoft made a number of changes to their products that necessitated changes to their best practices. One of the biggest changes to Microsoft’s solution base was Exchange 2007.

Exchange 2007 represented a massive paradigm shift in Microsoft’s Email platform. It was vastly different from all other Exchange versions before it, primarily because the needs of the corporate world changed drastically between the release of Exchange 2003 and 2007. Email needed to be more secure, and Microsoft was working toward the now burgeoning world IT automation. One of the most important new automation features that Exchange 2007 introduced was Autodiscover.

Autodiscover allowed users to configure their mail clients without having to enter server names, user names, and all that other stuff that got in the way of users and their email and resulted in loads of extra work for IT support personnel. All it required was an entry in DNS to allow the mail client to reach an XML file that contained all the necessary settings and the client could then use those settings to set itself up without user interaction. However, the mechanism that controlled that caused some conflict with the existing best practices for Active Directory Domain Naming. In order to work, Autodiscover required SSL certificate validation, and Exchange automatically configured itself to use a the Active Directory domain name as the server name to use for automatically configuring clients on the internal network, and with the best practice of using private domain names, that meant that you had to reconfigure Exchange to point clients to the whatever name was on the SSL certificate for the server. If you didn’t do this, clients would get certificate error messages when they used a mail client (For more details on that, read my blog bost here: https://acbrownit.wordpress.com/2012/12/20/internal-dns-and-exchange-autodiscover/).

The problems with autodiscover were actually pretty simple to solve with the use of certificates that supported Subject Alternate Names (SAN). These certificates were basically valid for any server name that was listed on the certificate. Most companies decided to add their internal domain name to their SAN certificates so they could go without worrying about the Autodiscover issues.

However, times change and now the major public Third Party Certificate Authorities (The people who sell SSL certificates) are refusing to issue SSL certificates that included .local and other common private TLDs. Why? Because a couple years ago ICANN announced that, for a nominal fee, companies or individuals could register any TLD they could think of for distribution to public DNS servers around the Internet. This meant that  .local and .internal could potentially resolve on the public internet! And part of the Third Party CA chain of trust requires them to ensure that whoever purchases an SSL owns all the domain names used on their SSL certificates. This causes a huge problem with Autodiscover and similar features that rely on SSL certificates when used with Active Directory Domains using Private Domains. The private domain name could now *never* be added to the SSL certificate (until .local or whatever become publicly resolvable) and any attempt to connect to a .local server would generate a certificate error.

The New Best Practice

I call the current best practice new despite the fact that Microsoft has recommended it since 2007. I do that because it wasn’t necessary to use this best practice until the changes to SSL certificate creation policies. You could just add .local to the certificate and be done with it. You can’t any more. So we have to use a different best practice for managing our domain.

So, the new best practice is as follows, For your Active Directory Domain Name, use a subdomain of your public domain. What does that mean? Well, if your company has a public domain of company.com, set your Active Directory domain name to be something like internal.company.com or private.company.com. You can really use anything you want for the subdomain name as long as the primary domain matches a public domain name that you own (ownership is extremely important. Never pick a public domain name that you don’t own).

Why is this the Best Practice?

The answer here is relatively simple, it allows you to have a domain name in AD that can have SSL certificates generated by a public Certificate Authority, and you don’t have to manage a split DNZ zone. It’s basically the best of both domain naming strategies. There is, of course, a small drawback to this strategy, which is part of why it hasn’t been widely adopted yet (aside from lack of publicity). It means you have to type more when you are typing out FQDNs on your internal network. This is a miniscule issue, but one that seems to bother IT admins around the world. To those who this bothers, I would suggest using a short subdomain like ad.company.com, or even just a single letter, like i.company.com.

Avoiding Issues with Certificates in Exchange 2007+

For information, modern Active Directory Best Practices can help you avoid having trouble with certificate errors in Exchange. I wrote a blog on the best practice that can help you avoid Exchange Certificate issues here. If you follow that best practice when creating your AD environment, you won’t have to worry about certificate errors in Exchange. However, if you can’t build a new environment or aren’t already planning to migrate to a new AD environment in the near future, it isn’t worth the effort to do so when small configuration changes like the one above can fix certificate errors.

 

 

Internal DNS and Exchange Autodiscover

 

The Issue

By now, anyone who has managed, deployed, or worked with an Exchange 2007 or later environment should be familiar with Autodiscover. If you aren’t yet, I’ll give a short Explanation of what it is and how it works.

Autodiscover is a feature that allows any Mail Client that supports Autodiscover to configure the appropriate server settings for communication so you don’t have to input everything manually. It’s very handy. Unfortunately, you can end up with a lot of headaches related to Autodiscover when you start having to deal with Certificates. The issues you may run into are specifically limited to Exchange Organizations that have a Domain Name that uses a non-public TLD like domain.local, or a public domain name that they don’t actually own and can’t use externally as well. On an unrelated note, this is one of the reasons that Microsoft has started recommending the use of Public domain names for Active Directory domains.

If you have a domain that isn’t publicly useable on your Exchange AD environment, you will run into certificate errors when mail clients use Autodiscover. This becomes particularly problematic when you use Exchange 2013 and try to use HTTPS for Outlook Anywhere. This is because Microsoft is now enforcing certificate validity with Exchange 2013’s Autodiscover features (Note, though, that Outlook Anywhere will be configured to use HTTP only when your Exchange Server certificate is determined to be invalid in Exchange 2013). With Exchange 2007 and 2010, you will get a Certificate error every time you open Outlook. Generally, this error will state that the name on the certificate is not valid.

The Cause

To solve the issue with certificates, you need to configure your environment so it enforces the appropriate action with Autodiscover. By default, Autodiscover will attempt to communicate with a number of URLs based on the Client’s email address (for external users) or domain name (for internal users). It will take the following pattern when checking for Autodiscover services:

1. Autodiscover will attempt to find the Autodiscover configuration XML file at the domain name of the SMTP address used in configuration (because internal domain computers configure themselves automatically by default, this matches the Internal Domain. For example, the first place autodiscover looks is https://domain.com/Autodiscover/autodiscover.xml for external addresses. Change domain.com with domain.local for what Exchange looks for on Internal clients.

2. If the autodiscover record is not found at domain.com/domain.local, the server will attempt to connect to https://autodiscover.domain.com/Autodiscover/Autodiscover.xml (replace domain.com with domain.local for internal). This is why the typical recommendation for having an A Record for Autodiscover in your DNS that points to the mail server exists. In addition, you would need to have autodiscover.domain.com as a SAN on the SSL certificate installed on the Exchange server for it to be valid when attempting to connect to autodiscover using this step.

3. If autodiscover information cannot be found in either of the first two steps, Exchange will attempt to use a Service Locator record in DNS to determine the appropriate location of the configuration files. This record points the Autodiscover service to a specific location for getting the configuration it needs.

Because of the way this works, there is some configuration necessary to get Autodiscover working correctly. This usually involves adding Subject Alternate Names to the SSL certificate you use for your Exchange Server to allow the many host names used to be authenticated with the certificate.

The problem lately, though, is that many Third Party Certificate Authorities that provide SSL certificates are beginning to deny requests for Subject Alternate Names that aren’t publicly available (There are valid security reasons for this that I won’t go in to in this post, but maybe later). As a result, you won’t be able to get a valid SSL certificate that allows domain.local as a SAN. This means that the automated steps Exchange uses for Autodiscover configuration will always fail on an Internal domain with a name that is not publicly accessible or not owned.

The Solution

There are actually two ways to solve the certificate issues, here. The first would be to prevent Outlook from automatically entering a user’s information when they create their profile. This will result in more work for you and your users, so I don’t recommend it. The other solution is to leverage that last step of the Autodiscover configuration search to force it to look at a host name that is listed on the certificate. This is actually fairly simple to do. Follow these steps to configure the Service Locator record in your internal domain.

  1. Open the DNS manager on one of your Domain Controllers.
  2. Expand out the management tree until you can see your Internal Domain’s Forward Lookup Zone. Click on it, and make sure there are no A records for autodiscover.domain.local in the zone.
  3. Once no autodiscover A records exist, right click the Zone name and select Other New Records.
  4. Select Service Location (SRV) from the list.
  5. Enter the settings as shown below:Image
  6. Hit OK to finish adding the record.

Once the SRV record is added to the internal DNS zone, Outlook and other autodiscover clients that attempt to configure themselves with a domain.local SMTP address will work properly without the Certificate errors on all versions of Exchange.

Other Nifty Stuff

There are some additional benefits to utilizing the Service Locator record for Autodiscover rather than an Autodiscover A record, even in your public domain. When you use a SRV record, you can also point public clients to communicate with mail.domain.com or outlook.domain.com, or whatever you have configured your external server name to be. This means you can get away with having a single host name on your SSL certificate, since you wouldn’t need autodiscover.domain.com to get autodiscover working. Since most Third Party CAs charge a bit more for SANs than they do for Single Name SSL certs, you can save a bit of money (for this to work, though, you may need to change your Internal and External Web Services URLs in Exchange to match the name you have configured).

Another Problem the SRV record Fixes

There are also some other issues you may run into that are easily fixed by adding a SRV record. One of the most common is the use of multiple Email Domains in a single Exchange Environment. If you have users that are not assigned a Primary or secondary SMTP address that matches the domain name listed on your SSL certificate, you’ll discover that those users and the rest of your users will not be able to share calendar data between their mailboxes. You can fix this by adding an Autodiscover SRV record to the DNS zone that manages the additional mail domains. For example, you have domain1.com and domain2.com on the same Exchange Server. user@domain1.com can’t see user@domain2.com’s calendar. The fix for this is to add the SRV record to the domain2.com DNS zone and point it to the public host name for domain1.com’s mail server. Once that’s done the services that operate the calendar sharing functions will be properly configured and both users will be able to share calendars.