Designing Infrastructure High Availability

IT people, for some reason, seem to have an affinity towards designing solutions that use “cool” features, even when those features aren’t really necessary. This tendency sometimes leads to good solutions, but a lot of times it ends up creating solutions that fall short of requirements or leave IT infrastructure with significant short-comings in any number of areas. Other times, “cool” features result in over-designed, unnecessarily expensive infrastructure designs.

The “cool” factor is probably most obvious in the realm of High Availability design. And yes, I do realize that with the cloud becoming more common and prevalent in IT there is less need to understand the key architectural decisions needed when designing HA, but there are still plenty of companies that refuse to use the cloud, and for good reason. Cloud solutions are not meant to be one size fits all solutions. They are one size fits most solutions.

High Availability (Also called “HA”) is a complex subject with a lot of variables involved. The complexity is due to the fact that there are multiple levels of HA that can be implemented, from light touch failover to globally replicated, multi-redundant, always on solutions.

High Availability Defined

HA is, put simply, any solution that allows an IT resource (Files, applications, etc) to be accessible at all times, regardless of hardware failure. In an HA designed infrastructure, your files are always available even if the server that normally stores those files breaks for any reason.

HA has also become much more common and inexpensive in recent years, so more people are demanding it. A decade ago, any level of HA involved costs that exponentially exceeded a normal, single server solution. Today, HA is possible for as little as half the cost of a single server (Though, more often, the cost is essentially double the single server cost).

Because of the cost reduction, many companies have started demanding it, and because of the cool factor, a lot of those companies have been spending way too much. Part of why this happens is due to the history of HA in IT.

HA History Lesson

Prior to the development of Virtualization (the technology that allows multiple “Virtual” servers to run on a single physical server), HA was prohibitively expensive and required massive storage arrays, large numbers of servers, and a whole lot of configuration. Then, VMWare implemented a solution called “VMotion” that allowed a Virtual Server to be moved between server hardware immediately at the touch of a button (Called VM High Availability). This signaled a kind of renaissance in High Availability because it allowed servers to survive a hardware failure for a fraction of the cost normally associated with HA. There is a lot more involved in this shift that just VMotion (SANs, cheaper high-speed internet, and similar advancements played a big part), but the shift began about the time VMotion was introduced.

Once companies started realizing they could have servers that were always running, regardless of hardware failures, an unexpected market for high-availability solutions popped up, and software developers started developing better techniques for HA in their products. Why would they care? Because there are a lot of situations where a server solution can stop working properly that aren’t related to hardware failures, and VMotion was only capable of handling HA in the event of hardware failures.

VM HA vs Software HA

The most common mistake I see people making in their HA designs is accepting the assumption that VM-level High Availability is enough. It is most definitely not. Take Exchange server as an example. There are a number of problems that can occur in Exchange that will prevent users from accessing their email. Log drives fill up, forcing database dismount. IIS can fail to function, preventing users from accessing their mailbox. Databases can become corrupted, resulting in a complete shutdown of Exchange until the database can be repaired or restored from backup. VM HA does nothing to help when these situations come up.

This is where the Exchange Database Availability Group (DAG) comes in to play. A DAG involves constantly replication changes to Mailbox Databases to additional Exchange servers (as many of them as you want, but 2-3 is most common). With a DAG in place, any issue that would cause a database to dismount in a single Exchange server will instead result in a Failover, where the database dismounts on one server and mounts on the other server immediately (within a few seconds or less).

The DAG solution alone, however, doesn’t provide full HA for Exchange, because IIS failures will still cause problems, and if there is a hardware failure, you have to change DNS records to point them to the correct server. This is why a Load Balancer is a necessary part of true HA solutions.

Load Balancing

A Load Balancer is a network device that allows users to access two servers with a single IP address. Instead of having to choose which server you talk to, you just talk to the load balancer and it decides which server to direct you to automatically. The server that is chosen depends on a number of factors. Among those is, of course, how many people are already on each server, since the primary purpose of a load balancer is to balance the load between servers more or less equally.

More importantly, though, most load balancers are capable of performing health checks to make sure the servers are responding properly. If a server fails a health check for any reason (for instance, if one server’s not responding to HTTP requests), the load balancer will stop letting users talk to that server, effectively ensuring that whatever failure occurs on the first server doesn’t result in users being unable to access their data.

Costs vs. Benefits

Adding a load balancer to the mix, of course, increases the cost of a solution, but that cost is generally justified by the benefit such a solution provides. Unfortunately, many IT solutions fail to take this fact into account.

If an HA solution requires any kind of manual intervention to fix, the time required for notifying IT staff and getting the switch completed varies heavily, and can be anywhere from 5 minutes to several hours. From an availability perspective, even this small amount of time can have a huge impact, depending on how much money is assumed as “lost” because of a failure. Here comes some math (And not just the Trigonometry involved in this slight tangent).

Math!

The easiest way to determine whether a specific HA solution is worth implementing involves a few simple calculations. First, though, we have to make a couple assumptions, none of which are going to be completely accurate, but are meant to help determine whether an investment like HA is worth making (Managers and CEOs take note)

  1. A critical system that experiences downtime results in the company being completely unable to make money for the period of time that system is down.
  2. The amount of money lost during downtime is equal to whatever percentage of a year the system is down times the amount of annual revenue the organization expects to make in a year.

For instance, if a company’s revenue is $1,000,000 annually, they will make an average of $2 per minute (Rounded up from $1.90), so you can assume that 5 minutes of downtime costs that company about $10 in gross revenue. The cheapest of Load balancers cost about $2,000 and will last about 5 years, so you recoup the cost of the load balancer by saving yourself 200 minutes of downtime. That’s actually less than the amount of time most organizations spend updating a single server. With Software HA in place, updates don’t cause downtime if done properly, so the cost of a load balancer is covered in just being able to keep Exchange running during updates (This isn’t possible with just VM HA). But, of course, that doesn’t cover the cost of the second server (Exchange runs well on a low-end server, so $5000 for server and licenses is about what it would cost). Now imagine if the company makes $10,000,000 in revenue, or think about a company that has revenue of several billion dollars a year. HA becomes a necessity according to these calculations very quickly.

VM HA vs Software HA Cost/Benefit

Realistically, the cost difference between VM HA and Software HA is extremely low for most applications. Everything MS sells has HA capability baked in that can be done for very low costs, now that the Clustering features are included in Windows 2012 Standard. So the costs associated with implementing Software HA vs VM HA are almost always justifiable. Thus, VM HA is rarely the correct solution. And mixing the two is not a good idea. Why? Because it requires twice the storage and network traffic to accomplish, and provides absolutely no additional benefit, other than the fact that VM Replication is kinda cool. Software HA requires 2 copies of the Server to function, and each copy should use a separate server (Separate servers are required for VM HA as well, so only the OS licensing  is an increased cost) to protect against hardware failure of one VM host server.

Know When to Use VM HA

Please note, though, that I am not saying you should never use VM HA. I am saying you shouldn’t use VM HA if software HA is available. You just need to know when to use it and when not to. If software HA isn’t possible (There are plenty of solutions out there with no High Availability capabilities), VM HA is necessary and provides the highest level of high availability for those products. Otherwise, use the software’s own HA capabilities, and you’ll save yourself from lots of headaches.

Advertisements

Do I need Anonymous Relay?

Problems

If you have managed an Exchange server in the past, you’ve probably been required to set things up to allow printers, applications, and other devices the ability to send email through the Exchange server. Most often, the solution to this request is to configure an Anonymous Open Relay connector. The first article I ever wrote on this blog was on that very subject: http://wp.me/pUCB5-b .¬† If you need to know what a Relay is, go read that blog.

What people don’t always do, though, is consider the question of whether or not they need an anonymous relay in Exchange. I didn’t really cover that subject in my first article, so I’ll cover it here.

When you Need an Open Relay

There are three factors that determine whether an organization needs an Open Relay. Anonymous relay is only required if you meet all three of the factors. Any other combination can be worked around without using anonymous relaying. I’ll explain how later, but for now, here are the three factors you need to meet:

  1. Printers, Scanners, and Applications don’t support changes to the SMTP port used.
  2. Printers, Scanners, and Applications don’t support SMTP Authentication.
  3. Your system needs to send mail to email addresses that don’t exist in your mail environment (That is to say, your system sends mail to email addresses that you don’t manage with your own mail server).

At this point, I feel it important to point out that Anonymous relays are inherently insecure. You can make them more secure by limiting access, but using an anonymous relay will always place a technical solution in the environment that is designed specifically to circumvent normal security measures. In other words, do so at your own informed risk, and only when it’s absolutely required.

The First Factor

If the system you want to send SMTP messages doesn’t allow you to send email over a port other than 25, you will need to have an open relay if the messages the system sends are addressed to email addresses outside your environment. The bold stuff there is an important distinction. The SMTP protocol defines port 25 as the “default” port for mail exchange, and that’s the port that every email server uses to receive email from all other systems, which means that, based on modern security concerns, sending mail to port 25 is only allowed if the recipient of the email you send exists on the mail server. So if you are using the abc.com mail server to send messages to bob@xyz.com, you will need to use a relay server to do it, or the mail will be rejected because relay is (hopefully) not allowed.

The Second Factor

If your system doesn’t allow you to specify a username and password in the SMTP configuration it has, then you will have to send messages Anonymously. For our purposes, an “anonymous” user is a user that hasn’t logged in with a username and password. SMTP servers usually talk to one another Anonymously, so it’s actually common for anonymous SMTP access to be valid and is actually necessary for mail exchange to function, but SMTP servers will, by default, only accept messages that are destined for email addresses that they manage. So if abc.com receives a message destined for bob@abc.com, it will accept it. However, abc.com will reject messages to jim@xyz.com, *unless* the SMTP session is Authenticated. In other words, if bob@abc.com wants to send jim @xyz.com a message, he can open an SMTP session with the abc.com mail server, enter his username and password, and send the message. If he does that, the SMTP server will accept the message, then contact the xyz.com mail server and deliver it. The abc.com mail server doesn’t need to have a username and password to do this, because the xyz.com mail server knows who jim@xyz.com is, so it just accepts the message and delivers it to the correct mailbox. So if you are able to set a username and password with the system you need to send mail with, you don’t need anonymous relay.

The Third Factor

Most of the time, applications and devices will only need to send messages to people who have mailboxes in your environment, but there are plenty of occasions where applications or devices that send email out need to be able to send mail to people *outside* the environment. If you don’t need to send to “external recipients” as these users are called, you can use the Direct Send method outlined in the solutions below.

Solutions

As promised, here are the solutions you can use *other* than anonymous relay to meet the needs of your application if it doesn’t meet *all three* of the deciding factors.

Authenticated Relay (Factor #3 applies)

In Exchange server, there is a default “Receive Connector” that accepts all messages sent by Authenticated users on port 587, so if your system allows you to set a username and password and change the port, you don’t need anonymous relaying. Just configure the system to use your Exchange Hub Transport server (or CAS in 2013) on port 587, and it should work fine, even if your requirements meet the last deciding factor of sending mail to external recipients.

Direct Send (Factor #2 applies and/or #3 doesn’t apply)

If your system needs to send messages to abc.com users using the abc.com mail server, you don’t need to relay or authenticate. Just configure your system to send mail directly to the mail server. The “direct send” method uses SMTP as if it were a mail server talking to another mail server, so it works without additional work. Just note that if you have a spam filter that enforces SPF or blocks messages from addresses in your environment to addresses in your environment, it’s likely these messages will get blocked, so make allowances as needed.

Authenticated Mail on Port 25 (Only factor #1 applies)

If the system doesn’t allow you to change the port number your system uses, but does allow you to authenticate, you can make a small change to Exchange to allow the system to work. This is done by opening the Default Receive connector (AKA – the Default Front End receive connector on Exchange 2013 and later) and adding Exchange Users to the Permission settings on the Security tab as shown with the red X below:

default-front-end-enabled

Once this setting is changed, restart the Transport service on the server and you can then perform authenticated relaying on port 25.

Conclusion

If you do find you need to use an anonymous relay, by all means, do so with careful consideration, but always be conscious of the fact that it isn’t always necessary. As always, comments questions on this article and others are always welcome and I’ll do my best to answer as soon as possible.

Configuring Exchange Autodiscover

As of the release of Outlook 2016, Microsoft has chosen to begin requiring the use of Autodiscover for setting up Outlook clients to communicate with the server. This means that, moving forward, Autodiscover will need to be properly configured.

This page contains some information and some links to other posts I’ve written on the subject of Autodiscover. This page is currently under construction as I write additional posts to assist in configuring and troubleshooting Autodiscover.

Initial Configuration

The initial configuration of Autodiscover requires that you have a Digital Certificate properly installed on your Exchange Server. If you use a Multi-Role configuration (No longer recommended by MS for Exchange versions after 2010), the Certificate should be installed on the CAS server.

Certificate Requirements

The certificate should have a Common Name that matches the name your users will be using to access Exchange. If you want users to use mail.domain.com to access the Exchange server, make sure that is the Common Name when creating the certificate.

The optimal configuration for Exchange also requires that you include autodiscover.domain.com as a Subject Alternate Name (SAN). You should also make sure that there is also an A or CNAME record in DNS to point users to autodiscover.domain.com. SAN certificates can cost significantly more money than a normal certificate, but there are ways to bypass the need for a SAN certificate (See the next section below for more info).

A Wildcard certificate is usable with Exchange, and can serve as a less expensive way to provide support for a large number of URLs. A Wildcard can also be used on other servers that use the same DNS domain as the Exchange server. However, wildcards are technically not as secure as a SAN cert, since they can be used with any URL in the domain. In addition, they do not support Sub-domains.

The certificate you install on Exchange should also be obtained from a reputable Third Party Certificate Authority. The following Certificate Authorities can generate Certificates that are trusted by the majority of web browsers and operating systems:

Comodo PositiveSSL
DigiCert
Entrust
Godaddy
Network Solutions

Also note, when generating your Certificate Signing Request (CSR), you should generate the CSR with a sufficient bit length. Currently, the recommended minimum for CSR generation is 2048 bits. 1024 and lower bit lengths may not be supported by Certificate Authorities.

Exchange Server Configuration

Autodiscover will determine the settings to apply to client machines by reading the Exchange Server configuration. This means the Exchange Service URLs must be properly configured. If they are not configured to use a name that exists on the Certificate in use, Outlook will generate a Certificate Error.

I will write a post on this subject in the future. For now, you can get this information easily from a Google Search.

DNS configuration

There are 2 different URLs Autodiscover will use when searching for configuration information. These URLs are based on the user’s Email Domain (The portion of the email address after the @). For bob@acbrownit.com, the Email Domain is acbrownit.com. The URLs checked automatically are:

domain.com
autodiscover.domain.com

As long as one of the above URLs exists on the Certificate and has an A record or CNAME record in DNS pointing to a CAS server, Autodiscover will work properly. The instructions for this can vary depending on the DNS provider you use.

Other Configurations

There are some situations that may cause autodiscover to fail if the above requirements are all met. The following situations require additional setup and configuration.

Domain Joined Computers

Computers that are part of the same Active Directory Domain as the Exchange server will attempt to reach the Active Directory Service Connection Point (SCP) for Autodiscover before attempting to find autodiscover at the normal URLs listed above. In this situation, you will typically need to configure the SCP to point to one of the URLs on your certificate.

Go to this post to find instructions for configuring the SCP:

Exchange Autodiscover Part 2 – The Active Directory SCP

Single Name Certificates

If you do not want to spend the additional money required to obtain a SAN or Wildcard certificate for Exchange, you can use a Service Locator (SRV) Record in DNS to define the location of autodiscover. A Service Locator Record allows you to define any URL you want for the Autodiscover service, so you can create one to bypass the need for having a SAN or Wildcard certificate.

Go to this post to find instructions for configuring a SRV record:

Internal DNS and Exchange Autodiscover