Data Encryption – How it Works (Part 1)

I’ve decided to start a short series of posts on data encryption, which is becoming an increasingly important subject in IT as government regulations and privacy concerns demand ever increasing levels of privacy and security.

In this series, I’ll try to cover the more confusing concepts in encryption, including the three main types of encryption systems used today; Private Key encryption, Public Key Encryption, and SSL/TLS encryption. I will cover how those types of encryption function and vary from one another. I will also get some coverage on one of the most confusing topics in IT security, Public Key Infrastructure. If you haven’t already read by article on Digital Certificates, I would highly recommend doing so before going on to part two of this series, since digital certificates underpin the vast majority of encryption standards today.

What is Encryption?

The goal of encryption is to make any message or information impossible to understand or read without permission. Perfect encryption is (currently) impossible. What I mean by that is there is no way to encrypt data so that it can’t *possibly* be read by someone who isn’t authorized to do so. There are an unlimited number of ways to encrypt data, but some methods are significantly more effective at preventing unauthorized disclosure of data than others.

Encryption Parts

Every encryption system, however, has a few things in common. First, there’s the data. If you don’t have something you want to keep private or secret, there’s no reason to encrypt your data, so no need for encryption. But since we live in a world where secrecy and privacy are occasionally necessary and desirable, we are going to have stuff we want to encrypt Credit card numbers, social security numbers, birth dates, and things like that, for instance, need to be encrypted to prevent people from misusing them. We call this data “Clear-text” because it’s clear what the text says.

The next part is the “encryption algorithm”. Encryption is based very heavily in math, so we have to borrow some mathematical terminology here. In math, an algorithm is all the steps required to reach a conclusion. The algorithm for 1+1 is identified by the + sign, which tells use the step we need to take to get the correct answer to the problem, which is to add the values together. Encryption algorithms can be as simple as adding numbers or so complicated that they require a library of books to explain. The more complicated the algorithm, the more difficult it is (in theory) to “crack” the encryption and expose the original clear-text.

Encryption algorithms also require some value to be added along with the clear-text to generate encrypted data. The extra value is called an encryption “key”. The encryption key has two purposes. First, it allows the encryption algorithm to produce a (theoretically) unique value from the clear-text. Second, it allows people who have permission to read the encrypted data to do so, since knowing what the key is will allow us to decrypt, or reveal, the clear-text (more on this in a bit).

These three pieces put together are used to create a unique “Cipher-text” that will appear to be just gobbledygook to casual inspection. The cipher-text can be given to anyone and whatever it represents will be unknown until the data is “decrypted”. The process we go through to do this is fairly simple. We take the clear-text and the key, enter them as input in the encryption algorithm, and after the whole algorithm is completed with those values, we get a cipher-text. The below image shows this:

Encryption

 

Every encryption algorithm requires the ability to “reverse” or “decrypt” the data, so they all have a different decryption algorithm. For instance, in order to get back to the original value of 1 after adding 1 to it to get 2, you would have to reverse that process by subtracting 1. In this case, we know what input (1) and algorithm (adding) was used to reach the value, so reversing it is easy. We just subtract whatever number we need to get back to the original value (1 in this case). In general, decryption algorithms will take the key and cipher-text as input to the algorithm. Once everything in the algorithm is done, it should result in the original clear-text, as shown below:

Decryption

Simple Examples

Two early examples of encryption come to us from Greek and Roman history. The Skytale was a fairly ingenious encryption tool that used a wooden block of varying size and shape as its key. The clear-text was written (or burned) on a strip of leather that was wrapped around the key on a single side of the key, which was usually hexagonal. The person who was supposed to receive the message had a key of similar shape and size. Wrapping the leather strip around the other key would allow the recipient to receive the message. Using the above terminology, the Clear-text is the message, the key is the block of wood, and the encryption algorithm is wrapping a strip of leather around the key and writing your message along with some fake gobbledygook on all the other sides. Unwrapping the leather from the block gives a cipher-text. Decryption is just wrapping the strip around a similarly shaped and sized block, then look at all sides to see which one makes sense.

One of the more famous encryption algorithms is called the “Caesar Cipher” because it was developed by Julius Caesar during his military conquests to keep his enemies from intercepting his plans. You’ve probably used this algorithm before without knowing it if you ever enjoyed passing notes to friends in school and wanted to keep the other kids (or the teacher) from knowing what the message said if they “intercepted” it.

The Caesar Cipher is fairly simple, but works well for quick, easy encryption. All you do is pick a number between 1 and 26 (or the number of letters in whatever language you’re using). When writing the message, you replace each letter with whatever letter whose space in the alphabet is equal to the number you chose above or below in the alphabet. For instance, “acbrownit” is “bdcspxmju” in a +1 Caesar Cypher. Decrypting the message is a simple matter of reversing that. For a Caesar Cypher, the key is whatever number you pick, the clear-text is the message you want to send, the algorithm is to add the key to the clear-text’s letters, outputting cipher-text.

Key Exchange

For any encryption algorithm to function properly as a way to send messages, you must have a way to ensure that the recipient of the message has the correct key to decrypt the message. Without a key, the recipient will be forced to “crack” the encryption to read the message. So you need to be able to provide that key to the recipient. The process of ensuring that both the sender and recipient have the keys to encrypt and decrypt the message (respectively), a “key exchange” must occur. This is often as simple as telling your friend what number to use with your Caesar cipher.

But what do you do if you need to exchange keys in a public place, surrounded by prying eyes (like, for instance, the Internet)? It becomes much more difficult to exchange keys when needed if there is significant distance between the sender and recipient, which means that the biggest weakness in any encryption standard is making sure that the recipient has the key they need to decrypt the message. If the key can be intercepted easily, the encryption system will fail.

The exchange method used will usually depend on they type of key required for decryption. For instance, in World War II, the German military developed a mechanical encryption device called “Enigma” that was essentially a typewriter, but it changed the letters used when typing out a message with a mechanical series of gears and levels. If you pushed the I button on the keyboard, depending on the key used it would type a J or a P (or whatever). The keys were written down in a large notebook that was given directly to military commanders before they departed on their missions, and the index location of the key assigned to the message was set on the machine itself to encrypt and decrypt messages. The process of creating that key book and handing it to the commander was a key exchange. It was kept secure by ensuring that the only people who had the notebook of keys were people that were allowed to have them. Commanders were ordered to destroy their Enigma machines and accompanying notebooks if capture was likely. The Allies in the war were able to capture some of these machines eventually, which allowed a lot of incredibly smart people a chance to examine them and learn the algorithm used to encrypt data, which ultimately resulted in the Enigma machines becoming useless.

Modern key exchange still occurs, and there are specific techniques used to do so, even across the Internet. For example, Public Key Cryptography allows us to encrypt a message with a key that is publicly available to anyone who wants it, but the only people who can read the message are the people who have an accompanying “Private Key”. I’ll cover this subject more in a later post.

Conclusion

In this part of the series, I covered what encryption is and how it works. I explained the different things needed to encrypt a message and gave some examples to illustrate that. I also explained a little bit about how key exchange works. In the next part of this series, I’ll cover some techniques used to “Crack” encryption.

 

Advertisements

Designing Infrastructure High Availability

IT people, for some reason, seem to have an affinity towards designing solutions that use “cool” features, even when those features aren’t really necessary. This tendency sometimes leads to good solutions, but a lot of times it ends up creating solutions that fall short of requirements or leave IT infrastructure with significant short-comings in any number of areas. Other times, “cool” features result in over-designed, unnecessarily expensive infrastructure designs.

The “cool” factor is probably most obvious in the realm of High Availability design. And yes, I do realize that with the cloud becoming more common and prevalent in IT there is less need to understand the key architectural decisions needed when designing HA, but there are still plenty of companies that refuse to use the cloud, and for good reason. Cloud solutions are not meant to be one size fits all solutions. They are one size fits most solutions.

High Availability (Also called “HA”) is a complex subject with a lot of variables involved. The complexity is due to the fact that there are multiple levels of HA that can be implemented, from light touch failover to globally replicated, multi-redundant, always on solutions.

High Availability Defined

HA is, put simply, any solution that allows an IT resource (Files, applications, etc) to be accessible at all times, regardless of hardware failure. In an HA designed infrastructure, your files are always available even if the server that normally stores those files breaks for any reason.

HA has also become much more common and inexpensive in recent years, so more people are demanding it. A decade ago, any level of HA involved costs that exponentially exceeded a normal, single server solution. Today, HA is possible for as little as half the cost of a single server (Though, more often, the cost is essentially double the single server cost).

Because of the cost reduction, many companies have started demanding it, and because of the cool factor, a lot of those companies have been spending way too much. Part of why this happens is due to the history of HA in IT.

HA History Lesson

Prior to the development of Virtualization (the technology that allows multiple “Virtual” servers to run on a single physical server), HA was prohibitively expensive and required massive storage arrays, large numbers of servers, and a whole lot of configuration. Then, VMWare implemented a solution called “VMotion” that allowed a Virtual Server to be moved between server hardware immediately at the touch of a button (Called VM High Availability). This signaled a kind of renaissance in High Availability because it allowed servers to survive a hardware failure for a fraction of the cost normally associated with HA. There is a lot more involved in this shift that just VMotion (SANs, cheaper high-speed internet, and similar advancements played a big part), but the shift began about the time VMotion was introduced.

Once companies started realizing they could have servers that were always running, regardless of hardware failures, an unexpected market for high-availability solutions popped up, and software developers started developing better techniques for HA in their products. Why would they care? Because there are a lot of situations where a server solution can stop working properly that aren’t related to hardware failures, and VMotion was only capable of handling HA in the event of hardware failures.

VM HA vs Software HA

The most common mistake I see people making in their HA designs is accepting the assumption that VM-level High Availability is enough. It is most definitely not. Take Exchange server as an example. There are a number of problems that can occur in Exchange that will prevent users from accessing their email. Log drives fill up, forcing database dismount. IIS can fail to function, preventing users from accessing their mailbox. Databases can become corrupted, resulting in a complete shutdown of Exchange until the database can be repaired or restored from backup. VM HA does nothing to help when these situations come up.

This is where the Exchange Database Availability Group (DAG) comes in to play. A DAG involves constantly replication changes to Mailbox Databases to additional Exchange servers (as many of them as you want, but 2-3 is most common). With a DAG in place, any issue that would cause a database to dismount in a single Exchange server will instead result in a Failover, where the database dismounts on one server and mounts on the other server immediately (within a few seconds or less).

The DAG solution alone, however, doesn’t provide full HA for Exchange, because IIS failures will still cause problems, and if there is a hardware failure, you have to change DNS records to point them to the correct server. This is why a Load Balancer is a necessary part of true HA solutions.

Load Balancing

A Load Balancer is a network device that allows users to access two servers with a single IP address. Instead of having to choose which server you talk to, you just talk to the load balancer and it decides which server to direct you to automatically. The server that is chosen depends on a number of factors. Among those is, of course, how many people are already on each server, since the primary purpose of a load balancer is to balance the load between servers more or less equally.

More importantly, though, most load balancers are capable of performing health checks to make sure the servers are responding properly. If a server fails a health check for any reason (for instance, if one server’s not responding to HTTP requests), the load balancer will stop letting users talk to that server, effectively ensuring that whatever failure occurs on the first server doesn’t result in users being unable to access their data.

Costs vs. Benefits

Adding a load balancer to the mix, of course, increases the cost of a solution, but that cost is generally justified by the benefit such a solution provides. Unfortunately, many IT solutions fail to take this fact into account.

If an HA solution requires any kind of manual intervention to fix, the time required for notifying IT staff and getting the switch completed varies heavily, and can be anywhere from 5 minutes to several hours. From an availability perspective, even this small amount of time can have a huge impact, depending on how much money is assumed as “lost” because of a failure. Here comes some math (And not just the Trigonometry involved in this slight tangent).

Math!

The easiest way to determine whether a specific HA solution is worth implementing involves a few simple calculations. First, though, we have to make a couple assumptions, none of which are going to be completely accurate, but are meant to help determine whether an investment like HA is worth making (Managers and CEOs take note)

  1. A critical system that experiences downtime results in the company being completely unable to make money for the period of time that system is down.
  2. The amount of money lost during downtime is equal to whatever percentage of a year the system is down times the amount of annual revenue the organization expects to make in a year.

For instance, if a company’s revenue is $1,000,000 annually, they will make an average of $2 per minute (Rounded up from $1.90), so you can assume that 5 minutes of downtime costs that company about $10 in gross revenue. The cheapest of Load balancers cost about $2,000 and will last about 5 years, so you recoup the cost of the load balancer by saving yourself 200 minutes of downtime. That’s actually less than the amount of time most organizations spend updating a single server. With Software HA in place, updates don’t cause downtime if done properly, so the cost of a load balancer is covered in just being able to keep Exchange running during updates (This isn’t possible with just VM HA). But, of course, that doesn’t cover the cost of the second server (Exchange runs well on a low-end server, so $5000 for server and licenses is about what it would cost). Now imagine if the company makes $10,000,000 in revenue, or think about a company that has revenue of several billion dollars a year. HA becomes a necessity according to these calculations very quickly.

VM HA vs Software HA Cost/Benefit

Realistically, the cost difference between VM HA and Software HA is extremely low for most applications. Everything MS sells has HA capability baked in that can be done for very low costs, now that the Clustering features are included in Windows 2012 Standard. So the costs associated with implementing Software HA vs VM HA are almost always justifiable. Thus, VM HA is rarely the correct solution. And mixing the two is not a good idea. Why? Because it requires twice the storage and network traffic to accomplish, and provides absolutely no additional benefit, other than the fact that VM Replication is kinda cool. Software HA requires 2 copies of the Server to function, and each copy should use a separate server (Separate servers are required for VM HA as well, so only the OS licensing  is an increased cost) to protect against hardware failure of one VM host server.

Know When to Use VM HA

Please note, though, that I am not saying you should never use VM HA. I am saying you shouldn’t use VM HA if software HA is available. You just need to know when to use it and when not to. If software HA isn’t possible (There are plenty of solutions out there with no High Availability capabilities), VM HA is necessary and provides the highest level of high availability for those products. Otherwise, use the software’s own HA capabilities, and you’ll save yourself from lots of headaches.

If You Have a Cisco Firewall, Disable this Feature NOW!!!

I don’t often have an opportunity to post a rant in an IT blog (And even less opportunity to create a click-bait headline), but here goes nothing! Cisco’s method of doing ESMTP packet inspection is INCREDIBLY STUPID and you should disable it immediately. Why do I say that? Because when Cisco ASAs/whatever they call them these days are configured to perform packet inspection on ESMTP traffic, the preferred option of doing so is to block the STARTTLS verb entirely.*

In other words, Cisco firewalls are designed to completely disable email encryption in order to inspect email traffic. This is such a stupid method of allowing packet inspection that I can barely find words to explain it. But find them I shall.

You might think that you want your Firewall to inspect your email traffic in order to block malicious email or prevent unauthorized access, or what have you. And in that context, I agree. It’s a useful thing. But knowing that the Firewall is not only inspecting the traffic but also preventing any kind of built in E-Mail encryption from running is rant food for me.

I can just imagine the people at Cisco one day sitting around coming up with ideas on how to implement ESMTP packet inspection. I can imagine some guy saying, “I know, we can design our firewall to function as a Smart Host, so it can receive encrypted emails from our customer’s email servers, decrypt them, inspect them, then communicate with the destination servers and attempt to encrypt the messages from there.” I can then imagine that guy being ignored by the rest of his coworkers once the lazy dork in the room says, “How about we just block the STARTTLS verb?”

Thank you, Cisco engineers, for using the absolute laziest possible method you could find to ensure that all email traffic gets inspected, thereby making certain that your packet inspection needs are met while preventing your clients from using TLS encryption over SMTP.

So, if you have a Cisco firewall and want to have the ability to, you know, encrypt email, make sure you disable ESMTP packet inspection. If that feature is turned on, all your email is completely unencrypted. Barracuda provides a lovely guide on disabling ESMTP inspection. https://www.barracuda.com/support/knowledgebase/50160000000IyefAAC

Cisco tells people to just disable the rule that blocks STARTTLS in email, but that wouldn’t really help their packet inspection much, since everything past the STARTTLS verb is encrypted. If it’s encrypted, it can’t be inspected, other than looking at the traffic and going, “Yep. That’s all gobbledygook. Must be encrypted.” So that’s just a dumb recommendation that doesn’t do anything useful (It also requires a trip to the Cisco CLI, which is a great fun thing). This is why I say disable ESMTP packet inspection on your Cisco Firewall, cause it’s making you less secure.

*For the uninitiated, ESMTP stands for Extended Simple Mail Transfer Protocol, and it’s what every mail server on the Internet today uses to exchange emails with each other. The STARTTLS verb is a command that initiates an encrypted email session, so blocking it prevents encrypted email exchanges entirely. This is a bad thing.

 

Protect Yourself from the WannaCry(pt) Ransomware

Well, this has been an exciting weekend for IT guys around the world. Two IT Security folks can say that they saved the world and a lot of people in IT had no weekend. The attack was shut down before it encrypted the world, but there’s a good chance the attack will just be changed and start over. So what can you do to keep your system and data from being compromised by this most recent cyberware attack? If you’ve patched everything up already, or don’t know if you’re patched or vulnerable to this attack (or you just don’t want to deal with Windows updates right now), and you want to be absolutely positive that your computer won’t be affected, disable SMBv1! Like, seriously. You don’t need it. Unless you’re a Luddite.

There are some environments that may still need it (Anyone still using Windows XP and server 2003, antiquated management software, or PoS NAS devices), so if you have a Windows Server environment, run

Set-SmbServerConfiguration –AuditSmb1Access $true

in PowerShell for a bit and watch the SMBServer audit logs for failures.

To disable SMBv1 Server capabilities on your devices, do the following:

Server 2012 and Later

  1. Open Powershell (Click start and enter Powershell in the search bar to open it if you don’t know how to get to it)
  2. Type in this and hit Enter: Remove-WindowsFeature FS-SMB1
  3. Wait a bit for the uninstall process to finish.
  4. Voila! WannaCry can’t spread to this system anymore.

Windows 7, Server 2008/2008R2

  1. Open Powershell (Click start and enter Powershell in the search bar to open it if you don’t know how to get to it)
  2. Type in this (everything on the same line) and hit Enter: Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters” SMB1 -Type DWORD -Value 0 -Force
  3. Wait a bit for the command to complete.
  4. Voila! WannaCry can’t spread to this system anymore.

Windows 8.1/10

  1. Open Powershell (Click start and enter Powershell in the search bar to open it if you don’t know how to get to it)
  2. Type in this and hit Enter: Disable-WindowsOptionalFeature -Online -FeatureName smb1protocol
  3. Wait a bit for the uninstall process to finish.
  4. Voila! WannaCry can’t spread to this system anymore.

If you’re using Windows Vista…I am so so sorry…But the Windows 7/8 instructions should still work for you.

If you still use Windows XP…stop it. And you’re just going to have to get the patch that MS released for this vulnerability.

An additional step you may want to take is to disable SMBv1’s *client* capabilities on your systems. Running the two commands below (on one each line) will do this for you. This isn’t completely necessary, since the client can’t connect to other systems unless they support SMBv1, so if the SMBv1 server component is disabled above, the SMBv1 client can’t do anything. But, if you want to disable the client piece as well, enter the following commands:

sc.exe config lanmanworkstation depend= bowser/mrxsmb20/nsi
sc.exe config mrxsmb10 start= disabled

Resolving the Internal/External DNS zone Dilemma with Pinpoint DNS

Here’s an interesting trick that might help you resolve some of your DNS management woes, particularly if you have a different Public and Private DNS zone in your environment. For instance, you have a domain name of whatever.com externally, but use whatever.local internally. When your DNS is set up like that, all attempts to access systems using the whatever.com domain name will default to using the external, Public IP addresses assigned in that DNS zone. If you want to have internal, Private IP addresses assigned to those systems instead (which is common), you normally have to create an entire zone for whatever.com on your Internal DNS servers and populate it with A records for all the systems that exist in the public DNS zone. This technique, known as Split Horizon DNS or just Split DNS, results in additional administrative burden, since changes to the external DNS zone have to be replicated internally as well, and you have to spend time recreating all the DNS records that are already there. Luckily, there’s a little DNS trick you can use to get past this limitation: Pinpoint DNS.

Pinpoint DNS – What is it?

Put simply, Pinpoint DNS is a technique that utilizes some of the features of DNS to allow you to create a record for a single host name that exists in a different DNS zone than you usually use. For instance, instead of creating an entire Primary zone in your internal DNS for whatever.com, you can create a Pinpoint DNS record for really.whatever.com.

Make it So!

To implement Pinpoint DNS, all you have to do is create a new Primary DNS zone in DNS. Instead of naming the zone whatever.com, name the zone really.whatever.com. Once the zone is created, you can then assign an IP address to the root of that new zone (in Windows, this shows up as the IP being “Same as Parent”). Attempts to connect to really.whatever.com will resolve the root zone IP address, and you will be connected to whatever you set that IP to. So, instead of having an entire internal DNS zone full of DNS A records that you have to fill out, even if you only want an Internal IP on one of them, you can have a DNS zone for the single Internal IP record.

Downsides?

There really aren’t a lot of downsides to this, other than it could confuse people who aren’t familiar with the technique. It does look a little odd to see a lot of Forward Lookup Zones in DNS with only a single record in them, but that’s just aesthetic.

Functionally, as long as the DNS zones you create for Pinpoint records are AD integrated, there aren’t any technical downsides to this technique, but if you have a large, distributed DNS infrastructure that *isn’t* AD integrated, this technique will greatly increase administrative burden, since you have to create replication configurations for each Pinpoint record. If you run a DNS environment that isn’t part of Active Directory, Pinpoint DNS isn’t a good solution, because it increases the burden more than managing split horizon DNS.

DNS is a very light-weight protocol (having been designed in the late 70s), so replication traffic increases caused by having multiple Forward Lookup Zones is generally not an issue here.

Windows How To

To implement this, do the following:

  1. Open DNS Management (preferably a Domain Controller)
  2. Expand the DNS server that’s listed
  3. Right Click the Forward Lookup Zones entry and select New Zone to open the new zone wizard. Hit Next when the wizard opens.
  4. Make sure Primary DNS Zone is selected, and that the AD Integration option is checked. Click Next.
  5. Select the option to replication to all DCs in the Forest (particularly if you are in a multi-domain Forest. It’s not necessary for single domain forests, but it’s a good idea to set this anyway, in case that ever changes). Click Next.
  6. Enter the name of the zone. This will be the host name you’re assigning an IP to, so really.whatever.com for the previous example. Click Next.
  7. Select the option to only allow secure updates (It’s the default, anyway). Click Next, then Finish to finalize the wizard and create the zone.
  8. Expand your Forward Lookup Zones and you’ll see the zone there, like below:PinpointDNSZone
  9. Right Click the new Zone, select New Host (A or AAAA).
  10. In the wizard that appears, *leave the host name blank*. This is important, since it is the key part of Pinpoint DNS. An empty host name assigns the A record to the root domain.
  11. Enter the IP address you want to point to in the IP address field, then click Add Host. Your record should look like the one below:PinpointDNSRecord
  12. Verify the new record appears in the really.whatever.com zone, and shows as (Same as Parent).

Once that’s done, the next time you ping really.whatever.com (after running “ipconfig /flushdns” to clear your DNS cache, of course), you’ll receive the Internal IP address you assigned to the Pinpoint zone, and the rest of your external DNS records will remain managed by external DNS servers.

ADFS or Password Sync: Which one do you use?

I’ve run into a number of people who get confused about this subject when trying to determine how to get their On-Prem accounts and Office 365 synced and working properly. Most often, people are making a comment somewhere that says, “Just use Password sync, it’s just as good and doesn’t require a server,” or something similar. While I wish this were true, it most absolutely is not. While both options fulfill a similar requirement (“I want my AD usernames and Passwords to work with Office 365”), they both do so in a completely different manner that can have a major impact on security, workflow, and administration of services.

Single Sign-On vs Same Sign-On

To see the difference here, you have to understand the terminology involved. The primary goal for synchronizing user accounts between Office 365 and Active Directory is to give users the ability to use the same username and password to use O365 that they use when logging in to their computer. There are two terms used to describe this relationship. Single Sign-On refers to technology that allows users to access numerous applications while only logging in once. You’ve probably used Facebook or Google’s version of this to access applications, games, or other software. Same Sign-On, however, allows a user to access multiple applications with the same username and password. If you have two bank accounts and use the same username and password to access them, you’re using a simplified version of Same Sign-on. Most Same Sign-on solutions in IT involve an application that reads username and password data used by one system and copies it to another system.

The biggest difference between the two technologies is that Single Sign-On allows you to authenticate one time and access all the applications that are tied to that sign-on system. Same Sign-On requires you to log in to all applications regardless of which or how many applications you’ve already logged into using that username and password.

Single Sign-on and Same Sign-on have a lot of similarities as well. They both allow you to use the same username and password and both simplify account management (theoretically). Most importantly, for Office 365 at least, they allow you to manage usernames and passwords in a single environment, rather than having to change passwords in multiple locations every time something needs to change. The way changes are accomplished is where the decision to use ADFS or Password Sync faces its biggest test.

ADFS is Single Sign-On, Password Sync is Same Sign-On

For the purposes of Office 365, which is what this article focuses on, ADFS is considered a Single Sign-On solution, while Password Sync is Same Sign-On. What does this mean for you, the IT administrator, when you are deciding how to set up your environment? It means you need to consider the following realities of each solution:

ADFS Issues

  1. ADFS requires more administrative overhead to function:
    1. ADFS is not a perfect solution and it does fail sometimes.
    2. Troubleshooting ADFS can be a daunting task. The error messages provided by ADFS are really poorly worded and generic, so a lot of digging in logs is required to really figure out where a problem is coming from.
    3. ADFS requires a trust between your environment and Office 365. Maintaining the trust takes some effort. ADFS relies on Digital Certificates that have expiration dates, so you have to make sure the certificates are updated before they expire or ADFS won’t work.
  2. ADFS is tricky to configure sometimes. The Office 365 setup for it has been streamlined, but there are occasional setup issues that can be difficult to resolve or confusing.
  3. If your ADFS server goes down for any reason, Office 365 can’t be accessed. This means that a High Availability ADFS cluster is very beneficial. It’s also expensive.
  4. In short, ADFS has a significantly higher cost to use than password sync, but it is also more secure.

Password Sync

  1. Password sync copies the “hash” for the AD password to Office 365. This means that if Office 365 gets taken over by hackers (very very unlikely, but still a potential concern), they also get to take over your network because they have all your password hashes. This doesn’t happen with ADFS.
  2. The Synchronization between Office 365 and AD occurs on a scheduled basis. This occurs every 30 minutes at a minimum, so if you change someone’s password in AD, you have to wait up to 30 minutes for the password to change in Office 365. This can be very confusing for users and result in a lot of time consuming support calls, particularly if you enable account lockout in Office 365. You can force syncs to occur, but this does add a good bit of administrative time to the password change process.

Issue Mitigation

There are some ways to get around the issues involved with each solution. For instance, Microsoft is currently working on a cloud-based version of ADFS that will allow you to have ADFS level security without the added infrastructure and administrative costs of an ADFS server/cluster. They also provide an “upgraded” version of Azure AD (which is the back-end system for account management in Office 365) called Azure AD Premium. AAD Premium costs about 4 dollars a month, but allows you to provide your users with self-service password reset features and adds attribute “write-back” capabilities that allow you to manage users in the cloud when using ADConnect, which isn’t possible otherwise, meaning you can change distribution group membership, user passwords, and other attributes in Office 365 and those changes will by written to your AD environment.

Decisions

In the end, the decision between ADFS and Password Sync is entirely up to you. If you have major regulatory governance requirements or are very concerned about security, ADFS is a very capable system that will greatly improve system security for Office 365. However, if you work for a small organization with little to no major security concerns, Password sync will provide you with a lot of benefit.

 

What is a DNS SRV record?

If you’ve had to work with Active Directory or Exchange, there’s a good chance you’ve come across a feature of DNS called a SRV record. SRV records are an extremely important part of Active Directory (They are, in fact, the foundation of AD) and an optional part of Exchange Autodiscover. There are a lot of other applications that use SRV records to some degree or another (Lync/Skype for Business relies heavily on them, for instance).The question, though, is why SRV records are so important and what exactly do they do?

What does a SRV record do?

The purpose of a SRV record is found in its longer, more jargon filled name: Service Locator Record. It’s basically a DNS record that is meant to allow applications to find a Server that is providing a Service the application needs to function. They provide a centralized method of configuration and control of applications that result in less work configuring the client of a client/server based application.

For example, let’s say you’re an application designer and you are creating an application that needs to talk to a server for some reason. Prior to the existence of SRV records in DNS, you had two choices:

  1. Program the application so it only ever talked to a server if it had a specific name or IP address
  2. Include some configuration settings in the application that would let end users put in the DNS name of the server.

Both of these options are not very useful for usability. Hard-coding IP addresses or host names for the server makes setup difficult and very strict in its requirements. Making end users enter the server information usually causes a lot more work for IT staff, as they would usually be required to do this for all the users.

SRV records were first added to the DNS protocol’s specifications around the year 2000 to give programmers another option for designing Client/Server based software. With SRV records, the application can be designed to look for a SRV record and get server information without having be directly configured by end users or IT staff. This is similar to the first option above, but allows greater flexibility because the server can have any name or IP address you want and the application can still find it. Some of the advanced features of SRV records also allow failover capabilities and a lot of other cool stuff.

How do SRV Records Work?

Since Active Directory relies so heavily on SRV records, let’s use it as an example to explain how they work. First, let’s take a look at a typical AD DNS zone. Below, you can see a picture that shows the fully expanded _MSDCS zone for my test lab:srv-records-for-sysinteg

This shows the _Kerberos and _ldap SRV records created by a Domain Controller (Megaserver). Here’s basically what those records are for:

  1. Windows Login requires a Domain-Joined client to connect to a Domain Controller
  2. The login system is programmed to find a Domain Controller by looking for a SRV record at _ldap.Default-First-Site-Name._sites.DC._msdcs.sysinteg.ad
  3. The SRV record listed above has a value that returns megaserver.sysinteg.ad as the location of the server providing the _ldap service.
  4. The computer’s programming fills in a blank left for whatever value the _ldap service returns with the value that is returned (megaserver.sysinteg.ad).
  5. The computer then talks to megaserver.sysinteg.ad exclusively for all functions that require it to use LDAP (Which is the underlying Protocol used by AD for what it does).

If SRV records didn’t exist, we would be required to manually configure every computer on the domain to use megaserver.sysinteg.ad for anything related to AD. Now, that’s certainly not an unfeasible solution, but it does give us a lot more work to do.

What Makes up a SRV record?

A SRV record has a number of settings that are required for them to function. To see all the settings, look at the image below:

autodiscover

That shows an Exchange Autodiscover SRV record. I’ll explain what each setting here does:

Domain: This is an un-changeable value. It shows the DNS Domain the SRV record belongs to.
Service: This is the “service” the SRV record will be used to define. In the image, that service is Autodiscover. Note that all SRV records should have an Underscore at the start, so the service value is _autodiscover. The underscore prevents issues where there might be a regular A record with the same name as a SRV record.
Protocol: This is the Protocol used by the service. This can functionally be anything, since the protocol in a SRV record is usually only meant to organize SRV records, but it’s best to use the protocols allowed by RFC 2782 to ensure compatibility (_tcp and _udp are universally accepted), but the Protocol can be anything. Unless you are designing software that uses SRV records, you’ll never be in a situation where you’ll have to make a decision about what to put as the Protocol. If you’re trying to configure a SRV record for some application that you are setting up, just follow the instructions when creating a SRV record.
Priority: In a situation where multiple servers are providing the same service, the Priority value determines which server should be contacted first. The server chosen will always be the one with the lowest number value here.
Weight: In a situation where you have multiple SRV records with the same Service value and Priority value, the Weight is used to determine which server should be used. When the application is designed according to RFC 2782, the Weight value of all SRV records is added together to determine the full Weight. Whatever portion of that weight a single SRV record is assigned determines how often a server will be used by the application. For instance, if you have 2 SRV records with the same Service and Priority where Server 1 has a weight of 50 and Server 2 has a weight of 25, Server 1 will be chosen by the application as its service provider 2/3s of the time because it’s weight of 50 is 2/3s of the total weight assigned, or 75. Server 2 will be chosen the remaining 1/3 of the time. If there’s only one server to host the service, set this value to 0 to avoid confusion.
Port Number: This setting provides Port data for the application to use when contacting the server. If, for instance, your server is providing this service on port 5000, you would put 5000 in as the Port number. The setting here is defined by how the server is configured. For Autodiscover, as shown above, the value is 443, which is the default port designated by the HTTPS protocol. The Autodiscover Website in my environment is being hosted on the default HTTPS port, so I put in port 443. If I wanted to change my server to use port 5000, I could do so, but I would need to update my SRV record to match (As an aside, if I wanted to change the port Autodiscover was published on, I would be required to use a SRV record for Autodiscover to work, as opposed to any other method).
Host Offering this Service: This is, put simply, the host name of the server we want our clients to communicate with. You can use an IP address or a Host name here, but it’s generally best to use the Host name, since IPs can and do change over time.

Using SRV Records to Enable High Availability

If you managed to read through all the descriptions of those settings up there, you may have noticed my explanation of the Priority and Weight settings. Well, those two settings allow for one of the best features of SRV records: High Availability.

Prior to the existence of SRV records, the only way you could use DNS to enable high availability was to use a feature called Round Robin. Round Robin DNS is where you have multiple IP addresses assigned to one host name (or A record). When this is set up, the DNS server will alternate between all the IPs assigned to that A record, giving the first IP out to the first client, the second IP to the second client, the third IP to the third client, and the first IP again to the fourth client (assuming 3 IPs for one A record).

With a SRV record, though, we can configure much more advanced and capable High Availability features by having multiple SRV records that have the same Service Name, but different combinations of Priority and Weight.

When we use SRV records, we have two options for high availability: Failover and Load Balancing. We can also combine the two if we wish. To do this, we manipulate the values of Priority and Weight.

If we want failover capabilities for our application, we would have two servers hosting the service and configure one server with a lower Priority value than the second. When the application performs a SRV record lookup, it will retrieve all the SRV records and attempt to contact all servers until it gets a response, using the Priority value to determine the order. A lower Priority value will be contacted first.

If we want to have load balancing for the application (all servers can be used at any time), we have multiple SRV records with the same service name, like with the Failover solution, and the same Priority value. We then determine how much of the load we want each server to take. If we have two servers providing the same service and want them to share the load equally, we pick any even number between 2 and 65534 (65535 is the highest possible Weight value) then divide that number by 2. The resulting value is entered for the Weight on both servers. When a client queries the SRV record, it will receive all values that match the SRV record, calculate the total weight, and then pick a random number between 1 and whatever the total weight value of all SRV records is to determine which server to talk to.

For instance, if you had Server 1 and Server 2 both with a Weight of 50 in their SRV record, the client would assign half of the total weight value, 100, to Server 1 and half to Server 2. Let’s say it assigns 1-50 to Server 1 and 51-100 to Server 2. The client would then pick a number between 1 and 100. If it picked a number between 1 and 50, the client would communicate with Server 1. Otherwise, it would talk to Server 2. Note: Because this functions using a random number, you will not always end up with a results that match the calculated expectations. Also note: The system used to determine which system is used, based on the Weight value, is determined by the application’s developer. This is just a simple example of how it can work. Some developers may choose a scheme that always results in an exact load distribution.

The Weight value can be used with as many servers as you want (up to 65534 servers), and with any percentage amount you want to define your load balancing scheme. You can have 4 Servers, with only three providing service 33% of the time, while the fourth server only gets chosen when all others are down by setting the weight for three SRV records to 33 and the fourth to 0. Note that a value of 0 means that the server is only chosen when all others are unavailable. You should not set multiple copies of the same SRV record with weights of 0.

Lastly, you can combine Priority and Weight to have multiple load balanced groups of servers. This isn’t a very common solution, but it is possible to have Server 1 and 2 using priority 1 and weight 50, with Server 3 and 4 using priority 2 with weight 50. In this situation, Servers 1 and 2 would provide 50 percent of the system load, but if both Server 1 and 2 stopped working, Server 3 and 4 would then be used, while distributing the load between themselves.

Tinkering with AD

If you want to see how SRV records can be used to handle high availability and get a good example of a system that uses SRV records to their fullest capabilities, try tinkering with some of your AD SRV records. By manipulating Priority and Weight, you can force clients to always use a specific DC, or configure them to use one DC more often than others.

Try modifying the Weight and Priority of the various SRV records to see what happens. For instance, if you want one specific DC in your environment to handle Kerberos authentication and another one to hand LDAP lookups, change the priorities of those records so one server has a 0 in Kerberos and 100 in LDAP, while the other has 100 in Kerberos and 0 in LDAP. You can also tinker with the Weight to give a DC with more resources priority over smaller, backup DCs. Give your monster DC a weight of 90 and a tiny, possibly older DC a weight of 10. By default, Clients in AD will pick a DC at random.

The easiest way to see this in action is to set one DC with a Priority of 10 and another with a priority of 20 on all SRV records in the _msdcs zone. Then make sure the DNS data is replicated between the DCs (either wait or do a manual replication). Run ipconfig /flushdns on a client machine and log out, then back in. Run SET LOGONSERVER in CMD to see which DC the computer is using. Now, switch the priorities of the SRV records in DNS, wait for replication, run ipconfig /flushdns, then then log out and back in again. Run SET LOGONSERVER again and you should see that the second DC is now chosen.

Final Thoughts

As I mentioned, much of a SRV record’s configuration is determined by Software Developers, since they define how their application functions. To be specific, as an IT administrator or engineer, you’ll never be able to decide what the Service Name and Protocol will be. Those are always determine by software developers. You’ll also never be in control of whether or not an application will use SRV records. Software Developers have to design their applications to make use of SRV records. But if you take some time to understand how a SRV record works, you can greatly improve functionality and security for any and all applications that support configuration using SRV records.

If you’re a Software Developer, I have to point out the incredible usefulness of SRV records and the power they give to you. Instead of having to hard-code server configurations or develop UIs that allow your end users to put in server information, you can utilize SRV records to partially automate your applications and make life easier for the IT people who make your software work. SRV records have been available for almost 2 decades now. It’s about time we started using them more and cut down the workload of the world’s IT guys.