ADFS or Password Sync: Which one do you use?

I’ve run into a number of people who get confused about this subject when trying to determine how to get their On-Prem accounts and Office 365 synced and working properly. Most often, people are making a comment somewhere that says, “Just use Password sync, it’s just as good and doesn’t require a server,” or something similar. While I wish this were true, it most absolutely is not. While both options fulfill a similar requirement (“I want my AD usernames and Passwords to work with Office 365”), they both do so in a completely different manner that can have a major impact on security, workflow, and administration of services.

Single Sign-On vs Same Sign-On

To see the difference here, you have to understand the terminology involved. The primary goal for synchronizing user accounts between Office 365 and Active Directory is to give users the ability to use the same username and password to use O365 that they use when logging in to their computer. There are two terms used to describe this relationship. Single Sign-On refers to technology that allows users to access numerous applications while only logging in once. You’ve probably used Facebook or Google’s version of this to access applications, games, or other software. Same Sign-On, however, allows a user to access multiple applications with the same username and password. If you have two bank accounts and use the same username and password to access them, you’re using a simplified version of Same Sign-on. Most Same Sign-on solutions in IT involve an application that reads username and password data used by one system and copies it to another system.

The biggest difference between the two technologies is that Single Sign-On allows you to authenticate one time and access all the applications that are tied to that sign-on system. Same Sign-On requires you to log in to all applications regardless of which or how many applications you’ve already logged into using that username and password.

Single Sign-on and Same Sign-on have a lot of similarities as well. They both allow you to use the same username and password and both simplify account management (theoretically). Most importantly, for Office 365 at least, they allow you to manage usernames and passwords in a single environment, rather than having to change passwords in multiple locations every time something needs to change. The way changes are accomplished is where the decision to use ADFS or Password Sync faces its biggest test.

ADFS is Single Sign-On, Password Sync is Same Sign-On

For the purposes of Office 365, which is what this article focuses on, ADFS is considered a Single Sign-On solution, while Password Sync is Same Sign-On. What does this mean for you, the IT administrator, when you are deciding how to set up your environment? It means you need to consider the following realities of each solution:

ADFS Issues

  1. ADFS requires more administrative overhead to function:
    1. ADFS is not a perfect solution and it does fail sometimes.
    2. Troubleshooting ADFS can be a daunting task. The error messages provided by ADFS are really poorly worded and generic, so a lot of digging in logs is required to really figure out where a problem is coming from.
    3. ADFS requires a trust between your environment and Office 365. Maintaining the trust takes some effort. ADFS relies on Digital Certificates that have expiration dates, so you have to make sure the certificates are updated before they expire or ADFS won’t work.
  2. ADFS is tricky to configure sometimes. The Office 365 setup for it has been streamlined, but there are occasional setup issues that can be difficult to resolve or confusing.
  3. If your ADFS server goes down for any reason, Office 365 can’t be accessed. This means that a High Availability ADFS cluster is very beneficial. It’s also expensive.
  4. In short, ADFS has a significantly higher cost to use than password sync, but it is also more secure.

Password Sync

  1. Password sync copies the “hash” for the AD password to Office 365. This means that if Office 365 gets taken over by hackers (very very unlikely, but still a potential concern), they also get to take over your network because they have all your password hashes. This doesn’t happen with ADFS.
  2. The Synchronization between Office 365 and AD occurs on a scheduled basis. This occurs every 30 minutes at a minimum, so if you change someone’s password in AD, you have to wait up to 30 minutes for the password to change in Office 365. This can be very confusing for users and result in a lot of time consuming support calls, particularly if you enable account lockout in Office 365. You can force syncs to occur, but this does add a good bit of administrative time to the password change process.

Issue Mitigation

There are some ways to get around the issues involved with each solution. For instance, Microsoft is currently working on a cloud-based version of ADFS that will allow you to have ADFS level security without the added infrastructure and administrative costs of an ADFS server/cluster. They also provide an “upgraded” version of Azure AD (which is the back-end system for account management in Office 365) called Azure AD Premium. AAD Premium costs about 4 dollars a month, but allows you to provide your users with self-service password reset features and adds attribute “write-back” capabilities that allow you to manage users in the cloud when using ADConnect, which isn’t possible otherwise, meaning you can change distribution group membership, user passwords, and other attributes in Office 365 and those changes will by written to your AD environment.


In the end, the decision between ADFS and Password Sync is entirely up to you. If you have major regulatory governance requirements or are very concerned about security, ADFS is a very capable system that will greatly improve system security for Office 365. However, if you work for a small organization with little to no major security concerns, Password sync will provide you with a lot of benefit.

Update – 10/30/2017

It’s been a while since I wrote this post, but a number of changes to ADFS and the addition of Passthrough authentication using AD Connect mean that I need to update some of the conclusions here, and will definitely change the solution you may choose.

  1. Password Sync has a specific limitation for environments that use limitations to logon hours in Active Directory. Because the attributes for logon hours are not properly synced through Azure AD Connect, logon hour limitations will not function in Office 365 when using Password Sync. ADFS authenticates against AD directly, so it will not allow users to log in if AD says they are outside of their login hours window(s).
  2. Passthrough Authentication in Azure AD Connect *greatly* improves authentication in Office 365 by creating an authentication that passes credentials to AD through Azure ADConnect, rather than storing password hashes in the cloud. This significantly reduces the security risks associated with using password sync.
  3. ADFS in Server 2012 R2 and later allows a pretty awesome feature that I wasn’t aware of til just now, a self-service password reset portal tied to the ADFS portal. covers this in greater detail.



What is a DNS SRV record?

If you’ve had to work with Active Directory or Exchange, there’s a good chance you’ve come across a feature of DNS called a SRV record. SRV records are an extremely important part of Active Directory (They are, in fact, the foundation of AD) and an optional part of Exchange Autodiscover. There are a lot of other applications that use SRV records to some degree or another (Lync/Skype for Business relies heavily on them, for instance).The question, though, is why SRV records are so important and what exactly do they do?

What does a SRV record do?

The purpose of a SRV record is found in its longer, more jargon filled name: Service Locator Record. It’s basically a DNS record that is meant to allow applications to find a Server that is providing a Service the application needs to function. They provide a centralized method of configuration and control of applications that result in less work configuring the client of a client/server based application.

For example, let’s say you’re an application designer and you are creating an application that needs to talk to a server for some reason. Prior to the existence of SRV records in DNS, you had two choices:

  1. Program the application so it only ever talked to a server if it had a specific name or IP address
  2. Include some configuration settings in the application that would let end users put in the DNS name of the server.

Both of these options are not very useful for usability. Hard-coding IP addresses or host names for the server makes setup difficult and very strict in its requirements. Making end users enter the server information usually causes a lot more work for IT staff, as they would usually be required to do this for all the users.

SRV records were first added to the DNS protocol’s specifications around the year 2000 to give programmers another option for designing Client/Server based software. With SRV records, the application can be designed to look for a SRV record and get server information without having be directly configured by end users or IT staff. This is similar to the first option above, but allows greater flexibility because the server can have any name or IP address you want and the application can still find it. Some of the advanced features of SRV records also allow failover capabilities and a lot of other cool stuff.

How do SRV Records Work?

Since Active Directory relies so heavily on SRV records, let’s use it as an example to explain how they work. First, let’s take a look at a typical AD DNS zone. Below, you can see a picture that shows the fully expanded _MSDCS zone for my test lab:srv-records-for-sysinteg

This shows the _Kerberos and _ldap SRV records created by a Domain Controller (Megaserver). Here’s basically what those records are for:

  1. Windows Login requires a Domain-Joined client to connect to a Domain Controller
  2. The login system is programmed to find a Domain Controller by looking for a SRV record at
  3. The SRV record listed above has a value that returns as the location of the server providing the _ldap service.
  4. The computer’s programming fills in a blank left for whatever value the _ldap service returns with the value that is returned (
  5. The computer then talks to exclusively for all functions that require it to use LDAP (Which is the underlying Protocol used by AD for what it does).

If SRV records didn’t exist, we would be required to manually configure every computer on the domain to use for anything related to AD. Now, that’s certainly not an unfeasible solution, but it does give us a lot more work to do.

What Makes up a SRV record?

A SRV record has a number of settings that are required for them to function. To see all the settings, look at the image below:


That shows an Exchange Autodiscover SRV record. I’ll explain what each setting here does:

Domain: This is an un-changeable value. It shows the DNS Domain the SRV record belongs to.
Service: This is the “service” the SRV record will be used to define. In the image, that service is Autodiscover. Note that all SRV records should have an Underscore at the start, so the service value is _autodiscover. The underscore prevents issues where there might be a regular A record with the same name as a SRV record.
Protocol: This is the Protocol used by the service. This can functionally be anything, since the protocol in a SRV record is usually only meant to organize SRV records, but it’s best to use the protocols allowed by RFC 2782 to ensure compatibility (_tcp and _udp are universally accepted), but the Protocol can be anything. Unless you are designing software that uses SRV records, you’ll never be in a situation where you’ll have to make a decision about what to put as the Protocol. If you’re trying to configure a SRV record for some application that you are setting up, just follow the instructions when creating a SRV record.
Priority: In a situation where multiple servers are providing the same service, the Priority value determines which server should be contacted first. The server chosen will always be the one with the lowest number value here.
Weight: In a situation where you have multiple SRV records with the same Service value and Priority value, the Weight is used to determine which server should be used. When the application is designed according to RFC 2782, the Weight value of all SRV records is added together to determine the full Weight. Whatever portion of that weight a single SRV record is assigned determines how often a server will be used by the application. For instance, if you have 2 SRV records with the same Service and Priority where Server 1 has a weight of 50 and Server 2 has a weight of 25, Server 1 will be chosen by the application as its service provider 2/3s of the time because it’s weight of 50 is 2/3s of the total weight assigned, or 75. Server 2 will be chosen the remaining 1/3 of the time. If there’s only one server to host the service, set this value to 0 to avoid confusion.
Port Number: This setting provides Port data for the application to use when contacting the server. If, for instance, your server is providing this service on port 5000, you would put 5000 in as the Port number. The setting here is defined by how the server is configured. For Autodiscover, as shown above, the value is 443, which is the default port designated by the HTTPS protocol. The Autodiscover Website in my environment is being hosted on the default HTTPS port, so I put in port 443. If I wanted to change my server to use port 5000, I could do so, but I would need to update my SRV record to match (As an aside, if I wanted to change the port Autodiscover was published on, I would be required to use a SRV record for Autodiscover to work, as opposed to any other method).
Host Offering this Service: This is, put simply, the host name of the server we want our clients to communicate with. You can use an IP address or a Host name here, but it’s generally best to use the Host name, since IPs can and do change over time.

Using SRV Records to Enable High Availability

If you managed to read through all the descriptions of those settings up there, you may have noticed my explanation of the Priority and Weight settings. Well, those two settings allow for one of the best features of SRV records: High Availability.

Prior to the existence of SRV records, the only way you could use DNS to enable high availability was to use a feature called Round Robin. Round Robin DNS is where you have multiple IP addresses assigned to one host name (or A record). When this is set up, the DNS server will alternate between all the IPs assigned to that A record, giving the first IP out to the first client, the second IP to the second client, the third IP to the third client, and the first IP again to the fourth client (assuming 3 IPs for one A record).

With a SRV record, though, we can configure much more advanced and capable High Availability features by having multiple SRV records that have the same Service Name, but different combinations of Priority and Weight.

When we use SRV records, we have two options for high availability: Failover and Load Balancing. We can also combine the two if we wish. To do this, we manipulate the values of Priority and Weight.

If we want failover capabilities for our application, we would have two servers hosting the service and configure one server with a lower Priority value than the second. When the application performs a SRV record lookup, it will retrieve all the SRV records and attempt to contact all servers until it gets a response, using the Priority value to determine the order. A lower Priority value will be contacted first.

If we want to have load balancing for the application (all servers can be used at any time), we have multiple SRV records with the same service name, like with the Failover solution, and the same Priority value. We then determine how much of the load we want each server to take. If we have two servers providing the same service and want them to share the load equally, we pick any even number between 2 and 65534 (65535 is the highest possible Weight value) then divide that number by 2. The resulting value is entered for the Weight on both servers. When a client queries the SRV record, it will receive all values that match the SRV record, calculate the total weight, and then pick a random number between 1 and whatever the total weight value of all SRV records is to determine which server to talk to.

For instance, if you had Server 1 and Server 2 both with a Weight of 50 in their SRV record, the client would assign half of the total weight value, 100, to Server 1 and half to Server 2. Let’s say it assigns 1-50 to Server 1 and 51-100 to Server 2. The client would then pick a number between 1 and 100. If it picked a number between 1 and 50, the client would communicate with Server 1. Otherwise, it would talk to Server 2. Note: Because this functions using a random number, you will not always end up with a results that match the calculated expectations. Also note: The system used to determine which system is used, based on the Weight value, is determined by the application’s developer. This is just a simple example of how it can work. Some developers may choose a scheme that always results in an exact load distribution.

The Weight value can be used with as many servers as you want (up to 65534 servers), and with any percentage amount you want to define your load balancing scheme. You can have 4 Servers, with only three providing service 33% of the time, while the fourth server only gets chosen when all others are down by setting the weight for three SRV records to 33 and the fourth to 0. Note that a value of 0 means that the server is only chosen when all others are unavailable. You should not set multiple copies of the same SRV record with weights of 0.

Lastly, you can combine Priority and Weight to have multiple load balanced groups of servers. This isn’t a very common solution, but it is possible to have Server 1 and 2 using priority 1 and weight 50, with Server 3 and 4 using priority 2 with weight 50. In this situation, Servers 1 and 2 would provide 50 percent of the system load, but if both Server 1 and 2 stopped working, Server 3 and 4 would then be used, while distributing the load between themselves.

Tinkering with AD

If you want to see how SRV records can be used to handle high availability and get a good example of a system that uses SRV records to their fullest capabilities, try tinkering with some of your AD SRV records. By manipulating Priority and Weight, you can force clients to always use a specific DC, or configure them to use one DC more often than others.

Try modifying the Weight and Priority of the various SRV records to see what happens. For instance, if you want one specific DC in your environment to handle Kerberos authentication and another one to hand LDAP lookups, change the priorities of those records so one server has a 0 in Kerberos and 100 in LDAP, while the other has 100 in Kerberos and 0 in LDAP. You can also tinker with the Weight to give a DC with more resources priority over smaller, backup DCs. Give your monster DC a weight of 90 and a tiny, possibly older DC a weight of 10. By default, Clients in AD will pick a DC at random.

The easiest way to see this in action is to set one DC with a Priority of 10 and another with a priority of 20 on all SRV records in the _msdcs zone. Then make sure the DNS data is replicated between the DCs (either wait or do a manual replication). Run ipconfig /flushdns on a client machine and log out, then back in. Run SET LOGONSERVER in CMD to see which DC the computer is using. Now, switch the priorities of the SRV records in DNS, wait for replication, run ipconfig /flushdns, then then log out and back in again. Run SET LOGONSERVER again and you should see that the second DC is now chosen.

Final Thoughts

As I mentioned, much of a SRV record’s configuration is determined by Software Developers, since they define how their application functions. To be specific, as an IT administrator or engineer, you’ll never be able to decide what the Service Name and Protocol will be. Those are always determine by software developers. You’ll also never be in control of whether or not an application will use SRV records. Software Developers have to design their applications to make use of SRV records. But if you take some time to understand how a SRV record works, you can greatly improve functionality and security for any and all applications that support configuration using SRV records.

If you’re a Software Developer, I have to point out the incredible usefulness of SRV records and the power they give to you. Instead of having to hard-code server configurations or develop UIs that allow your end users to put in server information, you can utilize SRV records to partially automate your applications and make life easier for the IT people who make your software work. SRV records have been available for almost 2 decades now. It’s about time we started using them more and cut down the workload of the world’s IT guys.



A Treatise on Information Security

One famous misquote of American Founding Father Ben Franklin goes like this, “Anyone who would sacrifice freedom for security deserves neither.” At first glance, this statement speaks to the heart of people who have spent hours waiting in line at the airport, waiting for a TSA agent to finish groping a 90 year old lady in a wheel chair so they can take off their shoes and be guided into a glass tube to be bombarded with the emissions of a full body scanner. But the reality of any kind of security, and Information Security in particular, is that any increase of security requires sacrificing freedom. The question we all have to ask, as IT professionals tasked with improving or developing proper security controls and practices, is whether or not the cost of lost freedom is worth the amount of increased security.

The Balancing Act

If you were to dig a little, like I have, you would find that Mr. Franklin actually said, “Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.” This version of the quote demonstrates very eloquently one of the principle struggles of developing security policies in IT. After all, there is a famous axiom in the Industry (it’s quote day here at ACBrown’s IT World), “The most secure computer is unplugged.” Or something like that. I’m probably misquoting.

In a humorous demonstration of that axiom, I present a short story. When I was a contractor performing DIACAP (Go look it up) audits on US military bases, we were instructed to use a tool called the “Gold Disc.” The Gold Disc was developed by personnel in the military to scan through a workstation or server and check for configuration settings that violated the DISA (That’s the Defense Information Systems Agency) STIG (That’s Security Technical Implementation Guide. Not the guy that drives cars for that one TV show). The Gold Disc was a handy tool, but the final screen that gave you the results of the scan had a little button on it that we were expressly forbidden from ever pushing. That button said, simply, “Remediate All.” Anyone who pushed that button would find that they were instantly locked out of the network, unable to communicate with anything. Pushing the button on an important server would result in mass hysteria, panic, and sudden loss of employment for the person who pushed the button. You see, the Remediate All button caused the tool to change every configuration setting to comply exactly with the DISA STIG recommendations. If you’re not laughing yet, here’s the puchline…Perfectly implementing the DISA STIG puts computers in a state that makes it impossible for them to communicate with one another properly. <Insert follow up joke regarding Government and the problems it causes here>.

On the other hand, computers that blatantly failed to comply with the DISA STIG recommendations would (theoretically) be removed from the network (after 6 or 7 months of bureaucratic nonsense). In the end, there was a point in the middle where we wanted the systems to be. That balancing point was the point where computers were secure enough to prevent the majority of attacks from succeeding, but not so secure that they significantly inhibited the ability of people to do their jobs effectively and in a timely matter. As IT Security professionals, we have a duty to find the right balance of security and freedom for the environments we are responsible for.

The Costs of Security

Everything in IT has a cost. The cost can’t always be easily quantified, but there is always a cost associated. For instance, something as simple as password expiration in Active Directory has a very noticeable cost. How much time do system administrators spend unlocking accounts for people who forgot their password after it just reset? Multiply the number of hours spent unlocking accounts and helping people reset their passwords by the amount of money the average system administrator makes and you get the cost of that level of security in dollars. But that is only the direct cost.

Implementing password expiration and account lockout policies also reduce the level of freedom your employees have in controlling their user accounts. That lost freedom also translates into lost revenue as employees are forced to spend their time calling tech support to get their password reset. Then you also consider lost productivity due to people wasting time trying to remember the password they set earlier that morning.

With some estimates showing that nearly 30 percent of all help-desk work hours are devoted to password resets, the cost of enabling password expiration climbs pretty high.

The Cost of Freedom

On the other hand, every day an individual goes without resetting their passwords increases the likelihood of that password being discovered. Furthermore, every day a discovered password is left unchanged increases the likelihood of that password being used by an unauthorized individual. If the individual who lost the password is highly privileged (a CEO for example), the cost to the business who employs that individual can be astronomical. There are numerous cases of companies going bankrupt after major intrusions linked to exposed passwords

So while it may cost a lot to implement a password expiration policy, it can cost infinitely more not to. In comparison, the cost of implementing a password expiration policy is almost always justified. This is particularly true when working for organizations that fall under the purview of Regulatory Compliance laws (Queue the dramatic music).

Regulatory Compliance

One of the unfortunate realities of the IT world is that some organizations have outright failed to consider the costs of *not* having a good security policy and just plain failed to have good security. Those organizations got hit hard and either lost data that cost the business huge amounts of money, or worse, data that put their customers at risk of identity theft. So, because the kids couldn’t play safe without supervision, most Governments around the world have developed laws that tell businesses in key industries things that they must do when developing their IT infrastructure.

For instance, the Healthcare industry in the US must follow the HITECH addition to HIPAA (so many acronyms) which mandates the need for utilizing IT infrastructure that prevents the unauthorized disclosure of certain types of patient information. Publicly owned corporations in the US are required to follow the rules outlined in the Sarbanes Oxley act, which requires companies to maintain adequate records of business dealings for a significant period of time. The aforementioned DIACAP audits are performed to verify whether military installations are complying with the long list of instructions and requirements developed by the DoD (if you ever have trouble sleeping…).

Organizations that fall under the umbrella of one or more Regulatory Compliance laws are compelled to ensure their IT infrastructure meets the defined requirements. Failing to do so is often punishable with significant fines. Failing to do so and getting attacked in a way that makes use of security holes meant to be plugged by regulations is a huge problem (not just for the organization itself). For regulatory compliance applicable organizations, the costs associated with violating regulations must always be considered when developing a security policy. This is mostly a good thing, since the costs of actually meeting the regulations is occasionally extremely high.

Mitigating Costs – Not Always Worth It

There are actually a lot of technical solutions in the IT industry that exist entirely to reduce the costs associated with implementing security technologies. For instance, utilizing a Self-Service Password Reset (SSPR, cause that’s a lot of typing) solution can significantly reduce the number of man-hours required by help-desk staff to reset passwords and unlock accounts. But such solutions also have costs associated with them. Aside from the purchase cost, many of these solutions significantly reduce security in an organization.  SSPRs, again, increase user freedom and control of their user account, which makes things less secure again. However, depending on the SSPR in use, how much security is reduced depends on how users interact with the software. An SSPR that only requires someone to enter their username and current password is likely to reduce security significantly more than an SSPR that requires users to answer 3 “security questions,” which will, in turn, reduce security much more than an SSPR that requires people to provide their Social Security Number, submit a urine sample, and authenticate with a retina scan while sacrificing a chicken from Uruguay with a special ceremonial dagger. But, again, the time spent by employees resetting their own password (not to mention the cost of importing chickens from Uruguay) increases the cost of such solutions. The key to determining which solutions and technologies to use is a matter of finding the right balance of freedom and security in the environment.

When Security Costs Too Much Freedom

There are times when the financial costs and the cost of freedom associated with a security measure are obviously too high (I’m looking at you, TSA). Implementing longer passwords may have many technical security advantages, but doing so includes a risk that the loss of freedom is too great for people to handle. For instance, implementing a 20 character minimum password policy that includes password complexity requirements might cause some employees with bad memories to write their password down and put it in a place that easy for them to remember. Like on a post-it note stuck to their monitor. Suddenly, that very secure password policy is defeated by a low-tech solution. Now you have a password accessible to anyone walking around in the office (like Janitor Bob) that can be used to access critical information and sell it to the highest bidder (AKA, your competitor). This is a prime example of the unconsidered costs of security being too high. Specifically, the security requirement costs so much freedom and negatively impacts employees so much that they end up bypassing security entirely.

Balancing Act

In the end, IT security is a massive balancing act. To properly balance security and freedom in IT, it is necessary to ask questions and obtain as much knowledge about the environment as possible. The investigative part is among the most important phases in any security policy. Organizations looking to increase security need to have balance in their security implementations. Decisions on IT security must always be thoughtful ones.

Disabling Direct Access Forced Tunneling

So you’re trying to get Direct Access (DA) running in your environment and you suddenly realized that your test machine can no longer access…anything. Well, this may be due to the “accidental” enabling of “Forced Tunneling” in your DA configuration. How do you fix it? You can pretty easily reconfigure your DA configuration to disable Forced Tunneling, but unless your test machine is directly connected to your AD environment, you’ll never be able to get the Group Policy updates on your test machine. Now, you *should* be connected if you’re doing this, but there are some situations where that’s less than possible (Remote workers unite!).

Disabling Forced Tunneling on Client Machines

I’ll give the way to fix this first, and explain why this happens second:

  1. Open Regedit
  2. Navigate to HKLM:\Software\Policies\Microsoft\Windows\TCPIP\v6Transition
  3. Set all visible entries to Disabled
  4. Delete all subkeys.
  5. Reboot
  6. Rejoice

Why Does This Happen?

Well, if your DA configuration is not configured perfectly, you can’t initialize a DA session. So if, for instance, your Computer Client Certificate fails to enroll properly before you disconnect, and your machine has obtained the DA settings from Group Policy, you’re stuck dealing with all the settings required to connect to DA, but can’t actually do so. With Forced Tunneling enabled, you are forcing all DA client systems to go through DA for *any* internet connectivity. So if your DA DNS settings also configure things to point to an Internal IP for DNS lookups when connected, congratulations…you can’t reach a dang thing. Disabling Forced Tunneling in the registry is about your only option here. Just make sure you’ve also disabled Forced Tunneling in your DA config before you disconnect from the VPN again or you’ll have to do this stuff all over again. (Oops)

Final Note

Don’t use Forced Tunneling with Direct Access. It provides no additional security and is a huge pain in the butt if DA doesn’t connect properly for *any* reason.

Anatomy of a Certificate Error

The most important step in diagnosing a specific security error involves determining what the error is telling you. There are a few things that can cause certificate errors, and what you do depends entirely on what is causing the error to begin with. Once you know what the error is telling you, it becomes much easier to figure out what you need to do next.

Getting the Message

One of the more concise and effective Certificate Errors is the one delivered by Outlook. An image of it is below.



Note the numbers 1, 2, and 3. These don’t normally show up on the error because I put them there for reference. In case you were looking at your own error. At any rate, the numbers are sitting next to three possible kinds of errors you can get with a certificate.

For this particular error, you’ll note that there is a red X next to number 3. That X points out that one of the Validity checks run against the certificate failed. Specifically, the name I used to access the server doesn’t match either the Common Name on the certificate or any of the Subject Alternate Names. This is probably the most common certificate error you’ll see.

The four Checks

Every time you access a website that is secured with SSL, there are four checks the computer you use runs to verify that the certificate is valid. The reason for these checks is explained in my article on Digital Certificates. The four checks are as follows, and match the numbering in the image above.

  1. Was the Certificate issued by a known and trusted Certificate Authority?
  2. Is the current date within the period of time the Certificate is valid?
  3. Does the host name used to access the server match any of the host names defined by the certificate?
  4. Has the Certificate been Revoked? (Wait, there’s no 4 on the image! Don’t worry, I’ll explain later.)

If any of these checks fails, you’ll get a certificate error. Note that this *does not mean* that the data you’re trying to encrypt isn’t going to be encrypted. Any time you use SSL or TLS, you’re data will be encrypted whether the certificate is valid or not. However, if any of the checks fail it is much more likely that someone could decrypt the data you encrypt. Here’s why, based on each of the possible certificate errors.

Was the Certificate issued by a known and trusted Certificate Authority?

Certificate Authorities are servers that are designed specifically to generate digital certificates. Anyone on the planet can create a Certificate Authority server if they want to (and know how to). If you have your own Certificate Authority, you can create a certificate that matches any Common Name you want and use that certificate to interject yourself into any secure transmission and read the data without anyone knowing, but only if the client computer *trusts* your Certificate Authority.

Normally, most computer Operating Systems and Web Browsers have a list of CAs that are trusted right out of the box. These include CAs owned by companies like Godaddy, Entrust, and Network Solutions. So unless you happen to gain control of the Certificate Authority owned by these companies and defined by the Root CA Certificate installed on the OSes of every computer in the world, your CA is probably not going to be trusted without a lot of extra work.

If you see a certificate error that warns the Certificate Authority isn’t trusted, it means the Certificate was issues by a *private* CA. You can instruct your computer to trust the CA if you want, but if you are using a site that normally has no certificate error and this error suddenly shows up one day, there’s a good chance your data is being intercepted and redirected.

As an IT Professional, if you see this error when accessing a system under your control, there are two solutions.

  1. Request a new certificate from a trusted, Third Party Root CA provider.
  2. Install the Root CA certificate as a Trusted Third Party Root CA in the OS.

#1 requires significantly less effort to accomplish because it means you don’t have to actually install the certificate on your users’ computers, phones, or other devices.

Is the current date within the period of time the Certificate is valid?

Certificates are only valid for a set period of time. Most certificates are valid from 1 – 3 years from the time they are first generated, depending on options used during certificate generation. Certificate Validity periods are meant to ensure that only a limited amount of time is available for a certificate’s Private Key to be discovered.

The possibility of a brute force attack successfully discoverying the Private Key in use is astronomically small, and the time to run a full brute force attack against modern Certificates is in the million year period. But as we progress technologically, the time required reduces exponentially. If a certificate was generated in, say, 1991 using the DES algorithm, it would have taken thousands of years to crack it with normal computing resources. Today, it would take less than an hour.

Having a certificate validity period ensures that technology doesn’t outpace the security of the certificate. Having a validity period between 1 and 3 years is the general recommendation for certificates these days. If you run across a certificate that has an expiration date that is more than 2-3 years in the past, I highly recommend not using the site that uses that Certificate.

If a server you control has this error, you need to generate and install a new certificate on the server. This is the only possible solution to this error.

Does the host name used to access the server match any of the host names defined by the certificate?

This error is always caused by attempts to access a server using a URL that uses a host name not included on the certificate. For instance, let’s say a web server has a certificate that defines the host name as If you attempt to reach that server using, you’ll get a certificate error.

This check is meant to ensure that the server we are communicating with is the one we *want* to communicate with. If the server we want to talk to is using a valid third-party certificate, we can be significantly more certain that the server we’re talking to is the one we want to talk to and that no one is attempting to spy on the data we send if this check comes back okay. If not, it’s important to check the information listed on the certificate to verify that we’re talking to the right server.

For IT Professionals, there are two definite solutions for this error.

  1. Generate a certificate that matches the Host Name you want people to use to access your server. If there is a need for multiple names, get a SAN cert that includes all host names or a Wildcard cert that is valid for any host name at a specific domain (Wilcard certificates are generated with a common name of * and will be considered valid for any value that you want to replace the *. This is slightly less secure since it can be used on any number of servers, but the security difference is minimal. Be certain to verify that the web server you are using fully supports wildcard certificates before obtaining one. IIS supports them, as do the vast majority of Microsoft solutions, though some may require additional setup.)
  2. Create a DNS record for a host name that matches the certificate and point it to the web server.
  3. Note: There are some applications that use HTTPS that may have specific host name requirements, and may require multiple host names to function properly (Exchange Autodiscover for example). Be aware that this type of certificate error will always occur unless you have a certificate that matches *all* the necessary host names or have made sufficient configuration changes to allow things to work properly with a single host name.

Has the Certificate been Revoked?

This is actually a very unusual error that you will not see often. I don’t have a picture of one of these to show you, since it takes a good bit of effort to force the error to occur. Certificate Revocation is not particularly common, and was developed to combat the possibility of a certificate being compromised. A certificate is considered compromised when an unauthorized entity obtains a copy of the certificate’s Private Key.

If this happens, or if the certificate is reissued for any reason (For instance, if you want to change the common name, modify the list of Subject Alternate Names, or make any other changes), the certificate is listed in a Certificate Revocation List (CRL) that is published by the server that originally generated the certificate. A CRL is just a simple web page that a web browser or other application that checks certificate validity will go to and check to determine if the certificate is still valid. If the certificate is listed in the CRL, many applications are designed to absolutely refuse further communication with the server using that certificate (Web browsers specifically). Servers using revoked certificates are always considered to be compromised, and it is always a good idea to avoid using servers with revoked certificates. Basically, if you see this error, *DO NOT CONTINUE!*.

Brown’s Anatomy

So that’s it for Certificate errors. The 4 checks are designed primarily to keep your data safe, so make sure you are aware of what you’re walking into when you see these errors. As a regular joe, non-IT person, you’re pretty likely to run into these errors, and knowing what they mean will help you determine if it’s a good idea to keep going or not. For IT people, you are going to see these errors a lot, no matter what, and knowing what they mean will help you fix them.



Welcome to AC Brown’s IT World!

My name is Adam Brown. I’m an IT Guy. I’ve worked most of my life fixing computers and tinkering with technology. I’ve spent thousands of hours studying and learning, and I’ve gotten to the point where I want to share what I’ve learned with others in a way that will hopefully save time digging through useless details.

My goal with this blog is to build a center of knowledge for IT professionals around the world. Over the past few years I’ve built a few good posts that have helped a lot of people out, but I feel like there’s a huge gap in the Internet’s ability to teach people how to work well and effectively in IT.

Most IT Blogs focus on the “how to” part of IT work (The Practice), with very little information about the “why” (the Theory). The problem with this approach is that it results in a lot of people just following instructions without thinking about what they’re doing or why the solution they have is recommended. I’ve had to fix a whole load of problems caused by people who just did stuff because someone told them to, and I’m sure there are loads of IT environments out there suffering from the same issues.

In this blog, I’m striving to not only explain *how* to do things, but *why* it should be done that way. This is much more difficult to do and will require additional reading from anyone here, so I am separating all of my posts into Theory and Practice, so you can get the answer you want quickly, but also have an easy way to learn more about what you’re working with when you have time to do so. This will likely be a massive undertaking for me, and I appreciate everyone’s patience while I build things up.

Below is a Table of Contents of sorts, outlining the starting point for some of my posts.

IT Security

Theory: Understanding Digital Certificates
Theory: Email Encryption for the Common Man
Theory: Passwords (Part 1)


How will the cloud affect my career as an IT Professional?

Office 365

ADFS and SQL with Office 365

Exchange Server

Practice: Resolving Public Folder Permissions with PFDavAdmin

Active Directory

Theory: Active Directory Domain Naming

Exchange Transaction Logs – Reducing the Confusion

Exchange Transaction Logs are, in my opinion, one of the most horribly documented parts of Exchange server. There’s a lot of misinformation out there as well as a lot of misunderstanding. If you look for an answer to questions that most people have about them, you’ll run across poorly written documentation that barely explains what they are, let alone how they work. In this post, I’ll be going over the basics of Transaction Logs and explaining what they are, how they work, and, more importantly, what they are for.

What are Transaction Logs?

Transaction logs are usually kept for any type of database, so knowing what a database is helps. To put a database in perspective, just think about something we’ve all had to work with at some point in time, a spreadsheet. If you’ve ever had to compile a list of numbers and figures in Excel, you’ve used a spreadsheet. Well, databases are basically collections of spreadsheets that are inter-related, extremely large, extremely complex (in some cases), and accessible to numerous users at the same time.

In order for a database to function with lots of users at the same time who may be making changes to the same data at the same time, database systems will typically write changes to data in a transaction log, and then apply the change to the database. This keeps the data in the database from being corrupted and ensures that changes are applied in the order they are made. In a database that has two people changing the same data at the same time, the database will compare the entries and accept the most recent change if they are different. So that’s essentially what a transaction log is. It’s a record of every single operation performed that changes the state of any data in the database. Adding a new item, deleting an old item, modifying an existing item, all these functions are recorded in a transaction log before being applied to the database itself. At the very least, this is more or less a simplified explanation of how SQL handles transaction logs. For database systems like SQL, transaction logs are *extremely* important.

Exchange, on the other hand, doesn’t have the same flexibility of a highly customizable database solution like SQL. Exchange Databases are designed to handle a limited set of functions. So, much of the work in Exchange is very simple to manage. Data is automatically segregated in individual Mailboxes and those are not usually accessed by numerous users at the same time, and not much of the data stored in an Exchange database is modified regularly. Once an email is stored on an Exchange server, it doesn’t change. If an item does change in the database, it is usually recreated as a completely new object and the old version removed, rather than there being a direct modification to the stored data for that item. As a result, Exchange is not nearly as dependent on transaction logs as SQL.

How Does Exchange Use Transaction Logs?

Every time an email is delivered, sent, deleted, or forwarded, Exchange will write the information about that transaction directly to the transaction logs, then immediately to the database. The time difference between transaction log and database writes is measurable in milliseconds.

Exchange writes Transaction logs for a single purpose; database recovery. If, for some reason, the database that holds all your mailbox information fails for some reason, let’s say someone drops a giant anvil on your Mailbox server, because you never know when Wile E Coyote will strike out in anger (This is a major concern for the IT department at ACME Inc). Anyway, if your database ever gives up the metaphorical ghost, you will need to go back to your most recent backup to do a restore. The problem in that situation is that when you restore a backup of a database, you will usually end up restoring a copy that isn’t up to date with the most recent transactions. So if the last full backup you ran was on a Sunday and the live database fails on Friday, the database you restore from that full backup will be missing all the email that was sent and received between Sunday and Friday. This is where transaction logs come in. The entire purpose of transaction logs in Exchange is to provide information on the transactions that occur since the last time you ran a complete backup of your Exchange environment.

How Transaction Logs Work with the Database

One of the first things you do when configuring Exchange is define where the Database and Log files are stored. This is actually a lot more important than you might think. If you were to go to the location where your Exchange Transaction Logs are stored, you will first notice that there are a lot of log files there. Transaction Log files max out at a set size to keep down the risks of Transaction Log corruption. If all the transactions were stored in a single file and that file was corrupted somehow, you’d lose entire days of email. With multiple files, one file can be corrupted and you’d lose the ability to restore maybe an hour or two of email, which isn’t nearly that big a deal. At any rate, each transaction log file has a name that starts with the letter E and then a string of numbers, followed by the .log extension. You will also see a similarly named file with a .chk extension and a bunch of files named Eres<numbers>.jrs. The .JRS files are used by Exchange to make sure things don’t explode if the drive fills up for some reason. The .log Files are the actual Transaction Logs that are saved and the .chk file is used to determine what the most recent transaction log file name is as well as which transaction logs belong to which database. The name on these files is important because it represents the order in which those logged transactions occurred. Transactions in E00123.Log occurred before those in E00124.log and so one. Each time a log file reaches its size limit, a new file is created with an incremented number and the .chk file is updated. Another thing to remember is that the name of the last transaction log that contains the most recently applied transactions is written as a property of the actual database file that Exchange uses.

Now we get to the part where the transaction logs are important. When you mount any Exchange database, the Exchange server will do the following:

  1. Read the last transaction log property on the database (Assuming the database was properly shut down).
  2. Examine the .chk file in the Log Files directory to determine what the last log file that *should* be applied to the database is named.
  3. Examine the names of the Transaction Log files in the transaction log directory assigned to the database in Exchange.
  4. If the .chk file says that the last transaction log has a higher number than what is recorded by the database, the Exchange system will begin “replaying” the log files in the directory, applying every single transaction that occurred between what the Database you mount last saw and what the .chk file says should be the last log file. This is the step that completes the restore process.

When all of the available logs finish being replayed to the database, your database will have returned to the exact state it was in when that last log file was written. The end result is a restored database that is in the exact state the original database was in before failing. Note that this process can only occur if the database is mounted in a Recovery Storage Group (For Exchange 2003/2007), or as a Recovery Database (Exchange 2010/2013), or if the active database is flagged as allowing over-write.

So basically, the only real reason the transaction logs exist is to perform database restoration. This is why the Microsoft Best Practices state that the Transaction Logs should be on a completely different physical drive than the Database files they are associated with. If the drive that holds the database fails for some reason, you can always use the transaction log files to bring a restored database to a state that has the most recent data. And because all transactions are written to the logs *and* the database files as soon as they happen, losing your log file drive will not cause you to lose any data either. If your logs drive fails, though, you may need to run a little bit of maintenance on the database files with ESEUTIL to put them into a clean state before they will mount properly. The logs are designed to provide “Point In Time” database recovery.

Point In Time Recovery

Point in Time Recovery is a function that allows you to restore a database to the state it was in at an exact point in time. For instance, lets say someone requests that you restore a mailbox that was deleted on Wednesday of two weeks ago at 2:14PM. For this situation, let’s assume you run full backups every Sunday and incremental backups every day. If you restore the mailbox from the backup taken before that Wednesday, you may be missing some mail. If you restore the database from the backup Wednesday night, you won’t get the mailbox. So what do you do? Well, you do a Point in Time Recovery. The way you do this is you restore the Database from the last full backup that was run before the point in time you want to restore to, then you restore all the log files between then and Wednesday night’s incremental backup. Once you have all the logs and database in a good location, you would create a RSG or Recovery Database that points to that location, and then look in the folder you saved the logs to. Each of the logs will have a timestamp on them that should carry over from the backup. This timestamp will allow you to pinpoint the log file that was written right before the mailbox was deleted. Once you find that, you delete every log that came after that, then mount the Database in Exchange. The database will go up to where the .CHK file that you restore says to, but it will stop at the last log file that is available below where the chk file says. So if the last log file available is the one written at 2:13PM on Wednesday, when the database finishes replaying the available logs, it will be in the exact state it was in when that last log file was written. And there you go, you have a database that has as much mail as possible in the deleted mailbox, which you can then restore normally.

Log Growth

One of the big problems that impacts Exchange servers is out of control Log growth. Logs are written constantly and there are only two ways they can be deleted. The proper way to delete log files is to perform a Full, Exchange Aware backup. If the backup software you use is not designed to perform Exchange Database backups, your logs will never ever get cleared and you will run out of drive space, which will force all databases with log files on the full drive to dismount and the Exchange server to explode (not really. It’ll just stop working). When you run a full backup that is Exchange Aware, the backup software instructs the Exchange system to “truncate” the logs. In older database systems, truncating the logs meant that the changes in the logs were written to the database and the files removed. These days changes to the database are written directly to the database, so when the system Truncates the logs, it basically just deletes them, but it does so in a way that allows the Database to stay operational.

The other option, deleting the log files manually, doesn’t work if the database the logs belong to is mounted. So you should always try to avoid deleting log files manually unless it’s an extreme emergency. And by Extreme Emergency I mean you haven’t run a full backup in a long time and have a completely full log file drive with about 300GB of logs. If you run into that situation, you pretty much *have* to delete the log files manually, since running a full backup on that many log files can take several days to complete, since the truncation process goes through each log file to make sure its changes were applied to the database. If the Database is dismounted, it is acceptable to delete log files, but you should only do so with the understanding that you will not be able to perform a Point in Time restore from the last backup to the point in time where the logs were deleted. (Point in Time recovery requests are fairly rare, from my experience, but they do happen, especially in larger companies with a lot of legal requirements).

Circular Logging

Now, if you are okay with not having the ability to do a Point In Time restore, you can configure Exchange to use a feature called Circular Logging. Circular Logging causes the Exchange server to retain only the latest 6 or 7 log files. Log files past that are automatically deleted, so you never have to deal with out of control log growth, and you also never have to run a full Exchange aware backup to clear log files. You would use this option if your backup solution doesn’t include support for Exchange server, if you don’t have a lot of space for logs, or if you just don’t care about dealing with logs for Point in Time restores. Another situation where you would use Circular Logging is if you have a Database Availability Group with at least three copies of each database. If you configure one copy to be Lagged (A lagged database copy waits a certain amount of time before writing transactions to the database), you can run Exchange in a No Backup mode. I’ll go into more detail on this feature in a later post, but for now, just understand that if you have enough database copies and at least one Lagged copy, you already have enough functionality to do Point in Time restores going back at most 14 days, and you are pretty well protected from Database failures.

Common Misconceptions

So now that I’ve explained how the logs work and what they do, let’s go over some common misconceptions about Transaction Logs:

  1. Transactions are only written to the logs and then the logs are written to the database – This misconception is due in part to how databases functioned in the early days. Nowadays, transactions are written to memory, disk, and logs at almost the exact same time. There is a little bit of lag time between them being written to log files and the database itself, but this lag time is so miniscule that it doesn’t really matter (fractions of a second).
  2. If I do a full backup every night, I can use circular logging – This is one of those sorta kinda maybe close to accurate things, but it’s mostly wrong because it ignores the primary purpose of log files, which is to bring a restored database up to the most recent possible state it was in when the original copy was destroyed. If you run full backups every night, you still need to make sure you’re keeping all the logs from that backup time to the next backup time, otherwise when you restore your backup you will be missing up to 24 hours worth of mail. If you’re okay with that limitation, then sure, use circular logging if you run daily full backups. Otherwise, keep circular logging off.
  3. Deleting the logs manually will corrupt the database – No, it won’t. As I mentioned, deleting the logs manually is sometimes necessary, and can be done at any time in more recent versions of Exchange. The danger in doing manual log purges is data loss. You never want to delete logs that haven’t been backed up (either a full backup or an incremental/differential backup). If you’ve cleared all your logs manually and the database dies, there is no way to recover any transactions from the logs that were deleted if the files themselves haven’t been backed up. A Full, Exchange aware backup will “truncate” the logs, which is geek speak for deleting all the log files created after backing them up. This is simply to free up space, because the transaction logs are no longer needed once they have been backed up.