What is a DNS SRV record?

If you’ve had to work with Active Directory or Exchange, there’s a good chance you’ve come across a feature of DNS called a SRV record. SRV records are an extremely important part of Active Directory (They are, in fact, the foundation of AD) and an optional part of Exchange Autodiscover. There are a lot of other applications that use SRV records to some degree or another (Lync/Skype for Business relies heavily on them, for instance).The question, though, is why SRV records are so important and what exactly do they do?

What does a SRV record do?

The purpose of a SRV record is found in its longer, more jargon filled name: Service Locator Record. It’s basically a DNS record that is meant to allow applications to find a Server that is providing a Service the application needs to function. They provide a centralized method of configuration and control of applications that result in less work configuring the client of a client/server based application.

For example, let’s say you’re an application designer and you are creating an application that needs to talk to a server for some reason. Prior to the existence of SRV records in DNS, you had two choices:

  1. Program the application so it only ever talked to a server if it had a specific name or IP address
  2. Include some configuration settings in the application that would let end users put in the DNS name of the server.

Both of these options are not very useful for usability. Hard-coding IP addresses or host names for the server makes setup difficult and very strict in its requirements. Making end users enter the server information usually causes a lot more work for IT staff, as they would usually be required to do this for all the users.

SRV records were first added to the DNS protocol’s specifications around the year 2000 to give programmers another option for designing Client/Server based software. With SRV records, the application can be designed to look for a SRV record and get server information without having be directly configured by end users or IT staff. This is similar to the first option above, but allows greater flexibility because the server can have any name or IP address you want and the application can still find it. Some of the advanced features of SRV records also allow failover capabilities and a lot of other cool stuff.

How do SRV Records Work?

Since Active Directory relies so heavily on SRV records, let’s use it as an example to explain how they work. First, let’s take a look at a typical AD DNS zone. Below, you can see a picture that shows the fully expanded _MSDCS zone for my test lab:srv-records-for-sysinteg

This shows the _Kerberos and _ldap SRV records created by a Domain Controller (Megaserver). Here’s basically what those records are for:

  1. Windows Login requires a Domain-Joined client to connect to a Domain Controller
  2. The login system is programmed to find a Domain Controller by looking for a SRV record at _ldap.Default-First-Site-Name._sites.DC._msdcs.sysinteg.ad
  3. The SRV record listed above has a value that returns megaserver.sysinteg.ad as the location of the server providing the _ldap service.
  4. The computer’s programming fills in a blank left for whatever value the _ldap service returns with the value that is returned (megaserver.sysinteg.ad).
  5. The computer then talks to megaserver.sysinteg.ad exclusively for all functions that require it to use LDAP (Which is the underlying Protocol used by AD for what it does).

If SRV records didn’t exist, we would be required to manually configure every computer on the domain to use megaserver.sysinteg.ad for anything related to AD. Now, that’s certainly not an unfeasible solution, but it does give us a lot more work to do.

What Makes up a SRV record?

A SRV record has a number of settings that are required for them to function. To see all the settings, look at the image below:

autodiscover

That shows an Exchange Autodiscover SRV record. I’ll explain what each setting here does:

Domain: This is an un-changeable value. It shows the DNS Domain the SRV record belongs to.
Service: This is the “service” the SRV record will be used to define. In the image, that service is Autodiscover. Note that all SRV records should have an Underscore at the start, so the service value is _autodiscover. The underscore prevents issues where there might be a regular A record with the same name as a SRV record.
Protocol: This is the Protocol used by the service. This can functionally be anything, since the protocol in a SRV record is usually only meant to organize SRV records, but it’s best to use the protocols allowed by RFC 2782 to ensure compatibility (_tcp and _udp are universally accepted), but the Protocol can be anything. Unless you are designing software that uses SRV records, you’ll never be in a situation where you’ll have to make a decision about what to put as the Protocol. If you’re trying to configure a SRV record for some application that you are setting up, just follow the instructions when creating a SRV record.
Priority: In a situation where multiple servers are providing the same service, the Priority value determines which server should be contacted first. The server chosen will always be the one with the lowest number value here.
Weight: In a situation where you have multiple SRV records with the same Service value and Priority value, the Weight is used to determine which server should be used. When the application is designed according to RFC 2782, the Weight value of all SRV records is added together to determine the full Weight. Whatever portion of that weight a single SRV record is assigned determines how often a server will be used by the application. For instance, if you have 2 SRV records with the same Service and Priority where Server 1 has a weight of 50 and Server 2 has a weight of 25, Server 1 will be chosen by the application as its service provider 2/3s of the time because it’s weight of 50 is 2/3s of the total weight assigned, or 75. Server 2 will be chosen the remaining 1/3 of the time. If there’s only one server to host the service, set this value to 0 to avoid confusion.
Port Number: This setting provides Port data for the application to use when contacting the server. If, for instance, your server is providing this service on port 5000, you would put 5000 in as the Port number. The setting here is defined by how the server is configured. For Autodiscover, as shown above, the value is 443, which is the default port designated by the HTTPS protocol. The Autodiscover Website in my environment is being hosted on the default HTTPS port, so I put in port 443. If I wanted to change my server to use port 5000, I could do so, but I would need to update my SRV record to match (As an aside, if I wanted to change the port Autodiscover was published on, I would be required to use a SRV record for Autodiscover to work, as opposed to any other method).
Host Offering this Service: This is, put simply, the host name of the server we want our clients to communicate with. You can use an IP address or a Host name here, but it’s generally best to use the Host name, since IPs can and do change over time.

Using SRV Records to Enable High Availability

If you managed to read through all the descriptions of those settings up there, you may have noticed my explanation of the Priority and Weight settings. Well, those two settings allow for one of the best features of SRV records: High Availability.

Prior to the existence of SRV records, the only way you could use DNS to enable high availability was to use a feature called Round Robin. Round Robin DNS is where you have multiple IP addresses assigned to one host name (or A record). When this is set up, the DNS server will alternate between all the IPs assigned to that A record, giving the first IP out to the first client, the second IP to the second client, the third IP to the third client, and the first IP again to the fourth client (assuming 3 IPs for one A record).

With a SRV record, though, we can configure much more advanced and capable High Availability features by having multiple SRV records that have the same Service Name, but different combinations of Priority and Weight.

When we use SRV records, we have two options for high availability: Failover and Load Balancing. We can also combine the two if we wish. To do this, we manipulate the values of Priority and Weight.

If we want failover capabilities for our application, we would have two servers hosting the service and configure one server with a lower Priority value than the second. When the application performs a SRV record lookup, it will retrieve all the SRV records and attempt to contact all servers until it gets a response, using the Priority value to determine the order. A lower Priority value will be contacted first.

If we want to have load balancing for the application (all servers can be used at any time), we have multiple SRV records with the same service name, like with the Failover solution, and the same Priority value. We then determine how much of the load we want each server to take. If we have two servers providing the same service and want them to share the load equally, we pick any even number between 2 and 65534 (65535 is the highest possible Weight value) then divide that number by 2. The resulting value is entered for the Weight on both servers. When a client queries the SRV record, it will receive all values that match the SRV record, calculate the total weight, and then pick a random number between 1 and whatever the total weight value of all SRV records is to determine which server to talk to.

For instance, if you had Server 1 and Server 2 both with a Weight of 50 in their SRV record, the client would assign half of the total weight value, 100, to Server 1 and half to Server 2. Let’s say it assigns 1-50 to Server 1 and 51-100 to Server 2. The client would then pick a number between 1 and 100. If it picked a number between 1 and 50, the client would communicate with Server 1. Otherwise, it would talk to Server 2. Note: Because this functions using a random number, you will not always end up with a results that match the calculated expectations. Also note: The system used to determine which system is used, based on the Weight value, is determined by the application’s developer. This is just a simple example of how it can work. Some developers may choose a scheme that always results in an exact load distribution.

The Weight value can be used with as many servers as you want (up to 65534 servers), and with any percentage amount you want to define your load balancing scheme. You can have 4 Servers, with only three providing service 33% of the time, while the fourth server only gets chosen when all others are down by setting the weight for three SRV records to 33 and the fourth to 0. Note that a value of 0 means that the server is only chosen when all others are unavailable. You should not set multiple copies of the same SRV record with weights of 0.

Lastly, you can combine Priority and Weight to have multiple load balanced groups of servers. This isn’t a very common solution, but it is possible to have Server 1 and 2 using priority 1 and weight 50, with Server 3 and 4 using priority 2 with weight 50. In this situation, Servers 1 and 2 would provide 50 percent of the system load, but if both Server 1 and 2 stopped working, Server 3 and 4 would then be used, while distributing the load between themselves.

Tinkering with AD

If you want to see how SRV records can be used to handle high availability and get a good example of a system that uses SRV records to their fullest capabilities, try tinkering with some of your AD SRV records. By manipulating Priority and Weight, you can force clients to always use a specific DC, or configure them to use one DC more often than others.

Try modifying the Weight and Priority of the various SRV records to see what happens. For instance, if you want one specific DC in your environment to handle Kerberos authentication and another one to hand LDAP lookups, change the priorities of those records so one server has a 0 in Kerberos and 100 in LDAP, while the other has 100 in Kerberos and 0 in LDAP. You can also tinker with the Weight to give a DC with more resources priority over smaller, backup DCs. Give your monster DC a weight of 90 and a tiny, possibly older DC a weight of 10. By default, Clients in AD will pick a DC at random.

The easiest way to see this in action is to set one DC with a Priority of 10 and another with a priority of 20 on all SRV records in the _msdcs zone. Then make sure the DNS data is replicated between the DCs (either wait or do a manual replication). Run ipconfig /flushdns on a client machine and log out, then back in. Run SET LOGONSERVER in CMD to see which DC the computer is using. Now, switch the priorities of the SRV records in DNS, wait for replication, run ipconfig /flushdns, then then log out and back in again. Run SET LOGONSERVER again and you should see that the second DC is now chosen.

Final Thoughts

As I mentioned, much of a SRV record’s configuration is determined by Software Developers, since they define how their application functions. To be specific, as an IT administrator or engineer, you’ll never be able to decide what the Service Name and Protocol will be. Those are always determine by software developers. You’ll also never be in control of whether or not an application will use SRV records. Software Developers have to design their applications to make use of SRV records. But if you take some time to understand how a SRV record works, you can greatly improve functionality and security for any and all applications that support configuration using SRV records.

If you’re a Software Developer, I have to point out the incredible usefulness of SRV records and the power they give to you. Instead of having to hard-code server configurations or develop UIs that allow your end users to put in server information, you can utilize SRV records to partially automate your applications and make life easier for the IT people who make your software work. SRV records have been available for almost 2 decades now. It’s about time we started using them more and cut down the workload of the world’s IT guys.

 

 

Advertisements

A Treatise on Information Security

One famous misquote of American Founding Father Ben Franklin goes like this, “Anyone who would sacrifice freedom for security deserves neither.” At first glance, this statement speaks to the heart of people who have spent hours waiting in line at the airport, waiting for a TSA agent to finish groping a 90 year old lady in a wheel chair so they can take off their shoes and be guided into a glass tube to be bombarded with the emissions of a full body scanner. But the reality of any kind of security, and Information Security in particular, is that any increase of security requires sacrificing freedom. The question we all have to ask, as IT professionals tasked with improving or developing proper security controls and practices, is whether or not the cost of lost freedom is worth the amount of increased security.

The Balancing Act

If you were to dig a little, like I have, you would find that Mr. Franklin actually said, “Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.” This version of the quote demonstrates very eloquently one of the principle struggles of developing security policies in IT. After all, there is a famous axiom in the Industry (it’s quote day here at ACBrown’s IT World), “The most secure computer is unplugged.” Or something like that. I’m probably misquoting.

In a humorous demonstration of that axiom, I present a short story. When I was a contractor performing DIACAP (Go look it up) audits on US military bases, we were instructed to use a tool called the “Gold Disc.” The Gold Disc was developed by personnel in the military to scan through a workstation or server and check for configuration settings that violated the DISA (That’s the Defense Information Systems Agency) STIG (That’s Security Technical Implementation Guide. Not the guy that drives cars for that one TV show). The Gold Disc was a handy tool, but the final screen that gave you the results of the scan had a little button on it that we were expressly forbidden from ever pushing. That button said, simply, “Remediate All.” Anyone who pushed that button would find that they were instantly locked out of the network, unable to communicate with anything. Pushing the button on an important server would result in mass hysteria, panic, and sudden loss of employment for the person who pushed the button. You see, the Remediate All button caused the tool to change every configuration setting to comply exactly with the DISA STIG recommendations. If you’re not laughing yet, here’s the puchline…Perfectly implementing the DISA STIG puts computers in a state that makes it impossible for them to communicate with one another properly. <Insert follow up joke regarding Government and the problems it causes here>.

On the other hand, computers that blatantly failed to comply with the DISA STIG recommendations would (theoretically) be removed from the network (after 6 or 7 months of bureaucratic nonsense). In the end, there was a point in the middle where we wanted the systems to be. That balancing point was the point where computers were secure enough to prevent the majority of attacks from succeeding, but not so secure that they significantly inhibited the ability of people to do their jobs effectively and in a timely matter. As IT Security professionals, we have a duty to find the right balance of security and freedom for the environments we are responsible for.

The Costs of Security

Everything in IT has a cost. The cost can’t always be easily quantified, but there is always a cost associated. For instance, something as simple as password expiration in Active Directory has a very noticeable cost. How much time do system administrators spend unlocking accounts for people who forgot their password after it just reset? Multiply the number of hours spent unlocking accounts and helping people reset their passwords by the amount of money the average system administrator makes and you get the cost of that level of security in dollars. But that is only the direct cost.

Implementing password expiration and account lockout policies also reduce the level of freedom your employees have in controlling their user accounts. That lost freedom also translates into lost revenue as employees are forced to spend their time calling tech support to get their password reset. Then you also consider lost productivity due to people wasting time trying to remember the password they set earlier that morning.

With some estimates showing that nearly 30 percent of all help-desk work hours are devoted to password resets, the cost of enabling password expiration climbs pretty high.

The Cost of Freedom

On the other hand, every day an individual goes without resetting their passwords increases the likelihood of that password being discovered. Furthermore, every day a discovered password is left unchanged increases the likelihood of that password being used by an unauthorized individual. If the individual who lost the password is highly privileged (a CEO for example), the cost to the business who employs that individual can be astronomical. There are numerous cases of companies going bankrupt after major intrusions linked to exposed passwords

So while it may cost a lot to implement a password expiration policy, it can cost infinitely more not to. In comparison, the cost of implementing a password expiration policy is almost always justified. This is particularly true when working for organizations that fall under the purview of Regulatory Compliance laws (Queue the dramatic music).

Regulatory Compliance

One of the unfortunate realities of the IT world is that some organizations have outright failed to consider the costs of *not* having a good security policy and just plain failed to have good security. Those organizations got hit hard and either lost data that cost the business huge amounts of money, or worse, data that put their customers at risk of identity theft. So, because the kids couldn’t play safe without supervision, most Governments around the world have developed laws that tell businesses in key industries things that they must do when developing their IT infrastructure.

For instance, the Healthcare industry in the US must follow the HITECH addition to HIPAA (so many acronyms) which mandates the need for utilizing IT infrastructure that prevents the unauthorized disclosure of certain types of patient information. Publicly owned corporations in the US are required to follow the rules outlined in the Sarbanes Oxley act, which requires companies to maintain adequate records of business dealings for a significant period of time. The aforementioned DIACAP audits are performed to verify whether military installations are complying with the long list of instructions and requirements developed by the DoD (if you ever have trouble sleeping…).

Organizations that fall under the umbrella of one or more Regulatory Compliance laws are compelled to ensure their IT infrastructure meets the defined requirements. Failing to do so is often punishable with significant fines. Failing to do so and getting attacked in a way that makes use of security holes meant to be plugged by regulations is a huge problem (not just for the organization itself). For regulatory compliance applicable organizations, the costs associated with violating regulations must always be considered when developing a security policy. This is mostly a good thing, since the costs of actually meeting the regulations is occasionally extremely high.

Mitigating Costs – Not Always Worth It

There are actually a lot of technical solutions in the IT industry that exist entirely to reduce the costs associated with implementing security technologies. For instance, utilizing a Self-Service Password Reset (SSPR, cause that’s a lot of typing) solution can significantly reduce the number of man-hours required by help-desk staff to reset passwords and unlock accounts. But such solutions also have costs associated with them. Aside from the purchase cost, many of these solutions significantly reduce security in an organization.  SSPRs, again, increase user freedom and control of their user account, which makes things less secure again. However, depending on the SSPR in use, how much security is reduced depends on how users interact with the software. An SSPR that only requires someone to enter their username and current password is likely to reduce security significantly more than an SSPR that requires users to answer 3 “security questions,” which will, in turn, reduce security much more than an SSPR that requires people to provide their Social Security Number, submit a urine sample, and authenticate with a retina scan while sacrificing a chicken from Uruguay with a special ceremonial dagger. But, again, the time spent by employees resetting their own password (not to mention the cost of importing chickens from Uruguay) increases the cost of such solutions. The key to determining which solutions and technologies to use is a matter of finding the right balance of freedom and security in the environment.

When Security Costs Too Much Freedom

There are times when the financial costs and the cost of freedom associated with a security measure are obviously too high (I’m looking at you, TSA). Implementing longer passwords may have many technical security advantages, but doing so includes a risk that the loss of freedom is too great for people to handle. For instance, implementing a 20 character minimum password policy that includes password complexity requirements might cause some employees with bad memories to write their password down and put it in a place that easy for them to remember. Like on a post-it note stuck to their monitor. Suddenly, that very secure password policy is defeated by a low-tech solution. Now you have a password accessible to anyone walking around in the office (like Janitor Bob) that can be used to access critical information and sell it to the highest bidder (AKA, your competitor). This is a prime example of the unconsidered costs of security being too high. Specifically, the security requirement costs so much freedom and negatively impacts employees so much that they end up bypassing security entirely.

Balancing Act

In the end, IT security is a massive balancing act. To properly balance security and freedom in IT, it is necessary to ask questions and obtain as much knowledge about the environment as possible. The investigative part is among the most important phases in any security policy. Organizations looking to increase security need to have balance in their security implementations. Decisions on IT security must always be thoughtful ones.

Disabling Direct Access Forced Tunneling

So you’re trying to get Direct Access (DA) running in your environment and you suddenly realized that your test machine can no longer access…anything. Well, this may be due to the “accidental” enabling of “Forced Tunneling” in your DA configuration. How do you fix it? You can pretty easily reconfigure your DA configuration to disable Forced Tunneling, but unless your test machine is directly connected to your AD environment, you’ll never be able to get the Group Policy updates on your test machine. Now, you *should* be connected if you’re doing this, but there are some situations where that’s less than possible (Remote workers unite!).

Disabling Forced Tunneling on Client Machines

I’ll give the way to fix this first, and explain why this happens second:

  1. Open Regedit
  2. Navigate to HKLM:\Software\Policies\Microsoft\Windows\TCPIP\v6Transition
  3. Set all visible entries to Disabled
  4. Delete all subkeys.
  5. Reboot
  6. Rejoice

Why Does This Happen?

Well, if your DA configuration is not configured perfectly, you can’t initialize a DA session. So if, for instance, your Computer Client Certificate fails to enroll properly before you disconnect, and your machine has obtained the DA settings from Group Policy, you’re stuck dealing with all the settings required to connect to DA, but can’t actually do so. With Forced Tunneling enabled, you are forcing all DA client systems to go through DA for *any* internet connectivity. So if your DA DNS settings also configure things to point to an Internal IP for DNS lookups when connected, congratulations…you can’t reach a dang thing. Disabling Forced Tunneling in the registry is about your only option here. Just make sure you’ve also disabled Forced Tunneling in your DA config before you disconnect from the VPN again or you’ll have to do this stuff all over again. (Oops)

Final Note

Don’t use Forced Tunneling with Direct Access. It provides no additional security and is a huge pain in the butt if DA doesn’t connect properly for *any* reason.

Anatomy of a Certificate Error

The most important step in diagnosing a specific security error involves determining what the error is telling you. There are a few things that can cause certificate errors, and what you do depends entirely on what is causing the error to begin with. Once you know what the error is telling you, it becomes much easier to figure out what you need to do next.

Getting the Message

One of the more concise and effective Certificate Errors is the one delivered by Outlook. An image of it is below.

 

ssl-name

Note the numbers 1, 2, and 3. These don’t normally show up on the error because I put them there for reference. In case you were looking at your own error. At any rate, the numbers are sitting next to three possible kinds of errors you can get with a certificate.

For this particular error, you’ll note that there is a red X next to number 3. That X points out that one of the Validity checks run against the certificate failed. Specifically, the name I used to access the server doesn’t match either the Common Name on the certificate or any of the Subject Alternate Names. This is probably the most common certificate error you’ll see.

The four Checks

Every time you access a website that is secured with SSL, there are four checks the computer you use runs to verify that the certificate is valid. The reason for these checks is explained in my article on Digital Certificates. The four checks are as follows, and match the numbering in the image above.

  1. Was the Certificate issued by a known and trusted Certificate Authority?
  2. Is the current date within the period of time the Certificate is valid?
  3. Does the host name used to access the server match any of the host names defined by the certificate?
  4. Has the Certificate been Revoked? (Wait, there’s no 4 on the image! Don’t worry, I’ll explain later.)

If any of these checks fails, you’ll get a certificate error. Note that this *does not mean* that the data you’re trying to encrypt isn’t going to be encrypted. Any time you use SSL or TLS, you’re data will be encrypted whether the certificate is valid or not. However, if any of the checks fail it is much more likely that someone could decrypt the data you encrypt. Here’s why, based on each of the possible certificate errors.

Was the Certificate issued by a known and trusted Certificate Authority?

Certificate Authorities are servers that are designed specifically to generate digital certificates. Anyone on the planet can create a Certificate Authority server if they want to (and know how to). If you have your own Certificate Authority, you can create a certificate that matches any Common Name you want and use that certificate to interject yourself into any secure transmission and read the data without anyone knowing, but only if the client computer *trusts* your Certificate Authority.

Normally, most computer Operating Systems and Web Browsers have a list of CAs that are trusted right out of the box. These include CAs owned by companies like Godaddy, Entrust, and Network Solutions. So unless you happen to gain control of the Certificate Authority owned by these companies and defined by the Root CA Certificate installed on the OSes of every computer in the world, your CA is probably not going to be trusted without a lot of extra work.

If you see a certificate error that warns the Certificate Authority isn’t trusted, it means the Certificate was issues by a *private* CA. You can instruct your computer to trust the CA if you want, but if you are using a site that normally has no certificate error and this error suddenly shows up one day, there’s a good chance your data is being intercepted and redirected.

As an IT Professional, if you see this error when accessing a system under your control, there are two solutions.

  1. Request a new certificate from a trusted, Third Party Root CA provider.
  2. Install the Root CA certificate as a Trusted Third Party Root CA in the OS.

#1 requires significantly less effort to accomplish because it means you don’t have to actually install the certificate on your users’ computers, phones, or other devices.

Is the current date within the period of time the Certificate is valid?

Certificates are only valid for a set period of time. Most certificates are valid from 1 – 3 years from the time they are first generated, depending on options used during certificate generation. Certificate Validity periods are meant to ensure that only a limited amount of time is available for a certificate’s Private Key to be discovered.

The possibility of a brute force attack successfully discoverying the Private Key in use is astronomically small, and the time to run a full brute force attack against modern Certificates is in the million year period. But as we progress technologically, the time required reduces exponentially. If a certificate was generated in, say, 1991 using the DES algorithm, it would have taken thousands of years to crack it with normal computing resources. Today, it would take less than an hour.

Having a certificate validity period ensures that technology doesn’t outpace the security of the certificate. Having a validity period between 1 and 3 years is the general recommendation for certificates these days. If you run across a certificate that has an expiration date that is more than 2-3 years in the past, I highly recommend not using the site that uses that Certificate.

If a server you control has this error, you need to generate and install a new certificate on the server. This is the only possible solution to this error.

Does the host name used to access the server match any of the host names defined by the certificate?

This error is always caused by attempts to access a server using a URL that uses a host name not included on the certificate. For instance, let’s say a web server has a certificate that defines the host name as acbrownit.com. If you attempt to reach that server using http://www.acbrownit.com, you’ll get a certificate error.

This check is meant to ensure that the server we are communicating with is the one we *want* to communicate with. If the server we want to talk to is using a valid third-party certificate, we can be significantly more certain that the server we’re talking to is the one we want to talk to and that no one is attempting to spy on the data we send if this check comes back okay. If not, it’s important to check the information listed on the certificate to verify that we’re talking to the right server.

For IT Professionals, there are two definite solutions for this error.

  1. Generate a certificate that matches the Host Name you want people to use to access your server. If there is a need for multiple names, get a SAN cert that includes all host names or a Wildcard cert that is valid for any host name at a specific domain (Wilcard certificates are generated with a common name of *.domain.com and will be considered valid for any value that you want to replace the *. This is slightly less secure since it can be used on any number of servers, but the security difference is minimal. Be certain to verify that the web server you are using fully supports wildcard certificates before obtaining one. IIS supports them, as do the vast majority of Microsoft solutions, though some may require additional setup.)
  2. Create a DNS record for a host name that matches the certificate and point it to the web server.
  3. Note: There are some applications that use HTTPS that may have specific host name requirements, and may require multiple host names to function properly (Exchange Autodiscover for example). Be aware that this type of certificate error will always occur unless you have a certificate that matches *all* the necessary host names or have made sufficient configuration changes to allow things to work properly with a single host name.

Has the Certificate been Revoked?

This is actually a very unusual error that you will not see often. I don’t have a picture of one of these to show you, since it takes a good bit of effort to force the error to occur. Certificate Revocation is not particularly common, and was developed to combat the possibility of a certificate being compromised. A certificate is considered compromised when an unauthorized entity obtains a copy of the certificate’s Private Key.

If this happens, or if the certificate is reissued for any reason (For instance, if you want to change the common name, modify the list of Subject Alternate Names, or make any other changes), the certificate is listed in a Certificate Revocation List (CRL) that is published by the server that originally generated the certificate. A CRL is just a simple web page that a web browser or other application that checks certificate validity will go to and check to determine if the certificate is still valid. If the certificate is listed in the CRL, many applications are designed to absolutely refuse further communication with the server using that certificate (Web browsers specifically). Servers using revoked certificates are always considered to be compromised, and it is always a good idea to avoid using servers with revoked certificates. Basically, if you see this error, *DO NOT CONTINUE!*.

Brown’s Anatomy

So that’s it for Certificate errors. The 4 checks are designed primarily to keep your data safe, so make sure you are aware of what you’re walking into when you see these errors. As a regular joe, non-IT person, you’re pretty likely to run into these errors, and knowing what they mean will help you determine if it’s a good idea to keep going or not. For IT people, you are going to see these errors a lot, no matter what, and knowing what they mean will help you fix them.

 

Welcome!

Welcome to AC Brown’s IT World!

My name is Adam Brown. I’m an IT Guy. I’ve worked most of my life fixing computers and tinkering with technology. I’ve spent thousands of hours studying and learning, and I’ve gotten to the point where I want to share what I’ve learned with others in a way that will hopefully save time digging through useless details.

My goal with this blog is to build a center of knowledge for IT professionals around the world. Over the past few years I’ve built a few good posts that have helped a lot of people out, but I feel like there’s a huge gap in the Internet’s ability to teach people how to work well and effectively in IT.

Most IT Blogs focus on the “how to” part of IT work (The Practice), with very little information about the “why” (the Theory). The problem with this approach is that it results in a lot of people just following instructions without thinking about what they’re doing or why the solution they have is recommended. I’ve had to fix a whole load of problems caused by people who just did stuff because someone told them to, and I’m sure there are loads of IT environments out there suffering from the same issues.

In this blog, I’m striving to not only explain *how* to do things, but *why* it should be done that way. This is much more difficult to do and will require additional reading from anyone here, so I am separating all of my posts into Theory and Practice, so you can get the answer you want quickly, but also have an easy way to learn more about what you’re working with when you have time to do so. This will likely be a massive undertaking for me, and I appreciate everyone’s patience while I build things up.

Below is a Table of Contents of sorts, outlining the starting point for some of my posts.

IT Security

Theory: Understanding Digital Certificates
Theory: Email Encryption for the Common Man
Theory: Passwords (Part 1)

Editorial

How will the cloud affect my career as an IT Professional?

Office 365

ADFS and SQL with Office 365

Exchange Server

Practice: Resolving Public Folder Permissions with PFDavAdmin

Active Directory

Theory: Active Directory Domain Naming

Exchange Transaction Logs – Reducing the Confusion

Exchange Transaction Logs are, in my opinion, one of the most horribly documented parts of Exchange server. There’s a lot of misinformation out there as well as a lot of misunderstanding. If you look for an answer to questions that most people have about them, you’ll run across poorly written documentation that barely explains what they are, let alone how they work. In this post, I’ll be going over the basics of Transaction Logs and explaining what they are, how they work, and, more importantly, what they are for.

What are Transaction Logs?

Transaction logs are usually kept for any type of database, so knowing what a database is helps. To put a database in perspective, just think about something we’ve all had to work with at some point in time, a spreadsheet. If you’ve ever had to compile a list of numbers and figures in Excel, you’ve used a spreadsheet. Well, databases are basically collections of spreadsheets that are inter-related, extremely large, extremely complex (in some cases), and accessible to numerous users at the same time.

In order for a database to function with lots of users at the same time who may be making changes to the same data at the same time, database systems will typically write changes to data in a transaction log, and then apply the change to the database. This keeps the data in the database from being corrupted and ensures that changes are applied in the order they are made. In a database that has two people changing the same data at the same time, the database will compare the entries and accept the most recent change if they are different. So that’s essentially what a transaction log is. It’s a record of every single operation performed that changes the state of any data in the database. Adding a new item, deleting an old item, modifying an existing item, all these functions are recorded in a transaction log before being applied to the database itself. At the very least, this is more or less a simplified explanation of how SQL handles transaction logs. For database systems like SQL, transaction logs are *extremely* important.

Exchange, on the other hand, doesn’t have the same flexibility of a highly customizable database solution like SQL. Exchange Databases are designed to handle a limited set of functions. So, much of the work in Exchange is very simple to manage. Data is automatically segregated in individual Mailboxes and those are not usually accessed by numerous users at the same time, and not much of the data stored in an Exchange database is modified regularly. Once an email is stored on an Exchange server, it doesn’t change. If an item does change in the database, it is usually recreated as a completely new object and the old version removed, rather than there being a direct modification to the stored data for that item. As a result, Exchange is not nearly as dependent on transaction logs as SQL.

How Does Exchange Use Transaction Logs?

Every time an email is delivered, sent, deleted, or forwarded, Exchange will write the information about that transaction directly to the transaction logs, then immediately to the database. The time difference between transaction log and database writes is measurable in milliseconds.

Exchange writes Transaction logs for a single purpose; database recovery. If, for some reason, the database that holds all your mailbox information fails for some reason, let’s say someone drops a giant anvil on your Mailbox server, because you never know when Wile E Coyote will strike out in anger (This is a major concern for the IT department at ACME Inc). Anyway, if your database ever gives up the metaphorical ghost, you will need to go back to your most recent backup to do a restore. The problem in that situation is that when you restore a backup of a database, you will usually end up restoring a copy that isn’t up to date with the most recent transactions. So if the last full backup you ran was on a Sunday and the live database fails on Friday, the database you restore from that full backup will be missing all the email that was sent and received between Sunday and Friday. This is where transaction logs come in. The entire purpose of transaction logs in Exchange is to provide information on the transactions that occur since the last time you ran a complete backup of your Exchange environment.

How Transaction Logs Work with the Database

One of the first things you do when configuring Exchange is define where the Database and Log files are stored. This is actually a lot more important than you might think. If you were to go to the location where your Exchange Transaction Logs are stored, you will first notice that there are a lot of log files there. Transaction Log files max out at a set size to keep down the risks of Transaction Log corruption. If all the transactions were stored in a single file and that file was corrupted somehow, you’d lose entire days of email. With multiple files, one file can be corrupted and you’d lose the ability to restore maybe an hour or two of email, which isn’t nearly that big a deal. At any rate, each transaction log file has a name that starts with the letter E and then a string of numbers, followed by the .log extension. You will also see a similarly named file with a .chk extension and a bunch of files named Eres<numbers>.jrs. The .JRS files are used by Exchange to make sure things don’t explode if the drive fills up for some reason. The .log Files are the actual Transaction Logs that are saved and the .chk file is used to determine what the most recent transaction log file name is as well as which transaction logs belong to which database. The name on these files is important because it represents the order in which those logged transactions occurred. Transactions in E00123.Log occurred before those in E00124.log and so one. Each time a log file reaches its size limit, a new file is created with an incremented number and the .chk file is updated. Another thing to remember is that the name of the last transaction log that contains the most recently applied transactions is written as a property of the actual database file that Exchange uses.

Now we get to the part where the transaction logs are important. When you mount any Exchange database, the Exchange server will do the following:

  1. Read the last transaction log property on the database (Assuming the database was properly shut down).
  2. Examine the .chk file in the Log Files directory to determine what the last log file that *should* be applied to the database is named.
  3. Examine the names of the Transaction Log files in the transaction log directory assigned to the database in Exchange.
  4. If the .chk file says that the last transaction log has a higher number than what is recorded by the database, the Exchange system will begin “replaying” the log files in the directory, applying every single transaction that occurred between what the Database you mount last saw and what the .chk file says should be the last log file. This is the step that completes the restore process.

When all of the available logs finish being replayed to the database, your database will have returned to the exact state it was in when that last log file was written. The end result is a restored database that is in the exact state the original database was in before failing. Note that this process can only occur if the database is mounted in a Recovery Storage Group (For Exchange 2003/2007), or as a Recovery Database (Exchange 2010/2013), or if the active database is flagged as allowing over-write.

So basically, the only real reason the transaction logs exist is to perform database restoration. This is why the Microsoft Best Practices state that the Transaction Logs should be on a completely different physical drive than the Database files they are associated with. If the drive that holds the database fails for some reason, you can always use the transaction log files to bring a restored database to a state that has the most recent data. And because all transactions are written to the logs *and* the database files as soon as they happen, losing your log file drive will not cause you to lose any data either. If your logs drive fails, though, you may need to run a little bit of maintenance on the database files with ESEUTIL to put them into a clean state before they will mount properly. The logs are designed to provide “Point In Time” database recovery.

Point In Time Recovery

Point in Time Recovery is a function that allows you to restore a database to the state it was in at an exact point in time. For instance, lets say someone requests that you restore a mailbox that was deleted on Wednesday of two weeks ago at 2:14PM. For this situation, let’s assume you run full backups every Sunday and incremental backups every day. If you restore the mailbox from the backup taken before that Wednesday, you may be missing some mail. If you restore the database from the backup Wednesday night, you won’t get the mailbox. So what do you do? Well, you do a Point in Time Recovery. The way you do this is you restore the Database from the last full backup that was run before the point in time you want to restore to, then you restore all the log files between then and Wednesday night’s incremental backup. Once you have all the logs and database in a good location, you would create a RSG or Recovery Database that points to that location, and then look in the folder you saved the logs to. Each of the logs will have a timestamp on them that should carry over from the backup. This timestamp will allow you to pinpoint the log file that was written right before the mailbox was deleted. Once you find that, you delete every log that came after that, then mount the Database in Exchange. The database will go up to where the .CHK file that you restore says to, but it will stop at the last log file that is available below where the chk file says. So if the last log file available is the one written at 2:13PM on Wednesday, when the database finishes replaying the available logs, it will be in the exact state it was in when that last log file was written. And there you go, you have a database that has as much mail as possible in the deleted mailbox, which you can then restore normally.

Log Growth

One of the big problems that impacts Exchange servers is out of control Log growth. Logs are written constantly and there are only two ways they can be deleted. The proper way to delete log files is to perform a Full, Exchange Aware backup. If the backup software you use is not designed to perform Exchange Database backups, your logs will never ever get cleared and you will run out of drive space, which will force all databases with log files on the full drive to dismount and the Exchange server to explode (not really. It’ll just stop working). When you run a full backup that is Exchange Aware, the backup software instructs the Exchange system to “truncate” the logs. In older database systems, truncating the logs meant that the changes in the logs were written to the database and the files removed. These days changes to the database are written directly to the database, so when the system Truncates the logs, it basically just deletes them, but it does so in a way that allows the Database to stay operational.

The other option, deleting the log files manually, doesn’t work if the database the logs belong to is mounted. So you should always try to avoid deleting log files manually unless it’s an extreme emergency. And by Extreme Emergency I mean you haven’t run a full backup in a long time and have a completely full log file drive with about 300GB of logs. If you run into that situation, you pretty much *have* to delete the log files manually, since running a full backup on that many log files can take several days to complete, since the truncation process goes through each log file to make sure its changes were applied to the database. If the Database is dismounted, it is acceptable to delete log files, but you should only do so with the understanding that you will not be able to perform a Point in Time restore from the last backup to the point in time where the logs were deleted. (Point in Time recovery requests are fairly rare, from my experience, but they do happen, especially in larger companies with a lot of legal requirements).

Circular Logging

Now, if you are okay with not having the ability to do a Point In Time restore, you can configure Exchange to use a feature called Circular Logging. Circular Logging causes the Exchange server to retain only the latest 6 or 7 log files. Log files past that are automatically deleted, so you never have to deal with out of control log growth, and you also never have to run a full Exchange aware backup to clear log files. You would use this option if your backup solution doesn’t include support for Exchange server, if you don’t have a lot of space for logs, or if you just don’t care about dealing with logs for Point in Time restores. Another situation where you would use Circular Logging is if you have a Database Availability Group with at least three copies of each database. If you configure one copy to be Lagged (A lagged database copy waits a certain amount of time before writing transactions to the database), you can run Exchange in a No Backup mode. I’ll go into more detail on this feature in a later post, but for now, just understand that if you have enough database copies and at least one Lagged copy, you already have enough functionality to do Point in Time restores going back at most 14 days, and you are pretty well protected from Database failures.

Common Misconceptions

So now that I’ve explained how the logs work and what they do, let’s go over some common misconceptions about Transaction Logs:

  1. Transactions are only written to the logs and then the logs are written to the database – This misconception is due in part to how databases functioned in the early days. Nowadays, transactions are written to memory, disk, and logs at almost the exact same time. There is a little bit of lag time between them being written to log files and the database itself, but this lag time is so miniscule that it doesn’t really matter (fractions of a second).
  2. If I do a full backup every night, I can use circular logging – This is one of those sorta kinda maybe close to accurate things, but it’s mostly wrong because it ignores the primary purpose of log files, which is to bring a restored database up to the most recent possible state it was in when the original copy was destroyed. If you run full backups every night, you still need to make sure you’re keeping all the logs from that backup time to the next backup time, otherwise when you restore your backup you will be missing up to 24 hours worth of mail. If you’re okay with that limitation, then sure, use circular logging if you run daily full backups. Otherwise, keep circular logging off.
  3. Deleting the logs manually will corrupt the database – No, it won’t. As I mentioned, deleting the logs manually is sometimes necessary, and can be done at any time in more recent versions of Exchange. The danger in doing manual log purges is data loss. You never want to delete logs that haven’t been backed up (either a full backup or an incremental/differential backup). If you’ve cleared all your logs manually and the database dies, there is no way to recover any transactions from the logs that were deleted if the files themselves haven’t been backed up. A Full, Exchange aware backup will “truncate” the logs, which is geek speak for deleting all the log files created after backing them up. This is simply to free up space, because the transaction logs are no longer needed once they have been backed up.

Should I Switch to Office 365? A Frank Examination

The Cloud – An Explanation

As time moves on, technology moves with it, and times they are a’changin’! There have been many drastic changes in the world of IT over the years, but the most recent change, the move toward cloud computing, is probably one of the most drastic and industry redefining change to occur since the release of HTTP in the early 1990s.

Cloud computing is, put simply, placing your IT infrastructure into the hands of a third party, and it’s becoming big news for companies like Microsoft, Amazon, and Apple, who are working hard to push the IT world into the Cloud so they can take advantage of the recurring income model that their Cloud systems are built around.

A more complex explanation of Cloud Computing almost always requires a metaphor or analogy of some kind, so here’s mine; Cake. I love cake. Everyone loves cake. If you don’t love cake, you’re crazy and you should give your cake to me. But there is a problem with cake. If you live alone, you can never really have cake, because in order to have cake you usually have to buy or make a whole cake, and if you have a whole cake to yourself, you will very quickly regret having purchased a whole cake for yourself as you roll yourself out of bed and out the door each morning on your gigantic rolls of fat. Then you’ll have a heart attack. Cake is meant to be shared. One cake is enough to allow 12 people (or more, or less, depending on how much they love cake) to have a comfortable amount of cake. This is good for everyone because they all get cake and aren’t fat because of it. So, how is the Cloud like cake? Well, now that computing technology has advanced to the point where almost every computer system provides more power than is necessary for most tasks in the corporate environment, buying and building a complex IT infrastructure that meets all the needs of a specific company can be extremely expensive, and it’s highly likely that much of that infrastructure is going to be wasted because it is more than is needed.

Virtualization was the first step to addressing the concerns of excessive computing power. It allowed IT departments to combine multiple server roles, securely segregated, on a single physical server. Prior to the advent of Virtualization, secure segregation of server roles required numerous physical servers, which in turn required a lot of space, power, and resources to maintain. Virtualization started shrinking the corporate Datacenters of the world, and the concept of Cloud computing seeks primarily to not only shrink the corporate datacenter, but to centralize it

Cloud computing, like a giant cake, allows multiple corporate environments to share a single, gigantic infrastructure system. Rather than each company having their own segregated and wholly owned infrastructure that is managed, configured, and maintained separately, Cloud solutions like Office 365 seek to provide all the services of a highly integrated and functional environment without requiring a fully dedicated infrastructure for each company that needs or wants that type of functionality. This is accomplished through the use of highly customized versions of products that are normally available to individual corporations for a one time investment in a packaged, much smaller recurring fee system. The cloud provider builds, maintains, and supports the Infrastructure and the cloud user makes use of the system just like it was owned by them. In the case of Office 365, every individual or company that uses the solution is effectively lumped into the same infrastructure system and uses the portion of that system that they need, rather than using a portion of their own system and wasting the portion they don’t use. This is handy because one of the worst things in the world is stale cake. I mean unused IT resources.

Getting to it – What Does Office 365 Offer

Office 365, being Microsoft’s Cloud solutions environment, provides most of the IT services that companies depend on Microsoft for. Specifically, Email and Calendaring (Through Exchange), Collaboration (Through Sharepoint and Exchange), Instant Messaging (Through Lync), and centralized file management and storage through OneDrive (Formally SkyDrive, formally something else. Microsoft changes the names of stuff every week it seems, so we’ll just call this cloud storage). If you wanted to compare the Infrastructure requirements you would need to meet what Office 365 provides for its monthly per-user fee (or a la cart if you don’t want all the services together) you would need the following things in your environment:

Exchange Online:

  • At minimum 3 Exchange 2013 servers configured with DAG
  • A hardware load balancer
  • A high-speed Fiber-channel SAN with about 55GB of storage per user (For normal Mailboxes) and an additional 5GB for each resource mailbox (Rooms, equipment, shared calendars, etc.).
  • An infinitely expanding low speed SAN (For archive mailboxes, for E3 licenses and above)
  • A secure email delivery solution to provide email stubbing (for E3 licenses and above)
  • Spam filtering services or a spam appliance

Sharepoint Online:

  • At minimum 2 Sharepoint 2013 Servers
  • Additional high-speed Fiber-Channel space, up to 50GB per user (again…this is space for OneDrive/SkyDrive/whatever they call it next)
  • More Load Balancing

Lync Online:

  • At minimum 3 Lync 2013 servers

Generic Software Infrastructure

  • Several Windows 2008/2012 Licenses
  • Active Directory (2 DCs minimum)
  • Multi-Tiered SAN infrastructure (with multi-site geo-replication capabilities)
  • A Load Balancer
  • Highly secured Firewalls

Software

  • Office Professional Plus license for each user

Physical Requirements

  • Multiple Physical locations spread across numerous geographical regions.
  • Each Physical Location should have Concrete security walls, entry barriers, full time security staff, man-traps, and multi-factor authentication before admittance

And that’s just for basic service. You would also need several employees dedicated to maintaining the infrastructure and supporting the environment, since it is very difficult to find individual IT personnel who have the skill set or mental constitution necessary to manage such an environment.

In other words, if you were to build the infrastructure necessary to provide the same functionality and level of service available with Office 365, you would need to spend several hundred thousand dollars in hardware, software, and manpower. This also does not take into account Architecting and development costs for setting up the environment, which is typically done by third party companies or contractors.

Normally, individual companies would need to spend this kind of money every time they upgrade their infrastructure to keep up with new features and changes in technology. But with Office 365, upgrades, patching and new services/features are released regularly, with no need to manage a patching system. All in all, using Office 365 can represent a significant cost savings for companies that need high availability, accessibility, scalability, and security in their IT infrastructure.

Why Wouldn’t I Use Office 365

If it costs a whole lot less and provides a lot of great features, why would you not want to use Office 365? The answer here is that cloud solutions are designed to meet the needs of the average environment, not every environment. As the cloud begins to mature (it’s really just an infant right now, so don’t be surprised if it tosses Cheerios across the room every now and then) it will become much more customizable, but for now, there is very little customization that can happen with Office 365. For instance, any Line of Business application you have in your environment that must me installed on an Exchange Server is not usable. Many software providers that require this type of functionality are moving their focus to cloud based solutions now, but things like Blackberry Enterprise Server will not work with Office 365. As I mentioned, most companies are building systems that integrate with Office 365, for instance Research in Motion has teamed with Microsoft to build in support for BES features into Office 365. This support has to be activated, though. But there are several things that simply can’t be done with Office 365. I’ll try to provide some of the limitations here. For a more detailed explanation of what Office 365 *can* do, check out the official service descriptions available from Microsoft.


  1. Microsoft limits email restorations to 14 days. If an email is deleted from a Mailbox, the Deleted Items folder, and then purged from the hidden recoverable items folder, you will only have the ability to recover that email for 14 days after being fully purged. Within the 14 day window, a support request to Microsoft is required to recover the email. Outside that Window, there is no possible way to recover it. I should mention that this is a technical limitation of the Exchange Online service. Exchange Online utilizes a 14 day lagged, 3+ copy DAG configuration. This configuration allows Microsoft to use Circular Logging on their databases to reduce resource usage, and allows an extremely high level of availability. However, the maximum amount of time that any DAG member can be lagged is 14 days. Once a full email purge is fully committed to the lagged database copy, there is no way to recover it.
  2. Office 365 does not allow unauthenticated email relaying. In order to send email to Office 365, you *must* authenticate with a licensed user account. If you have Line of Business applications or equipment that doesn’t support unauthenticated email, consider upgrading your version to one that supports authenticated relaying. If this is not possible, it is necessary to utilize an SMTP server in your network that supports unauthenticated relay, such as PostFix or the IIS based SMTP server that is included with Windows Server.
  3. Exchange Online has strict limits on the amount of email that each mailbox can send. This is to prevent spamming from Office 365’s mail servers and reduce resource overhead. Each mailbox is allowed to send to at most 10,000 recipients per day (a recipient is considered to be a single email address listed in the To:, CC:, or BCC: fields of an email, so a distribution list can be used to count as a single recipient). In addition, each email can be sent to up to 500 recipients and each mailbox can only send up to 30 messages per minute. Microsoft will not increase these limits even if you ask. If you have a business need to exceed these limitations, consider using a cloud based mass-emailing service like MailChimp.
  4. Many of the administrative capabilities and controls that are available with an On-Premise Exchange, SharePoint, or Lync environment are not exposed to Office 365 tenant administrators. If there is a control that you would like to enable or use and you can’t find it in the Admin Portal, you may be able to do it in the Remote Powershell sessions provided by Microsoft. Even with Powershell, there is only so much you can do. There are 4 different Modules for Managing Office 365 in Powershell, Exchange Online (with some additional things), Lync Online, Windows Azure AD, and SharePoint Online. As a general rule, though, administrative settings that control server level or organizational level functions will not be available to you. If you have a business need to change a high level configuration, you must prepare and submit a Support Request to Microsoft through the Office 365 Admin Portal. This is a fairly simple process, but support requests can take a significant amount of time to complete, so be prepared to wait for your changes to apply.
  5. If it breaks, there’s nothing you can do to fix it in most situations. Unless you are using ADFS and Dirsync in your environment, bringing Office 365 services back online if something fails is completely out of your wheelhouse. If you do use ADFS or Dirsync, the only thing you can really troubleshoot and fix is Active Directory Object syncing or Login issues. Everything else false under Microsoft’s SLAs and is their responsibility to fix. Microsoft guarantees a minimum service up-time of 99.9% (or 43 minutes acceptable downtime per month). The SLA documents provide exact details on service credits granted for falling below that level, but if your IT management has decided that a higher level of up-time is required, Office 365 may not be a good solution for your environment.

There are a number of other limitations to Office 365’s service, so many, in fact, that it would be difficult to outline them all with a small book. Because of that, Microsoft (and I) typically recommend a short Proof of Concept (PoC) period before migrating to the service. A PoC will highlight errors in your system configuration that will negatively impact interoperability with Office 365 and make sure that your environment and business needs can be completely met with Office 365.

I Work in IT – How Will This Impact My Job

One of the greatest causes of push-back from moving to the Cloud comes from IT staff. Most IT people are justifiably twitchy when it comes to keeping their jobs. There are a lot of people competing for IT jobs and one of the major selling points of Cloud Services is decreased employment costs. Add that to the fact that the first group to get targeted for layoffs during a recession is the IT department and questions about how this will impact IT workers becomes a greatly valid question. The truth is that while movements to the cloud will decrease the need for dedicated IT staff in most companies, it also increases the need for IT staff at datacenters and consulting firms. Skilled IT people are in short supply, so keeping up with the technology trends and times is very important. Learning about the cloud and understanding it will keep you from being unemployed for extended periods (I speak from experience on this, I promise).

That said, large environments that maintain IT staff will still need to keep a significant portion of their IT workers even if they move to the cloud. Microsoft and other cloud solutions do not provide End User Support so there will always be a need for Help Desk and on-site support staff. In addition, companies that migrate to the cloud will continue to need IT staff that can interface with Cloud Services Support Staff. A single support request with Microsoft should convince most of the difficulty that exists in managing support requests and maintaining lines of communication during outages and system failures. There will also still be a need for Systems and Network Administrators (In fact, with Cloud services Network Administrators may be in even higher demand as Internet Connection Up-time becomes more important). In all reality, on-site IT staff will still be very necessary, but the nature of the job will begin to change as more cloud services become available. Instead of fighting fires and panicking about system failures and inefficiencies, IT staff can focus on developing processes and making non-cloud based services work better. Cloud services make the typical IT employee’s job easier, not less necessary.

What is Available in the Cloud Besides Office 365

The primary principal of Cloud Computing is that the equipment that runs your infrastructure no longer exists in your physical locations. There are a lot of ways this is accomplished. Terms like Private Cloud are bandied about by Marketing teams with wild abandon without a really concrete definition. Ask three different salesmen what a Private Cloud is and you’ll probably get three completely different answers (Or some really blank stares. Salesmen. *eyeroll* Am I right?). But essentially, you’re in the cloud if you have to have a Public Internet connection to reach your resources. If you have a dedicated MPLS-like connection to a datacenter and someone else manages, maintains, and updates those systems for you, this is not operating in the Cloud. Some people will call this a Private Cloud, but the real term used to define this type of relationship is Managed Services (since that’s what that kind of relationship was called well before the term Private Cloud was coined).

At any rate, cloud services can range from Solutions like Office 365 and Google Apps for Business to things like Drop-Box and Imgur. For IT purposes, some additional services that may be useful include Microsoft’s Infrastructure as a Service (IaaS) Azure solution. Azure allows you to create entirely cloud based Virtual Machines in Microsoft’s datacenters that can be acessed from anywhere. Amazon’s AWS provides a number of services for Web Based businesses to perform necessary functions. Google’s Apps for Business provides similar functionality to Office 365 but with (in my opinion) less polish.

Other Considerations

Migrating to the cloud is a difficult decision to make. There are pros and cons just like any other business decision. Take some time to understand what is involved in moving to the cloud and make sure to plan for life after moving if you decide to do so. One of the best recommendations I can make for people considering a move like this is to contact a company that specializes in cloud services (Like Business and Decision North America, the company I work for. How’s that for shameless plugs?). Aside from being able to explain things in greater detail and help plan for moving to the cloud, using a Microsoft Partner to assist in your move will open up options for quick escalations and better communication with Microsoft’s Support teams in the event of problems. Companies that do not have a partner to assist them must deal with Microsoft’s Office 365 support teams themselves, and this can take up to 2 days to receive the first response , depending on their workload and day of the week (never make a support case at 4PM on Friday. Just a friendly tip). If you work with a Microsoft Partner, this response time can be decreased to the amount of time it takes for someone in Redmond to get their butt kicked (metaphorically speaking). All in all, the world is just now beginning to move into the cloud, so now is the time to begin preparing for the inevitable.