In this post (after the the dauntingly long intro) I’ll demonstrate some of the ways we’ve implemented security baseline monitoring (= security posture management) for our customers. The tools are familiar to most readers: Azure Policy and Defender for Cloud. I’m not going to explain the tools themselves deeply, as there are plenty of blog posts about that already. What I’m exploring here is the integration between them, and some non-documented features that I’ve ran into.
Conducting Azure security assessments is part our team’s daily work at Nixu. Usually this involves the use a combination of tools and interviews to understand the current state of the customer’s Azure environment, reporting the findings and giving recommendations to remediate the issues found. A colleague Joosua Santasalo has written a number of great blog posts around assessments, including a series of posts around Secure DevOps Kit for Azure and how to automate the report creation with NodeJS and PS. Check out also this really cool tool Joosua has been developing for assessment data gathering.
Understanding the current state of an environment is the first step towards improving its security posture. After we present our recommendations to our customers, the next step is to evaluate and implement them (typically there are some gaping holes that need to be fixed ASAP, while some recommendations are more long-term or just less critical). One of the things we get constantly asked (and which we also recommend) is this: How do you ensure that the security posture will not start deteriorating soon after the fixes have been made? It is often the case that the development teams (or the service provider) have full control of their subscriptions/resources. Mistakes happen, people don’t know what they’re doing, and the pressure to get stuff working is high. As result, the the configurations start to drift.
The cure is quite simple (but easier said than done). First, we need to have a security baseline defined. Then we need to continuously enforce that baseline. The enforcement can be done using detective controls (create alerts when non-compliance is detected, and then follow a process to fix the issues) or preventive controls (prevent non-compliant configurations to be made in the first place) or it could be something in between (e.g., detect + auto-remediate). In the good old Windows + Active Directory times this would normally be achieved using Group Policies, SCCM or PowerShell DSC. So how do we do this with Azure?
What is my baseline?
Before going into details about the implementation, it’s good to have a few words about the baseline itself. There are different sources of information that you can use to come up with a set of controls that define your Azure security baseline:
- Existing corporate security controls: Especially large organizations typically have some sort of catalog of security controls that apply to all systems within the organization. Often many of these controls are not technical but administrative. But typically there are some high-level technical controls that can be used as guiding principals.
- Azure Security Benchmark (ASB) consolidates Microsoft security best practices in Azure. It’s a great resource for design decisions, and the controls are mapped to industry standards. Microsoft has also created a policy set in Azure which can be used for monitoring resource compliance against the baseline. If you don’t have anything else, use this! However, not all controls are suitable for all organizations.
- CIS Azure Foundations Benchmark is also a useful resource for security best practices. Microsoft has also created a policy set in which can be used for monitoring resource compliance against the CIS benchmark. However, they are often lagging a bit behind (right now the latest CIS benchmark is version 1.4.0, but the latest policy set is for version 1.3.0).
- Secure DevOps Kit for Azure: as noted many times in our blog posts, AzSK is a great tool for assessing Azure environments. AzSK is currently in sunset mode. The team has been developing a new solution (AzTS), personally I prefer using the native tools (described in this blog post) for continuous monitoring. The main reason is that setting up AzSK and AzTS for continuous monitoring is more complex, and they lack the possibility to exclude/exempt individual resources from individual controls.
Implementing the baseline (Azure Policy)
Azure Policy is a free offering in Azure that has been there for a long time. Every customer knows that it exists but surprisingly few actively leverage it. What I’ve learned while working with my colleagues who’ve implemented security baseline monitoring for other cloud platforms (specifically AWS and Aliyun) is that other cloud platforms have not productized this sort of baseline monitoring as neatly as Azure has, and as result the implementations tend to be more complex (and have at least some cost involved). I’m not saying that Azure is better as a platform, just that for this specific use case it has a pretty unique offering, and customers should leverage it.
To understand how the baseline is created and assigned, it’s important to understand the following concepts that are central to Azure Policy and this baseline:
- Policy Definition represents a technical control. It contains the policy statement and the type of control (detection / prevention / detection+remediate). Example: check whether storage accounts have network restrictions enabled, if not: mark non-compliant.
- Policy Set Definition is a collection of policy definitions. Azure portal uses the term ”initiative” instead of policy set. But I prefer policy set as it’s more descriptive (and it’s the term used in all the APIs). You can have multiple policy sets, but in the simplest case you could just use one for the entire security baseline. Policy sets also support the use of policy groups, which provides the ability to map policies to higher level controls.
- Assignment can be done to policies or policy sets. Assignment makes the policies effective in the sense that the policy engine starts evaluating the resources against the policies. You can assign policies on management group, subscription or resource group level. We’ll discuss this later in the post.
- Exemptions provide a way to exempt resources from specific policies or policy sets. This is a fairly new feature that is absolutely crucial, because you want to keep your dashboards green (exempt the resources that are not compliant because of some good reason).
Built-in or custom?
You can have two kinds of policies (and policy sets) in Azure: built-in and custom. Azure has a number of built-in policies (1081 the last time I checked):
There are also a number of built-in policy sets that consolidate individual policies:
You can use the built-in policies and policy sets directly, or you can create custom ones. The great thing about the built-in policies and policy sets is that Microsoft is constantly maintaining them so they are being kept up-to-date. However, there are some drawbacks too:
- What if there is no built-in policy for the control you want to implement?
- What if you want to add company-specific metadata to the policies?
- What if you want to control changes to the policies in your version control?
Mainly due to these three questions, I’ve been recommending to use custom policies (and policy sets) when building your security baseline. Note! Even if you use custom policies, it does not mean that you need to start from scratch. As you’ll see in the example later in the post, we will actual simply create a copy of existing built-in policies and adding some metadata into them. Make as few customizations as necessary.
Defining the controls (creating policy definitions)
Now that the basics are covered, we can start creating the policy definitions. At this point we assume you have a high-level security baseline. The next steps are:
- Map the built-in policies against your security baseline and pick the relevant built-in policy definitions.
- Create copies of the relevant policy definitions, add any metadata you find useful and tailor the policy definition to your needs. Note! One very useful piece of metadata you should add is the securityCenter metadata property, which allows you to instruct Defender for Cloud what kind of information to show about the policy. Especially the severity information is very useful, as policy definitions do not have that kind of property (more about the relationship between Azure Policy and MDC in a later section).
- If there are still gaps that are not covered, analyze if it’s feasible to implement them using custom policies. If not, you need to find some other way of implementing the control (it can also be an administrative control).
Next, we will look into steps 1 and 2 above (I won’t spend time on the details of creating custom policies, but there are plenty of good resources available to guide you there).
There are several good ways to analyze the built-in policies, including:
- Use the Microsoft documentation listing the built-in policies. The nice thing about this page is that the policies are grouped based on resource type (and you can easily search via browser). It also has links directly to the policy definition in Azure. However, this approach is very manual.
- Use AzAdvertizer, which is a site maintained by Julian Hayward. This is a great site that analyzes changes in built-in policy definitions. Even if you don’t use it in the initial baseline creation, it’s great for checking what has changed since specific dates.
- Use Azure Portal. Obviously you can directly look into the policy definitions in Azure Portal, and even create copies of policy definitions. But this approach doesn’t scale very well (and we want to maintain our baseline in version control).
- Use PowerShell (or your favorite scripting language) to pull the definitions into CSV or some other format and go from there. If prefer this option. The great thing about this option is that you can then pull the policy definitions into JSON files, add metadata, etc.
Once you have the policy definitions created (JSON files), you should store them in version control (Azure DevOps, GitLab, whatever you are using) to keep track of the changes. You can also then use CI/CD pipelines for deploying and updating the policies. Here’s a great resource describing how you can do that with Azure DevOps. Regardless of your deployment method, as result you will have your custom definitions (which are based on built-in definitions) in Azure:
In the example I have just a few policies. In reality, a security baseline can easily contain 200 policies.
Tip: if you create custom policies by copying built-in policies, insert the original policy definition id and version into the metadata. This way, you can track changes to built-in policy, and compare it to your copy (can easily be automated).
Defining the baseline (creating policy set)
Once you have all the policies created, the next step is to create the policy set (= initiative). The policy set can also be defined in JSON file, so you can store that in your version control as well, and the deployment should be done using the same method as with the policies themselves.
Integration with Defender for Cloud
This is the part where there’s always confusion. What’s the relationship between the recommendations in Defender for Cloud (MDC) and Azure Policy? Now, I haven’t really seen any Microsoft documentation that would describe how this works under the hood, so most of the following claims are just based on my own reasoning, and I could be wrong. Just FYI :). But here it goes.
Every recommendation in MDC is based on a policy definition
If you see a recommendation in MDC, there is always a policy definition behind it. This is quite simple, but it’s very important. With recommendation I mean what you can see in the Recommendations or Regulatory compliance view. The Cloud Security Posture Management (CSPM) capability of MDC is based on Azure Policy. Open any recommendation, and you will see an option to see the policy definition behind it:
Integrate through regulatory compliance
How do you get your custom policy definitions visible in MDC? You add it through the security policy. You need to have the non-free MDC plan to do this, i.e. Azure Security Center Standard, as it was known 10 years ago. I haven’t seen any documentation with regards to what MDC plans you need to enable. Technically, you can just have one plan (even if you don’t have any resources of that type which makes it effectively free of charge). But I honestly do not know if there are license-wise any requirements in terms of which plans need to be enabled.
Btw. When you do this integration, what actually happens is your policy set is updated with a metadata property ”ASC”: ”true” as shown below. This means that you can do this integration programmatically when you create the policy set (using your CI/CD pipeline or script or whatever the method). Just add the metadata during creation, and it will be automatically integrated with MDC. Again, haven’t seen any documentation about this, but that’s how it works.
It’s the same data, but not exactly
The recommendations you see in MDC are based on Azure Policies, but it’s not the same data. I created a picture that illustrates the relationship (later in this blog post). But a few observations first:
- Policy states are updated once a day (seems to happen somewhere around 7-9 AM EET). Let’s say you change a configuration on a resource that makes it non-compliant. Azure Policy will show it as non-compliant after the daily evaluation takes place (you can also trigger it with Start-AzPolicyComplianceScan). Note, that the policies themselves are evaluated during the change, i.e. when you push the update to Azure resource manager, it will do the evaluation (how would the deny policies otherwise work?). But, it won’t be visible in the compliance view until the daily evaluation. My observation, haven’t seen any documentation about this.
- MDC creates the assessment resources based on the policy states. This also happens daily. It also means that Azure Policy and MDC are not always in sync. Using the previous example: you make the change, the next daily policy evaluation takes place and you will see it become non-compliant in Azure Policy, but it may still show up in MDC as compliant. Only after MDC has its own daily cycle completed, it will update the state (based on the Azure Policy state).
- MDC recommendations may have a severity classification (not all do). I mentioned this before, but Azure Policy does not have a way to classify policy definitions (e.g., low/medium/high), but MDC does (as you can see in the picture above with the Medium severity). The way it works is that you can add a securityCenter metadata property to your policy definition, which will instruct MDC to add the severity information to the recommendation. This is super useful when creating your baseline, because not all findings are equally critical to fix (e.g., RDP port open to the internet vs. TLS 1.1 allowed on a web app).
Sometimes it goes the other way around
I noted before that all MDC recommendations are based on policy definitions. This is true, but in some cases the evaluation is actually done by MDC, not Azure Policy. I’ll give you an example.
Below is typical policy definition, which uses the Azure Policy engine to evaluate the compliance. It uses policy aliases (such as Microsoft.Web/sites/httpsOnly) to access the configuration information. It evaluates the compliance, and MDC then pics up the information and shows it as a recommendation in MDC.
Below is an different example, where Azure Policy actually uses an MDC assessment resource to evaluate the compliance. It’s a weird two-way integration, because MDC actually performs the assessment first, creates an assessment resource first, which is used by Azure Policy to determine compliance. MDC will then use the policy compliance (which is based on the assessment done by MDC) to show the result in MDC (if it’s a custom policy). Why? Well, the reason is that the control below could never be assessed using policy aliases, because this is not information that is available through Azure control plane. So MDC does its own magic behind the scenes to do the assessment. In this example I believe it actually analyzes activity logs to find admins that did not use MFA (it’s not looking into Azure AD and conditional access policies).
Finally, to show the data in Azure Policy and MDC correlate, I’ve created a couple of pictures.
The first one is simple. It shows the following relationships:
- Policy definition states are shown as recommendations in MDC (in this case I’m using the Regulatory compliance view, but the same policies are visible also in the Recommendations view).
- Policy set (= initiative) is shown as a regulatory compliance standard in MDC.
- Policy groups are shown as regulatory compliance controls in MDC.
The underlying data structure is shown in the (horrible) picture below. I’m not a big fan of the way the regulatory compliance controls are structured. For whatever reason they are identified with an incrementing number (which you can also see in the above portal view also).
The other thing that annoys me is that while each policy group can have a category (misleadingly called subgroup in the portal), but this is not represented in the regulatory compliance view in any way. But hopefully in the future!
Side note: Azure Resource Graph actually consolidates the assessment and assessmentMetadata objects as shown in the picture below. But you’re using the APIs directly to get the information, you need to do this consolidation youself.
Another side note (and another annoyance): when you’re in the policy compliance view, it shows the policy groups as links. If you click the link, it does’t take you anywhere. You know why? It’s because the link actually tries to find a policy metadata object related to that group. Unfortunately, these metadata objects are read-only, you cannot create your own metadata object for that particular group.
If you open one of the built-in policy sets and click the group name, it takes you to the metadata describing that particular control (below Azure Security Benchmark). It would be great if you could create your own metadata objects describing these high-level controls. The description could be copy-paste from your organization’s control descriptions (or whatever framework you are using). Not possible, but maybe in the future!
This is the difficult part. Once you have the baseline defined, whether it’s one of the built-in policy sets or one that you’ve created, what do you do with it? A few observations:
- Especially in larger organizations, management of the subscriptions and resources is scattered between different teams and outsourcing partners. Fixing a non-compliant configuration (or creating an exemption) is not that simple. You need to have the ones responsible for managing the resources to monitor their compliance against the baseline, and to fix the issues (instead of having someone from the security team to do that for them).
- You can easily integrate your compliance monitoring into external systems using a simple function app (pull the results from the APIs described above, and push somewhere). We’ve used this approach to create tickets in ServiceNow, and even publish results into AWS Security Hub (although, in many cases a better approach is to do it the other way around as described in Sami’s blog).
- If you create your own baseline, you need to maintain it! Microsoft publishes new built-in policies all the time, so you should have a continuous development process in place to add new policies and remove deprecated ones. This is also a very good reason to store your policy definitions in version control, and use CI/CI pipelines to deploy the changes.
- Use Azure Workbooks to provide different kinds of dashboards and views to your baseline. You can deploy the same workbook to Defender for Cloud, Azure Monitor and Microsoft Sentinel (if in use), which can be beneficial, as different teams may be using different tools.
I’ve been focusing on Azure here, but many customers have other cloud platforms in use. A colleague of mine, Sami Lamppu has written an excellent series of blogs with regards to the multi-cloud scenario. Great series of blogs that covers not only posture management, but security monitoring as well.
Summary / final remarks
Some final remarks here to summarize this rambling blog post (also some notes that I may have forgotten to mention earlier):
- Use Azure Policy! It’s free, and it’s great tool for ensuring basic security hygiene of your environment. If creating your own baseline seems too heavy (you need to maintain it!), just use the built-in Azure Security Benchmark.
- Use exemptions. Any dashboard that is always ”a little bit of red” is useless. So create exemptions for resources/policies that are not compliant (for a reason).
- If you create custom policy definitions, deploy them into the root management group. This way, the policy definitions are available in every subscription in your environment, and you only need to maintain them in one place. If necessary, you can create the policy assignment on a lower level. But especially if you’re just using audit-mode policies, I would also do the assignment at root level.