Keys to the Kingdom, Part Two: Building the Vault the Right Way Artwork

Dark Perimeter: True Cybersecurity Stories

Every major cyberattack has a story behind it. A vulnerability no one patched. A phishing email someone clicked. A nation-state with a motive. Dark Perimeter goes beyond the headlines to explore the true stories of the hacks, breaches, and cyber operations that shaped history - told in narrative form for security professionals and curious minds alike. No guests, no panels, no filler. Just the story.

All Episodes

Dark Perimeter: True Cybersecurity Stories

Keys to the Kingdom, Part Two: Building the Vault the Right Way

May 23, 2026

0:00 | 23:06

Part Two of the Azure Key Vault series goes into the architecture. The specific decisions. The configuration that separates a deployment that is genuinely secure from one that looks right on a diagram but has gaps. Cole Drayden covers: provisioning and naming conventions that matter, why Azure RBAC at the secret scope is the right access model and how to implement it, managed identity wiring in implementation detail, soft delete and purge protection, diagnostic logging configuration and the queries that prove it works, network access control and the case for private endpoints, infrastructure as code as a security control, secret rotation in practice, and the six most common Key Vault misconfigurations. Closes with the complete sign-off checklist for CISO acceptance — and the case for why AI service credentials are a new category of high-value secret that deserve the same rigor as your most sensitive organizational credentials. Dark Perimeter: Security, AI, and the Edge of What's Coming.

Support the show

SPEAKER_00 0:00

In part one, we covered the CISO oversight frame, the eight questions to get answered, the concepts to have in your head before you look at what your team has built. If you have not listened to part one, go back and do that first. This episode assumes you understand the management layer and are ready to go deeper. Today we are getting into the architecture, the specific decisions. The configuration that separates a key vault deployment that is genuinely secure from one that looks right on a diagram but has gaps that will matter when something goes wrong. We are going to walk through this in the order you would actually build it, and I am going to tell you why each decision is the decision, not just what the decision is. Welcome back to Dark Perimeter. I'm Cole Draden. Let's start at the beginning, provisioning the vault itself. Your key vault lives in an Azure resource group. The resource group is a logical container for related Azure resources, and your Key Vault should be in a resource group that reflects its organizational role. A common pattern is a dedicated resource group for security infrastructure, separate from the resource groups that hold your application workloads. This matters for access control. The people who administer your key vault infrastructure are not necessarily the same people who deploy your applications, and keeping them in separate resource groups lets you apply different RBAC assignments at the resource group level. Naming convention matters more than you might think because key vault names are globally unique across all of Azure, they appear in the vault's public URL, and they cannot be changed after creation. The pattern that works as a prefix indicating the organization or project, a descriptor indicating this is a key vault, the environment, and a suffix. Something like your organization abbreviation dash the environment dash a number. You might end up with something like orgname KV prod zero one. The reason to be deliberate about this now is that you are going to have multiple vaults as you grow. Development and production should be separate vaults. Separating them lets you apply different network policies, different access controls, and different retention policies appropriate to each environment. When you provision, you select the region. Choose the same region as your primary application workloads to minimize latency and keep data residency consistent. Enable geo redundancy if your organization's requirements demand it, and select premium tier, which you have already decided on. Now the access control model, and I want to spend real time here because this is where the architecture either works or it doesn't. When you create the key vault, you will be asked to configure the permission model. Select Azure Role-Based Access Control, not Vault Access Policy. This setting cannot be changed easily after the fact without disrupting existing access assignments, so getting it right at creation is important. With Azure RBAC selected, you now assign roles. The relevant built-in roles are Key Vault Administrator, which gives full control of vault operations and is for the people or service principles managing the vault itself. Key Vault Secrets Officer, which allows create, read, update, delete, and list operations on secrets. Key Vault Secrets User, which allows read operations on secrets only, and is the role your applications will use. Key Vault Reader, which allows reading vault metadata, but not secret values. Your human administrators who configure the vault get Key Vault Administrator. Your applications get Key Vault Secrets user, assigned at the secret level rather than the vault level wherever possible. No application should have Key Vault Secrets Officer in production. That role is for the automated processes or humans that create and rotate secrets, not for the applications that consume them. The assignment at the secret level is worth explaining because it is the control that gives you real least privilege. When you assign key vault secrets user to a managed identity at the scope of a specific secret rather than at the scope of the vault, that managed identity can read only that secret. If that application is compromised, the attacker can read one secret, not the entire vault. This is the architecture that makes the access model actually mean something. In the Azure portal, you do this by navigating to the specific secret, going to access control on that secret, and adding a role assignment there. In Infrastructure as code, which I will get to, you do this with a role assignment resource scoped to the secret resource ID. Let's talk about managed identities and implementation detail because the concept is one thing and the wiring is another. Your application runs on some Azure Compute resource, an app service, a container in Azure Container Apps, a virtual machine, a function. Whatever it is, navigate to that resource in the portal, find the identity section, and enable the system assigned managed identity. When you do this, Azure creates an identity and intra ID for that resource and manages its lifecycle automatically. When the resource is deleted, the identity is deleted. You never create credentials for it and you never rotate credentials for it. Now you have an identity. You assign it the Key Vault Secrets user role on the specific secrets it needs. In your application code, you retrieve the secret using one of the Azure SDK patterns for managed identity authentication. In Python, you would use the Azure Identity Library's Default Azure Credential class combined with the secret client from the Azure Key Vault Secrets Library. The default Azure Credential checks multiple authentication methods in sequence, and when running in Azure, it finds the managed identity automatically. You reference the Key Vault URI, which is in the format https. The SDK handles the token acquisition, the API call, and the secret retrieval. Your application code never sees a credential. It sees a function call that returns the secret value. For your Claude API key specifically, you store the key as a secret in Key Vault with a descriptive name like Claude API key prod. Your application, when it starts up or when it needs to make a call to Claude, retrieves the secret via the SDK. The key is held in memory for the duration of the operation and not written to disk or to logs. If you are in a language other than Python, the pattern is the same. Azure SDK Managed Identity Authentication Secret Retrieval by Name. Now let's talk about the things that make this deployment resilient rather than just functional. SoftDelete is enabled by default on new key vaults in current Azure, but verify this. In the portal, under Properties, confirm soft delete is enabled, and the retention period is set to ninety days. Then enable purge protection. Purge protection is a separate setting and it is not enabled by default. Find it in the same properties section and turn it on. Once purge protection is enabled, it cannot be disabled. That is intentional. It is a one way door that guarantees your secrets and your vault cannot be permanently destroyed during the retention period, regardless of who tries. Secret versioning is automatic in Key Vault. Every time you update a secret, a new version is created and the previous version is retained. Applications that reference a secret by name without specifying a version always get the latest version. This means secret rotation is simple from the application's perspective. You write the new value, key vault creates a new version, applications automatically get the new value on their next retrieval. The old version remains accessible by version ID if you need to retrieve it during a transition period. For your AI credentials, document the rotation plan for Claude API keys, 11 Labs keys, and any other AI service credentials. What is the trigger for rotation? Who performs the rotation? What is the process? And what is the rollback if the new key does not work? For some services, you can generate a new key before revoking the old one, which allows zero downtime rotation. Understand whether each of your AI service providers supports overlapping keys and design your rotation process accordingly. Test this process before you need it under pressure. Diagnostic logging is not optional. Let me tell you exactly what to configure. In the Key Vault portal, navigate to Diagnostic Settings and add a diagnostic setting. You want to capture the audit event log category, which is where every access operation is recorded. Send it to a loganalytics workspace. If you have a log analytics workspace that feeds your SIM, use that one. If you do not have one, create one in the same subscription and document its location. Once logs are flowing, verify with a query. In log analytics, the table is called Azure Diagnostics with a resource type of vaults. A basic query to verify logging is working looks like Azure Diagnostics where resource type equals vaults sorted by time generated descending. You should see entries appearing within a few minutes of any key vault operations. The queries that matter for CISO oversight are who accessed a specific secret in a date range, which identities have accessed the vault, and which access attempts were denied. Build those queries, save them, and run them on a schedule. If you are in Sentinel, create an analytic rule that alerts on access to high sensitivity secrets from unexpected identities. That is your real-time detection story for Key Vault. Log retention in log analytics defaults to thirty days for the free tier and ninety days for paid workspaces. For a security workload, set retention to at least one year. Compliance requirements may dictate longer. Network access control. Here is the architecture decision and why it matters. By default, Key Vault accepts traffic from any IP address on the public internet authenticated by intra ID. For most use cases, this is acceptable. For a security organization building AI infrastructure on top of secrets management, you should go further. The right architecture for production is a private endpoint. Creating a private endpoint for Key Vault provisions, a private IP address within your virtual network that maps to the vault. You then configure the Key Vault firewall to deny all public access and allow only traffic from trusted virtual networks. All traffic between your applications and Key Vault travels over the Microsoft Backbone Network, never touching the public Internet. The operational implication is that any system that needs to access Key Vault must be within your virtual network or connected to it via VPN or Express Route. For your AI application infrastructure, which presumably runs in Azure Compute Resources within your network, this is straightforward. For development access from developer workstations, you either put those workstations on a VPN that connects to the network, or you maintain a separate development vault with less restrictive network access and keep production locked down. If private endpoint is not currently in scope for this deployment, at minimum configure the key vault firewall to restrict access to known IP ranges. That is better than fully open, but plan the migration to private endpoint because it is the right long-term architecture and it is easier to implement before you have 40 applications depending on the vault than after. Infrastructure as code. I want to raise this because the difference between a key vault deployment that is maintainable and auditable and one that is not is often whether the deployment is defined in code or clicked together in the portal. Bicep or terraform, either works. The value is that your key vault configuration is in version control, it can be peer reviewed, it can be deployed consistently across environments, and changes to it are tracked. If someone changes the network access policy, that change is a pull request. If someone adds a new RBAC assignment, that is a commit. The infrastructure is auditable the same way your application code is auditable. For a security infrastructure component like Key Vault, the CISO should ask Is this defined in infrastructure as code? Is that code in version control and who approves changes to it? If the answer is that it was clicked together in the portal and there is no code representation that is technical debt worth addressing before the deployment grows. Now, let's talk about expanding beyond AI credentials, because you said this is where you are going. The organizational secrets that benefit most from Key Vault are database connection strings and passwords, service account credentials for integrations with SaaS platforms, and on premises systems, application registration secrets for intra ID app registrations, and encryption keys for data at rest. For database credentials, the pattern is the same as AI credentials. The application retrieves the connection string from Key Vault at startup. The database credential is never in a config file. You rotate it by updating the secret in Key Vault and restarting the application or triggering a configuration refresh. For encryption keys, this is where Key Vault Premium earns its cost more directly. You create RSA or EC keys in Key Vault and use them for envelope encryption. Generate a data encryption key for each dataset or storage resource. Encrypt that data encryption key with the Key Vault key, and store the encrypted data encryption key alongside the data. The Key Vault key never leaves the vault. Decryption requires calling Key Vault to unwrap the data encryption key, which is logged and access controlled. Azure Storage, Azure SQL, and most Azure Data Servic support customer managed keys using Key Vault directly, which means the encryption key management is handled at the platform level with Key Vault as the key store. For those services, enabling customer managed keys is a configuration change rather than custom code. For certificate management, Key Vault can manage the full certificate lifecycle, generation, renewal, and storage. If your certificates are currently renewed manually and stored in files on servers, migrating them to Key Vault gives you centralized visibility, automatic renewal through certificate authority integrations, and consistent access control. Azure services that consume certificates, like App Service and Application Gateway, have native Key Vault integration. Let me close with the common misconfigurations. These are the things that show up in Key Vault deployments that have been done quickly. The first is Vault Access Policies remaining as the permission model after a partial migration to RBAC. Sometimes a deployment starts with access policies, someone starts migrating to RBAC, and the vault ends up in a hybrid state with both models partially active. This is confusing and hard to audit. Commit to one model. It should be RBAC. The second is overly broad RBAC assignments. Key vault secrets user assigned at the vault scope rather than the secret scope. This gives applications access to all secrets rather than the specific secrets they need. It is harder to configure initially and much better from a least privilege standpoint to assign at the secret scope. The third is diagnostic logs configured, but not verified. The setting exists, something is shipping somewhere, but nobody has run a query to confirm the logs are arriving, and queryable. Verify the logs, run the query, confirm the audit trail exists before you rely on it. The fourth is no rotation process. Secrets are loaded, applications run, nobody thinks about rotation until a security incident or a key compromise makes it urgent. Document the rotation process for each secret in the vault before you need to use it under pressure. The fifth is leaving the vault open to public internet traffic when private endpoint is achievable. This is not a critical misconfiguration if authentication is properly configured, but it is an unnecessary exposure for a security infrastructure component. The sixth is not reviewing the owner and contributor list. Key vault administrators with owner level access can change access policies, export keys, and modify firewall rules. This list should be small, reviewed quarterly, and tied to named individuals rather than shared service accounts. Here is the sign off checklist for part two. This is what Done looks like. The key vault is in a dedicated security infrastructure resource group. Azure RBAC is the permission model. RBAC assignments for applications are scoped to specific secrets, not to the vault. Applications authenticate using managed identities. Purge protection is enabled. Diagnostic logs are shipping to log analytics and have been verified with a query. Network access is restricted via firewall rules or private endpoint. The deployment is defined in infrastructure as code and the code is inversion control. Secret rotation procedures are documented and tested for every secret in the vault. The owner and contributor list is reviewed and contains only named justified individuals. If you are handing this off to someone to build, that checklist is your acceptance criteria. If you are building it yourself, that checklist is your definition of done. The reason this matters, and I want to close on this, is that AI service credentials are a new category of high value secret, and the industry has not yet developed mature practices around them. The API key to your Claude deployment, your Eleven Labs production voice account, your AI infrastructure in general. These are credentials that control capabilities that are increasingly central to how organizations operate. Compromising them is not just a financial cost, it is access to the intelligence layer of your operation. Treating them with the same rigor you would apply to database administrator credentials or signing certificates is not over cautious. It is appropriate to the value of what they protect. You are building this now before it is urgent, which is the right time to build it. The architecture we have covered today will serve you well as the footprint grows. Start with the AI credentials, expand methodically, and make sure the eight questions from part one are answered in writing before you call this deployment complete. This is Dark Perimeter. I'm Cole Draden. Build it right.