Dark Perimeter: True Cybersecurity Stories

Keys to the Kingdom, Part One: The CISO's Guide to Managing Your Azure Key Vault Deployment

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 18:48
Most organizations building on AI infrastructure right now are handling their API keys badly — not because their people are careless, but because the default patterns of software development are not secure patterns. In Part One of this two-part series, Cole Drayden breaks down Azure Key Vault Premium from the CISO management perspective: what it is and what problem it solves, why the access control model matters more than most deployments get right, what managed identities actually mean in practice, and the eight oversight questions every security leader should get answered before signing off on this deployment. Whether you have a team deploying this or you are the one deploying it, this is the frame that separates a Key Vault deployment that is genuinely secure from one that looks right on a diagram. Dark Perimeter: Security, AI, and the Edge of What's Coming.

Support the show

SPEAKER_00

You have a guy deploying Azure Key Vault, or maybe you have a team, or maybe you are the person doing it and you report to someone who needs to understand what you're building. Either way, you are the CISO or the security manager or the person responsible for signing off on this thing when it's done. And the question you should be asking right now before a single resource gets provisioned is not how does this work technically? The question is what does Done Wright look like and how will I know if I'm looking at it? That is what this episode is about. Welcome to Dark Perimeter. I'm Cole Draden. Today we are talking about Azure Key Vault Premium, Secrets Management, and the specific challenge of securing AI service credentials in an organization that is starting to build on top of AI infrastructure. We are going to do this in two parts. This first episode is the CISO frame, the management perspective, the oversight questions, the things you need to understand to know whether your deployment is secure without having to be the one doing the deployment. Part two goes deeper into the architecture, the specific configuration decisions, and how to expand this beyond AI credentials to secure the broader organization. Let's start with the problem you're actually solving. Most organizations that are building on top of AI services right now are handling their API keys badly, not because their people are careless, but because the default patterns of software development are not secure patterns and nobody stops to change them until something goes wrong. Here is what bad looks like. A developer is connecting your application to Claude or to Eleven Labs or to whatever AI service is powering your workflow. They need an API key. They put the key in a configuration file. Or they put it in a dot env file that lives in the project directory. Or they hard code it in the application, which is worse. Or they store it in a shared spreadsheet, which is worse still. The key is functional. The application works. Everyone moves on. The problem is that key is now in multiple places, none of them controlled, none of them audited, none of them rotatable without touching every system that uses it. If a developer leaves the organization, the key goes with them in their memory and possibly in their local files. If the repository gets pushed to GitHub without the right exclusions, the key is public. If the configuration file is on a server that gets compromised, the key is compromised. And when you ask the question who has access to this key right now, the honest answer is we don't know. Azure Key Vault solves this problem. It is a centralized, access controlled, audited secret store. You put your secrets in the vault. Applications retrieve them at runtime using managed identities, which means no human ever holds the secret, no secret ever lives in a config file, and every access is logged. When you need to rotate the key, you rotate it in one place and every application that uses it picks up the new value automatically. That is the promise. The question is whether your deployment is actually delivering it. Let me give you the conceptual model you need to have in your head as a CISO before you start asking questions about your deployment. Azure Key Vault stores three types of things secrets, which are arbitrary strings, like API keys and connection strings and passwords, keys, which are cryptographic keys used for encryption and signing operations, and certificates, which are X five hundred nine certificates managed through their full life cycle. For the AI credentials use case you are starting with, you are primarily dealing with secrets. As you expand to broader organizational security, you will add keys for encryption and potentially certificates for application authentication. You chose premium tier, which is the right choice for a security conscious organization. The difference between standard and premium is that premium supports hardware security modules, which are physical devices that provide an additional layer of protection for cryptographic key material. In premium, you can flag certain keys as HSM protected, which means the key material never leaves the hardware boundary in an unencrypted form. For your AI credential use case, the secrets themselves are not cryptographic keys, so HSM protection is not directly relevant to the API keys. But premium gives you the option as you add encryption key management to your scope, and it is better to be on premium now than to have to migrate later. Now, let's talk about access control, because this is where most deployments go wrong. Azure Key Vault has two access control models. The old model is called Vault Access Policies. The new model is called Azure Role Based Access Control, which everyone calls RBACE. If your implementer is deploying with Vault Access Policies as the primary access model, that is a flag worth raising. Here is why. Vault access policies are vault scoped. You assign permissions at the level of the entire vault, not at the level of individual secrets. This means if you grant an application permission to read secrets from the vault, it can read all secrets from the vault. There is no native way to say this application can read this specific secret and nothing else. This is a problem as your vault grows and contains secrets belonging to different systems, different teams, different sensitivity levels. Azure RBAC solves this. With RBAC, you can assign roles at the vault level, at the secret level, or at the secret version level. You can give an application read access to exactly the secrets it needs and nothing else. You can give a developer read access to development secrets without giving them access to production secrets. You can give your security operations team read access to audit logs without giving them access to secret values. The model is granular, consistent with how access control works everywhere else in Azure and auditable through the same tooling you use for everything else. Your question for your implementer. Are we using Azure RBAC for the Key Vault Access Control model? If the answer is no, ask why and push for a good reason. The second major concept you need to understand is managed identities. And this is the thing that makes secrets management actually work at scale. In the old world, you would give an application access to the key vault by creating a service principle, generating a client ID and client secret for that service principle, and storing those credentials somewhere the application could read them. But now you have a new secret to protect the credentials the application uses to authenticate to the vault that holds your secrets. This is the authentication bootstrapping problem, and it is a real problem that defeats the purpose of secrets management if you don't solve it. Managed Identities Solve It. An Azure Managed Identity is an identity automatically managed by Azure Active Directory and associated with an Azure resource. You enable a managed identity on your application's hosting resource, whether that is an app service, a virtual machine, a container, a function, whatever you are running on. You then grant that managed identity access to Key Vault using RBACE. When the application needs to retrieve a secret, it authenticates to Key Vault using its managed identity, which is handled automatically by the Azure platform with no credentials that any human needs to manage or store. The result is that no human ever handles the credentials the application uses to authenticate to the vault. The application gets its secrets from the vault at runtime. The secrets are never in config files. The authentication credentials are never in config files. The entire chain is managed identities, and Key Vault end to end. Your question for your implementer, are our applications authenticating to Key Vault using managed identities? Or are we using service principles with stored credentials? If the answer is service principles with stored credentials, ask where those credentials are stored and how they are rotated. That answer will tell you a lot. Let's talk about what you should be able to see. Because a secrets management deployment that you cannot audit is not actually a security control, it is theater. Azure Key Vault has comprehensive diagnostic logging. Every secret read, every secret write, every authentication attempt, every access denial is logged. The logs include the identity that made the request, the operation they performed, the time, the result, and the specific secret they accessed. This is the audit trail that lets you answer the questions. Who accessed this secret, when, and from where? For that logging to be useful, it has to go somewhere. The logs need to be shipped to a destination where you can query them and where they are retained. For the period your compliance posture requires. In Azure, that destination is typically log analytics, which feeds your Microsoft Sentinel SIAM if you have one or your SIAM of choice via the appropriate connector. If you are not on Sentinel, you need to understand where the Key Vault Diagnostic Logs are going and verify that they are actually arriving. Your question for your implementer. Where are Key Vault Diagnostic Logs shipping and can you show me a sample query that demonstrates we can answer the question of who accessed a specific secret in the last thirty days? If they cannot answer that question with a working query, the logging is not configured correctly or not going anywhere useful, and that is a gap. Two more things you need to understand as CISO before I give you the oversight checklist. First, soft delete and purge protection. By default in current Azure deployments, soft delete is enabled on Key Vault. This means that when a secret or a vault is deleted, it is not immediately gone. It is moved to a soft deleted state and retained for a configurable period, typically ninety days, during which it can be recovered. Purge protection goes one step further. It prevents anyone, including vault owners, from permanently deleting a secret or vault during the soft delete retention period. Even someone with owner level permissions cannot bypass this. For a CISO, purge protection is important because it protects you against a specific attack scenario. An attacker who gains elevated access and attempts to destroy evidence or sabotage operations by deleting secrets. It also protects against insider threat and against accidents. Your implementer should have purge protection enabled. Second, network access. By default, Azure Key Vault is accessible over the public internet with authentication. For most organizations, that is fine because authentication is the control. But for a security conscious organization that is building AI infrastructure, you should be thinking about whether Key Vault should be restricted to private network access only, as your private endpoints allow you to put a private IP address for Key Vault inside your virtual network. Traffic between your applications and Key Vault then travels over the private Microsoft Backbone network rather than the public internet. You can also configure the Key Vault firewall to deny all public internet access and allow only traffic from specific virtual networks or IP ranges. This is not strictly required, but it is defense in depth. If Key Vault is only accessible from within your private network, an attacker who compromises an API key to Key Vault still needs network access to your private network to use it. That is an additional layer of protection worth having for a production security deployment. Let me give you the CSO oversight checklist. These are the eight questions you should get answered before you sign off on this deployment. One, is Azure RBAC enabled as the Key Vault permission model, not Vault Access Policies? Azure RBAC? Two, are applications authenticating to Key Vault using managed identities? No service principal credentials stored anywhere. Three, is the principle of least privilege applied to every RBAC assignment? Each application has access only to the specific secrets it needs, not to the entire vault. Four, is soft delete enabled with a retention period of at least ninety days and is purge protection on? Five. Can the implementer demonstrate a working query? six, is there a network access control policy? Whether that is private endpoint, firewall rules, or documented decision to use public access with authentication only? Seven, are secrets versioned and is there a documented rotation plan? For AI service credentials specifically, what is the process when a key needs to be rotated? How long does it take and is it tested? eight, who are the key vault contributors and owners in Azure RBAC? This list should be small, documented and reviewed. The principle of least privilege applies to the humans who manage the vault as much as to the applications that consume it. Get written answers to those eight questions. If any of them produce vague or incomplete answers, that is where to focus your follow up. One last thing before we close part one, and this is the thing I want you to sit with. The reason you are building this is not just to secure your AI credentials. It is to build the habit and the infrastructure of secrets management across your organization. AI credentials are the trigger because AI is new, the API keys are powerful, and nobody has a mature process for them yet. But the vault you are building today is the vault you will use for database connection strings, for service account passwords, for encryption keys, for certificate management. You are building the foundation. That means the decisions your implementer makes today about access control models, about logging destinations, about network access, about naming an organizational structure are decisions that will be harder to change later when the vault is full and a dozen applications depend on it. The right time to get the architecture right is before you have built everything on top of it. Part two of this series goes into that architecture in detail. We will cover the specific configuration decisions, the naming conventions that matter, the RBAC role assignments you need, how to structure secrets for the AI credential use case and then expand it, what secret rotation looks like in practice, and the common misconfigurations that show up in key vault deployments that have been done quickly rather than carefully. If you're the implementer listening to this, part two is for you. If you're the CISO, part two will make you a more informed reviewer of the work your team brings you. This is Dark Perimeter. I'm Cole Draden. Stay sharp.