Cloud security moved fast in 2024. In 2025 it is moving faster, driven by pervasive SaaS, platform engineering, AI adoption, and an increasingly capable threat landscape. The cost of getting it wrong has never been higher. According to the IBM Cost of a Data Breach 2024 report, average breach costs rose again globally, underscoring why cloud controls must be measurable, automated, and continuously verified. This field-tested checklist is designed for CTOs, CISOs, heads of platform, and security engineering teams who want a pragmatic way to harden cloud environments without slowing delivery.

How to use this checklist
- Prioritise risk. Start with high-value data, internet-facing workloads, and production environments.
- Automate evidence. Bake verification into CI/CD and platform guardrails so controls stay enforced.
- Measure outcomes. Track a small set of security SLOs, for example mean time to detect and restore, patch latency, and percentage of resources covered by critical controls.
1) Governance and shared responsibility
Cloud providers secure the cloud, your team secures what runs in it. Establish a clear operating model and guardrails that teams cannot accidentally bypass. Treat your landing zone, accounts or subscriptions, policies, and identity boundary as first-class platform products.
Minimum checks for 2025:
- Operate a multi-account or multi-subscription model with environment separation and least-privilege access between them.
- Enforce policy as code, for example AWS SCPs, Azure Policy, or GCP Org Policy, version controlled and tested.
- Standardise tagging, budgets, and cost alerts to align FinOps and security from day one; see our guidance on cost optimisation with governance.
2) Identity and access management, the heart of zero trust
Compromised credentials are still the fastest path to compromise. Centralise identity, remove long-lived secrets, and make privilege time-bound and auditable. Adopt device and risk-aware access where possible.
Minimum checks for 2025:
- Enforce SSO with MFA for all humans and strong workload identity for services, eliminate access keys at rest where possible.
- Implement least privilege with role-based and, where feasible, attribute-based access control, and enable just-in-time elevation.
- Review and remove dormant identities and permissions continuously; align with NCSC guidance on Zero Trust.
3) Network security and connectivity
Modern architectures reduce implicit trust. Prefer private-by-default networks, explicit egress controls, and managed ingress protections. Avoid exposing management planes to the internet.
Minimum checks for 2025:
- Use private subnets, service endpoints or PrivateLink equivalents, and restrict egress with egress gateways and DNS policies.
- Protect internet-facing services with a WAF, rate limiting, bot defences, and DDoS protections.
- Replace bastion hosts with cloud-native session managers and audit every administrative session.
4) Data protection and key management
Data classification, encryption, and robust key management underpin compliance and trust. Aim for encryption everywhere, managed by a strict key lifecycle.
Minimum checks for 2025:
- Encrypt all data at rest and in transit, including backups and logs, using cloud KMS or HSM-backed keys with rotation policies.
- Separate duties for key administrators and data owners, and consider BYOK or HYOK for the most sensitive data.
- Begin post-quantum readiness by inventorying cryptography, prioritising critical paths, and planning hybrid key exchange transitions.
5) Secure build and software supply chain
Supply-chain attacks target your pipelines and dependencies. Secure the developer experience without reducing velocity, and make provenance inspectable.
Minimum checks for 2025:
- Enforce branch protection, mandatory code review, and signed commits and artefacts.
- Generate SBOMs for builds, scan dependencies and containers pre-deploy, and quarantine vulnerable artefacts.
- Isolate build runners and secrets, and store artefacts in private registries with immutable tags and retention policies.
6) Workload hardening, containers and Kubernetes
Harden base images and nodes, constrain runtime privileges, and lock down cluster control planes. Policy as code is your friend.
Minimum checks for 2025:
- Apply CIS Benchmarks to OS, Kubernetes, and managed services, see CIS Benchmarks.
- Run containers as non-root with read-only filesystems and drop unnecessary capabilities; build from minimal images, here are tips to ship secure Docker images.
- Enforce Kubernetes NetworkPolicies, strong RBAC, admission controls for image provenance, and runtime threat detection.
7) Secrets management and configuration
Secrets in code or environment variables are still common findings. Centralise secrets, rotate them automatically, and adopt short-lived credentials.
Minimum checks for 2025:
- Store secrets in a dedicated secrets manager and encrypt application configuration; never commit secrets to Git.
- Integrate secret scanning into CI and pre-commit checks and revoke leaked credentials instantly.
- Use external secrets in Kubernetes and dynamic credentials for databases and queues.
8) Observability, logging and threat detection
You cannot defend what you cannot see. Centralise telemetry, correlate signals, and route actionable alerts to on-call.
Minimum checks for 2025:
- Ship audit logs, platform logs, metrics, traces, and security findings to a central plane with long-term retention.
- Enable managed threat detection services and integrate with your SIEM and SOAR.
- Maintain service-level and security runbooks and test alerting paths; see our multi-level monitoring approach.
9) Vulnerability and misconfiguration management
Attackers chain minor misconfigurations into major incidents. Treat vulnerability management as continuous hygiene across hosts, containers, serverless, and PaaS.
Minimum checks for 2025:
- Continuously scan for CVEs in images and hosts and apply patch SLAs based on severity and exploitability.
- Run CSPM and CIEM to detect misconfigurations and privilege issues across accounts and regions.
- Validate internet exposure, public data access, and overly permissive network or identity policies.
10) Resilience, backup and disaster recovery
Security includes availability. Design for failure, protect backups from tampering, and rehearse recovery so you can restore under pressure.
Minimum checks for 2025:
- Backups are immutable, encrypted, cross-account and cross-region, with regular automated restore tests.
- Define RTO and RPO per service and align architecture patterns accordingly, for example warm standby versus active-active.
- Run incident game days and chaos experiments; for AWS patterns and trade-offs see our guide to resilience and disaster recovery on AWS.
11) AI and SaaS security controls
Generative AI and pervasive SaaS amplify productivity and risk. Treat prompts and training data as sensitive inputs, and control what leaves your boundary.
Minimum checks for 2025:
- Classify data flowing into SaaS and AI platforms, disable training on your tenant by default, and implement data loss prevention where feasible.
- Apply prompt and output filtering, protect retrieval pipelines, and red-team applications using the OWASP Top 10 for LLM Applications.
- If you are scaling AI adoption, consider specialist support for safe-by-design adoption and productivity gains via AI audits and enablement.
12) Compliance automation and continuous assurance
Manual evidence gathering does not scale. Map controls to frameworks, automate evidence collection, and keep an audit-ready trail, all year round.
Minimum checks for 2025:
- Map platform guardrails to ISO 27001, SOC 2, GDPR and sector frameworks using the Cloud Security Alliance Cloud Controls Matrix.
- Archive audit logs for a minimum of one year, protect integrity, and restrict access to compliance evidence stores.
- Document exceptions, time-box risk acceptance, and record compensating controls; see our journey to ISO 27001 certification.
Quick verification guide
| Domain | What good looks like | How to verify |
|---|---|---|
| Identity | SSO with MFA, no long-lived access keys, least privilege and JIT elevation | Sample 50 users and 50 roles, check MFA enforcement and last-used keys, scan for inline policies and admin wildcards |
| Network | Private-by-default, controlled egress, WAF on every public endpoint | Enumerate public IPs and open ports, validate egress policies and WAF coverage |
| Data | KMS-backed encryption, key rotation, separated duties | List storage without encryption, review key rotation ages, check key admin vs data owner separation |
| Workloads | Hardened images, non-root containers, NetworkPolicies | Inspect running pods for root and capabilities, check CIS benchmark findings |
| Supply chain | SBOM, signed artefacts, dependency and image scanning | Confirm signed releases, review pipeline scan results and blocking policies |
| Secrets | Central secrets manager, short-lived credentials, secret scanning | Search repos for secrets, verify rotation policies and external secrets usage |
| Detection | Centralised logs, managed threat detection, alert runbooks | Trace an incident ID across logs, check alert routing and recent game-day outcomes |
| Resilience | Immutable cross-account backups, tested restores, defined RTO/RPO | Review restore test evidence, cross-region backup configuration, and RTO/RPO metrics |
A pragmatic 90-day plan
Start with visibility, then enforce guardrails, then rehearse.
- Days 0 to 30, baseline and risk: inventory internet exposure, privileged identities, unencrypted data stores, and backup posture. Enable central logging and managed detections if missing.
- Days 31 to 60, enforce controls: roll out SSO and MFA everywhere, block public storage by policy, enforce image signing and pre-deploy scans, and implement immutable cross-account backups.
- Days 61 to 90, prove it works: run an access key compromise drill, a backup-restore exercise, and a critical patch race. Capture metrics and adjust SLAs and runbooks.
Evidence that controls are working
Focus on a short set of leading indicators that both engineers and executives can trust.
- Identity, percentage of human and machine identities with MFA or short-lived credentials, number of high-privilege roles with JIT enabled.
- Data, percentage of storage encrypted with customer-managed keys, mean key age and rotation cadence, unauthorised public data exposure incidents.
- Vulnerabilities, median patch latency by severity, percentage of images failing admission due to known CVEs.
- Detection and response, mean time to detect, mean time to contain, successful alert acknowledgement within SLO.
- Resilience, verified restore success rate, average time to restore, drift between declared RTO/RPO and observed recovery.
What’s different about 2025
- Zero trust is moving from slideware to policy as code. Identity-centric controls, device posture, and continuous authorisation are becoming default expectations.
- CNAPP is consolidating CSPM, CIEM and CWPP into unified risk views. Use the convergence to simplify processes, not to rubber-stamp exceptions.
- AI and LLM usage requires explicit guardrails. Build a register of AI use cases, sensitive data flows, and controls before scale-out.
- Post-quantum readiness is now a roadmap item. No panic, just inventory, prioritise, and plan.
Where to go next
If you want a deeper dive on specific domains, these resources may help:
- Cloud resilience patterns and trade-offs, Designing Resilient Cloud Infrastructure on AWS
- Monitoring and alerting strategy, Tasrie IT Services: Ensuring Peak Performance with Multi-Level Monitoring
- Governance with cost and security alignment, AWS Cloud Cost Optimisation: A Practical Guide
Tasrie IT Services helps teams ship faster and safer with cloud-native security, DevSecOps, Kubernetes hardening, CI/CD automation, observability, and compliance enablement. If you want an expert-led assessment mapped to this checklist, measurable outcomes, and implementation support that respects your delivery cadence, we are ready to help.