Skip to main content
NewSimplified cost tracking for startups.Start now
Back to Guides
GuidesFeatured

The Complete Guide to Cloud Cost Optimization

A comprehensive, step-by-step guide to optimizing your cloud spending across AWS, GCP, Azure, and OCI while maintaining performance and reliability.

Sarah Chen
Feb 1, 202620 min read
The Complete Guide to Cloud Cost Optimization

Chapter 1: Understanding Your Cloud Bill

Cloud cost optimization begins with understanding what you are paying for. Most organizations find that their cloud bills are far more complex than expected -- a single AWS invoice can contain thousands of line items across dozens of services, each with its own pricing dimensions including compute hours, data transfer, storage volume, API calls, and provisioned capacity.

Before you can optimize, you need to answer three fundamental questions: Where is the money going? Who is responsible for the spending? And is the spending delivering proportional business value?

Billing Data Structure

Each cloud provider structures billing data differently, which makes multi-cloud cost analysis particularly challenging:

  • AWS provides Cost and Usage Reports (CUR) with 100+ columns per line item, exported to S3. Key fields include lineItem/UsageType, lineItem/BlendedCost, and product/region.
  • GCP exports billing data to BigQuery with fields like cost, usage.amount, service.description, and project.id. GCP uniquely includes credit and discount line items inline.
  • Azure provides cost data through the Cost Management API or exported CSV/Parquet files. Fields include CostInBillingCurrency, MeterCategory, and ResourceGroup.
  • OCI provides usage reports with cost/computedAmount, product/service, and usage/consumedQuantity.

The structural differences between providers make it nearly impossible to compare costs side-by-side using raw billing data. This is where standardization becomes essential.

The FOCUS 1.3 Standard

The FinOps Open Cost and Usage Specification (FOCUS) 1.3 provides a vendor-neutral schema for cloud billing data. By converting all provider billing data into FOCUS format, you can analyze multi-cloud costs using a single set of dimensions and metrics.

Key FOCUS 1.3 columns used in cost analysis:

FOCUS ColumnDescriptionExample
BilledCostAmount charged by the provider$1,234.56
EffectiveCostCost after amortized discounts$987.65
ListCostOn-demand price (no discounts)$1,500.00
ProviderCloud provider nameAWS, GCP, Azure
ServiceNameService or productAmazon EC2, Cloud Storage
ServiceCategoryHigh-level categoryCompute, Storage, Network
RegionDeployment regionus-east-1, europe-west1
ResourceTypeResource categoryVirtual Machine, Object Storage
ChargeCategoryType of chargeUsage, Purchase, Tax

CloudAct.ai Approach: CloudAct.ai automatically converts raw billing data from all supported providers into FOCUS 1.3 format through its pipeline service. This normalization happens during ingestion, so every query against the Semantic Data Layer returns standardized data regardless of the source provider.

Common Billing Pitfalls

Watch out for these frequently overlooked cost drivers:

  1. Data transfer charges: Cross-region and cross-AZ data transfer can account for 10-15% of total cloud spend. Many teams overlook these because they focus on compute and storage.
  2. Orphaned resources: Load balancers without targets, unattached EBS volumes, idle NAT gateways, and unused Elastic IPs silently accumulate charges.
  3. Over-provisioned databases: RDS and Cloud SQL instances are frequently provisioned for peak load and left at that size permanently.
  4. Logging and monitoring costs: CloudWatch, Cloud Logging, and Azure Monitor charges grow with application scale and are rarely reviewed.
  5. Snapshot accumulation: EBS snapshots and disk snapshots pile up over time. A single snapshot costs pennies, but thousands of forgotten snapshots add up to hundreds monthly.

Chapter 2: Right-Sizing and Resource Optimization

Right-sizing is the process of matching resource allocations to actual workload requirements. It is consistently the highest-impact optimization available -- most organizations can reduce compute costs by 20-40% through right-sizing alone, because default instance selections tend to be significantly larger than what workloads actually need.

Identifying Idle Resources

Start with the lowest-hanging fruit: resources that are running but not being used at all. Common candidates include:

  • Development and staging environments left running 24/7 when they are only used during business hours (potential 65% savings by scheduling)
  • Forgotten proof-of-concept resources from experiments that concluded months ago
  • Load balancers with no healthy targets behind them
  • Databases with zero active connections over the past 30 days
  • Kubernetes nodes with minimal pod scheduling (node utilization below 10%)

Rule of thumb: If a compute resource averages less than 5% CPU utilization over 14 days with no significant memory or network usage, it is a strong candidate for termination or consolidation.

Right-Sizing Compute Instances

For resources that are in use but over-provisioned, right-sizing recommendations follow a systematic approach:

  1. Collect utilization metrics: Gather at least 14 days of CPU, memory, network, and disk I/O metrics. Shorter windows miss weekly patterns.
  2. Identify peak utilization: Look at the P95 (95th percentile) utilization -- this represents the near-peak load your instance needs to handle.
  3. Target 60-70% peak utilization: Select an instance size where your P95 utilization falls in the 60-70% range. This provides headroom for traffic spikes while avoiding significant over-provisioning.
  4. Consider instance families: Sometimes the right move is not just smaller, but a different family. Compute-optimized instances (C-series) cost less than general-purpose (M-series) for CPU-bound workloads.
  5. Implement gradually: Right-size in stages. Drop one instance size at a time and monitor for 48 hours before making further changes.
# AWS example: Get CPU utilization statistics for right-sizing analysis
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --start-time 2026-01-01T00:00:00Z \
  --end-time 2026-02-01T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum p95

Storage Optimization

Storage costs are often overlooked because individual objects are cheap, but aggregate storage costs at scale can rival compute spending:

  • Lifecycle policies: Automatically transition objects from standard to infrequent access to archive tiers based on access patterns. A well-tuned lifecycle policy can reduce storage costs by 60-80%.
  • Compression: Compress data before storage. Parquet and ORC formats for analytics data are 3-5x smaller than CSV or JSON equivalents.
  • Deduplication: Identify and eliminate duplicate data across storage buckets. This is especially common in data lake architectures where pipelines may produce redundant copies.
  • Snapshot management: Implement automated snapshot expiry. Keep daily snapshots for 7 days, weekly for 4 weeks, and monthly for 12 months -- then delete everything older.

Chapter 3: Commitment-Based Discounts

After right-sizing your resources, the next major optimization lever is commitment-based discounts. By committing to a certain level of usage over 1-3 years, you can save 30-72% compared to on-demand pricing. The trade-off is reduced flexibility -- you are paying for capacity whether you use it or not.

Reserved Instances and Savings Plans

AWS offers two main commitment vehicles:

  • Savings Plans: Commit to a consistent dollar amount of compute usage per hour. Flexible across instance families, sizes, OS, and regions (for Compute Savings Plans). Offers 20-66% savings.
  • Reserved Instances: Commit to specific instance types in specific regions. Less flexible but can offer slightly deeper discounts. Standard RIs offer up to 72% savings on 3-year all-upfront terms.

GCP provides Committed Use Discounts (CUDs) -- commit to minimum resource levels for 1 or 3 years for 28-55% savings. Resource-based CUDs apply to specific machine types; spend-based CUDs are more flexible.

Azure offers Reserved Instances across VMs, databases, storage, and more. Azure Reservations provide up to 72% savings with 3-year terms, and Savings Plans for compute provide flexibility similar to AWS.

Committed Use Discounts (GCP)

GCP's CUD model is worth special attention because of its simplicity. You commit to a minimum number of vCPUs and memory in a region, and every resource in that region that matches automatically receives the discount. There is no instance-level mapping required.

Best practices for GCP CUDs:

  1. Analyze 90 days of usage to identify your baseline (minimum consistent usage)
  2. Commit only to 70-80% of your baseline to account for potential optimization or reduction
  3. Start with 1-year commitments to limit risk, then graduate to 3-year for proven workloads
  4. Use resource-based CUDs for stable workloads and spend-based for variable ones

Spot and Preemptible Instances

For fault-tolerant, stateless workloads, spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer 60-90% savings over on-demand pricing. The catch: the provider can reclaim these instances with little notice.

Good candidates for spot/preemptible instances:

  • Batch processing and data pipeline workloads
  • CI/CD build runners
  • Stateless web servers behind auto-scaling groups
  • Machine learning training jobs with checkpointing
  • Development and testing environments

Warning: Never run stateful databases, single-instance critical services, or workloads without checkpointing on spot instances. The savings are not worth the operational risk.

Chapter 4: Tagging and Cost Allocation

Tagging is the foundation of cost accountability. Without a consistent tagging strategy, you cannot attribute costs to teams, projects, or applications -- and if you cannot attribute costs, you cannot hold anyone accountable for optimization.

Designing a Tagging Strategy

An effective tagging strategy needs to be simple enough for developers to follow consistently, yet rich enough to support meaningful cost analysis. We recommend starting with these essential tags:

Tag KeyDescriptionExample ValuesRequired
environmentDeployment stageproduction, staging, developmentYes
teamOwning teamplatform, data-eng, ml-opsYes
applicationApplication nameapi-service, pipeline-workerYes
cost-centerBusiness unitengineering, marketing, salesYes
projectProject or initiativeq1-migration, cost-optimizerNo
managed-byProvisioning methodterraform, manual, helmNo

Enforce tagging through infrastructure-as-code policies. Both Terraform and Pulumi support required tag validation. AWS Organizations supports tag policies that prevent resource creation without required tags. GCP uses organization policies with label constraints.

Cost Allocation Hierarchy

Tags alone are not enough -- you need a hierarchy that maps cloud resources to business structure. CloudAct.ai uses a four-level hierarchy model:

Organization
  +-- Department (C-Suite / DEPT-*)
        +-- Business Unit (PROJ-*)
              +-- Function / Team (TEAM-*)

This hierarchy enables cost roll-ups at every level: you can see total spend for the entire organization, drill into a department's costs, examine a specific business unit, or zoom into an individual team's resource consumption. The hierarchy is maintained in CloudAct.ai and automatically applied to all cost data during analysis.

When resources are tagged with team identifiers that map to this hierarchy, every dollar of cloud spend can be attributed to a responsible business owner. Untagged resources are flagged in a separate "unallocated" bucket, creating natural pressure to improve tagging compliance.

Chapter 5: Continuous Optimization with CloudAct.ai

Cost optimization is not a project with an end date -- it is a continuous practice. Cloud environments are dynamic: teams spin up new resources daily, pricing changes quarterly, and business requirements evolve constantly. Without ongoing governance, cost optimizations decay within 3-6 months as new waste accumulates.

Unified Multi-Cloud Visibility

CloudAct.ai provides a single pane of glass for all your cloud, GenAI, and SaaS costs. By ingesting billing data from AWS, GCP, Azure, OCI, and GenAI providers (OpenAI, Anthropic, Google, DeepSeek, and more), CloudAct.ai normalizes everything into FOCUS 1.3 format and presents it through its Semantic Data Layer.

Key visibility features:

  • Multi-cloud dashboard: See total spend across all providers with drill-down by service, region, team, and time period
  • Cost trends: Track spending over time with configurable granularity (daily, weekly, monthly)
  • Provider comparison: Compare costs for equivalent services across providers
  • Currency normalization: View all costs in your organization's preferred currency using daily exchange rates (20 currencies supported)
  • Anomaly detection: Automatic identification of unusual spending patterns with configurable alert thresholds

Automated Recommendations

CloudAct.ai's AI assistant, ELSA, analyzes your cost data and provides actionable optimization recommendations. ELSA can identify:

  • Resources that are candidates for right-sizing based on utilization patterns
  • Opportunities for commitment-based discounts based on stable usage baselines
  • Idle resources that can be terminated or scheduled
  • Tagging gaps that prevent accurate cost allocation
  • GenAI model substitution opportunities (using cheaper models for simple tasks)

ELSA operates within strict multi-tenant isolation boundaries -- it can only access and analyze data for the organization you are logged into, enforced at the query level through parameterized org_slug binding.

Budget Governance

Setting budgets and alerts creates accountability and prevents surprise bills. CloudAct.ai supports budget management at every level of the hierarchy:

  1. Create budgets for departments, business units, or teams with monthly or quarterly periods
  2. Configure alert thresholds at 50%, 70%, 90%, and 100% of budget
  3. Route notifications to email, Slack, or webhook endpoints
  4. Track burn rate to predict whether you will exceed budget before the period ends
  5. Review historical performance to identify teams that consistently over- or under-spend

Getting started: Sign up for CloudAct.ai, connect your cloud billing accounts, and within minutes you will have unified visibility across all your providers. The platform runs cost pipelines automatically on a daily schedule, keeping your dashboards current with minimal setup effort. Start with visibility, add budgets, and build from there.

Share

About the Author

Sarah Chen

VP of Engineering at CloudAct.ai

Sarah leads the engineering team at CloudAct.ai, specializing in cloud cost optimization and FinOps. With 15 years of experience building data platforms at scale, she brings deep expertise in multi-cloud architectures and cost governance.

Stay Updated

Get the latest cloud cost optimization insights delivered to your inbox.

Start Optimizing

Ready to Cut Cloud Costs?

Put these insights into action with CloudAct.ai's unified cost platform.