Last-Minute Revision for the SAP-C02 exam — Data Storage (Post 4/7)

– AWS Storage, Databases, and Migration notes

Welcome to the 4th post in our seven-part revision series for the AWS Certified Solutions Architect – Professional (SAP-C02) exam.


This post focuses on the backbone of data-heavy architectures: AWS storage, databases, analytics, and migration services. These topics are rarely tested in isolation. In SAP-C02, they usually appear inside cost, durability, performance, and recovery scenarios where small service limitations make or break the correct answer. The goal here is to reinforce how AWS data services behave under real architectural constraints — so you can recognise the right design quickly, not just the familiar service name.


Static Storage & Object Services

Amazon S3 — Core Behaviour & Limits

  • You can’t use a bucket policy to prevent deletions or transitions by an S3 Lifecycle rule. For example, even if your bucket policy denies all actions for all principals, your S3 Lifecycle configuration still functions as normal.
  • After you create an Amazon S3 bucket, up to 24 hours can pass before the bucket name propagates across all AWS Regions.
  • There are no S3 data transfer charges when data is transferred into AWS from the internet.
  • If the transfer doesn’t use the accelerator, then the end user does not need to pay any transfer charges for the image upload.
  • An application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
  • You can configure Object Lock only when you first create a bucket.

S3 Networking & Connectivity

  • You must use a public virtual interface (VIF) from AWS Direct Connect to access an S3 bucket. Private VIFs only work with resources using private IPs in a VPC.
  • Gateway attachments are used with Transit Gateways, not S3 endpoints.
  • Gateway endpoints support IPv4 only.
  • You can’t use the aws:SourceIp condition for requests that traverse an S3 VPC endpoint. Use aws:VpcSourceIp instead, or control access via route tables.

S3 Access Control Patterns

  • Cross-account IAM roles can be used for programmatic and console access to S3 objects.

Preventing Uploads of Unencrypted Objects

To encrypt an object at upload time, include the x-amz-server-side-encryption header in the request.

Supported values:

  • AES256 — SSE-S3 (S3-managed keys)
  • aws:kms — SSE-KMS (KMS-managed keys)

Example:

PUT /example-object HTTP/1.1
Host: myBucket.s3.amazonaws.com
Date: Wed, 8 Jun 2016 17:50:00 GMT
Authorization: authorization string
Content-Type: text/plain
Content-Length: 11434
x-amz-meta-author: Janet
Expect: 100-continue
x-amz-server-side-encryption: AES256

[11434 bytes of object data]

To enforce encryption, use a bucket policy that denies PutObject requests without this header.

Reference:
https://aws.amazon.com/blogs/security/how-to-prevent-uploads-of-unencrypted-objects-to-amazon-s3/

CloudFront with S3 Website Endpoints

To avoid Access Denied errors when CloudFront uses an S3 website endpoint:

  • Objects must be publicly accessible.
  • Objects cannot be encrypted with AWS KMS.
  • Bucket policy must allow s3:GetObject.
  • Bucket owner must also own the objects.
  • S3 Block Public Access must be disabled.
  • If Requester Pays is enabled, include request-payer.
  • Validate any Referer header restrictions.
  • Objects must exist in the bucket.

S3 Intelligent-Tiering

Access tiers:

  1. Frequent Access (automatic)
  2. Infrequent Access — no access for 30 days
  3. Archive Instant Access — no access for 90 days
  4. Archive Access (optional)
  • Same performance as Glacier Flexible Retrieval
  • 3–5 hour retrieval
  • Restore starts within minutes via Batch Operations
    1. Deep Archive Access (optional)
  • No access for 180 days
  • Same performance as Glacier Deep Archive

Notes:

  • Objects smaller than 128 KB are not eligible for auto-tiering.
  • Days-without-access thresholds are configurable.

S3 Access Points

Key features:

  1. Dedicated hostname, ARN, and IAM policy
  2. Block Public Access is enabled by default
  3. Scoped to an account and a region
  4. Custom IAM permissions per application or user
  5. Prefix-level access control
  6. Can restrict access to a specific VPC

Set up with VPC endpoint:

IAM Access Analyzer for S3

IAM Access Analyzer identifies external access to S3 buckets, including:

Storage Gateway

  • Storage Gateway does not auto-refresh cache when objects are uploaded directly to S3.
  • You must run RefreshCache to see changes.

Glacier

  • A Glacier vault is a container for archives.
  • Vault Lock policies enable compliance controls like WORM and can be locked permanently.
  • Glacier Select cannot be used on compressed data.

File Storage Services

Amazon EFS

  • Fully managed, elastic NFS file system
  • Scales to petabytes automatically
  • Low, consistent latency with throughput exceeding 10 GB/s
  • Regional service

Connectivity:

  • EC2 across AZs, Regions, and VPCs
  • Inter-Region VPC peering
  • On-premises via Direct Connect or VPN

Notes:

  • General Purpose mode prioritises latency over throughput

Amazon FSx

Capacity expansion process:

FSx for Lustre with S3

  • S3 objects appear as files
  • Imports existing and new S3 objects
  • Supports lazy loading
  • Writes data back to S3
  • Data repository tasks simplify sync

Recommended pattern:

Use S3 Intelligent-Tiering for long-term storage, FSx for Lustre for temporary shared high-performance workloads, and delete FSx when the job completes.


Block Storage

Amazon EBS

  • Max IOPS: 256K (io2 Block Express)
  • Baseline IOPS (gp2): 3 IOPS per GiB
gp2 PerformanceDetails
LatencySingle-digit ms
BurstUp to 3,000 IOPS
Max baseline16,000 IOPS

Scaling strategy: increase volume size to increase baseline IOPS.

Instance Store

  • SSD-based instance store supports >1M IOPS for random reads
  • Higher performance than EBS, but ephemeral

Databases

Amazon RDS

  • Oracle RAC is not supported in RDS
  • RMAN, Multitenant, Unified Auditing, Database Vault not supported
  • RAC must run on EC2

Replication:

  • Multi-AZ — synchronous
  • Read Replicas — asynchronous (AZ, cross-AZ, cross-Region)

Encryption:

  • In transit — SSL / native encryption
  • At rest — must be enabled at creation time
  • TDE is supported for SQL Server Enterprise and Oracle Enterprise
  • MySQL requires application-managed encryption

Aurora

  • Six-way synchronous storage replication across AZs
  • Aurora Replicas can be failover targets
  • Engine upgrades happen simultaneously
  • Global Database latency < 1s, promotion < 1 min

RDS Proxy

  • Handles connection management during failover

DynamoDB

  • Max item size: 400 KB
  • Supports Web Identity Federation and fine-grained access
  • TTL enables automatic item expiration
  • TTL deletes items without consuming write capacity

Use cases:

  • Expiring user or sensor data
  • Archiving via Streams + Lambda
  • Regulatory retention

Analytics & Warehousing

Amazon Redshift

  • Automated snapshots every 8 hours or 5 GB/node
  • Default retention: 1 day
  • Cross-region backup via snapshot copy (not replication)
  • Snapshots stored in S3 using SSL

Key points:

  • OLAP, not OLTP
  • Requires KMS snapshot copy grants for encrypted cross-region copies


![][image12]

Redshift Spectrum


In-Memory Databases

Amazon MemoryDB

  • Durable, in-memory Redis-compatible database
  • Multi-AZ transaction log
  • Microsecond read latency
  • More durable than ElastiCache

Time Series

Amazon Timestream

  • Purpose-built for time series analytics
  • Memory tier for recent data
  • Cost-optimised storage for historical data
  • Serverless and auto-scaling

Integrations:

  • IoT Core, Kinesis, MSK
  • QuickSight, Grafana
  • SageMaker

Data Transfer & Migration

Snow Family

Snowmobile

  • Up to 100 PB per device
  • Best for datasets >10 PB
  • Can import directly into Glacier

Snowball

  • Up to 80 TB usable storage
  • Data copied to S3, then lifecycle to Glacier
  • Cannot copy directly into Glacier

Performance tips (best to worst):

  1. Latest client
  2. Batch small files
  3. Parallel copies
  4. Multiple workstations
  5. Copy directories

Migration Strategies (6 Rs)

  1. Rehost
  2. Replatform
  3. Repurchase
  4. Refactor
  5. Retire
  6. Retain

Application Discovery Service

  • Collects server configuration, usage, dependencies
  • Exports CSV for TCO analysis
  • Integrated with Migration Hub

Migration Hub

  • Central migration tracking
  • Home Region required
  • Data exportable to Athena and QuickSight

AWS DataSync

  • Accelerated online data transfer
  • Up to 10× faster than CLI tools
  • Single agent can saturate 10 Gbps

AWS DMS

  • Data validation compares source and target
  • Supports full load and CDC
  • Validation consumes additional resources

AWS DMS data validation

What’s Next

In the next post, we shift away from data and into the plumbing that makes everything work. Post 5 focuses on hybrid connectivity, VPC networking, and traffic inspection services — the layers that sit between identity and applications. These topics show up relentlessly in SAP-C02 scenarios, often buried inside long questions where understanding traffic flow, trust boundaries, and service limits matters more than memorising features. If you can reason about how data actually moves through AWS, you’ll spot the correct architecture under exam pressure far faster.

Catch up on the series:

  1. Last-Minute Revision for the Solutions Architect – Professional exam – Introduction
  2. AWS Organisations
  3. Policies and Encryptions
  4. Data Storage – this post
  5. Networking
  6. Serverless
  7. Final

I would like to hear your thoughts