Last-Minute Revision for the SAP-C02 exam — Data Storage (Post 4/7)

– AWS Storage, Databases, and Migration notes

Welcome to the 4th post in our seven-part revision series for the AWS Certified Solutions Architect – Professional (SAP-C02) exam.

This post focuses on the backbone of data-heavy architectures: AWS storage, databases, analytics, and migration services. These topics are rarely tested in isolation. In SAP-C02, they usually appear inside cost, durability, performance, and recovery scenarios where small service limitations make or break the correct answer. The goal here is to reinforce how AWS data services behave under real architectural constraints — so you can recognise the right design quickly, not just the familiar service name.

Static Storage & Object Services

Amazon S3 — Core Behaviour & Limits

You can’t use a bucket policy to prevent deletions or transitions by an S3 Lifecycle rule. For example, even if your bucket policy denies all actions for all principals, your S3 Lifecycle configuration still functions as normal.
After you create an Amazon S3 bucket, up to 24 hours can pass before the bucket name propagates across all AWS Regions.
There are no S3 data transfer charges when data is transferred into AWS from the internet.
If the transfer doesn’t use the accelerator, then the end user does not need to pay any transfer charges for the image upload.
An application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
You can configure Object Lock only when you first create a bucket.

S3 Networking & Connectivity

You must use a public virtual interface (VIF) from AWS Direct Connect to access an S3 bucket. Private VIFs only work with resources using private IPs in a VPC.
Gateway attachments are used with Transit Gateways, not S3 endpoints.
Gateway endpoints support IPv4 only.
You can’t use the aws:SourceIp condition for requests that traverse an S3 VPC endpoint. Use aws:VpcSourceIp instead, or control access via route tables.

S3 Access Control Patterns

Cross-account IAM roles can be used for programmatic and console access to S3 objects.

Preventing Uploads of Unencrypted Objects

To encrypt an object at upload time, include the x-amz-server-side-encryption header in the request.

Supported values:

AES256 — SSE-S3 (S3-managed keys)
aws:kms — SSE-KMS (KMS-managed keys)

Example:

PUT /example-object HTTP/1.1
Host: myBucket.s3.amazonaws.com
Date: Wed, 8 Jun 2016 17:50:00 GMT
Authorization: authorization string
Content-Type: text/plain
Content-Length: 11434
x-amz-meta-author: Janet
Expect: 100-continue
x-amz-server-side-encryption: AES256

[11434 bytes of object data]

To enforce encryption, use a bucket policy that denies PutObject requests without this header.

Reference:
https://aws.amazon.com/blogs/security/how-to-prevent-uploads-of-unencrypted-objects-to-amazon-s3/

CloudFront with S3 Website Endpoints

To avoid Access Denied errors when CloudFront uses an S3 website endpoint:

Objects must be publicly accessible.
Objects cannot be encrypted with AWS KMS.
Bucket policy must allow s3:GetObject.
Bucket owner must also own the objects.
S3 Block Public Access must be disabled.
If Requester Pays is enabled, include request-payer.
Validate any Referer header restrictions.
Objects must exist in the bucket.

S3 Intelligent-Tiering

Access tiers:

Frequent Access (automatic)
Infrequent Access — no access for 30 days
Archive Instant Access — no access for 90 days
Archive Access (optional)

Same performance as Glacier Flexible Retrieval
3–5 hour retrieval
Restore starts within minutes via Batch Operations
1. Deep Archive Access (optional)
No access for 180 days
Same performance as Glacier Deep Archive

Notes:

Objects smaller than 128 KB are not eligible for auto-tiering.
Days-without-access thresholds are configurable.

S3 Access Points

Key features:

Dedicated hostname, ARN, and IAM policy
Block Public Access is enabled by default
Scoped to an account and a region
Custom IAM permissions per application or user
Prefix-level access control
Can restrict access to a specific VPC

Set up with VPC endpoint:

IAM Access Analyzer for S3

IAM Access Analyzer identifies external access to S3 buckets, including:

Public access
Cross-account access (including outside the organisation)
Reviewing bucket access using IAM Access Analyzer for S3

Storage Gateway

Storage Gateway does not auto-refresh cache when objects are uploaded directly to S3.
You must run RefreshCache to see changes.

Glacier

A Glacier vault is a container for archives.
Vault Lock policies enable compliance controls like WORM and can be locked permanently.
Glacier Select cannot be used on compressed data.

File Storage Services

Amazon EFS

Fully managed, elastic NFS file system
Scales to petabytes automatically
Low, consistent latency with throughput exceeding 10 GB/s
Regional service

Connectivity:

EC2 across AZs, Regions, and VPCs
Inter-Region VPC peering
On-premises via Direct Connect or VPN

Notes:

General Purpose mode prioritises latency over throughput

Amazon FSx

Capacity expansion process:

FSx for Lustre with S3

S3 objects appear as files
Imports existing and new S3 objects
Supports lazy loading
Writes data back to S3
Data repository tasks simplify sync

Recommended pattern:

Use S3 Intelligent-Tiering for long-term storage, FSx for Lustre for temporary shared high-performance workloads, and delete FSx when the job completes.

Block Storage

Amazon EBS

Max IOPS: 256K (io2 Block Express)
Baseline IOPS (gp2): 3 IOPS per GiB

gp2 Performance	Details
Latency	Single-digit ms
Burst	Up to 3,000 IOPS
Max baseline	16,000 IOPS

Scaling strategy: increase volume size to increase baseline IOPS.

Instance Store

SSD-based instance store supports >1M IOPS for random reads
Higher performance than EBS, but ephemeral

Databases

Amazon RDS

Oracle RAC is not supported in RDS
RMAN, Multitenant, Unified Auditing, Database Vault not supported
RAC must run on EC2

Replication:

Multi-AZ — synchronous
Read Replicas — asynchronous (AZ, cross-AZ, cross-Region)

Encryption:

In transit — SSL / native encryption
At rest — must be enabled at creation time
TDE is supported for SQL Server Enterprise and Oracle Enterprise
MySQL requires application-managed encryption

Aurora

Six-way synchronous storage replication across AZs
Aurora Replicas can be failover targets
Engine upgrades happen simultaneously
Global Database latency < 1s, promotion < 1 min

RDS Proxy

Handles connection management during failover

DynamoDB

Max item size: 400 KB
Supports Web Identity Federation and fine-grained access
TTL enables automatic item expiration
TTL deletes items without consuming write capacity

Use cases:

Expiring user or sensor data
Archiving via Streams + Lambda
Regulatory retention

Analytics & Warehousing

Amazon Redshift

Automated snapshots every 8 hours or 5 GB/node
Default retention: 1 day
Cross-region backup via snapshot copy (not replication)
Snapshots stored in S3 using SSL

Key points:

OLAP, not OLTP
Requires KMS snapshot copy grants for encrypted cross-region copies

![][image12]

Redshift Spectrum

Query data directly in S3
Serverless, pay-per-query
Similar to Athena, but integrated with Redshift
Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required

In-Memory Databases

Amazon MemoryDB

Durable, in-memory Redis-compatible database
Multi-AZ transaction log
Microsecond read latency
More durable than ElastiCache

Time Series

Amazon Timestream

Purpose-built for time series analytics
Memory tier for recent data
Cost-optimised storage for historical data
Serverless and auto-scaling

Integrations:

IoT Core, Kinesis, MSK
QuickSight, Grafana
SageMaker

Data Transfer & Migration

Snow Family

Snowmobile

Up to 100 PB per device
Best for datasets >10 PB
Can import directly into Glacier

Snowball

Up to 80 TB usable storage
Data copied to S3, then lifecycle to Glacier
Cannot copy directly into Glacier

Performance tips (best to worst):

Latest client
Batch small files
Parallel copies
Multiple workstations
Copy directories

Migration Strategies (6 Rs)

Rehost
Replatform
Repurchase
Refactor
Retire
Retain

Application Discovery Service

Collects server configuration, usage, dependencies
Exports CSV for TCO analysis
Integrated with Migration Hub

Migration Hub

Central migration tracking
Home Region required
Data exportable to Athena and QuickSight

AWS DataSync

Accelerated online data transfer
Up to 10× faster than CLI tools
Single agent can saturate 10 Gbps

AWS DMS

Data validation compares source and target
Supports full load and CDC
Validation consumes additional resources

AWS DMS data validation

What’s Next

In the next post, we shift away from data and into the plumbing that makes everything work. Post 5 focuses on hybrid connectivity, VPC networking, and traffic inspection services — the layers that sit between identity and applications. These topics show up relentlessly in SAP-C02 scenarios, often buried inside long questions where understanding traffic flow, trust boundaries, and service limits matters more than memorising features. If you can reason about how data actually moves through AWS, you’ll spot the correct architecture under exam pressure far faster.

Catch up on the series:

Last-Minute Revision for the Solutions Architect – Professional exam – Introduction
AWS Organisations
Policies and Encryptions
Data Storage – this post
Networking
Serverless
Final