AWS Certified Data Engineer Associate

Total progress

0%

High priority done

0/0

S3, Glue, Athena, Kinesis, Redshift, security.

Target readiness

Not ready

Ready when 2 practice exams ≥80% and weak topics closed.

Next action

Start Day 1

Auto-calculated from unchecked daily tasks.

Fastest realistic strategy

1. Do not watch passively.
Watch the Udemy lessons at 1.25–1.5x, then immediately answer 10–20 questions on the topic.

2. Build reflexes.
For every scenario, identify: source → latency → transformation → target → governance → monitoring.

3. Error log is mandatory.
Every wrong answer becomes a rule. Revisit the rule 24h later and during final review.

Exam focus by official weight

Domain	Weight	Fast-track implication
D1 Data Ingestion & Transformation	34%	Highest priority: Kinesis, Firehose, Glue, Lambda, EventBridge, Step Functions, DMS, formats.
D2 Data Store Management	26%	S3, Redshift, DynamoDB, Glue Catalog, partitions, lifecycle, schema evolution, Iceberg.
D3 Data Operations & Support	22%	Monitoring, troubleshooting, data quality, Athena/Redshift SQL, CloudWatch, CloudTrail, Glue failures.
D4 Security & Governance	18%	IAM, KMS, Lake Formation, Macie, Secrets Manager, audit logs, data sharing, sovereignty.

Master Tracker

Complete in this order. “High” means exam-heavy or frequently confusing; “Medium” means know service selection and key trade-offs; “Low” means skim unless practice exams expose weakness.

Done	Priority	Exam domain	Udemy course block / lessons	What to know for the exam	Confidence	Notes

Weekly / Daily Schedule

Aggressive 3-week plan. If a day is too heavy, move its review block to the weekend, but keep practice questions from Day 5 onwards.

AWS exam guide ↔ Udemy course mapping

AWS domain / task	Udemy sections to complete	Fast-track treatment
D1.1 Ingestion: streaming/batch, APIs, schedulers, events, Lambda from Kinesis, throttling, fan-in/out, replayability.	Analytics: Kinesis Streams, Firehose, MSAF/Flink, MSK; Migration: DMS/DataSync; App Integration: EventBridge, SQS/SNS; Storage: S3 event notifications.	Deep Memorize service-selection rules and failure modes.
D1.2 Transform/process: EMR, Glue, Lambda, Redshift, format conversion, cost/performance.	Analytics: Glue, EMR, Athena Spark; Compute: Lambda; Containers: ECS/EKS basics.	Deep Glue vs Lambda vs EMR vs Redshift decisions.
D1.3 Orchestration: MWAA, Step Functions, Glue workflows, EventBridge, notifications.	Application Integration: Step Functions, EventBridge, MWAA, SQS/SNS; Analytics: Glue Workflows/Bookmarks.	Deep Trigger, retry, DLQ, state machine, DAG.
D1.4 Programming concepts: Lambda tuning, SAM, IaC, CI/CD, Git, distributed computing.	Fundamentals: Git/SQL; Compute: Lambda & SAM; Developer Tools: CloudFormation/CDK/CodePipeline if present.	Selective Concepts only; no language syntax.
D2.1 Choose data store: Redshift, RDS, DynamoDB, EMR, Lake Formation, vectors, Iceberg.	Storage; Database; Analytics; GenAI/vector lessons if present.	Deep Data-store selection table.
D2.2 Cataloging: Glue Catalog, crawlers, Hive metastore, partitions, connections.	Analytics: Glue, Hive, Glue Catalog; Storage: S3 Tables/Iceberg.	Deep Catalog + partition sync + schema discovery.
D2.3 Lifecycle: S3 lifecycle, versioning, DynamoDB TTL, Redshift COPY/UNLOAD.	Storage: S3 lifecycle/versioning; Database: DynamoDB TTL; Database/Analytics: Redshift COPY/UNLOAD.	Deep Cost + retention + legal deletion patterns.
D2.4 Models/schema: Redshift, DynamoDB, Lake Formation, DMS/SCT, lineage, partitions/compression.	Fundamentals: modeling/schema evolution; Database: DynamoDB/Redshift; Migration: DMS/SCT; Analytics: Lake Formation.	Deep Star schema vs key-value vs lakehouse.
D3 Operations: automate, analyze, monitor, troubleshoot, data quality.	Analytics: Glue Data Quality/DataBrew, Athena, Redshift; Management: CloudWatch/CloudTrail/Config; App Integration.	Practice-heavy Learn via wrong answers and scenarios.
D4 Security/Governance: IAM, KMS, Lake Formation, Macie, Secrets Manager, CloudTrail Lake, privacy.	Security, Identity & Compliance; Analytics: Lake Formation; Storage: S3 encryption/access points; Redshift security.	Deep Least privilege + fine-grained governance + audit.

Practice Exam Tracker

Exam	Date	Score %	Pass?	Top weak domain	Action

Error Log

Do not just write the correct answer. Write the rule you missed.

Topic	Wrong assumption	Correct rule to memorize	Closed?

Cheat Sheet – Domain 1: Data Ingestion & Transformation (34%)

Core decision tree

Streaming, replay, multiple consumers: Kinesis Data Streams or MSK.
Streaming to S3/Redshift/OpenSearch with low ops: Kinesis Data Firehose.
CDC from databases: AWS DMS.
SaaS → S3/Redshift: AppFlow.
Files on-prem → S3/EFS/FSx: DataSync or Transfer Family.

Transform choice

Small/event transform: Lambda.
Serverless Spark ETL: Glue.
Big Spark/Hadoop control: EMR.
SQL transform in warehouse: Redshift.
Streaming analytics/windowing: Managed Service for Apache Flink.

Kinesis must-know

Partition key decides shard; bad keys create hot shards.
Enhanced fan-out gives dedicated throughput per consumer.
Producers can create duplicates; consumers must be idempotent.
Data Streams = replayable. Firehose = delivery service, less consumer control.

Orchestration rules

Event-driven: EventBridge.
Visual state machine, retries, audit trail: Step Functions.
Airflow DAGs: MWAA.
Glue-native ETL chains: Glue Workflows + bookmarks.
Failure isolation: SQS DLQ / SNS alerts.

Exam traps

Do not choose Lambda for long-running heavy ETL.
Do not choose Firehose when you need multiple independent consumers/replay semantics.
Do not choose EMR when the question says least operational overhead and Glue fits.
Format conversion to Parquet is often part of the best answer.

One-line memory hooks

Low ops streaming delivery → Firehose.
Replay + fan-out → Kinesis Data Streams.
Batch Spark ETL → Glue.
Complex DAG → MWAA; simple serverless workflow → Step Functions.

Cheat Sheet – Domain 2: Data Store Management (26%)

Store selection

S3: durable data lake/object storage.
Athena: serverless SQL on S3.
Redshift: high-performance warehouse/BI.
DynamoDB: low-latency key-value/document at scale.
RDS/Aurora: relational transactions.
OpenSearch: search/log analytics.

S3 lake optimization

Use Parquet/ORC for analytics.
Partition by high-value filters, not by ultra-high-cardinality fields.
Compress to reduce scan cost.
Lifecycle policies for cost and retention.
Versioning protects against accidental delete/overwrite.

Glue Data Catalog

Crawlers infer schema and populate tables.
Partitions must be registered/synchronized.
Glue Catalog can act as Hive metastore.
Catalog + Lake Formation enables governed discovery/access.

Redshift must-know

COPY loads from S3; UNLOAD exports to S3.
Spectrum queries S3 external tables.
Materialized views speed repeated queries.
Federated query accesses live RDS/Aurora data.
RA3 decouples compute/storage.

Schema/model traps

Redshift: star/snowflake, sort/dist choices.
DynamoDB: design by access patterns; avoid scans/joins.
Schema evolution: Glue Catalog + compatible formats help.
Iceberg/open table formats matter for lakehouse tables.

Lifecycle hooks

DynamoDB TTL removes old items.
S3 lifecycle transitions/expiration control cost/compliance.
Legal deletion requirement means explicit deletion/expiry policy, not just cheaper storage.

Cheat Sheet – Domain 3: Data Operations & Support (22%)

Monitoring stack

CloudWatch Logs: application/service logs.
CloudWatch Metrics/Alarms: operational alerting.
CloudTrail: API activity and audit.
CloudTrail Lake: centralized queryable audit events.
Config: configuration history/compliance.

Glue troubleshooting

Check IAM role, network/VPC endpoints, source credentials.
Bookmark issues cause duplicate/missing processing.
Partition/schema mismatch breaks Athena/Glue queries.
Skew causes slow Spark jobs; repartition/salt/change key.

Kinesis troubleshooting

Hot shard → bad partition key.
Producer throttling → batch, retry, increase shards/on-demand.
Consumer lag → scale consumers/enhanced fan-out.
Duplicates → idempotent consumer/dedup key.

Data quality

Completeness, consistency, validity, accuracy, uniqueness.
Glue Data Quality rules can fail job or publish metrics.
DataBrew for visual profiling/cleaning.
Sampling: random, stratified; skew requires special handling.

Analysis rules

Athena for ad hoc S3 SQL.
Redshift for repeated, high-performance analytics.
QuickSight for visualization.
Provisioned vs serverless is usually cost predictability/control vs low ops/elasticity.

Exam traps

Audit question → CloudTrail, not CloudWatch.
Application logs → CloudWatch Logs.
Configuration drift → AWS Config.
Data quality during ETL → Glue Data Quality/DataBrew, not only Athena.

Cheat Sheet – Domain 4: Data Security & Governance (18%)

Access control

Least privilege always wins.
IAM roles for AWS services; avoid long-lived credentials.
Resource policies for S3/cross-account access.
S3 Access Points simplify access at scale.
PrivateLink/VPC endpoints for private service access.

Lake Formation

Fine-grained data lake permissions.
Works with S3, Athena, Redshift, EMR.
Column/table/database-level governance.
Use when IAM/S3 bucket policy is too coarse.

Encryption

KMS for managed key control and audit.
SSE-S3: simple default S3 encryption.
SSE-KMS: stricter control/audit/cross-account considerations.
TLS/HTTPS for in transit.
Secrets Manager for credential storage/rotation.

Privacy & governance

Macie identifies sensitive/PII in S3.
Mask/anonymize where compliance requires.
SCP/IAM/S3 policies can restrict disallowed regions.
Data sovereignty = region/location control + audit.

Audit

CloudTrail tracks API calls.
CloudWatch Logs stores app/service logs.
CloudTrail Lake enables SQL-like audit queries.
Athena/OpenSearch can analyze large log datasets.

Exam traps

Column-level data lake permissions → Lake Formation.
Find PII in S3 → Macie.
Rotate DB password → Secrets Manager.
Who changed config? → Config/CloudTrail depending on wording.

Backup, Import / Export & Profiles

Autosave is enabled.
Every checkbox, score, note and error-log entry is saved automatically in this browser under the selected profile.

Active profile

Use profiles if you want separate progress tracks, e.g. Fast Track, Retake, or another certification.

JSON backup

Export creates a timestamped JSON file with your current profile progress. Import restores values into the selected profile.

No JSON imported/exported yet.

Local versioned snapshots

Create named snapshots before major changes or practice exams. Snapshots stay in this browser and can also be downloaded as JSON.

Reset options

Resources & exam-day checklist

Resource	Use
Official AWS certification page	Exam logistics: 130 minutes, 65 questions, exam cost, languages, official preparation steps.
Official AWS DEA-C01 exam guide	Source of truth for domains, weights, task statements, and in-scope services.
Udemy course	Main learning path; use at 1.25–1.5x and skip low-yield sections after you pass practice thresholds.
Practice exams	Minimum: 3 full exams. Ready when consistently ≥80–85% and no repeated weak domain.

Book the exam when: you finish all high-priority rows, score ≥80% on two full practice exams, and can explain every error-log entry without notes.

Use the Backup / Profiles page for visible import/export controls, profiles, and versioned snapshots.