Subjects data engineering

Etl Challenges 775Cb0

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Use the AI math solver

1. **Problem Statement:** FreshMart faces ETL challenges such as inconsistent product codes, duplicated customer names, and missing timestamps in sales records from multiple daily data sources. 2. **ETL Challenges:** - Inconsistent product codes across branches cause difficulty in data integration. - Duplicate customer names lead to inaccurate customer identification. - Missing timestamps in sales records affect data completeness and temporal analysis. - Multiple data sources arriving daily increase complexity in data consolidation. 3. **Transformation Rules to Resolve Data Quality Problems:** - Standardize product codes by mapping branch-specific codes to a unified code system using a lookup table. - Remove duplicate customer names by applying deduplication logic based on unique identifiers or fuzzy matching. - Impute or flag missing timestamps by using default values, interpolation, or requesting data correction. 4. **Incremental Loading vs Full Loading:** - Incremental loading processes only new or changed data, reducing processing time and resource usage. - Full loading reloads the entire dataset, which is inefficient for daily data arrivals. - Incremental loading minimizes downtime and supports near real-time data availability. Final answer: Incremental loading is more suitable because it efficiently handles daily data updates, reduces system load, and improves data freshness compared to full loading.