1. **Problem Statement:** FreshMart faces ETL challenges such as inconsistent product codes, duplicated customer names, and missing timestamps in sales records from multiple daily data sources.
2. **ETL Challenges:**
- Inconsistent product codes across branches cause difficulty in data integration.
- Duplicate customer names lead to inaccurate customer identification.
- Missing timestamps in sales records affect data completeness and temporal analysis.
- Multiple data sources arriving daily increase complexity in data consolidation.
3. **Transformation Rules to Resolve Data Quality Problems:**
- Standardize product codes by mapping branch-specific codes to a unified code system using a lookup table.
- Remove duplicate customer names by applying deduplication logic based on unique identifiers or fuzzy matching.
- Impute or flag missing timestamps by using default values, interpolation, or requesting data correction.
4. **Incremental Loading vs Full Loading:**
- Incremental loading processes only new or changed data, reducing processing time and resource usage.
- Full loading reloads the entire dataset, which is inefficient for daily data arrivals.
- Incremental loading minimizes downtime and supports near real-time data availability.
Final answer: Incremental loading is more suitable because it efficiently handles daily data updates, reduces system load, and improves data freshness compared to full loading.
Etl Challenges 775Cb0
Step-by-step solutions with LaTeX - clean, fast, and student-friendly.