Handling Schema Evolution in DataStori
Schema evolution in ELT pipelines is the process of handling structural changes to your data — added columns, dropped columns, renamed fields, and data type changes — without breaking downstream workflows. DataStori handles schema evolution automatically so your pipelines keep running when source systems change.
Why Does Schema Evolution Matter?
Source systems and business requirements are dynamic. Two common scenarios trigger schema changes:
- Source system updates: The application adds or removes a field from its API. This should not cause your pipelines to stall.
- Business requirements: Your team wants to report on new KPIs and requests additional columns in the data pipeline.
Both scenarios are extremely common during the lifecycle of data pipelines. Handling schema updates is necessary for operational continuity and data integrity.
How Does DataStori Handle Adding New Columns?
DataStori adds new columns to your dataset automatically. For existing rows that predate the new column, the value is set to null.
| order_id | customer_id | order_date | customer_email | total_amount |
|---|---|---|---|---|
| 1 | 101 | 2024-07-01 | null | 100.00 |
| 2 | 102 | 2024-07-02 | null | 150.00 |
| 3 | 103 | 2024-07-03 | unknown@example.com | 200.00 |
Rows 1 and 2 existed before the customer_email column was added, so they show null. Row 3 includes the new column with data.
How Does DataStori Handle Dropping Columns?
DataStori does not drop columns from your dataset. If the source stops sending a column, the value is set to null for new rows, but the column and its historical data are preserved.
How Does DataStori Handle Renaming Columns?
When a column is renamed in the source, DataStori treats it as a new column added and the old column dropped. Both columns are preserved in the dataset — the old column shows null for new rows, and the new column shows null for old rows.
| order_id | customer_id | total_amount | sales_amount |
|---|---|---|---|
| 1 | 101 | 100.00 | null |
| 2 | 102 | 150.00 | null |
| 3 | 103 | null | 200.00 |
| 4 | 104 | null | 250.00 |
What Happens When Data Types Change?
DataStori raises an error and stops the pipeline when a data type change is detected. Changing data types can cause downstream chaos, so manual intervention is required to resolve the conflict and restart the pipeline safely.
How Does DataStori Document Schema Changes?
- Version-controlled schema: DataStori automatically documents each schema update, so you can view how the schema has evolved over time.
- Rollback: If you need to revert to a previous version of the data (before a schema change), DataStori enables rollback to a prior restore point.
- Monitoring: DataStori alerts you whenever a schema change is detected, so you can review and take action if needed.
Frequently Asked Questions
Does schema evolution require manual intervention in DataStori?
No — for adding, dropping, and renaming columns, DataStori handles it automatically. The only case that requires manual intervention is a data type change, which stops the pipeline to prevent downstream errors.
Can I roll back to a schema version before a change?
Yes. DataStori maintains version-controlled schema history and allows rollback to any prior restore point.
How does DataStori notify me about schema changes?
DataStori sends alerts whenever a schema update is detected. You can review the change log to see exactly what columns were added, dropped, or renamed.