Back
Key Takeaway
Integrated dispersed data workflows into Airflow DAG for enhanced analytics environment
Restructured BigQuery-based analytical queries and Databricks workflows into Airflow DAG, and enhanced execution efficiency, reusability, and maintainability through code refactoring.
Fashion e-commerce (M Company)
Client :Fashion e-commerce (M Company)
Industry :Retail / Software
Service Area :Data & AI
Applied Solution :AIR
1. Overview (Project Background)
This project was initiated to transition data analysis workloads that were operated on BigQuery to the Databricks platform,
and to consolidate dispersed data processing workflows into a single Airflow operational system.
Previously, the system was operated with a mixed structure of BigQuery Scheduled Query and Airflow,
and the Databricks environment had workflows configured around sequential execution or single notebook-centric approaches,
which presented structural limitations in terms of scalability and maintainability.
In particular, complex logic where data processing reference dates vary depending on classification values was included,
and the need to improve workflow readability and reusability was raised.
2. Solution (Resolution Approach)
In this project, we established solutions centered on two key validation tasks.
Validation Task 1
We converted existing BigQuery-based SQL to Databricks SQL tailored to the Databricks environment,
and restructured some repetitive logic as Databricks UDF to improve execution efficiency and management convenience.
Validation Task 2
After analyzing the workflows executed in the Databricks environment,
we redesigned them into a To-Be Airflow DAG structure to standardize workflow execution and operations.
3. Result (Achievements)
Based on the existing processing logic, each step was restructured into Airflow Task units,
and logic requiring refactoring was separately modularized to improve the structure.
Through function-level modularization work, code reusability and maintainability were enhanced,
and by conducting logic analysis prior to Airflow implementation, we minimized risks during the workflow transition process.
Additionally, for existing workflows composed of multiple SQL queries and individual function logic,
we performed analysis centered on structure and execution flow, enabling us to organize them in a form conducive to future expansion and operations.
Expected Benefits
By converting Databricks workflows to Airflow DAG and conducting code refactoring in parallel,
we have established a foundation to reduce overall workflow execution time and eliminate unnecessary computations.
Furthermore, through query structure optimization, improvements in data processing efficiency and operational stability are expected.






