bg

Fashion e-commerce (M Company)

Back

Key Takeaway

Integrated dispersed data workflows into Airflow DAG for enhanced analytics environment

Restructured BigQuery-based analytical queries and Databricks workflows into Airflow DAG, and enhanced execution efficiency, reusability, and maintainability through code refactoring.

Fashion e-commerce (M Company)

Client :Fashion e-commerce (M Company)

Industry :Retail / Software

Service Area :Data & AI

Applied Solution :AIR

1. Overview (Project Background)

This project was initiated to transition data analysis workloads that were operated on BigQuery to the Databricks platform,
and to consolidate dispersed data processing workflows into a single Airflow operational system.

Previously, the system was operated with a mixed structure of BigQuery Scheduled Query and Airflow,
and the Databricks environment had workflows configured around sequential execution or single notebook-centric approaches,
which presented structural limitations in terms of scalability and maintainability.

In particular, complex logic where data processing reference dates vary depending on classification values was included,
and the need to improve workflow readability and reusability was raised.


2. Solution (Resolution Approach)

In this project, we established solutions centered on two key validation tasks.

Validation Task 1
We converted existing BigQuery-based SQL to Databricks SQL tailored to the Databricks environment,
and restructured some repetitive logic as Databricks UDF to improve execution efficiency and management convenience.

Validation Task 2
After analyzing the workflows executed in the Databricks environment,
we redesigned them into a To-Be Airflow DAG structure to standardize workflow execution and operations.


3. Result (Achievements)

Based on the existing processing logic, each step was restructured into Airflow Task units,
and logic requiring refactoring was separately modularized to improve the structure.

Through function-level modularization work, code reusability and maintainability were enhanced,
and by conducting logic analysis prior to Airflow implementation, we minimized risks during the workflow transition process.

Additionally, for existing workflows composed of multiple SQL queries and individual function logic,
we performed analysis centered on structure and execution flow, enabling us to organize them in a form conducive to future expansion and operations.

Expected Benefits

By converting Databricks workflows to Airflow DAG and conducting code refactoring in parallel,
we have established a foundation to reduce overall workflow execution time and eliminate unnecessary computations.
Furthermore, through query structure optimization, improvements in data processing efficiency and operational stability are expected.

Related

Case Stories

Ready to unlock your data's potential?

Let's build intelligent data solutions that drive real business value through advanced analytics and AI.

ACT ACERTi

ISO/IEC 42001:2023
ISO/IEC 27001:2022

ISO/IEC 27018:2019
ISO/IEC 27017:2015

ISO/IEC 27701:2019
ISO 45001:2018