bg

Fashion e-commerce (M Company)

Back

Key Takeaway

Integrated dispersed data workflows into Airflow DAG for enhanced analytics environment

Restructured BigQuery-based analytical queries and Databricks workflows into Airflow DAG, and enhanced execution efficiency, reusability, and maintainability through code refactoring.

Fashion e-commerce (M Company)

Client :Fashion e-commerce (M Company)

Industry :Retail / Software

Service Area :Data & AI

Applied Solution :AIR

1. Overview (Project Background)

This project was initiated to transition data analysis workloads that were operated on BigQuery to the Databricks platform,
and to consolidate dispersed data processing workflows into a single Airflow operational system.

Previously, the system was operated with a mixed structure of BigQuery Scheduled Query and Airflow,
and the Databricks environment had workflows configured around sequential execution or single notebook-centric approaches,
which presented structural limitations in terms of scalability and maintainability.

In particular, complex logic where data processing reference dates vary depending on classification values was included,
and the need to improve workflow readability and reusability was raised.


2. Solution (Resolution Approach)

In this project, we established solutions centered on two key validation tasks.

Validation Task 1
We converted existing BigQuery-based SQL to Databricks SQL tailored to the Databricks environment,
and restructured some repetitive logic as Databricks UDF to improve execution efficiency and management convenience.

Validation Task 2
After analyzing the workflows executed in the Databricks environment,
we redesigned them into a To-Be Airflow DAG structure to standardize workflow execution and operations.


3. Result (Achievements)

Based on the existing processing logic, each step was restructured into Airflow Task units,
and logic requiring refactoring was separately modularized to improve the structure.

Through function-level modularization work, code reusability and maintainability were enhanced,
and by conducting logic analysis prior to Airflow implementation, we minimized risks during the workflow transition process.

Additionally, for existing workflows composed of multiple SQL queries and individual function logic,
we performed analysis centered on structure and execution flow, enabling us to organize them in a form conducive to future expansion and operations.

Expected Benefits

By converting Databricks workflows to Airflow DAG and conducting code refactoring in parallel,
we have established a foundation to reduce overall workflow execution time and eliminate unnecessary computations.
Furthermore, through query structure optimization, improvements in data processing efficiency and operational stability are expected.

Related

Case Stories

Yanolja

Yanolja

Consolidate dispersed SaaS into one, manage costs and risks simultaneously

Read More
HANATOUR

HANATOUR

Travel service with 432% user growth through hyper-personalized AI consultation

Read More
Doalltech

Doalltech

Doalltech revolutionized both cost and operational efficiency through container-based SaaS transformation

Read More
Vueron Technology

Vueron Technology

Building a scalable cloud architecture for GPU-intensive LiDAR AI SaaS

Read More
hy(Korea Yakult)

hy(Korea Yakult)

Innovation in HY product search accuracy through generative AI and hybrid search-based construction, and acquisition of customer natural language recommendation functionality

Read More
Hansol Paper

Hansol Paper

Achieved 95% answer accuracy through prompt tuning process tailored to data characteristics and established a corporate knowledge utilization system

Read More

Ready to unlock your data's potential?

Let's build intelligent data solutions that drive real business value through advanced analytics and AI.