Back

Key Takeaway

Building a secure RAG-based in-house LLM utilization environment

Through an RAG architecture based on AIR Studio and AWS OpenSearch, we established a chatbot environment that safely utilizes in-house documents, and verified a security-focused LLM utilization system where RAG or LLM Only responses automatically operate depending on the availability of materials.

Automotive (D Company)

Client :Automotive (D Company)

Industry :Automotive / Manufacturing

Service Area :Data & AI

1. Overview (Project Background)

This project was initiated to establish a
secure LLM usage environment that minimizes the risk of technical information leakage and data learning issues that could arise as generative AI usage spreads within the company.

As internal employees utilized public LLMs such as ChatGPT,
concerns were raised that corporate internal data could be leaked externally or used in model training,
and a security-focused approach to generative AI utilization was needed to address these concerns.

Additionally, beyond simple question-and-answer interactions,
through RAG (Retrieval-Augmented Generation) chatbot implementation based on in-house documents and embedding data,
we aimed to create a structure that automatically switches response methods depending on the availability of materials.

When internal documents exist → RAG-based response
When internal documents do not exist → LLM Only response

2. Solution (Solution Approach)

Objective Definition

Verification of data leakage prevention structure based on security solutions
Performance and quality comparison and benchmarking of AWS-based LLM compared to GPT-4o

Key Verification Tasks

Verification of architecture to ensure internal data is not used for external training
Verification of response quality and accuracy using AWS LLM models

3. Result (Achievements)

Building RAG-based Data Processing Pipeline

Establishment of preprocessing process to convert various types of documents into RAG-suitable structures
Ensuring search accuracy by vector indexing preprocessed data in AWS OpenSearch

Document Parsing and Indexing Enhancement

Document content parsing using LLM-based OCR
Composition of parsed documents into RAG-usable structure by loading into VectorDB (OpenSearch)

Chat API Business Logic Implementation

Intent classification performed upon user query input
(In-house regulations / ESG / Others)
Automatic selection of RAG pipeline or LLM Only response path based on classification results

Document Correction Function Verification

Implementation of typo and expression error correction pipeline using LLM
Verification of document quality improvement possibilities completed

Expected Effects

RAG-based Chatbot Utilization

Provision of in-house document RAG chatbot and Web RAG chatbot through AIR Studio
Support for document management and configuration management functions by repository
Establishment of chatbot verification system based on expected question-answer sets

Document Correction Automation

Streamlit-based UI provision
Automatic inspection and correction output of entire document content upon upload

Automotive (D Company)

Key Takeaway

Building a secure RAG-based in-house LLM utilization environment

Automotive (D Company)

1. Overview (Project Background)

2. Solution (Solution Approach)

Objective Definition

Key Verification Tasks

3. Result (Achievements)

Building RAG-based Data Processing Pipeline

Document Parsing and Indexing Enhancement

Chat API Business Logic Implementation

Document Correction Function Verification

Expected Effects

RAG-based Chatbot Utilization

Document Correction Automation

Related

Case Stories

HANATOUR

hy(Korea Yakult)

Hansol Paper

MORAI

Jeju Beer

Automotive (C Company)

Let's build intelligent data solutions that drive real business value through advanced analytics and AI.