Back
Key Takeaway
Building a secure RAG-based in-house LLM utilization environment
Through an RAG architecture based on AIR Studio and AWS OpenSearch, we established a chatbot environment that safely utilizes in-house documents, and verified a security-focused LLM utilization system where RAG or LLM Only responses automatically operate depending on the availability of materials.
Automotive (D Company)
Client :Automotive (D Company)
Industry :Automotive / Manufacturing
Service Area :Data & AI
1. Overview (Project Background)
This project was initiated to establish a
secure LLM usage environment that minimizes the risk of technical information leakage and data learning issues that could arise as generative AI usage spreads within the company.
As internal employees utilized public LLMs such as ChatGPT,
concerns were raised that corporate internal data could be leaked externally or used in model training,
and a security-focused approach to generative AI utilization was needed to address these concerns.
Additionally, beyond simple question-and-answer interactions,
through RAG (Retrieval-Augmented Generation) chatbot implementation based on in-house documents and embedding data,
we aimed to create a structure that automatically switches response methods depending on the availability of materials.
When internal documents exist → RAG-based response
When internal documents do not exist → LLM Only response
2. Solution (Solution Approach)
Objective Definition
Verification of data leakage prevention structure based on security solutions
Performance and quality comparison and benchmarking of AWS-based LLM compared to GPT-4o
Key Verification Tasks
Verification of architecture to ensure internal data is not used for external training
Verification of response quality and accuracy using AWS LLM models
3. Result (Achievements)
Building RAG-based Data Processing Pipeline
Establishment of preprocessing process to convert various types of documents into RAG-suitable structures
Ensuring search accuracy by vector indexing preprocessed data in AWS OpenSearch
Document Parsing and Indexing Enhancement
Document content parsing using LLM-based OCR
Composition of parsed documents into RAG-usable structure by loading into VectorDB (OpenSearch)
Chat API Business Logic Implementation
Intent classification performed upon user query input
(In-house regulations / ESG / Others)Automatic selection of RAG pipeline or LLM Only response path based on classification results
Document Correction Function Verification
Implementation of typo and expression error correction pipeline using LLM
Verification of document quality improvement possibilities completed
Expected Effects
RAG-based Chatbot Utilization
Provision of in-house document RAG chatbot and Web RAG chatbot through AIR Studio
Support for document management and configuration management functions by repository
Establishment of chatbot verification system based on expected question-answer sets
Document Correction Automation
Streamlit-based UI provision
Automatic inspection and correction output of entire document content upon upload






