Amazon Transcribe에서 이제 멀티채널 트랜스크립션을 지원합니다.

BLOG

작성일: 2018-09-11

Amazon Transcribe는 개발자가 애플리케이션에 음성 – 텍스트 기능을 쉽게 추가 할 수 있게 해주는 자동 음성 인식 (ASR) 서비스입니다. 저희는 사용자가 다중 채널 오디오 파일을 처리하고 각 채널 레이블로 주석된 단일 사본을 검색 할 수 있는 Channel Identification(채널 식별)이라는 새로운 기능의 출시를 발표하게 되어 매우 기쁩니다.

특히 전화(콜)센터에서는 채널 식별을 통해 엄청난 이익을 얻습니다. 발신자와 상담원 간의 전화 대화는 일반적으로 별도의 채널에 녹음되고 단일 오디오 파일로 병합됩니다. 이 새로운 기능을 사용하여 콜센터는 Amazon Transcribe를 사용하여 단일 오디오를 처리 할 수 있습니다. Amazon Transcribe는 각 채널에 녹음된 음성을 지능적으로 기록한 다음 채널 레이블로 최종 녹음본을 만듭니다. 각 채널의 출력에 대한 워드 수준의 타임 스탬핑을 통해 콜센터는 발신자와 상담원 간의 일관된 교환을 재현할 수 있습니다.

Amazon Transcribe 콘솔 또는 AWS CLI를 사용하여 채널 식별 작업을 수행 할 수 있습니다. 콘솔에서 간단히 매개 변수 채널 식별을 활성화합니다.

CLI에서 StartTranscriptionJob을 호출하고 채널 식별 매개 변수를 true로 설정하기 만하면 됩니다. 다음은 채널 식별이 활성화된 전사 작업을 시작하는 샘플 Python 스크립트를 확인할 수 있습니다.

from __future__ import print_function

import time

import boto3

import json

import requests

# given an array of items (JSON output from Amazon Transcribe, get the transcript as a string)

def get_transcript_text(items):

transcript_text = “”

for item in items:

if (item[‘type’] == ‘pronunciation’):

transcript_text += ” {0}”.format(item[‘alternatives’][0][“content”])

else:

transcript_text += “{0}”.format(item[‘alternatives’][0][“content”])

return transcript_text

transcribe = boto3.client(‘transcribe’)

job_name = “demo-qldh-106”

job_uri = “s3://mast-mast-3/private/labs/transcribe-speech/aus-qldh/0b3debff-fedc-4bee-91e5-3faa3727ee80.wav”

transcribe.start_transcription_job(

TranscriptionJobName=job_name,

Media={‘MediaFileUri’: job_uri},

MediaFormat=’wav’,

LanguageCode=’en-US’,

Settings={‘ChannelIdentification’: True}

)

print(“{0}: Started job {1} …”.format(time.ctime(), job_name))

while True:

status = transcribe.get_transcription_job(TranscriptionJobName=job_name)

if status[‘TranscriptionJob’][‘TranscriptionJobStatus’] in [‘COMPLETED’, ‘FAILED’]:

break

print(“{0}: Still going ({1}) … “.format(time.ctime(), status[‘TranscriptionJob’][‘TranscriptionJobStatus’]))

time.sleep(30)

print(“{0}: It’s done ({1}).”.format(time.ctime(), status[‘TranscriptionJob’][‘TranscriptionJobStatus’]))

# if we completed successfully, then get the full text of the transcript

if (status[‘TranscriptionJob’][‘TranscriptionJobStatus’] == ‘COMPLETED’):

transcript_uri = status[‘TranscriptionJob’][‘Transcript’][‘TranscriptFileUri’]

# get the transcript JSON

transcript = json.loads((requests.get(transcript_uri)).text)

print(“Transcript:”)

# we could check the number_of_channels to be sure, but we’ll just assume two channels for this example

# get the left channel transcript as a blob of text

print(get_transcript_text(transcript[‘results’][‘channel_labels’][‘channels’][0][‘items’]))

# get the right channel transcript as a blob of text

print(get_transcript_text(transcript[‘results’][‘channel_labels’][‘channels’][1][‘items’]))

else:

print(“Job status is {0}”.format(status[‘TranscriptionJob’][‘TranscriptionJobStatus’]))

출력 내용은 채널 레이블로 표시된 두 개의 텍스트 블록을 보여줍니다. Amazon Transcribe 콘솔에서 채널 식별 탭을 선택하면 간단한 미리보기를 볼 수 있습니다.

다음은 JSON 형식의 샘플 출력 사본입니다. 트랜스크립트 섹션에 병합된 트랜스크립트를 표시하고 채널 배열 아래에 각 채널의 항목을 찾을 수 있습니다.

{

“jobName”: “job id”,

“accountId”: “account id”,

“results”: {

“transcripts”: [

{

“transcript”: “When you try … It seems to …”

}

“channel_labels”: {

“channels”: [

{

“channel_label”: “ch_0”,

“items”: [

{

“start_time”: “12.282”,

“end_time”: “12.592”,

“alternatives”: [

{

“confidence”: “1.0000”,

“content”: “When”

}

“type”: “pronunciation”

{

“start_time”: “12.592”,

“end_time”: “12.692”,

“alternatives”: [

{

“confidence”: “0.8787”,

“content”: “you”

}

“type”: “pronunciation”

{

“start_time”: “12.702”,

“end_time”: “13.252”,

“alternatives”: [

{

“confidence”: “0.8318”,

“content”: “try”

}

“type”: “pronunciation”

Transcription abbreviated

]

{

“channel_label”: “ch_1”,

“items”: [

{

“start_time”: “12.379”,

“end_time”: “12.589”,

“alternatives”: [

{

“confidence”: “0.5645”,

“content”: “It”

}

“type”: “pronunciation”

{

“start_time”: “12.599”,

“end_time”: “12.659”,

“alternatives”: [

{

“confidence”: “0.2907”,

“content”: “seems”

}

“type”: “pronunciation”

{

“start_time”: “12.669”,

“end_time”: “13.029”,

“alternatives”: [

{

“confidence”: “0.2497”,

“content”: “to”

}

“type”: “pronunciation”

Transcription abbreviated

]

}

기본적으로 이 기능은 두 개의 채널을 처리합니다. 추가 채널에 대한 지원을 요청고자 하면 그렇게 할 수 있습니다 (최대 5 개). 자세한 내용은 해당 설명서를 참조하십시오.

채널 식별은 추가 비용없이 Amazon Transcribe가 제공되는 모든 AWS 리전에서 사용할 수 있습니다. 자세한 내용은 이 설명서 페이지를 참조하십시오.

원문 URL: https://aws.amazon.com/ko/blogs/machine-learning/amazon-transcribe-now-supports-multi-channel-transcriptions/

** 메가존클라우드 TechBlog는 AWS BLOG 영문 게재글중에서 한국 사용자들에게 유용한 정보 및 콘텐츠를 우선적으로 번역하여 내부 엔지니어 검수를 받아서, 정기적으로 게재하고 있습니다. 추가로 번역및 게재를 희망하는 글에 대해서 관리자에게 메일 또는 SNS페이지에 댓글을 남겨주시면, 우선적으로 번역해서 전달해드리도록 하겠습니다.

매달 마지막 주 수요일, IT 트렌드에 전문가 인사이트 더하기!

IT 트렌드에 전문가 인사이트 더하기!

[인증범위] 메가존클라우드 서비스 운영

Cloud MSP, HyperBilling, 융합평생교육원, MegazonePoPs, CloudPlex, SpaceONE, Hubble Security
(심사받지 않은 물리적 인프라 제외)

[유효기간] 2023.04.05-2026.04.04

KAB-IC-97

BLOG