Search | Korea Science

Loss-adjusted Regularization based on Prediction for Improving Robustness in Less Reliable FAQ Datasets (신뢰성이 부족한 FAQ 데이터셋에서의 강건성 개선을 위한 모델의 예측 강도 기반 손실 조정 정규화)

Park, Yewon;Yang, Dongil;Kim, Soofeel;Lee, Kangwook
- Annual Conference on Human and Language Technology
- /
- 2019.10a
- /
- pp.18-22
- /
- 2019
FAQ 분류는 자주 묻는 질문을 범주화하고 사용자 질의에 대해 가장 유사한 클래스를 추론하는 방식으로 진행된다. FAQ 데이터셋은 클래스가 다수 존재하기 때문에 클래스 간 포함 및 연관 관계가 존재하고 특정 데이터가 서로 다른 클래스에 동시에 속할 수 있다는 특징이 있다. 그러나 최근 FAQ 분류는 다중 클래스 분류 방법론을 적용하는 데 그쳤고 FAQ 데이터셋의 특징을 모델에 반영하는 연구는 미미했다. 현 분류 방법론은 이러한 FAQ 데이터셋의 특징을 고려하지 못하기 때문에 정답으로 해석될 수 있는 예측도 오답으로 여기는 경우가 발생한다. 본 논문에서는 신뢰성이 부족한 FAQ 데이터셋에서도 분류를 잘 하기 위해 손실 함수를 조정하는 정규화 기법을 소개한다. 이 정규화 기법은 클래스 간 포함 및 연관 관계를 반영할 수 있도록 오답을 예측한 경우에도 예측 강도에 비례하여 손실을 줄인다. 이는 오답을 높은 확률로 예측할수록 데이터의 신뢰성이 낮을 가능성이 크다고 판단하여 학습을 강하게 하지 않게 하기 위함이다. 실험을 위해서는 다중 클래스 분류에서 가장 좋은 성능을 보이고 있는 모형인 BERT를 이용했으며, 비교 실험을 위한 정규화 방법으로는 통상적으로 사용되는 라벨 스무딩을 채택했다. 실험 결과, 본 연구에서 제안한 방법은 기존 방법보다 성능이 개선되고 보다 안정적으로 학습이 된다는 것을 확인했으며, 데이터의 신뢰성이 부족한 상황에서 효과적으로 분류를 수행함을 알 수 있었다.
PDF

Jointly learning class coincidence classification for FAQ classification (FAQ 분류 성능 향상을 위한 클래스 일치 여부 결합 학습 모델)

Yang, Dongil;Ham, Jina;Lee, Kangwook;Lee, Jiyeon
- Annual Conference on Human and Language Technology
- /
- 2019.10a
- /
- pp.12-17
- /
- 2019
FAQ(Frequently Asked Questions) 질의 응답 시스템은 자주 묻는 질문과 답변을 정의하고, 사용자 질의에 대해 정의된 답변 중 가장 알맞는 답변을 추론하여 제공하는 시스템이다. 정의된 대표 질문 및 대응하는 답변을 클래스(Class)라고 했을 때, FAQ 질의 응답 시스템은 분류(Classification) 문제라고 할 수 있다. 종래의 FAQ 분류는 동일 클래스 내 동의 문장(Paraphrase)에서 나타나는 공통적인 특징을 통해 분류 문제를 학습하였으나, 이는 비슷한 단어 구성을 가지면서 한 두 개의 단어에 의해 의미가 다른 문장의 차이를 구분하지 못하며, 특히 서로 다른 클래스에 속한 학습 데이터 간에 비슷한 의미를 가지는 문장이 존재할 때 클래스 분류에 오류가 발생하기 쉬운 문제점을 가지고 있다. 본 논문에서는 이 문제점을 해결하고자 서로 다른 클래스 내의 학습 데이터 문장들이 상이한 클래스임을 구분할 수 있도록 클래스 일치 여부(Class coincidence classification) 문제를 결합 학습(Jointly learning)하는 기법을 제안한다. 동일 클래스 내 학습 문장의 무작위 쌍(Pair)을 생성 및 학습하여 해당 쌍이 같은 클래스에 속한다는 것을 학습하게 하면서, 동시에 서로 다른 클래스 간 학습 문장의 무작위 쌍을 생성 및 학습하여 해당 쌍은 상이한 클래스임을 구분해 내는 능력을 함께 학습하도록 유도하였다. 실험을 위해서는 최근 발표되어 자연어 처리 분야에서 가장 좋은 성능을 보이고 있는 BERT 의 텍스트 분류 모델을 이용했으며, 제안한 기법을 적용한 모델과의 성능 비교를 위해 한국어 FAQ 데이터를 기반으로 실험을 진행했다. 실험 결과, 분류 문제만 단독으로 학습한 BERT 기본 모델보다 본 연구에서 제안한 클래스 일치 여부 결합 학습 모델이 유사한 문장들 간의 차이를 구분하며 유의미한 성능 향상을 보인다는 것을 확인할 수 있었다.
PDF

Comparison of Document Features Extraction Methods for Automatic Classification of Real World FAQ Mails (실세계의 FAQ 메일 자동분류를 위한 문서 특징추출 방법의 성능 비교)

홍진혁;류중원;조성배
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.271-273
- /
- 2001
최근 문서 자동분류의 중요성이 널리 인식되어 다양한 연구가 진행되고 있다. 본 논문에서는 한글 문서의 효과적인 자동분류를 위한 다양한 특징추출 방법들을 구현하고 실제 질의메일에 대한 효율적인 특징주출 방법을 제시한다. 실험을 위해 문서 빈도(document frequency), 정보획득(information gain), 상호 정보량(mutual information), x$^2$등 7가지 특징추출 방법을 사용하였으며 463개의 실제 테스트 질의메일에 적용한 결과, x$^2$ 방법이 74.7%의 인식률을 내어 성능이 가장 좋음을 알 수 있었다. 반면에 x$^2$와 함께 가장 자주 쓰이는 방법 중의 하나인 정보 이득은 인식률이 최대 40.6%밖에 되지 않았다.
PDF

Automatic Response and Conceptual Browsing of Internet FAQs Using Self-Organizing Maps (자기구성 지도를 이용한 인터넷 FAQ의 자동응답 및 개념적 브라우징)

Ahn, Joon-Hyun;Ryu, Jung-Won;Cho, Sung-Bae
- Journal of the Korean Institute of Intelligent Systems
- /
- v.12 no.5
- /
- pp.432-441
- /
- 2002
Though many services offer useful information on internet, computer users are not so familiar with such services that they need an assistant system to use the services easily In the case of web sites, for example, the operators answer the users e-mail questions, but the increasing number of users makes it hard to answer the questions efficiently. In this paper, we propose an assistant system which responds to the users questions automatically and helps them browse the Hanmail Net FAQ (Frequently Asked Question) conceptually. This system uses two-level self-organizing map (SOM): the keyword clustering SOM and document classification SOM. The keyword clustering SOM reduces a variable length question to a normalized vector and the document classification SOM classifies the question into an answer class. Experiments on the 2,206 e-mail question data collected for a month from the Hanmail net show that this system is able to find the correct answers with the recognition rate of 95% and also the browsing based on the map is conceptual and efficient.
https://doi.org/10.5391/JKIIS.2002.12.5.432 인용 PDF KSCI

A Study on the Influencing Factors of Continuous Usage Intention for a Scenario based FAQ Service regarding on Private Information Protection (개인정보보호에 관한 시나리오 기반 질의응답서비스 품질이 이용의도에 미치는 요인에 관한 연구)

Kang, Sang-Ug;Lee, Dae-Chul
- Journal of Digital Convergence
- /
- v.12 no.2
- /
- pp.223-236
- /
- 2014
The paper studies the influencing factors of continuous usage intention for a scenario based cognitive FAQ service regrading on private information protection. The research result finds that three major factors are significantly positive to the continuous usage intention for the service. First, search easiness is an essential factor and it can be improved using sophisticate categorization. Second, Scenario based FAQ service is effective on understanding and solving questioner's situation. Related information is helpful for problem solving. The research shows that the new approach to private information protection area can lead to a more acceptable and reasonable problem solving tool.
https://doi.org/10.14400/JDC.2014.12.2.223 인용 PDF KSCI

A New Similarity Measure for Improving Ranking in QA Systems (질의응답시스템 응답순위 개선을 위한 새로운 유사도 계산방법)

Kim Myung-Gwan;Park Young-Tack
- Journal of KIISE:Computing Practices and Letters
- /
- v.10 no.6
- /
- pp.529-536
- /
- 2004
The main idea of this paper is to combine position information in sentence and query type classification to make the documents ranking to query more accessible. First, the use of conceptual graphs for the representation of document contents In information retrieval is discussed. The method is based on well-known strategies of text comparison, such as Dice Coefficient, with position-based weighted term. Second, we introduce a method for learning query type classification that improves the ability to retrieve answers to questions from Question Answering system. Proposed methods employ naive bayes classification in machine learning fields. And, we used a collection of approximately 30,000 question-answer pairs for training, obtained from Frequently Asked Question(FAQ) files on various subjects. The evaluation on a set of queries from international TREC-9 question answering track shows that the method with machine learning outperforms the underline other systems in TREC-9 (0.29 for mean reciprocal rank and 55.1% for precision).
PDF KSCI

Classification of Query E-Mail Using Neural Network (신경망을 이용한 사용자 질의 전자 메일 분류)

변영철;홍영보
- Journal of Korea Multimedia Society
- /
- v.7 no.3
- /
- pp.438-449
- /
- 2004
More and more users are using the query e-mail according to the increment of use of internet. The operator of internet site desires the users to check the FAQ and Q＆A contents first before sending the query e-mail to the operator However the users try to get the solution for a problem easily by simply sending a query e-mail. Therefore the increment of query e-mail is inevitable, and the site operator is suffering from too heavy loads and spending too much time and cost to reply the query e-mail. In this paper, we are proposing an efficient method of classifying the query e-mail of users automatically by using a neural network. To verify the reasonability of our work, the query e-mails of KORNET are used as the test data, which is actually gathered in KT. A total of 210 learning data and 280 test data were used to test the performance of the proposed approach. From the experiments we got the encouraging result from the view point of application in real life. The proposed approach satisfied the request of users who wanted rapid response for their query e-mail.
PDF

An Interactive Approach to Categorize Questions on the Internet BBSs (인터넷 게시판 질문 분류를 위한 인터랙티브 접근방법에 관한 연구)

Jae-Kwang Lee;Seong-Ho Noh;Ok-Hyun Ryou
- The Journal of Society for e-Business Studies
- /
- v.8 no.3
- /
- pp.177-195
- /
- 2003
In a traditional customer support environment, mainly call centers or service centers are responsible for receiving inquiries from their customers via telephone calls. Due to the rapid growth of Internet with its widespread acceptance and accessibility, means of communication with customers in the traditional customer support center, such as telephones, letters, and direct-visiting, have been replaced by e-mails and bulletin board systems (BBSs) using the Internet constantly. BBSs are basically question and answer systems, they require some lead time to get answer from administrator. To reduce lead time, BBSs enable remote customers or users to log on and tap into a knowledge database that is generally formatted in the form of Frequently Asked Questions (FAQs) that provide answers and solutions to the common problems. And, many different types of the questions are mixed on the BBS. It is a burden to administrator. To build FAQs and to support BBS adminstrator, a supporting tool which is to categorize questions is helpful. In this research, we suggest an interactive question categorizing methodology which consists of steps to present question using keywords, identifying keywords' affinity, computing similarity among questions, and clustering questions. This methodology allows users to interact iteratively for clear manifestation of ambiguous questions. We also developed a prototype system, IQC (interactive question categorizer) and evaluated its performance using the comparison experiments with other systems. IQC is not a general purposed system, but it produces a good result in a given specific domain.
PDF

A Usability Test of E-mail Automatic Response and Browsing System Using Self-Organizing Map (자기 구성 지도를 이용한 전자메일 자동응답 및 브라우징 시스템의 사용성 평가)

노영주;조성배
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10b
- /
- pp.220-222
- /
- 2001
컴퓨터의 사용인구가 많아지고, 인터넷의 보급이 급속히 늘어남에 따라 많은 정보가 생산되고 있다. 그리고 이러한 정보를 사용자에게 좀더 효율적인 방법으로 제공하는 서비스들도 많아지게 되었다. 그러나 컴퓨터에 익숙하지 않은 사용자들은 쉽게 이러한 서비스를 이용하지 못하기 때문에 사용자를 돕는 시스템이 필요하다. 인터넷 서비스제공 업체들은 사용자의 질문에 대해 관리자가 직접 답을 해주는데, 이들 시스템을 이용하는 사용자들의 증가로 질의응답 업무의 양이 커지고 있다. 본 논문에서는 이를 해결하기 위해 사용자의 질의를 자동으로 분류하여 응답하고 사용자가 FAQ를 개념적으로 브라우징 할 수 있는 시스템의 유용성을 입증하기 위하여, 그 적용 가능성과 일반 사용자들의 이용 결과를 통계적으로 분석하였다.
PDF

Question Similarity Analysis in dialogs with Automatic Feature Extraction (자동 추출 자질을 이용한 대화 속 질의 문장 유사성 분석)

Oh, KyoJoong;Lee, DongKun;Lim, Chae-Gyun;Choi, Ho-Jin
- Annual Conference on Human and Language Technology
- /
- 2018.10a
- /
- pp.347-351
- /
- 2018
이 논문은 대화 시스템에서 질의를 이해하기 위해 딥 러닝 모델을 통해 추출된 자동 추출 자질을 이용하여 문장의 유사성을 분석하는 방법에 대해 기술한다. 문장 간 유사성을 분석하기 위한 자동 추출 자질로써, 문장 내 표현 순차적 정보를 반영하기 위한 RNN을 이용하여 생성한 문장 벡터와, 어순에 관계 없이 언어 모델을 학습하기 위한 CNN을 이용하여 생성한 문장 벡터를 사용한다. 이렇게 자동으로 추출된 문장 임베딩 자질은 금융서비스 대화에서 입력 문장을 분류하거나 문장 간 유사성을 분석하는데 이용된다. 유사성 분석 결과는 질의 문장과 관련된 FAQ 문장을 찾거나 답변 지식을 찾는데 활용된다.
PDF

Search Result 15, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)