• Title/Summary/Keyword: Cross-Lingual

Search Result 74, Processing Time 0.026 seconds

Cross-Lingual Text Retrieval Based on a Knowledge Base (지식베이스에 기반한 다언어 문서 검색)

  • Choi, Myeong-Bok;Jo, Jun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.1
    • /
    • pp.21-32
    • /
    • 2010
  • User query formation highly acts on the effectiveness of information retrieval when we retrieve documents from the general domain as a web. This thesis proposes a intelligent information retrieval method based on a cross-lingual knowledge base to effectively perform a cross-lingual text retrieval from the web. The inferred knowledge from the cross-lingual knowledge base helps user's word association to make up user query easily and exactly for effective cross-lingual text information retrieval. This thesis develops user's query reformation algorithm and experiments it with Korean and English web. Experimental results show that the algorithm based on the proposed knowledge base is much more effective than without knowledge base in the cross-lingual text retrieval.

English-Korean Cross-lingual Link Discovery Using Link Probability and Named Entity Recognition (링크확률과 개체명 인식을 이용한 영-한 교차언어 링크 탐색)

  • Kang, Shin-Jae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.3
    • /
    • pp.191-195
    • /
    • 2013
  • This paper proposes an automatic method for discovering cross-lingual links from English Wikipedia documents to Korean ones in order to increase connectivity among vast web resources. Compared to the existing methods roughly estimating link probability of phrases, candidate anchors are selected from English documents by using various information such as title lists and linking probability extracted from Wikipedia dumps and the results of named-entity recognition, and the anchors are translated into Korean words, and then the most suitable Korean documents with the words are selected as cross-lingual links. The experimental results showed 0.375 of MAP.

Effective Cross-Lingual Text Retrieval using a Fuzzy Knowledge Base (퍼지 지식베이스를 이용한 효과적인 다언어 문서 검색)

  • Choi, Myeong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.53-62
    • /
    • 2008
  • Cross-lingual text retrieval(CLTR) is the information retrieval in which a user tries to search a set of documents written in one language for a query another language. This thesis proposes a CLTR system based on fuzzy multilingual thesaurus to handle a partial matching between terms of two different languages. The proposed CLTR system uses a fuzzy term matrix defined in our thesis to perform the information retrieval effectively. In the defined fuzzy term matrix, all relation degrees between terms are inferred from using the transitive closure algorithm to reflect all implicit links between terms into processing of the information retrieval. With this framework, the CLTR system proposed in our thesis enhances the retrieval effectiveness because it is able to emulate a human expert's decision making well in CLTR.

  • PDF

Llama2 Cross-lingual Korean with instruction and translation datasets (지시문 및 번역 데이터셋을 활용한 Llama2 Cross-lingual 한국어 확장)

  • Gyu-sik Jang;;Seung-Hoon Na;Joon-Ho Lim;Tae-Hyeong Kim;Hwi-Jung Ryu;Du-Seong Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.627-632
    • /
    • 2023
  • 대규모 언어 모델은 높은 연산 능력과 방대한 양의 데이터를 기반으로 탁월한 성능을 보이며 자연어처리 분야의 주목을 받고있다. 이러한 모델들은 다양한 언어와 도메인의 텍스트를 처리하는 능력을 갖추게 되었지만, 전체 학습 데이터 중에서 한국어 데이터의 비중은 여전히 미미하다. 결과적으로 이는 대규모 언어 모델이 영어와 같은 주요 언어들에 비해 한국어에 대한 이해와 처리 능력이 상대적으로 부족함을 의미한다. 본 논문은 이러한 문제점을 중심으로, 대규모 언어 모델의 한국어 처리 능력을 향상시키는 방법을 제안한다. 특히, Cross-lingual transfer learning 기법을 활용하여 모델이 다양한 언어에 대한 지식을 한국어로 전이시켜 성능을 향상시키는 방안을 탐구하였다. 이를 통해 모델은 기존의 다양한 언어에 대한 손실을 최소화 하면서도 한국어에 대한 처리 능력을 상당히 향상시켰다. 실험 결과, 해당 기법을 적용한 모델은 기존 모델 대비 nsmc데이터에서 2배 이상의 성능 향상을 보이며, 특히 복잡한 한국어 구조와 문맥 이해에서 큰 발전을 보였다. 이러한 연구는 대규모 언어 모델을 활용한 한국어 적용 향상에 기여할 것으로 기대 된다.

  • PDF

A Method of Chinese and Thai Cross-Lingual Query Expansion Based on Comparable Corpus

  • Tang, Peili;Zhao, Jing;Yu, Zhengtao;Wang, Zhuo;Xian, Yantuan
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.805-817
    • /
    • 2017
  • Cross-lingual query expansion is usually based on the relationship among monolingual words. Bilingual comparable corpus contains relationships among bilingual words. Therefore, this paper proposes a method based on these relationships to conduct query expansion. First, the word vectors which characterize the bilingual words are trained using Chinese and Thai bilingual comparable corpus. Then, the correlation between Chinese query words and Thai words are computed based on these word vectors, followed with selecting the Thai candidate expansion terms via the correlative value. Then, multi-group Thai query expansion sentences are built by the Thai candidate expansion words based on Chinese query sentence. Finally, we can get the optimal sentence using the Chinese and Thai query expansion method, and perform the Thai query expansion. Experiment results show that the cross-lingual query expansion method we proposed can effectively improve the accuracy of Chinese and Thai cross-language information retrieval.

Risk of lingual nerve injuries in removal of mandibular third molars: a retrospective case-control study

  • Tojyo, Itaru;Nakanishi, Takashi;Shintani, Yukari;Okamoto, Kenjiro;Hiraishi, Yukihiro;Fujita, Shigeyuki
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • v.41
    • /
    • pp.40.1-40.7
    • /
    • 2019
  • Background: Through the analysis of clinical data, we attempted to investigate the etiology and determine the risk of severe iatrogenic lingual nerve injuries in the removal of the mandibular third molar. Methods: A retrospective chart review was performed for patients who had undergone microsurgical repair of lingual nerve injuries. The following data were collected and analyzed: patient sex, age, nerve injury side, type of impaction (Winter's classification, Pell and Gregory's classification). Ratios for the respective lingual nerve injury group data were compared with the ratios of the respective data for the control group, which consisted of data collected from the literature. The data for the control group included previous patients that encountered various complications during the removal of the mandibular third molar. Results: The lingual nerve injury group consisted of 24 males and 58 females. The rate of female patients with iatrogenic lingual nerve injuries was significantly higher than the control groups. Ages ranged from 15 to 67 years, with a mean age of 36.5 years old. Lingual nerve injury was significantly higher in the patient versus the control groups in age. The lingual nerve injury was on the right side in 46 and on the left side in 36 patients. There was no significant difference for the injury side. The distoangular and horizontal ratios were the highest in our lingual nerve injury group. The distoangular impaction rate in our lingual nerve injury group was significantly higher than the rate for the control groups. Conclusion: Distoangular impaction of the mandibular third molar in female patients in their 30s, 40s, and 50s may be a higher risk factor of severe lingual nerve injury in the removal of mandibular third molars.

Contour of lingual surface in lower complete denture formed by polished surface impression

  • Heo, Yu-Ri;Kim, Hee-Jung;Son, Mee-Kyoung;Chung, Chae-Heon
    • The Journal of Advanced Prosthodontics
    • /
    • v.8 no.6
    • /
    • pp.472-478
    • /
    • 2016
  • PURPOSE. The aim of this study was to analyze the shapes of lingual polished surfaces in lower complete dentures formed by polished surface impressions and to provide reference data for use when manufacturing edentulous trays and lower complete dentures. MATERIALS AND METHODS. Twenty-six patients with mandibular edentulism were studied. After lower wax dentures were fabricated, wax was removed from the lingual side of the wax denture and a lingual polished surface impression was obtained with tissue conditioner. The definitive denture was scanned with a three-dimensional scanner, and scanned images were obtained. At the cross-sections of the lingual frenum, lateral incisors, first premolars, first molars, and anterior border of the retromolar pads, three points were marked and eight measurements were taken. The Kruskal-Wallis test and a post hoc analysis with the Mann-Whitney test were performed. RESULTS. Each patient showed similar values for the same areas on the left and right sides without a statistically significant difference. The height of the contour of the lingual polished surface at the lingual frenum was halfway between the occlusal plane and lingual border, it moved gradually in a downward direction. The angle from the occlusal plane to the height of the contour of the lingual polished surface was increased as it progressed from the lingual frenum towards the retromolar pads. CONCLUSION. The shape of the mandibular lingual polished surface was convex at the lingual frenum, lateral incisors and gradually flattened towards the first molars and retromolar pads.

Reference line for computed tomogram of the mandible (하악골 전산화단층사진촬영시 기준선에 관한 연구)

  • You Choong-Hyun;Kim Jae-Duk
    • Imaging Science in Dentistry
    • /
    • v.32 no.3
    • /
    • pp.153-157
    • /
    • 2002
  • Purpose : This study was performed to determine the proper reference line for taking axial computed tomograms from which the good cross-sectional views can be reformatted by multiplanar reconstruction. Methods : Three dry mandibles with implanted gutta percha cones in the extracted socket were scanned axially according to 6 reference lines of 2 mandibular positions with computed tomogram Hitachi W550. The accuracy of measurements of the lengths of implanted gutta perch a cones in the each cross-sectional view reformatted from axial computed tomogram by multiplanar reconstruction was evaluated. Results: The difference between the measurements and the real length of implant was smallest in the bucco-lingual views reformatted from the axial views scanned according to the reference line of group V-a. The smaller the angle difference between reference line and occlusal line was, the smaller the difference between the measurements in the bucco-lingual views reformatted from axial views and the real length of implant. The majority of measured widths of implants in the bucco-lingually reformatted views were larger than the actual values. Conclusions : When the mandible is inclined within the limitation of gantry angle and scanned with the reference line coincident with occlusal plane, the bucco-lingual view can be reformatted without deformation of images from the axially scanned images.

  • PDF

Korean language model construction and comparative analysis with Cross-lingual Post-Training (XPT) (Cross-lingual Post-Training (XPT)을 통한 한국어 언어모델 구축 및 비교 실험)

  • Suhyune Son;Chanjun Park ;Jungseob Lee;Midan Shim;Sunghyun Lee;JinWoo Lee ;Aram So;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.295-299
    • /
    • 2022
  • 자원이 부족한 언어 환경에서 사전학습 언어모델 학습을 위한 대용량의 코퍼스를 구축하는데는 한계가 존재한다. 본 논문은 이러한 한계를 극복할 수 있는 Cross-lingual Post-Training (XPT) 방법론을 적용하여 비교적 자원이 부족한 한국어에서 해당 방법론의 효율성을 분석한다. 적은 양의 한국어 코퍼스인 400K와 4M만을 사용하여 다양한 한국어 사전학습 모델 (KLUE-BERT, KLUE-RoBERTa, Albert-kor)과 mBERT와 전반적인 성능 비교 및 분석 연구를 진행한다. 한국어의 대표적인 벤치마크 데이터셋인 KLUE 벤치마크를 사용하여 한국어 하위태스크에 대한 성능평가를 진행하며, 총 7가지의 태스크 중에서 5가지의 태스크에서 XPT-4M 모델이 기존 한국어 언어모델과의 비교에서 가장 우수한 혹은 두번째로 우수한 성능을 보인다. 이를 통해 XPT가 훨씬 더 많은 데이터로 훈련된 한국어 언어모델과 유사한 성능을 보일 뿐 아니라 학습과정이 매우 효율적임을 보인다.

  • PDF

Study on Zero-shot based Quality Estimation (Zero-Shot 기반 기계번역 품질 예측 연구)

  • Eo, Sugyeong;Park, Chanjun;Seo, Jaehyung;Moon, Hyeonseok;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.35-43
    • /
    • 2021
  • Recently, there has been a growing interest in zero-shot cross-lingual transfer, which leverages cross-lingual language models (CLLMs) to perform downstream tasks that are not trained in a specific language. In this paper, we point out the limitations of the data-centric aspect of quality estimation (QE), and perform zero-shot cross-lingual transfer even in environments where it is difficult to construct QE data. Few studies have dealt with zero-shots in QE, and after fine-tuning the English-German QE dataset, we perform zero-shot transfer leveraging CLLMs. We conduct comparative analysis between various CLLMs. We also perform zero-shot transfer on language pairs with different sized resources and analyze results based on the linguistic characteristics of each language. Experimental results showed the highest performance in multilingual BART and multillingual BERT, and we induced QE to be performed even when QE learning for a specific language pair was not performed at all.