• 제목/요약/키워드: Text Similarity

검색결과 281건 처리시간 0.028초

Extended Semantic Web Services Retrieval Model for the Intelligent Web Services (지능형 웹 서비스를 위한 확장된 시맨틱 웹서비스 검색 모델)

  • Choi, Ok-Kyung;Han, Sang-Yong;Lee, Zoon-Ky
    • The KIPS Transactions:PartD
    • /
    • 제13D권5호
    • /
    • pp.725-730
    • /
    • 2006
  • Recently Web services have become a key technology which is indispensable for e-business. Due to its ability to provide the desired information or service regardless of time and place, integrating current application systems within a single business or between multiple businesses with standardized technologies are realized using the open network and Internet. However, the current Web Services Retrieval Systems, based on text oriented search are incapable of providing reliable search results by perceiving the similarity or interrelation between the various terms. Currently there are no web services retrieval models containing such semantic web functions. This research work is purported for solving such problems by designing and implementing an extended Semantic Web Services Retrieval Model that is capable of searching for general web documents, UDDI and semantic web documents. Execution result is proposed in this paper and its efficiency and accuracy are verified through it.

An Automatic LOINC Mapping Framework for Standardization of Laboratory Codes in Medical Informatics (의료 정보 검사코드 표준화를 위한 LOINC 자동 매핑 프레임웍)

  • Ahn, Hoo-Young;Park, Young-Ho
    • Journal of Korea Multimedia Society
    • /
    • 제12권8호
    • /
    • pp.1172-1181
    • /
    • 2009
  • An electronic medical record (EMR) is the medical system that all the test are recorded as text data. However, domestic EMR systems have various forms of medical records. There are a lot of related works to standardize the laboratory codes as a LOINC (Logical Observation Identifiers Names and Code). However the existing researches resolve the problem manually. The manual process does not work when the size of data is enormous. The paper proposes a novel automatic LOINC mapping algorithm which uses indexing techniques and semantic similarity analysis of medical information. They use file system which is not proper to enormous medical data. We designed and implemented mapping algorithm for standardization laboratory codes in medical informatics compared with the existing researches that are only proposed algorithms. The automatic creation of searching words is being possible. Moreover, the paper implemented medical searching framework based on database system that is considered large size of medical data.

  • PDF

A Study on the Analysis of Intellectual Structure of Korean Veterinary Sciences (국내 수의과학 분야의 지적 구조 분석에 관한 연구)

  • Cho, Hyun-Yang
    • Journal of Information Management
    • /
    • 제43권2호
    • /
    • pp.43-66
    • /
    • 2012
  • The purpose of this study is to see the intellectual structure in the field of veterinary sciences in Korea, using author profiling analysis(APA), a bibliometric approach. Three journals are selected on the basis of citation data, exchanging most citations with Korean Journal of Veterinary. And then, 50 authors who published most articles at selected journals during the given period of time were chosen. The analysis of similarity and dissimilarity among authors by comparing co-word appearance patterns from article title, abstracts, and keywords was made. Authors can be grouped 11 minor clusters under 4 major clusters, depending on their interests in the area of veterinary sciences in Korea. The subjects for each cluster at the veterinary sciences are decided by the matching the keyword, representing author's research interest. As a result, it is possible to figure out the current research trends and the researcher network in the field of veterinary sciences.

The Analysis of Chosun Danasty Poetry Using 3D Data Visualization (3D 시각화를 이용한 조선시대 시문 분석)

  • Min, Kyoung-Ju;Lee, Byoung-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제25권7호
    • /
    • pp.861-868
    • /
    • 2021
  • With the development of technology for visualizing big-data, tasks such as intuitively analyzing a lot of data, detecting errors, and deriving meaning are actively progressing. In this paper, we describe the design and implementation of a 3D analysis that collects and stores the writing data in Chinese characters provided by the Korean Classical Database of the Korean Classics Translation Institute, stores and progress the data, and visualizes the writing information in a 3D network diagram. It solves the problem when a large amount of data is expressed in 2D, intuitive that analysis, error detection, meaningful data extraction such as characteristics, similarity, differences, etc. and user convenience can be provided. In this paper, we improved the problems of analyzing Chosun dynasty poetry in Chinese characters using 2D visualization conducted in previous studies.

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering (워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법)

  • Lee, Pil Won;Hwang, Yun Young;Choi, Jong Seok;Shin, Young Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제10권11호
    • /
    • pp.535-540
    • /
    • 2021
  • Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

A Study on Smallpox and Measles by BYUN Gwangwon - Based on a formation Yosandnagsinjipuibangkeumnangjibo and The Bojeoksinbang - (변광원(卞光源)의 두진(痘疹)과 마진(麻疹)에 대한 연구 - 『요산당신집의방금낭지보(樂山堂新集醫方錦囊至寶)』와 『보적신방(保赤新方)』의 편제를 중심으로 -)

  • SONG, Jichung
    • Journal of Korean Medical classics
    • /
    • 제35권3호
    • /
    • pp.59-69
    • /
    • 2022
  • Objectives : The existence of specialized medical texts on a certain disease is reflective of its prevalence of the time. Smallpox and measles were major pediatric diseases, of which previous studies examined the outbreak of measles in late Joseon and the relationship among various specialized texts, and how records of the two diseases in the general medical literature has changed chronologically. Research on the two diseases recorded in different texts written by the same author has not been conducted before. Methods : Examination of the organization of the smallpox and measles parts in the Yosandangsinjipuibangkeumnangjibo and Bojeoksinbang, followed by comparative analysis was undertaken. Results : While the two texts show great similarity in the general contents of smallpox and measles, there was difference in the way they were written. In the case of the Yosandangsinjipuibangkeumnangjibo the author lists referenced literature, while in the Bojeoksinbang he does not. Also, compared to the Yosandangsinjipuibangkeumnangjibo, the Bojeoksinbang has detailed titles for the contents in both introduction and the detailed parts, while in the Bojeoksinbang there are contents that could not be found in the Yosandangsinjipuibangkeumnangjibo, along with more pattern differentiation in the former. Conclusions : The Yosandangsinjipuibangkeumnangjibo which was published in May of 1806 is a general type of medical text, in which the part on pediatrics is positioned in the first two volumes out of the entire 12 volumes, indicative of the author's emphasis on pediatric disease. The Bojeoksinbang which was published in December of 1806 discusses in-depth theories on smallpox and measles out of all pediatric disease, from which we can glimpse a specialized field of pediatrics in the late Joseon period.

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • 제8권3호
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.

Development of a Ranking System for Tourist Destination Using BERT-based Semantic Search (BERT 기반 의미론적 검색을 활용한 관광지 순위 시스템 개발)

  • KangWoo Lee;MyeongSeon Kim;Soon Goo Hong;SuGyeong Roh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • 제29권4호
    • /
    • pp.91-103
    • /
    • 2024
  • A tourist destination ranking system was designed that employs a semantic search to extract information with reasonable accuracy. To this end the process involves collecting data, preprocessing text reviews of tourist spots, and embedding the corpus and queries with SBERT. We calculate the similarity between data points, filter out those below a specified threshold, and then rank the remaining tourist destinations using a count-based algorithm to align them semantically with the query. To assess the efficacy of the ranking algorithm experiments were conducted with four queries. Furthermore, 58,175 sentences were directly labeled to ascertain their semantic relevance to the third query, 'crowdedness'. Notably, human-labeled data for crowdedness showed similar results. Despite challenges including optimizing thresholds and imbalanced data, this study shows that a semantic search is a powerful method for understanding user intent and recommending tourist destinations with less time and costs.

A Design and Implementation of a Content_Based Image Retrieval System using Color Space and Keywords (칼라공간과 키워드를 이용한 내용기반 화상검색 시스템 설계 및 구현)

  • Kim, Cheol-Ueon;Choi, Ki-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • 제4권6호
    • /
    • pp.1418-1432
    • /
    • 1997
  • Most general content_based image retrieval techniques use color and texture as retrieval indices. In color techniques, color histogram and color pair based color retrieval techniques suffer from a lack of spatial information and text. And This paper describes the design and implementation of content_based image retrieval system using color space and keywords. The preprocessor for image retrieval has used the coordinate system of the existing HSI(Hue, Saturation, Intensity) and preformed to split One image into chromatic region and achromatic region respectively, It is necessary to normalize the size of image for 200*N or N*200 and to convert true colors into 256 color. Two color histograms for background and object are used in order to decide on color selection in the color space. Spatial information is obtained using a maximum entropy discretization. It is possible to choose the class, color, shape, location and size of image by using keyword. An input color is limited by 15 kinds keyword of chromatic and achromatic colors of the Korea Industrial Standards. Image retrieval method is used as the key of retrieval properties in the similarity. The weight values of color space ${\alpha}(%)and\;keyword\;{\beta}(%)$ can be chosen by the user in inputting the query words, controlling the values according to the properties of image_contents. The result of retrieval in the test using extracted feature such as color space and keyword to the query image are lower that those of weight value. In the case of weight value, the average of te measuring parameters shows approximate Precision(0.858), Recall(0.936), RT(1), MT(0). The above results have proved higher retrieval effects than the content_based image retrieval by using color space of keywords.

  • PDF

SEARCH FOR EXOPLANETS AROUND NORTHERN CIRCUMPOLAR STARS III. LONG-PERIOD RADIAL VELOCITY VARIATIONS IN HD 18438 AND HD 158996

  • Bang, Tae-Yang;Lee, Byeong-Cheol;Jeong, Gwang-Hui;Han, Inwoo;Park, Myeong-Gu
    • Journal of The Korean Astronomical Society
    • /
    • 제51권1호
    • /
    • pp.17-25
    • /
    • 2018
  • Detecting exoplanets around giant stars sheds light on the later-stage evolution of planetary systems. We observed the M giant HD 18438 and the K giant HD 158996 as part of a Search for Exoplanets around Northern circumpolar Stars (SENS) and obtained 38 and 24 spectra from 2010 to 2017 using the high-resolution Bohyunsan Observatory Echelle Spectrograph (BOES) at the 1.8m telescope of Bohyunsan Optical Astronomy Observatory in Korea. We obtained precise RV measurements from the spectra and found long-period radial velocity (RV) variations with period 719.0 days for HD 18438 and 820.2 days for HD 158996. We checked the chromospheric activities using Ca $\text\tiny{II}$ H and $H{\alpha}$ lines, HIPPARCOS photometry and line bisectors to identify the origin of the observed RV variations. In the case of HD 18438, we conclude that the observed RV variations with period 719.0 days are likely to be caused by the pulsations because the periods of HIPPARCOS photometric and $H{\alpha}$ EW variations for HD 18438 are similar to that of RV variations in Lomb-Scargle periodogram, and there are no correlations between bisectors and RV measurements. In the case of HD 158996, on the other hand, we did not find any similarity in the respective periodograms nor any correlation between RV variations and line bisector variations. In addition, the probability that the real rotational period can be as longer than the RV period for HD 158996 is only about 4.3%. Thus we conclude that observed RV variations with a period of 820.2 days of HD 158996 are caused by a planetary companion, which has the minimum mass of 14.0 $M_{Jup}$, the semi-major axis of 2.1 AU, and eccentricity of 0.13 assuming the stellar mass of $1.8 M_{\odot}$. HD 158996 is so far one of the brightest and largest stars to harbor an exoplanet candidate.