• 제목/요약/키워드: search similarity

검색결과 537건 처리시간 0.022초

데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링 (Hybrid Simulated Annealing for Data Clustering)

  • 김성수;백준영;강범수
    • 산업경영시스템학회지
    • /
    • 제40권2호
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

거대언어모델과 문서검색 알고리즘을 활용한 한국원자력연구원 규정 질의응답 시스템 개발 (Development of a Regulatory Q&A System for KAERI Utilizing Document Search Algorithms and Large Language Model)

  • 김홍비;유용균
    • 한국산업정보학회논문지
    • /
    • 제28권5호
    • /
    • pp.31-39
    • /
    • 2023
  • 최근 자연어 처리(NLP) 기술, 특히 ChatGPT를 비롯한 거대 언어 모델(LLM)의 발전으로 특정 전문지식에 대한 질의응답(QA) 시스템의 연구개발이 활발하다. 본 논문에서는 거대언어모델과 문서검색 알고리즘을 활용하여 한국원자력연구원(KAERI)의 규정 등 다양한 문서를 이해하고 사용자의 질문에 답변하는 시스템의 동작 원리에 대해서 설명한다. 먼저, 다수의 문서를 검색과 분석이 용이하도록 전처리하고, 문서의 내용을 언어모델에서 처리할 수 있는 길이의 단락으로 나눈다. 각 단락의 내용을 임베딩 모델을 활용하여 벡터로 변환하여 데이터베이스에 저장하고, 사용자의 질문에서 추출한 벡터와 비교하여 질문의 내용과 가장 관련이 있는 내용들을 추출한다. 추출된 단락과 질문을 언어 생성 모델의 입력으로 사용하여 답변을 생성한다. 본 시스템을 내부 규정과 관련된 다양한 질문으로 테스트해본 결과 복잡한 규정에 대하여 질문의 의도를 이해하고, 사용자에게 빠르고 정확하게 답변을 제공할 수 있음을 확인하였다.

BERT 기반 의미론적 검색을 활용한 관광지 순위 시스템 개발 (Development of a Ranking System for Tourist Destination Using BERT-based Semantic Search)

  • 이강우;김명선;홍순구;노수경
    • 한국산업정보학회논문지
    • /
    • 제29권4호
    • /
    • pp.91-103
    • /
    • 2024
  • 본 연구의 목적은 시맨틱 검색 기법을 활용하여 사용자 쿼리 기반의 타당한 정확도를 가진 관광지 랭킹시스템을 설계하는 것이다. 이를 위해 관광지에 대한 텍스트 리뷰 데이터 수집, 데이터 전처리 및 SBERT를 활용한 임베딩 과정을 거쳤다. 이후 유사도를 측정하고 임계값을 충족하는 데이터를 필터링한 후 카운트 기반 랭킹 알고리즘을 적용하여 쿼리와 의미적으로 유사한 순서로 관광지 순위를 도출하였다. 제안된 랭킹 알고리즘의 평가를 위해 4개의 쿼리로 실험을 진행하여 연관성이 높은 상위 5개 관광지를 도출하였다. 도출된 결과값의 비교를 위해 58,175개의 문장에 직접 라벨을 붙여 세 번째 쿼리인 혼잡도와 의미적으로 연관성이 있는지를 확인하였다. 두 결과값이 유사하여 본 연구에서 제시된 랭킹 알고리즘의 효율성이 검증되었다. 임계값 최적화, 데이터 불균형 등의 문제에도 불구하고 이 연구는 시맨틱 검색 기법을 이용하여 적은 비용과 시간으로도 사용자의 의도를 파악하여 관광지를 추천하는 것이 가능하다는 것을 보여주었다.

Accelerating Self-Similarity-Based Image Super-Resolution Using OpenCL

  • Jun, Jae-Hee;Choi, Ji-Hoon;Lee, Dae-Yeol;Jeong, Seyoon;Cho, Suk-Hee;Kim, Hui-Yong;Kim, Jong-Ok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제4권1호
    • /
    • pp.10-15
    • /
    • 2015
  • This paper proposes the parallel implementation of a self-similarity based image SR (super-resolution) algorithm using OpenCL. The SR algorithm requires tremendous computations to search for a similar patch. This becomes a bottleneck for the real-time conversion from a FHD image to UHD. Therefore, it is imperative to accelerate the processing speed of SR algorithms. For parallelization, the SR process is divided into several kernels, and memory optimization is performed. In addition, two GPUs are used for further acceleration. The experimental results shows that a GPGPU implementation can speed up over 140 times compared to a single-core CPU. Furthermore, it was confirmed experimentally that utilizing two GPUs can speed up the execution time proportionally, up to 277 times.

입말 표기를 이용한 영어 단어 검색 (Retrieving English Words with a Spoken Work Transliteration)

  • 김지승;김광현;이준호
    • 한국문헌정보학회지
    • /
    • 제39권3호
    • /
    • pp.93-103
    • /
    • 2005
  • 영어 사전 검색 서비스 이용자들은 원하는 영어 단어의 철자를 정확하게 기억하지 못하고, 발음만을 기억하는 경우가 있다. 이러한 이용자들에게 도움을 주기 위해 본 연구에서는 입말 표기, 즉 영어 단어 발음의 한글 표기를 이용하여 영어 단어를 효과적으로 검색할 수 있는 방법을 제안한다. 이를 위하여 코닉스(KONIX) 코드를 개발하며, 입말 표기와 영어 단어를 코닉스 코드들로 변환한다. 그리고 변환된 코닉스 코드들 사이의 음성적 유사도를 편집 거리 방법과 2-그램 방법을 이용하여 계산한다. 또한 제안한 방법이 입말 표기에 의한 영어 단어 검색에 매우 효과적임을 실험을 통하여 입증한다.

유추적 사고에 의한 디자인 문제해결의 유형 - 연상된 단어와 스케치 분석을 중심으로 - (A Study on the Types of Design Problem Solving by Analogical Thinking - Focused on the Analysis of Associated Words and Sketch -)

  • 최은희;최윤아
    • 한국실내디자인학회논문집
    • /
    • 제16권2호
    • /
    • pp.63-70
    • /
    • 2007
  • Analogy in problem solving is similarity-based reasoning facilitated by verbal and visual operation. This similarity-based reasoning generally supports initial phase of idea search. Therefore, this study intends to infer the types of problem solving by tracing the analogy use of verbal and visual representation through a experimental research. According to the result of this research, the types of problem solving by analogy are classified into 'evolving', 'divergent', and 'poor conversion' type. Firstly, 'evolving type' is distinguished between 'combination type' associated different contents to develope a new design and 'transformation type' associated similar words and sketches to be continuously revised and developed. In these types usually structural analogy rather than surface analogy is used. Secondly, in 'divergent type' associated words or sketches are individually represented, and among them one design solution is selected. In this type usually surface analogy is used. Thirdly, in 'poor conversion type' interaction between verbal representation and visual representation does not go on smoothly, and the generation of idea is poor. In here surface analogy is mostly used. These findings could form the basis of skill development of idea generation and conversion in design education.

방제의 본초 중량비를 활용한 방제 비교 방안에 관한 연구 (A Study on the Comparative Method of Prescription Using Herb Weight Ratio)

  • 박대식;이부균;이병욱
    • 대한한의학방제학회지
    • /
    • 제21권2호
    • /
    • pp.121-132
    • /
    • 2013
  • Objectives : The objectives of this study is to establish data-base to find out similar herbal formulas with a particular herbal formula by comparing composition ratio of configuring herbs. And this thesis is to analyze differences of prescriptions and find out similar prescriptions by utilizing galenical mass ratio, which is directly related to effectiveness of galenical. Methods : This study was proceeded by using Access 2007 with Window 7(MS) and 2,787 prescriptions of which herbal configuration could be indicated by weight unit were analysed from Donguibogam. We standardize all units of the prescription and input the mass ratio data when entered galenical data. Results : We could confirm a degree of similarity between compared prescriptions and a particular prescription according to the sum of differences of herb weight ratio and similarity ratio. Conclusions : A most similar herbal formula could be searched through comparing multi prescriptions by multi prescriptions of herbal configuration from established herbal formula data-base where herb weight ratio of prescriptions is to be input.

Analysis of Hierarchical Competition Structure and Pricing Strategy in the Hotel Industry

  • BAEK, Unji;SIM, Youngseok;LEE, Seul-Ki
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제6권4호
    • /
    • pp.179-187
    • /
    • 2019
  • This study aims to investigate the effects of market commonality and resource similarity on price competition and the recursive consequences in the Korean lodging market. Price comparison among hotels in the same geographic market has been facilitated through the development of information technology, rendering little search cost of consumers. While the literature implies the heterogeneous price attack and response among hotels, a limited number of empirical researches focus on the asymmetric and recursive pattern in the competitive dynamics. This study empirically examines the price interactions in the Korean lodging market based on the theoretical framework of competitive price interactions and countervailing power. Demonstrating superiority to the spatial lag model and the ordinary least squares in the estimation, the results from spatial error model suggest that the hotels with longer operational history pose an asymmetric impact on the price of the newer hotels. The asymmetry is also found in chain hotels over the independent, further implying the possibility of predatory pricing. The findings of this study provide the evidence of a hierarchical structure in the price competition, with different countervailing power by the resources of the hotels. Theoretical and managerial implications are discussed, with suggestions for future study.

Comparison of Fine-Tuned Convolutional Neural Networks for Clipart Style Classification

  • Lee, Seungbin;Kim, Hyungon;Seok, Hyekyoung;Nang, Jongho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제9권4호
    • /
    • pp.1-7
    • /
    • 2017
  • Clipart is artificial visual contents that are created using various tools such as Illustrator to highlight some information. Here, the style of the clipart plays a critical role in determining how it looks. However, previous studies on clipart are focused only on the object recognition [16], segmentation, and retrieval of clipart images using hand-craft image features. Recently, some clipart classification researches based on the style similarity using CNN have been proposed, however, they have used different CNN-models and experimented with different benchmark dataset so that it is very hard to compare their performances. This paper presents an experimental analysis of the clipart classification based on the style similarity with two well-known CNN-models (Inception Resnet V2 [13] and VGG-16 [14] and transfers learning with the same benchmark dataset (Microsoft Style Dataset 3.6K). From this experiment, we find out that the accuracy of Inception Resnet V2 is better than VGG for clipart style classification because of its deep nature and convolution map with various sizes in parallel. We also find out that the end-to-end training can improve the accuracy more than 20% in both CNN models.

Molecular Docking Studies of Wolbachia Endosymbiont of Brugia Malayi's Carbonic Anhydrase Using Coumarin-chromene Derivatives Towards Designing Anti-filarial Agents

  • Malathy, P.;Jagadeesan, G.;Gunasekaran, K.;Aravindhan, S.
    • 통합자연과학논문집
    • /
    • 제9권4호
    • /
    • pp.268-274
    • /
    • 2016
  • Filariasis causing nematode Brugia malayi is shown to harbor wolbachia bacteria as symbionts. The sequenced genome of the wolbachia endosymbiont from B.malayi (wBm) offers an unprecedented opportunity to identify new wolbachia drug targets. Hence the enzyme carbonic anhydrase from wolbachia endosymbiont of Brugia malayi (wBm) which is responsible for the reversible interconversion of carbon dioxide and water to bicarbonate and protons (or vice versa) is chosen as the drug target for filariasis. This enzyme is thought to play critical functions in bacteria by involving in various steps of their life cycle which are important for survival, The 3D structure of wBm carbonic anhydrase is predicted by selecting a suitable template using the similarity search tool, BLAST. The BLAST results shows a hexapeptide transferase family protein from Anaplasma phagocytophilum (PDB ID: 3IXC) having 77% similarity and 54% identity with wBm carbonic anhydrase. Hence the above enzyme is chosen as the template and the 3D structure of carbonic anhydrase is predicted by the tool Modeller9v7. Since the three dimensional structure of carbonic anhydrase from wolbachia endosymbiont of Brugia malayi has not yet solved, attempts were made to predict this protein. The predicted structure is validated and also molecular docking studies are carried out with the suitable inhibitors that have been solved experimentally.