• Title/Summary/Keyword: Context-based Similarity

Search Result 87, Processing Time 0.042 seconds

A New Semantic Distance Measurement Method using TF-IDF in Linked Open Data (링크드 오픈 데이터에서 TF-IDF를 이용한 새로운 시맨틱 거리 측정 기법)

  • Cho, Jung-Gil
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.10
    • /
    • pp.89-96
    • /
    • 2020
  • Linked Data allows structured data to be published in a standard way that datasets from various domains can be interlinked. With the rapid evolution of Linked Open Data(LOD), researchers are exploiting it to solve particular problems such as semantic similarity assessment. In this paper, we propose a method, on top of the basic concept of Linked Data Semantic Distance (LDSD), for calculating the Linked Data semantic distance between resources that can be used in the LOD-based recommender system. The semantic distance measurement model proposed in this paper is based on a similarity measurement that combines the LOD-based semantic distance and a new link weight using TF-IDF, which is well known in the field of information retrieval. In order to verify the effectiveness of this paper's approach, performance was evaluated in the context of an LOD-based recommendation system using mixed data of DBpedia and MovieLens. Experimental results show that the proposed method shows higher accuracy compared to other similar methods. In addition, it contributed to the improvement of the accuracy of the recommender system by expanding the range of semantic distance calculation.

Enhancing Existing Products and Services Through the Discovery of Applicable Technology: Use of Patents and Trademarks (제품 및 서비스 개선을 위한 기술기회 발굴: 특허와 상표 데이터 활용)

  • Seoin Park;Jiho Lee;Seunghyun Lee;Janghyeok Yoon;Changho Son
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.1-14
    • /
    • 2023
  • As markets and industries continue to evolve rapidly, technology opportunity discovery (TOD) has become critical to a firm's survival. From a common consensus that TOD based on a firm's capabilities is a valuable method for small and medium-sized enterprises (SMEs) and reduces the risk of failure in technology development, studies for TOD based on a firm's capabilities have been actively conducted. However, previous studies mainly focused on a firm's technological capabilities and rarely on business capabilities. Since discovered technologies can create market value when utilized in a firm's business, a firm's current business capabilities should be considered in discovering technology opportunities. In this context, this study proposes a TOD method that considers both a firm's business and technological capabilities. To this end, this study uses patent data, which represents the firm's technological capabilities, and trademark data, which represents the firm's business capabilities. The proposed method comprises four steps: 1) Constructing firm technology and business capability matrices using patent classification codes and trademark similarity group codes; 2) Transforming the capability matrices to preference matrices using the fuzzy function; 3) Identifying a target firm's candidate technology opportunities using the collaborative filtering algorithm; 4) Recommending technology opportunities using a portfolio map constructed based on technology similarity and applicability indices. A case study is conducted on a security firm to determine the validity of the proposed method. The proposed method can assist SMEs that face resource constraints in identifying technology opportunities. Further, it can be used by firms that do not possess patents since the proposed method uncovers technology opportunities based on business capabilities.

A Dissimilarity with Dice-Jaro-Winkler Test Case Prioritization Approach for Model-Based Testing in Software Product Line

  • Sulaiman, R. Aduni;Jawawi, Dayang N.A.;Halim, Shahliza Abdul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.3
    • /
    • pp.932-951
    • /
    • 2021
  • The effectiveness of testing in Model-based Testing (MBT) for Software Product Line (SPL) can be achieved by considering fault detection in test case. The lack of fault consideration caused test case in test suite to be listed randomly. Test Case Prioritization (TCP) is one of regression techniques that is adaptively capable to detect faults as early as possible by reordering test cases based on fault detection rate. However, there is a lack of studies that measured faults in MBT for SPL. This paper proposes a Test Case Prioritization (TCP) approach based on dissimilarity and string based distance called Last Minimal for Local Maximal Distance (LM-LMD) with Dice-Jaro-Winkler Dissimilarity. LM-LMD with Dice-Jaro-Winkler Dissimilarity adopts Local Maximum Distance as the prioritization algorithm and Dice-Jaro-Winkler similarity measure to evaluate distance among test cases. This work is based on the test case generated from statechart in Software Product Line (SPL) domain context. Our results are promising as LM-LMD with Dice-Jaro-Winkler Dissimilarity outperformed the original Local Maximum Distance, Global Maximum Distance and Enhanced All-yes Configuration algorithm in terms of Average Fault Detection Rate (APFD) and average prioritization time.

Method of Extracting the Topic Sentence Considering Sentence Importance based on ELMo Embedding (ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출 방법)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.39-46
    • /
    • 2021
  • This study is about a method of extracting a summary from a news article in consideration of the importance of each sentence constituting the article. We propose a method of calculating sentence importance by extracting the probabilities of topic sentence, similarity with article title and other sentences, and sentence position as characteristics that affect sentence importance. At this time, a hypothesis is established that the Topic Sentence will have a characteristic distinct from the general sentence, and a deep learning-based classification model is trained to obtain a topic sentence probability value for the input sentence. Also, using the pre-learned ELMo language model, the similarity between sentences is calculated based on the sentence vector value reflecting the context information and extracted as sentence characteristics. The topic sentence classification performance of the LSTM and BERT models was 93% accurate, 96.22% recall, and 89.5% precision, resulting in high analysis results. As a result of calculating the importance of each sentence by combining the extracted sentence characteristics, it was confirmed that the performance of extracting the topic sentence was improved by about 10% compared to the existing TextRank algorithm.

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.163-170
    • /
    • 2023
  • The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.

Development and Evaluation of D-Attention Unet Model Using 3D and Continuous Visual Context for Needle Detection in Continuous Ultrasound Images (연속 초음파영상에서의 바늘 검출을 위한 3D와 연속 영상문맥을 활용한 D-Attention Unet 모델 개발 및 평가)

  • Lee, So Hee;Kim, Jong Un;Lee, Su Yeol;Ryu, Jeong Won;Choi, Dong Hyuk;Tae, Ki Sik
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.5
    • /
    • pp.195-202
    • /
    • 2020
  • Needle detection in ultrasound images is sometimes difficult due to obstruction of fat tissues. Accurate needle detection using continuous ultrasound (CUS) images is a vital stage of treatment planning for tissue biopsy and brachytherapy. The main goal of the study is classified into two categories. First, new detection model, i.e. D-Attention Unet, is developed by combining the context information of 3D medical data and CUS images. Second, the D-Attention Unet model was compared with other models to verify its usefulness for needle detection in continuous ultrasound images. The continuous needle images taken with ultrasonic waves were converted into still images for dataset to evaluate the performance of the D-Attention Unet. The dataset was used for training and testing. Based on the results, the proposed D-Attention Unet model showed the better performance than other 3 models (Unet, D-Unet and Attention Unet), with Dice Similarity Coefficient (DSC), Recall and Precision at 71.9%, 70.6% and 73.7%, respectively. In conclusion, the D-Attention Unet model provides accurate needle detection for US-guided biopsy or brachytherapy, facilitating the clinical workflow. Especially, this kind of research is enthusiastically being performed on how to add image processing techniques to learning techniques. Thus, the proposed method is applied in this manner, it will be more effective technique than before.

Texture Image Database Retrieval Using JPEG-2000 Partial Entropy Decoding (JPEG-2000 부분 엔트로피 복호화에 의향 질감 영상 데이터베이스 검색)

  • Park, Ha-Joong;Jung, Ho-Youl
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5C
    • /
    • pp.496-512
    • /
    • 2007
  • In this paper, we propose a novel JPEG-2000 compressed image retrieval system using feature vector extracted through partial entropy decoding. Main idea of the proposed method is to utilize the context information that is generated during entropy encoding/decoding. In the framework of JPEG-2000, the context of a current coefficient is determined depending on the pattern of the significance and/or the sign of its neighbors in three bit-plane coding passes and four coding modes. The contexts provide a model for estimating the probability of each symbol to be coded. And they can efficiently describe texture images which have different pattern because they represent the local property of images. In addition, our system can directly search the images in the JPEG-2000 compressed domain without full decompression. Therefore, our proposed scheme can accelerate the work of retrieving images. We create various distortion and similarity image databases using MIT VisTex texture images for simulation. we evaluate the proposed algorithm comparing with the previous ones. Through simulations, we demonstrate that our method achieves good performance in terms of the retrieval accuracy as well as the computational complexity.

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel (시맨틱 구문 트리 커널을 이용한 생명공학 분야 전문용어간 관계 식별 및 분류 연구)

  • Choi, Sung-Pil;Jeong, Chang-Hoo;Chun, Hong-Woo;Cho, Hyun-Yang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.2
    • /
    • pp.251-275
    • /
    • 2011
  • In this paper, we propose a novel kernel called a semantic parse tree kernel that extends the parse tree kernel previously studied to extract protein-protein interactions(PPIs) and shown prominent results. Among the drawbacks of the existing parse tree kernel is that it could degenerate the overall performance of PPI extraction because the kernel function may produce lower kernel values of two sentences than the actual analogy between them due to the simple comparison mechanisms handling only the superficial aspects of the constituting words. The new kernel can compute the lexical semantic similarity as well as the syntactic analogy between two parse trees of target sentences. In order to calculate the lexical semantic similarity, it incorporates context-based word sense disambiguation producing synsets in WordNet as its outputs, which, in turn, can be transformed into more general ones. In experiments, we introduced two new parameters: tree kernel decay factors, and degrees of abstracting lexical concepts which can accelerate the optimization of PPI extraction performance in addition to the conventional SVM's regularization factor. Through these multi-strategic experiments, we confirmed the pivotal role of the newly applied parameters. Additionally, the experimental results showed that semantic parse tree kernel is superior to the conventional kernels especially in the PPI classification tasks.

Roommate assignment for effective character education within a Residential College system (Residential college에서 효과적인 인성 교육을 위한 룸메이트 배정 문제)

  • Choi, Hyebong;Nam, J. Sophia;Kim, Woo-sung
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.9
    • /
    • pp.319-330
    • /
    • 2017
  • Recently, various universities in Korea have started to work on strengthening their liberal arts and character education through the residential college (RC) system, carrying out various community programs for this purpose. However, because most programs are based on student-to-student relationships, problems can often arise within the community living environments. This paper proposes the roommate assignment algorithm in the context of a residential college, as to effectively achieve character education goals. The clustering algorithm we propose is based on the similarity hypothesis. As a result of the assignment, the degree of similarity (euclidean distance) between roommates was significantly higher than that assigned randomly. The algorithm developed in this study was applied to the data of the students living in the international campus of H University.

Community Model for Smart TV over the Top Services

  • Pandey, Suman;Won, Young Joon;Choi, Mi-Jung;Gil, Joon-Min
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.577-590
    • /
    • 2016
  • We studied the current state-of-the-art of Smart TV, the challenges and the drawbacks. Mainly we discussed the lack of end-to-end solution. We then illustrated the differences between Smart TV and IPTV from network service provider point of view. Unlike IPTV, viewer of Smart TV's over-the-top (OTT) services could be global, such as foreign nationals in a country or viewers having special viewing preferences. Those viewers are sparsely distributed. The existing TV service deployment models over Internet are not suitable for such viewers as they are based on content popularity, hence we propose a community based service deployment methodology with proactive content caching on rendezvous points (RPs). In our proposal, RPs are intermediate nodes responsible for caching routing and decision making. The viewer's community formation is based on geographical locations and similarity of their interests. The idea of using context information to do proactive caching is itself not new, but we combined this with "in network caching" mechanism of content centric network (CCN) architecture. We gauge the performance improvement achieved by a community model. The result shows that when the total numbers of requests are same; our model can have significantly better performance, especially for sparsely distributed communities.