• Title/Summary/Keyword: search similarity

Search Result 537, Processing Time 0.028 seconds

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning (트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용)

  • Woo, Deock-Chae;Moon, Hyun Sil;Kwon, Suhnbeom;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.143-159
    • /
    • 2019
  • Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

First Report of Leptosphaerulina saccharicola Isolated from Persimmon (Diospyros kaki) Tree Bark in Korea

  • Fulbert, Okouma Nguia;Ayim, Benjamin Yaw;Das, Kallol;Lim, Yang-Sook;Lee, Seung-Yeol;Jung, Hee-Young
    • The Korean Journal of Mycology
    • /
    • v.47 no.1
    • /
    • pp.13-18
    • /
    • 2019
  • A fungal strain, designated PTT-2, was isolated from the bark of the trunk of a persimmon (Diospyros kaki) tree in Cheongdo, Korea. The isolate showed morphological similarities with Leptosphaerulina saccharicola. Strain PTT-2 had more rapid growth on potato dextrose agar medium than on oatmeal agar, malt extract agar, and synthetic nutrient poor agar media, with colony sizes of 53.8 mm, 49.8 mm, 48.4 mm, and 28.1 mm after 7 days at $25^{\circ}C$ temperature, respectively. Strain PTT-2 produced ascospores, which had irregular wavy edges, oblong to ellipsoidal shape, hyaline appearance and $23.6{\times}10{\mu}m$ size. The black ascomata were developed on PDA medium, and asci were recorded. A BLAST search of the internal transcribed spacer (ITS) region, TEF1-${\alpha}$ and RPB2 gene sequences revealed that strain PTT-2 showed more than 99% nucleotide similarity with a strain of Leptosphaerulina saccharicola previously reported from Thailand. A neighbor-joining phylogenetic tree was constructed by concatenating the above-mentioned sequences, and showed that strain PTT-2 clustered in the same clade with L. saccharicola. Based on these findings, this is the first record of Leptosphaerulina saccharicola occurring in Korea.

A Design of Similar Video Recommendation System using Extracted Words in Big Data Cluster (빅데이터 클러스터에서의 추출된 형태소를 이용한 유사 동영상 추천 시스템 설계)

  • Lee, Hyun-Sup;Kim, Jindeog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.172-178
    • /
    • 2020
  • In order to recommend contents, the company generally uses collaborative filtering that takes into account both user preferences and video (item) similarities. Such services are primarily intended to facilitate user convenience by leveraging personal preferences such as user search keywords and viewing time. It will also be ranked around the keywords specified in the video. However, there is a limit to analyzing video similarities using limited keywords. In such cases, the problem becomes serious if the specified keyword does not properly reflect the item. In this paper, I would like to propose a system that identifies the characteristics of a video as it is by the system without human intervention, and analyzes and recommends similarities between videos. The proposed system analyzes similarities by taking into account all words (keywords) that have different meanings from training videos, and in such cases, the methods handled by big data clusters are applied because of the large scale of data and operations.

A Client-Side App Model for Classifying and Storing Documents

  • Elhussein, Bahaeldein;Karrar, Abdelrahman Elsharif;Khalifa, Mahmoud;Alsharani, Mohammed Mujib
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.225-233
    • /
    • 2022
  • Due to the large number of documents that are important to people and many of their requests from time to time to perform an essential official procedure, this requires a practical arrangement and organization for them. When necessary, many people struggle with effectively arranging official documents that enable display, which takes a lot of time and effort. Also, no mobile apps specialize in professionally preserving essential electronic records and displaying them when needed. Dataset consisting of 10,841 rows and 13 columns was analyzed using Anaconda, Python, and Mito Data Science new tool obtained from Google Play. The research was conducted using the quantitative descriptive approach. The presented solution is a model specialized in saving essential documents, categorizing according to the user's desire, and displaying them when needed. It is possible to send in an image or a pdf file. Aside from identifying file kinds like PDFs and pictures, the model also looks for and verifies specific file extensions. The file extension and its properties are checked before sharing or saving it by applying the similarity algorithm (Levenshtein). Our method effectively and efficiently facilitated the search process, saving the user time and effort. In conclusion, such an application is not available, which facilitates the process of classifying documents effectively and displaying them quickly and easily for people for printing or sending to some official procedures, and it is considered one of the applications that greatly help in preserving time, effort, and money for people.

A Modeling Methodology for Analysis of Dynamic Systems Using Heuristic Search and Design of Interface for CRM (휴리스틱 탐색을 통한 동적시스템 분석을 위한 모델링 방법과 CRM 위한 인터페이스 설계)

  • Jeon, Jin-Ho;Lee, Gye-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.4
    • /
    • pp.179-187
    • /
    • 2009
  • Most real world systems contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of them. A two-step methodology comprised of clustering and then model creation is proposed for the analysis on time series data. An interface is designed for CRM(Customer Relationship Management) that provides user with 1:1 customized information using system modeling. It was confirmed from experiments that better clustering would be derived from model based approach than similarity based one. Clustering is followed by model creation over the clustered groups, by which future direction of time series data movement could be predicted. The effectiveness of the method was validated by checking how similarly predicted values from the models move together with real data such as stock prices.

Audio Fingerprint Extraction Method Using Multi-Level Quantization Scheme (다중 레벨 양자화 기법을 적용한 오디오 핑거프린트 추출 방법)

  • Song Won-Sik;Park Man-Soo;Kim Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4
    • /
    • pp.151-158
    • /
    • 2006
  • In this paper, we proposed a new audio fingerprint extraction method, based on Philips' music retrieval algorithm, which uses the energy difference of neighboring filter-bank and probabilistic characteristics of music. Since Philips method uses too many filter-banks in limited frequency band, it may cause audio fingerprints to be highly sensitive to additive noises and to have too high correlation between neighboring bands. The proposed method improves robustness to noises by reducing the number of filter-banks while it maintains the discriminative power by representing the energy difference of bands with 2 bits where the quantization levels are determined by probabilistic characteristics. The correlation which exists among 4 different levels in 2 bits is not only utilized in similarity measurement. but also in efficient reduction of searching area. Experiments show that the proposed method is not only more robust to various environmental noises (street, department, car, office, and restaurant), but also takes less time for database search than Philips in the case where music is highly degraded.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Overlapping Region of p53/Wrap53 Transcripts: Mutational Analysis and Sequence Similarity with microRNA-4732-5p

    • Pouladi, Nasser;Kouhsari, Shideh Montasser;Feizi, Mohammadali Hosseinpour;Gavgani, Reyhaneh Ravanbakhsh;Azarfam, Parvin
      • Asian Pacific Journal of Cancer Prevention
      • /
      • v.14 no.6
      • /
      • pp.3503-3507
      • /
      • 2013
    • Background: Although the majority of investigations concerned with TP53 and its protein have focused on coding regions, recently a set of studies highlighted significant roles of regulatory elements located in p53 mRNA, especially 5'UTR. The wrap53${\alpha}$ transcript is one of those that acts as a natural antisense agent, forming RNA-RNA hybrids with p53 mRNA and protecting it from degradation. Materials and Methods: In this study, we focused on the mutation status of exon $1{\alpha}$ of the WRAP53 gene (according to exon 1 of p53) in 160 breast tumor tissue samples and conducted a bioinformatics search for probable miRNA binding site in the p53/wrap53 overlapping region. Mutations were detected, using single stranded conformation polymorphism (SSCP) and sequencing. We applied the miRBase database for prediction of miRNAs which target overlapping region of p53/wrap53 transcripts. Results: Our results showed all samples to have wild type alleles in exon 1 of TP53 gene. We could detect a novel and unreported intronic mutation (IVS1+56, G>C) outside overlapping regions of p53/wrap53 genes in breast cancer tissues and also predict the presence of a binding site for miR-4732-5p in the 5'UTR of Wrap53 mRNA. Conclusions: From our findings we propose designing further studies focused on overexpression of miRNA-4732-5p and introducing different mutations in the overlapping region of wrap53 and p53 genes in order to study their effects on p53 and its ${\Delta}N$ isoform (${\Delta}$40p53) expression. The results may provide new pieces in the p53 targeting puzzle for cancer therapy.

    Utilization of Demographic Analysis with IMDB User Ratings on the Recommendation of Movies (IMDB 사용자평점에 대한 인구통계학적 분석의 활용)

    • Bae, Sung Moon;Lee, Sang Chun;Park, Jong Hun
      • The Journal of Society for e-Business Studies
      • /
      • v.19 no.3
      • /
      • pp.125-141
      • /
      • 2014
    • Nowadays, overflowing data produced every second from the internet make people to be difficult to search for the useful information. That's why people have invented and developed unique tools that they get some relevant information. In this paper, the recommender system, one of the effective tools, is used and it helps us to get the useful information that we want by using demographic information to predict new items of interest. The demographic recommender system in this paper computes users' similarity using demographic information, age and gender. So we performed demographic analysis on movie ratings on Internet Movie Database (IMDB) web site that movies are rated by thousands of people, where users submitted a movie rating after they watched a recent popular film. Meanwhile, we can understand that user's ratings, among various determinants of box office, is very essential factor in the study on recommendation of movie. This paper is aimed at analyzing movie average ratings directly given by film viewers, categorizing them into groups by sex and age, investigating the entire group and finding the representative group by examining it with F-test and T-test. This result is used to promote and recommend for the target group only. Therefore, this study is considerably significant as presenting utilization for movie business as well as showing how to analyze demographic information on movie ratings on the web.

    Social Network : A Novel Approach to New Customer Recommendations (사회연결망 : 신규고객 추천문제의 새로운 접근법)

    • Park, Jong-Hak;Cho, Yoon-Ho;Kim, Jae-Kyeong
      • Journal of Intelligence and Information Systems
      • /
      • v.15 no.1
      • /
      • pp.123-140
      • /
      • 2009
    • Collaborative filtering recommends products using customers' preferences, so it cannot recommend products to the new customer who has no preference information. This paper proposes a novel approach to new customer recommendations using the social network analysis which is used to search relationships among social entities such as genetics network, traffic network, organization network, etc. The proposed recommendation method identifies customers most likely to be neighbors to the new customer using the centrality theory in social network analysis and recommends products those customers have liked in the past. The procedure of our method is divided into four phases : purchase similarity analysis, social network construction, centrality-based neighborhood formation, and recommendation generation. To evaluate the effectiveness of our approach, we have conducted several experiments using a data set from a department store in Korea. Our method was compared with the best-seller-based method that uses the best-seller list to generate recommendations for the new customer. The experimental results show that our approach significantly outperforms the best-seller-based method as measured by F1-measure.

    • PDF

    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.