• Title/Summary/Keyword: search similarity

Search Result 537, Processing Time 0.026 seconds

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

Isolation and Characterization of a Novel Agar-Degrading Marine Bacterium, Gayadomonas joobiniege gen, nov, sp. nov., from the Southern Sea, Korea

  • Chi, Won-Jae;Park, Jae-Seon;Kwak, Min-Jung;Kim, Jihyun F.;Chang, Yong-Keun;Hong, Soon-Kwang
    • Journal of Microbiology and Biotechnology
    • /
    • v.23 no.11
    • /
    • pp.1509-1518
    • /
    • 2013
  • An agar-degrading bacterium, designated as strain $G7^T$, was isolated from a coastal seawater sample from Gaya Island (Gayado in Korean), Republic of Korea. The isolated strain $G7^T$ is gram-negative, rod shaped, aerobic, non-motile, and non-pigmented. A similarity search based on its 16S rRNA gene sequence revealed that it shares 95.5%, 90.6%, and 90.0% similarity with the 16S rRNA gene sequences of Catenovulum agarivorans $YM01^T$, Algicola sagamiensis, and Bowmanella pacifica W3-$3A^T$, respectively. Phylogenetic analyses demonstrated that strain $G7^T$ formed a distinct monophyletic clade closely related to species of the family Alteromonadaceae in the Alteromonas-like Gammaproteobacteria. The G+C content of strain $G7^T$ was 41.12 mol%. The DNA-DNA hybridization value between strain $G7^T$ and the phylogenetically closest strain $YM01^T$ was 19.63%. The genomes of $G7^T$ and $YM01^T$ had an average ANIb value of 70.00%. The predominant isoprenoid quinone of this particular strain was ubiquinone-8, whereas that of C. agarivorans $YM01^T$ was menaquinone-7. The major fatty acids of strain $G7^T$ were Iso-$C_{15:0}$ (41.47%), Anteiso-$C_{15:0}$ (22.99%), and $C_{16:1}{\omega}7c/iso-C_{15:0}2-OH$ (8.85%), which were quite different from those of $YM01^T$. Comparison of the phenotypic characteristics related to carbon utilization, enzyme production, and susceptibility to antibiotics also demonstrated that strain $G7^T$ is distinct from C. agarivorans $YM01^T$. Based on its phenotypic, chemotaxonomic, and phylogenetic distinctiveness, strain $G7^T$ was considered a novel genus and species in the Gammaproteobacteria, for which the name Gayadomonas joobiniege gen. nov. sp. nov. (ATCC BAA-2321 = $DSM25250^T=KCTC23721^T$) is proposed.

A Study on the Visual Representation of TREC Text Documents in the Construction of Digital Library (디지털도서관 구축과정에서 TREC 텍스트 문서의 시각적 표현에 관한 연구)

  • Jeong, Ki-Tai;Park, Il-Jong
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.1-14
    • /
    • 2004
  • Visualization of documents will help users when they do search similar documents. and all research in information retrieval addresses itself to the problem of a user with an information need facing a data source containing an acceptable solution to that need. In various contexts. adequate solutions to this problem have included alphabetized cubbyholes housing papyrus rolls. microfilm registers. card catalogs and inverted files coded onto discs. Many information retrieval systems rely on the use of a document surrogate. Though they might be surprise to discover it. nearly every information seeker uses an array of document surrogates. Summaries. tables of contents. abstracts. reviews, and MARC recordsthese are all document surrogates. That is, they stand infor a document allowing a user to make some decision regarding it. whether to retrieve a book from the stacks, whether to read an entire article, etc. In this paper another type of document surrogate is investigated using a grouping method of term list. lising Multidimensional Scaling Method (MDS) those surrogates are visualized on two-dimensional graph. The distances between dots on the two-dimensional graph can be represented as the similarity of the documents. More close the distance. more similar the documents.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Isolation of an Rx homolog from C. annuum and the evolution of Rx genes in the Solanaceae family

  • Shi, Jinxia;Yeom, Seon-In;Kang, Won-Hee;Park, Min-Kyu;Choi, Do-Il;Kwon, Jin-Kyung;Han, Jung-Heon;Lee, Heung-Ryul;Kim, Byung-Dong;Kang, Byoung-Cheorl
    • Plant Biotechnology Reports
    • /
    • v.5 no.4
    • /
    • pp.331-344
    • /
    • 2011
  • The well-conserved NBS domain of resistance (R) genes cloned from many plants allows the use of a PCR-based approach to isolate resistance gene analogs (RGAs). In this study, we isolated an RGA (CapRGC) from Capsicum annuum "CM334" using a PCR-based approach. This sequence encodes a protein with very high similarity to Rx genes, the Potato Virus X (PVX) R genes from potato. An evolutionary analysis of the CapRGC gene and its homologs retrieved by an extensive search of a Solanaceae database provided evidence that Rx-like genes (eight ESTs or genes that show very high similarity to Rx) appear to have diverged from R1 [an NBS-LRR R gene against late blight (Phytophthora infestans) from potato]-like genes. Structural comparison of the NBS domains of all the homologs in Solanaceae revealed that one novel motif, 14, is specific to the Rx-like genes, and also indicated that several other novel motifs are characteristic of the R1-like genes. Our results suggest that Rx-like genes are ancient but conserved. Furthermore, the novel conserved motifs can provide a basis for biochemical structural. function analysis and be used for degenerate primer design for the isolation of Rx-like sequences in other plant species. Comparative mapping study revealed that the position of CapRGC is syntenic to the locations of Rx and its homolog genes in the potato and tomato, but cosegregation analysis showed that CapRGC may not be the R gene against PVX in pepper. Our results confirm previous observations that the specificity of R genes is not conserved, while the structure and function of R genes are conserved. It appears that CapRGC may function as a resistance gene to another pathogen, such as the nematode to which the structure of CapRGC is most similar.

Analysis of Differentially Expressed Genes in Kiwifruit Actinidia chinensis var. 'Hongyang' (참다래 '홍양' 품종의 차등발현유전자 분석)

  • Bae, Kyung-Mi;Kwack, Yong-Bum;Shin, II-Sheob;Kim, Se-Hee;Kim, Jeong-Hee;Cho, Kang-Hee
    • Korean Journal of Breeding Science
    • /
    • v.43 no.5
    • /
    • pp.448-456
    • /
    • 2011
  • We used suppression subtractive hybridization (SSH) combined with mirror orientation selection (MOS) method to screen differentially expressed genes from red-fleshed kiwifruit 'Hongyang'. As a result, the 288 clones were obtained by subcloning PCR product and 192 clones that showed positive clones on colony PCR analysis were selected. All the positive clones were sequenced. After comparisons with the NCBI/Genbank database using the BLAST search revealed that 30 clones showed sequence similarity to genes from other organisms; 10 clones showed significant sequence similarity to known genes. Among these clones, 3 clones (AcF21, AcF42 and AcF106) had sequence homology to 1-aminicyclopropane-carboxylic acid (ACC)-oxidase (ACO) that known to be related to fruit ripening. The expression patterns of differentially expressed genes were further investigated to validate the SSH data by reverse transcription PCR (RT-PCR) and quantitative real-time PCR (qReal-time PCR) analysis. All the data from qReal-time PCR analysis coincide with the results obtained from RT-PCR analysis. Three clones were expressed at higher levels in 'Hongyang' than 'Hayward'. AcF21 was highly expressed in the other genes at 120 days after full bloom (DAFB) and 160 DAFB of 'Hongyang'.

A DB Pruning Method in a Large Corpus-Based TTS with Multiple Candidate Speech Segments (대용량 복수후보 TTS 방식에서 합성용 DB의 감량 방법)

  • Lee, Jung-Chul;Kang, Tae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.572-577
    • /
    • 2009
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. To prune the redundant speech segments in a large speech segment DB, we can utilize a decision-tree based triphone clustering algorithm widely used in speech recognition area. But, the conventional methods have problems in representing the acoustic transitional characteristics of the phones and in applying context questions with hierarchic priority. In this paper, we propose a new clustering algorithm to downsize the speech DB. Firstly, three 13th order MFCC vectors from first, medial, and final frame of a phone are combined into a 39 dimensional vector to represent the transitional characteristics of a phone. And then the hierarchically grouped three question sets are used to construct the triphone trees. For the performance test, we used DTW algorithm to calculate the acoustic similarity between the target triphone and the triphone from the tree search result. Experimental results show that the proposed method can reduce the size of speech DB by 23% and select better phones with higher acoustic similarity. Therefore the proposed method can be applied to make a small sized TTS.

LDA Topic Modeling and Recommendation of Similar Patent Document Using Word2vec (LDA 토픽 모델링과 Word2vec을 활용한 유사 특허문서 추천연구)

  • Apgil Lee;Keunho Choi;Gunwoo Kim
    • Information Systems Review
    • /
    • v.22 no.1
    • /
    • pp.17-31
    • /
    • 2020
  • With the start of the fourth industrial revolution era, technologies of various fields are merged and new types of technologies and products are being developed. In addition, the importance of the registration of intellectual property rights and patent registration to gain market dominance of them is increasing in oversea as well as in domestic. Accordingly, the number of patents to be processed per examiner is increasing every year, so time and cost for prior art research are increasing. Therefore, a number of researches have been carried out to reduce examination time and cost for patent-pending technology. This paper proposes a method to calculate the degree of similarity among patent documents of the same priority claim when a plurality of patent rights priority claims are filed and to provide them to the examiner and the patent applicant. To this end, we preprocessed the data of the existing irregular patent documents, used Word2vec to obtain similarity between patent documents, and then proposed recommendation model that recommends a similar patent document in descending order of score. This makes it possible to promptly refer to the examination history of patent documents judged to be similar at the time of examination by the examiner, thereby reducing the burden of work and enabling efficient search in the applicant's prior art research. We expect it will contribute greatly.

Network Analysis of Prescriptions for Inflammatory Bowel Disease - Preliminary Exploration of Prescriptions Using the K-HERB Database - (염증성 장질환 처방에 대한 네트워크 분석 - K-HERB 데이터베이스를 활용한 예비적 처방 탐색 -)

  • Jae-Yeon Lee;Yu-Gyeong Lee;Yeon-Hwa Lee;Seojung Ha;Bo-In Kwon
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.28 no.2
    • /
    • pp.131-150
    • /
    • 2024
  • Objectives : The aim of this study was to perform network analysis and analysis using the K-HERB database on inflammatory bowel disease (IBD), to verify the similarity between the derived networks and existing prescriptions, and to explore the possibility of developing new IBD prescriptions preliminarily. Methods : We conducted a comprehensive literature search on July 6, 2024, utilizing databases such as ScienceON, RISS, and OASIS. Clinical studies assessing the efficacy of herbal medicine in treating Crohn's disease and ulcerative colitis were identified and compiled into a structured database. This dataset, which included related prescriptions and herbal formulations, was subsequently analyzed using NetMiner 4 for centrality and Louvain clustering analyses. We then compared the networks derived from the K-HERB database with existing therapeutic prescriptions to assess their similarity. Results : A total of 24 prescriptions and 66 herbs were identified across the surveyed studies on IBD. Paeoniae Radix Alba(白芍藥) emerged as the most frequently utilized herb for both Crohn's disease and ulcerative colitis. Prominent herb combinations included Paeoniae Radix Alba-Angelicae Sinensis Radix (白芍藥-當歸), Angelicae Sinensis Radix-Coptidis Rhizoma (當歸-黃連), and Coptidis Rhizoma-Scutellariae Radix (黃連-黃芩) for ulcerative colitis. Centrality analysis revealed that Poria cocos (茯苓) and Paeoniae Radix Alba (白芍藥) had high centrality in the Crohn's disease, while Angelicae Sinensis Radix (當歸) and Paeoniae Radix Alba (白芍藥) had high centrality in the ulcerative colitis, indicating their prominent roles within the networks. Cohesion analysis resulted in 7 networks for Crohn's disease and 16 networks for ulcerative colitis. After excluding networks with a single herb, three networks related to Crohn's disease and two related to ulcerative colitis were examined using the K-HERB database. Among the 14 derived prescriptions for Crohn's disease and seven for ulcerative colitis, all except Oryeong-san (五苓散) were non-traditional in the context of IBD treatment. Conclusion : This preliminary study may provide a basis for the understanding and application of herbal prescriptions for IBD based on network analysis and the K-HERB database.

Partial Denoising Boundary Image Matching Based on Time-Series Data (시계열 데이터 기반의 부분 노이즈 제거 윤곽선 이미지 매칭)

  • Kim, Bum-Soo;Lee, Sanghoon;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.943-957
    • /
    • 2014
  • Removing noise, called denoising, is an essential factor for the more intuitive and more accurate results in boundary image matching. This paper deals with a partial denoising problem that tries to allow a limited amount of partial noise embedded in boundary images. To solve this problem, we first define partial denoising time-series which can be generated from an original image time-series by removing a variety of partial noises and propose an efficient mechanism that quickly obtains those partial denoising time-series in the time-series domain rather than the image domain. We next present the partial denoising distance, which is the minimum distance from a query time-series to all possible partial denoising time-series generated from a data time-series, and we use this partial denoising distance as a similarity measure in boundary image matching. Using the partial denoising distance, however, incurs a severe computational overhead since there are a large number of partial denoising time-series to be considered. To solve this problem, we derive a tight lower bound for the partial denoising distance and formally prove its correctness. We also propose range and k-NN search algorithms exploiting the partial denoising distance in boundary image matching. Through extensive experiments, we finally show that our lower bound-based approach improves search performance by up to an order of magnitude in partial denoising-based boundary image matching.