• Title/Summary/Keyword: 텍스트 매칭

Search Result 81, Processing Time 0.029 seconds

An Effective Algorithm for Checking Subsumption Relation on String Data Containing Wildcard Characters (와일드카드 문자를 포함하는 스트링 데이터 사이의 포함관계 확인을 위한 효율적인 알고리즘)

  • Kim, Do-Han;Park, Hee-Jin;Paek, Eun-Ok
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.9
    • /
    • pp.475-482
    • /
    • 2005
  • String data containing wildcard characters may represent certain patterns in texts. A subsumption relation between two patterns can be defined by a subset relation between sets of strings that match those patterns. Thus, the subsumption relation check is important to determine whether each pattern represents a set of strings without any overlap with another pattern. In this paper, we propose an effective algorithm that can determine subsumption relation between strings with wildcard characters. First, we consider a simple extension of the suffix tree algorithm so that it nay include wildcard characters and then we propose another method that checks the subsumption relation by dividing a suffix tree structure at each location of string data.

Design and Implementation of the Feature Information Parsing System for Video Image (동영상 이미지의 특징정보 분석 시스템 설계 및 구현)

  • 최내원;지정규
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.3
    • /
    • pp.1-8
    • /
    • 2002
  • Due to the fast development in computer application technologies, a video is now being more widely used than ever in many areas. The current information analyzing systems are basically built to process text-based data. Thus, it has little bits Problems when it needs to correctly represent the ambiguity of a video, when it has to process a large amount of comments. or when it lacks the objectivity that the jobs require. We would like to purpose the method that is capable of analyze a large amount of video efficiently. To extract the color, we translate the color from RGB to HSI and use the information that matches with the representative colors. To extract the shape information, we use improved moment invariants(IMI) so that we can solve many problems of histogram intersection.

  • PDF

The Prediction of Cryptocurrency on Using Text Mining and Deep Learning Techniques : Comparison of Korean and USA Market (텍스트 마이닝과 딥러닝을 활용한 암호화폐 가격 예측 : 한국과 미국시장 비교)

  • Won, Jonggwan;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • In this study, we predicted the bitcoin prices of Bithum and Coinbase, a leading exchange in Korea and USA, using ARIMA and Recurrent Neural Networks(RNNs). And we used news articles from each country to suggest a separated RNN model. The suggested model identifies the datasets based on the changing trend of prices in the training data, and then applies time series prediction technique(RNNs) to create multiple models. Then we used daily news data to create a term-based dictionary for each trend change point. We explored trend change points in the test data using the daily news keyword data of testset and term-based dictionary, and apply a matching model to produce prediction results. With this approach we obtained higher accuracy than the model which predicted price by applying just time series prediction technique. This study presents that the limitations of the time series prediction techniques could be overcome by exploring trend change points using news data and various time series prediction techniques with text mining techniques could be applied to improve the performance of the model in the further research.

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.

The Development of Characters with Artificial Emotion through Analyzing Drama characters - With a Korean Drama titled 'The Sons of Sol Pharmacy House' (드라마 대본 분석을 통한 등장인물의 성격이 반영된 인공정서 캐릭터 개발 - '솔약국집 아들들'을 중심으로)

  • Ham, Jun-Seok;Rhee, Shin-Young;Bang, Green;Ko, Il-Ju
    • Science of Emotion and Sensibility
    • /
    • v.15 no.2
    • /
    • pp.239-248
    • /
    • 2012
  • This paper looks to extract personality traits from the drama characters within a drama script, and to apply it them to a character that has an artificial emotion. The method of applying the personality of a character from a drama script is as follows. First, we separate a drama script into several pieces, by the characters therin. Next, we extract emotion-related terms by matching morphemes analysis and by using an emotion terms database. Next, we analyze a dominant emotion using extracted emotion terms. Finally last, we apply the analyzed dominant emotion to an equation pertaining to artificial emotion. We made progress in developing user evaluation that features blind testing, to verify that the artificial emotion character bears the personality of a drama character. We apply three drama character personalities to artificial emotion characters bearing the same appearance. The user had to match three artificial emotion characters and drama characters according to personality. The users had a high percentage of correct answers, thus confirming the efficacy of our method of applying a personality, using information from a drama script.

  • PDF

Study on the EDA based Statistics Attributes Discovery and Utilization for the Maritime Safety Statistics Items Diversification (해상안전 통계 항목 다양화를 위한 EDA 기반 통계 속성 도출 및 활용에 관한 연구)

  • Kang, Seong Kyung;Lee, Young Jai
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.7
    • /
    • pp.798-809
    • /
    • 2020
  • Evidence-based policymaking and assessments for scientific administration have increased the importance of statistics (data) utilization. Statistics can explain specific phenomena by providing numerical values and are a public resource for national decision making. Due to these inherent attributes, statistics are utilized as baseline and base data for government policy determinations and the analysis of various phenomena. However, compared to the importance, the role of statistics is limited, and statistics are often used as simple abstracts, produced mainly for suppliers, not for consumers' perspectives to create value. This study explores the statistical data and other attributes that can be utilized for policies or research to address the problems mentioned above. The baseline statistical data used in this study is from the Maritime Distress Accident Statistical Yearbook published by the South Korean Coast Guard, and other additional attributes are from text analyses of vessel casualty situation reports from the South Korean Maritime Police. Collecting 56 attributes drawn from the text analysis and executing an EDA resulted in 88 attribute unions: 18 attribute unions had a satisfactory significance probability (p-value < .05) and a strong correlation coefficient above 0.7, and 70 attribute unions had a middle correlation. (over 0.4 and under 0.7). Additionally, to utilize the extra attributes discovered from the EDA politically, a keyword analysis for each detailed strategy of the disaster Preparation basic plan was executed, the utilization availability of the attributes was obtained using a matching process of keywords, and the EDA deducted attributes were examined.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

Natural Language based Video Retrieval System with Event Analysis of Multi-camera Image Sequence in Office Environment (사무실 환경 내 다중카메라 영상의 이벤트분석을 통한 자연어 기반 동영상 검색시스템)

  • Lim, Soo-Jung;Hong, Jin-Hyuk;Cho, Sung-Bae
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.384-389
    • /
    • 2008
  • Recently, the necessity of systems which effectively store and retrieve video data has increased. Conventional video retrieval systems retrieve data using menus or text based keywords. Due to the lack of information, many video clips are simultaneously searched, and the user must have a certain level of knowledge to utilize the system. In this paper, we suggest a natural language based conversational video retrieval system that reflects users' intentions and includes more information than keyword based queries. This system can also retrieve from events or people to their movements. First, an event database is constructed based on meta-data which are generated by domain analysis for collected video in an office environment. Then, a script database is also constructed based on the query pre-processing and analysis. From that, a method to retrieve a video through a matching technique between natural language queries and answers is suggested and validated through performance and process evaluation for 10 users The natural language based retrieval system has shown its better efficiency in performance and user satisfaction than the menu based retrieval system.

  • PDF

A Similarity Computation Algorithm Based on the Pitch and Rhythm of Music Melody (선율의 음높이와 리듬 정보를 이용한 음악의 유사도 계산 알고리즘)

  • Mo, Jong-Sik;Kim, So-Young;Ku, Kyong-I;Han, Chang-Ho;Kim, Yoo-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3762-3774
    • /
    • 2000
  • The advances of computer hardware and information processing technologies raise the needs of multimedia information retrieval systems. Up to date. multimedia information systems have been developed for text information and image information. Nowadays. the multimedia information systems for video and audio information. especially for musical information have been grown up more and more. In recent music information retrieval systems. not only the information retrieval based on meta-information such like composer and title but also the content-based information retrieval is supported. The content-based information retrieval in music information retrieval systems utilize the similarity value between the user query and the music information stored in music database. In tbis paper. hence. we developed a similarity computation algorithm in which the pitches and lengths of each corresponding pair of notes are used as the fundamental factors for similarity computation between musical information. We also make an experiment of the proposed algorithm to validate its appropriateness. From the experimental results. the proposed similarity computation algorithm is shown to be able to correctly check whether two music files are analogous to each other or not based on melodies.

  • PDF

A Study on Feature Information Parsing of Video Image Using Improved Moment Invariant (향상된 불변모멘트를 이용한 동영상 이미지의 특징정보 분석에 관한 연구)

  • Lee, Chang-Soo;Jun, Moon-Seog
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.4
    • /
    • pp.450-460
    • /
    • 2005
  • Today, multimedia information is used on the internet and various social areas by rapid development of computer and communication technology. Therefor, the usage is growing dramatically. Multimedia information analysis system is basically based on text. So, there are many difficult problems like expressing ambiguity of multimedia information, excessive burden of works in appending notes and a lack of objectivity. In this study, we suggest a method which uses color and shape information of multimedia image partitions efficiently analyze a large amount of multimedia information. Partitions use field growth and union method. To extract color information, we use distinctive information which matches with a representative color from converting process from RGB(Red Green Blue) to HSI(Hue Saturation Intensity). Also, we use IMI(Improved Moment Invariants) which target to only outline pixels of an object and execute computing as shape information.

  • PDF