• Title/Summary/Keyword: Effective information retrieval

Search Result 411, Processing Time 0.03 seconds

Efficient Management of Statistical Information of Keywords on E-Catalogs (전자 카탈로그에 대한 효율적인 색인어 통계 정보 관리 방법)

  • Lee, Dong-Joo;Hwang, In-Beom;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.1-17
    • /
    • 2009
  • E-Catalogs which describe products or services are one of the most important data for the electronic commerce. E-Catalogs are created, updated, and removed in order to keep up-to-date information in e-Catalog database. However, when the number of catalogs increases, information integrity is violated by the several reasons like catalog duplication and abnormal classification. Catalog search, duplication checking, and automatic classification are important functions to utilize e-Catalogs and keep the integrity of e-Catalog database. To implement these functions, probabilistic models that use statistics of index words extracted from e-Catalogs had been suggested and the feasibility of the methods had been shown in several papers. However, even though these functions are used together in the e-Catalog management system, there has not been enough consideration about how to share common data used for each function and how to effectively manage statistics of index words. In this paper, we suggest a method to implement these three functions by using simple SQL supported by relational database management system. In addition, we use materialized views to reduce the load for implementing an application that manages statistics of index words. This brings the efficiency of managing statistics of index words by putting database management systems optimize statistics updating. We showed that our method is feasible to implement three functions and effective to manage statistics of index words with empirical evaluation.

  • PDF

A News Video Mining based on Multi-modal Approach and Text Mining (멀티모달 방법론과 텍스트 마이닝 기반의 뉴스 비디오 마이닝)

  • Lee, Han-Sung;Im, Young-Hee;Yu, Jae-Hak;Oh, Seung-Geun;Park, Dai-Hee
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.127-136
    • /
    • 2010
  • With rapid growth of information and computer communication technologies, the numbers of digital documents including multimedia data have been recently exploded. In particular, news video database and news video mining have became the subject of extensive research, to develop effective and efficient tools for manipulation and analysis of news videos, because of their information richness. However, many research focus on browsing, retrieval and summarization of news videos. Up to date, it is a relatively early state to discover and to analyse the plentiful latent semantic knowledge from news videos. In this paper, we propose the news video mining system based on multi-modal approach and text mining, which uses the visual-textual information of news video clips and their scripts. The proposed system systematically constructs a taxonomy of news video stories in automatic manner with hierarchical clustering algorithm which is one of text mining methods. Then, it multilaterally analyzes the topics of news video stories by means of time-cluster trend graph, weighted cluster growth index, and network analysis. To clarify the validity of our approach, we analyzed the news videos on "The Second Summit of South and North Korea in 2007".

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.

Effective Index and Backup Techniques for HLR System in Mobile Networks (이동통신 HLR 시스템에서의 효과적인 색인 및 백업 기법)

  • 김장환;이충세
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.1
    • /
    • pp.33-46
    • /
    • 2003
  • A Home Location Register(HLR) database system manages each subscriber's location information, which continuously changes in a cellular network. For this purpose, the HLR database system provides table management, index management, and backup management facilities. In this thesis, we propose using a two-level index method for the mobile directory number(MDN) as a suitable method and a chained bucket hashing method for the electronic serial number(ESN). Both the MDN and the ESN are used as keys in the HLR database system. We also propose an efficient backup method that takes into account the characteristics of HLR database transactions. The retrieval speed and the memory usage of the two-level index method are better than those of the R-tree index method. The insertion and deletion overhead of the chained bucket hashing method is less than that of the modified linear hashing method. In the proposed backup method, we use two kinds of dirty flags in order to solve the performance degradation problem caused by frequent registration-location operations. For a million subscribers, proposed techniques support reduction of memory size(more than 62%), directory operations (2500,000 times), and backup operations(more than 80%) compared with current techniques.

Establishment of Risk Database and Development of Risk Classification System for NATM Tunnel (NATM 터널 공정리스크 데이터베이스 구축 및 리스크 분류체계 개발)

  • Kim, Hyunbee;Karunarathne, Batagalle Vinuri;Kim, ByungSoo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.25 no.1
    • /
    • pp.32-41
    • /
    • 2024
  • In the construction industry, not only safety accidents, but also various complex risks such as construction delays, cost increases, and environmental pollution occur, and management technologies are needed to solve them. Among them, process risk management, which directly affects the project, lacks related information compared to its importance. This study tried to develop a MATM tunnel process risk classification system to solve the difficulty of risk information retrieval due to the use of different classification systems for each project. Risk collection used existing literature review and experience mining techniques, and DB construction utilized the concept of natural language processing. For the structure of the classification system, the existing WBS structure was adopted in consideration of compatibility of data, and an RBS linked to the work species of the WBS was established. As a result of the research, a risk classification system was completed that easily identifies risks by work type and intuitively reveals risk characteristics and risk factors linked to risks. As a result of verifying the usability of the established classification system, it was found that the classification system was effective as risks and risk factors for each work type were easily identified by user input of keywords. Through this study, it is expected to contribute to preventing an increase in cost and construction period by identifying risks according to work types in advance when planning and designing NATM tunnels and establishing countermeasures suitable for those factors.

A Study on Elemental Technology Identification of Sound Data for Audio Forensics (오디오 포렌식을 위한 소리 데이터의 요소 기술 식별 연구)

  • Hyejin Ryu;Ah-hyun Park;Sungkyun Jung;Doowon Jeong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.115-127
    • /
    • 2024
  • The recent increase in digital audio media has greatly expanded the size and diversity of sound data, which has increased the importance of sound data analysis in the digital forensics process. However, the lack of standardized procedures and guidelines for sound data analysis has caused problems with the consistency and reliability of analysis results. The digital environment includes a wide variety of audio formats and recording conditions, but current audio forensic methodologies do not adequately reflect this diversity. Therefore, this study identifies Life-Cycle-based sound data elemental technologies and provides overall guidelines for sound data analysis so that effective analysis can be performed in all situations. Furthermore, the identified elemental technologies were analyzed for use in the development of digital forensic techniques for sound data. To demonstrate the effectiveness of the life-cycle-based sound data elemental technology identification system presented in this study, a case study on the process of developing an emergency retrieval technology based on sound data is presented. Through this case study, we confirmed that the elemental technologies identified based on the Life-Cycle in the process of developing digital forensic technology for sound data ensure the quality and consistency of data analysis and enable efficient sound data analysis.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Effective menu retrieval for electronic information system (전자정보 시스템의 효율적 메뉴검색)

  • Shin, Dong-Wook;Nam, Se-Jin;Bae, Jeon-Gil;Park, Sang-Kyu;Jang, Myeong-Wook;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1994.11a
    • /
    • pp.409-415
    • /
    • 1994
  • Hitel 과 같은 전자정보 시스템은 사용자가 원하는 정보를 체계적으로 얻을 수 있도록 하기 위하여 메뉴들을 적당히 계층적으로 구성하여 제공하고 있다. 그러나, 보통 이 메뉴들의 계층이 정확한 분류법에 기초하여 만들어지지 않았을 뿐 아니라 그 양도 엄청나게 방대하여, 이 메뉴 계층을 이용하여 사용자가 원하는 정보를 얻기가 쉽지 않다. 실험적으로 보통 Hitel을 자주 이용하는 사람들도 자신이 주로 이용하는 메뉴들의 구성만 이해하고 있을뿐, 사용하지 않는 부분의 메뉴들의 구성은 잘 알지 못하는 것이 일반적이었다. 따라서 Hitel을 자주 이용하는 사용자도 자신이 이용해 보지 않은 정보를 얻기 쉽지 않으며, 더더욱 초보자에게는 이 메뉴계층을 이용하여 원하는 정보를 얻기가 쉽지 않은 실정이다. 본 연구에서는 정보검색 기술을 이용하여 Hitel과 같은 전자정보 시스템에서 사용자가 쉽게 자신이 원하는 정보를 얻을 수 있는 보조 시스템을 개발하고자 한다. 본 시스템은 사용자가 메뉴계층을 이용하기 전에 간략한 자연어로 입력을 주면, 여기에 적합한 메뉴나 실제 정보를 검색해 낸다. 따라서 사용자는 이 메뉴정보를 이용하여 메뉴계층을 쉽게 따라갈 수 있을 뿐 아니라, 경우에 따라서는 원하는 실제 정보를 검색하기 때문에 메뉴계층을 탐색할 필요가 없다. 본 연구에서는 자연어 입력을 최장 일치 방법으로 의미있는 명사들을 추출하여 불리한 질의어로 만든 후, 명사들 사이의 관계가 표현된 시소러스를 이용하여 이 질의어를 확장시킨다. 다음에 이 질의어들을 메뉴들과 부분/정확부합을 통하여 관련된 메뉴들을 찾아낸 후, 이들의 계층과제를 고려하여 최종 메뉴들을 검색한다. 본 시스템은 현재 C언어로 만들어져 구동중이며, 정확한 실험은 아직 하지 않았지만 높은 검색율을 보이고 있다. industrialized, was improved by introducing pressure in cooling procedure for both carbon and iron thermistors.er>$CHCl_3$>Hexane층 순으로 높은 활성을 나타냈다. 5. 아질산염소거능은 끝순, 들깨잎, 콩나물이 우수하였고 그중 들깨잎이 저해율 72%로 가장 높았으며, 용매분획 중에는 BuOH과 water추출물의 활성이 가장 높았다. 6. ACE 저해 효과는 고구마 부위별로는 끝순이 괴근에 비하여 1.5배 높았고, 들깨잎, 콩나물, 시금치보다 $1.9{\sim}3.7$배 높았다. 용매분획별로는 EtOAc, BuOH, water 추출물이 높은 활성을 보였다. 7. 이상을 종합하여 볼 때 고구마 끝순에는 페놀화합물이 다량 함유되어 있어 높은 항산화 활성을 가지며, 아질산염소거능 및 ACE저해활성과 같은 생리적 효과도 높아 기능성 채소로 이용하기에 충분한 가치가 있다고 판단된다.등의 관련 질환의 예방, 치료용 의약품 개발과 기능성 식품에 효과적으로 이용될 수 있음을 시사한다.tall fescue 23%, Kentucky bluegrass 6%, perennial ryegrass 8%) 및 white clover 23%를 유지하였다. 이상의 결과를 종합할 때, 초종과 파종비율에 따른 혼파초지의 건물수량과 사료가치의 차이를 확인할 수 있었으며, 레드 클로버 + 혼파 초지가 건물수량과 사료가치를 높이는데 효과적이었다.\ell}$ 이었으며 , yeast extract 첨가(添加)하여 배양시(培養時)는 yeast extract 농도(濃度)가 증가(增加)함에 따라 단백질(蛋白質) 함량(含量)도 증가(增加)하였다. 7. CHS-

  • PDF

Optimizing Similarity Threshold and Coverage of CBR (사례기반추론의 유사 임계치 및 커버리지 최적화)

  • Ahn, Hyunchul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.8
    • /
    • pp.535-542
    • /
    • 2013
  • Since case-based reasoning(CBR) has many advantages, it has been used for supporting decision making in various areas including medical checkup, production planning, customer classification, and so on. However, there are several factors to be set by heuristics when designing effective CBR systems. Among these factors, this study addresses the issue of selecting appropriate neighbors in case retrieval step. As the criterion for selecting appropriate neighbors, conventional studies have used the preset number of neighbors to combine(i.e. k of k-nearest neighbor), or the relative portion of the maximum similarity. However, this study proposes to use the absolute similarity threshold varying from 0 to 1, as the criterion for selecting appropriate neighbors to combine. In this case, too small similarity threshold value may make the model rarely produce the solution. To avoid this, we propose to adopt the coverage, which implies the ratio of the cases in which solutions are produced over the total number of the training cases, and to set it as the constraint when optimizing the similarity threshold. To validate the usefulness of the proposed model, we applied it to a real-world target marketing case of an online shopping mall in Korea. As a result, we found that the proposed model might significantly improve the performance of CBR.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.