• Title/Summary/Keyword: Text Retrieval System

Search Result 177, Processing Time 0.028 seconds

Development and Operation of Marine Environmental Portal Service System (해양환경 포탈서비스시스템 구축과 운영)

  • 최현우;권순철
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.338-341
    • /
    • 2003
  • According to a long-term master plan for the implementing of MOMAF's marine environmental informatization, we have developed marine environment portal web site which consists of 7 main-menu and 39 sub-menu including various types of contents (text, image and multimedia) based on RDBMS. This portal site was opened in Oct., 2002 (http://www.meps.info). Also, for the national institutions' distributed DB which is archived and managed respectively the marine chemical data and biological data, the integrated retrieval system was developed. This system is meaningful for the making collaborative use of real data and could be applied for data mining, marine research, marine environmental GIS and making-decisions.

  • PDF

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

A Mobile Landmarks Guide : Outdoor Augmented Reality based on LOD and Contextual Device (모바일 랜드마크 가이드 : LOD와 문맥적 장치 기반의 실외 증강현실)

  • Zhao, Bi-Cheng;Rosli, Ahmad Nurzid;Jang, Chol-Hee;Lee, Kee-Sung;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.1-21
    • /
    • 2012
  • In recent years, mobile phone has experienced an extremely fast evolution. It is equipped with high-quality color displays, high resolution cameras, and real-time accelerated 3D graphics. In addition, some other features are includes GPS sensor and Digital Compass, etc. This evolution advent significantly helps the application developers to use the power of smart-phones, to create a rich environment that offers a wide range of services and exciting possibilities. To date mobile AR in outdoor research there are many popular location-based AR services, such Layar and Wikitude. These systems have big limitation the AR contents hardly overlaid on the real target. Another research is context-based AR services using image recognition and tracking. The AR contents are precisely overlaid on the real target. But the real-time performance is restricted by the retrieval time and hardly implement in large scale area. In our work, we exploit to combine advantages of location-based AR with context-based AR. The system can easily find out surrounding landmarks first and then do the recognition and tracking with them. The proposed system mainly consists of two major parts-landmark browsing module and annotation module. In landmark browsing module, user can view an augmented virtual information (information media), such as text, picture and video on their smart-phone viewfinder, when they pointing out their smart-phone to a certain building or landmark. For this, landmark recognition technique is applied in this work. SURF point-based features are used in the matching process due to their robustness. To ensure the image retrieval and matching processes is fast enough for real time tracking, we exploit the contextual device (GPS and digital compass) information. This is necessary to select the nearest and pointed orientation landmarks from the database. The queried image is only matched with this selected data. Therefore, the speed for matching will be significantly increased. Secondly is the annotation module. Instead of viewing only the augmented information media, user can create virtual annotation based on linked data. Having to know a full knowledge about the landmark, are not necessary required. They can simply look for the appropriate topic by searching it with a keyword in linked data. With this, it helps the system to find out target URI in order to generate correct AR contents. On the other hand, in order to recognize target landmarks, images of selected building or landmark are captured from different angle and distance. This procedure looks like a similar processing of building a connection between the real building and the virtual information existed in the Linked Open Data. In our experiments, search range in the database is reduced by clustering images into groups according to their coordinates. A Grid-base clustering method and user location information are used to restrict the retrieval range. Comparing the existed research using cluster and GPS information the retrieval time is around 70~80ms. Experiment results show our approach the retrieval time reduces to around 18~20ms in average. Therefore the totally processing time is reduced from 490~540ms to 438~480ms. The performance improvement will be more obvious when the database growing. It demonstrates the proposed system is efficient and robust in many cases.

The Online Game Coined Profanity Filtering System by using Semi-Global Alignment (반 전역 정렬을 이용한 온라인 게임 변형 욕설 필터링 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.113-120
    • /
    • 2009
  • Currently the verbal abuse in text message over on-line game is so serious. However we do not have any effective policy or technical tools yet. Till now in order to cope with this problem, the online game service providers have accumulated a set of forbidden words and applied this list on the textual word used in on-line game, which is called 'Swear filter'. But young on-line game players easily avoid this filtering method by coining another words which is not kept in the list. Especially Korean is very easy to make new variations of a vulgar word. In this paper, we propose one smart filtering algorithm to identify newly coined profanities. Important features of our method include the canonical form transformation of coined profanities, semi-global alignment between in the level of consonant and vowel units. For experiment, we have collected more than 1000 newly coined vulgar words in on-line gaming sites and tested these word against our methods. where our system have successfully filtered more than 90% of those newly coined vulgar words.

A Study on the CD ROM Network(LAN) (CD-ROM 네트워크(LAN)에 관한 소고(小考))

  • Kil, Hyung-Do
    • Journal of Information Management
    • /
    • v.21 no.2
    • /
    • pp.9-23
    • /
    • 1990
  • CD-ROM technique, not more than 10 years after development, goes through rapid growth, has been taken advantage of several practical application parts. Needless to say about bibliographic data, numeric value, the phonetics, an image and a picture data that are recorded as abstract or full text, and offered and applied to industry, information service including library, it can be used for library staffs, information retrieval. Escape from the need of one disc drive and one computer to access one disc, now we organize an ideal system that can be retrieved several CD-ROM used only one drive, several users can access several information, so networking is possible through LAN. In this article, we studied the function and type, characteristics, system, structure, data block, production procedure, standardization of CD-ROM LAN.

  • PDF

The Application of Geography Markup Language(GML) to the Maritime Information

  • Oh, Se-Woong;Park, Jong-Min;Suh, Sang-Hyun
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • v.1
    • /
    • pp.519-524
    • /
    • 2006
  • This paper describes an application of information presentation based geographic map for maritime information, including navigation information. The work is motivated by the need to prepare maritime information representation and distribution for future generation Web network technology. This works consist of map generation using GML and application to maritime information. GML 3.0 became an adopted specification of the Open Geospatial Consortium(OGC) in January 2003, and is rapidly emerging as the world standard for the encoding, transport and storage of all forms of geographic information. This paper looks at the application of GML to one of the more challenging areas of maritime information. Specific features of GML of interest to maritime information provider are discussed and then illustrated through a series of maritime information case studies. The first phase of the work consists of the construction of GML application schema for using as a base map of maritime information. Maritime information is acquired from multiple sources, including standards documents, database schemas, lexicons, collections of symbol definition. The sources of GML ontological knowledge and the contribution of each source to the overall ontology are described in this paper. In the second phase, the prepared GML is used to create a prototype of the mixed maritime information as a base map - for tagging documents within the maritime domain. An overview of this prototype is included. One application area for these information elements described here is the integrated retrieval of maritime information from diverse sources, ranging from Web sites to nautical chart databases and text documents.

  • PDF

A New Similarity Measure for Improving Ranking in QA Systems (질의응답시스템 응답순위 개선을 위한 새로운 유사도 계산방법)

  • Kim Myung-Gwan;Park Young-Tack
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.529-536
    • /
    • 2004
  • The main idea of this paper is to combine position information in sentence and query type classification to make the documents ranking to query more accessible. First, the use of conceptual graphs for the representation of document contents In information retrieval is discussed. The method is based on well-known strategies of text comparison, such as Dice Coefficient, with position-based weighted term. Second, we introduce a method for learning query type classification that improves the ability to retrieve answers to questions from Question Answering system. Proposed methods employ naive bayes classification in machine learning fields. And, we used a collection of approximately 30,000 question-answer pairs for training, obtained from Frequently Asked Question(FAQ) files on various subjects. The evaluation on a set of queries from international TREC-9 question answering track shows that the method with machine learning outperforms the underline other systems in TREC-9 (0.29 for mean reciprocal rank and 55.1% for precision).

Text Mining-Based Emerging Trend Analysis for the Aviation Industry (항공산업 미래유망분야 선정을 위한 텍스트 마이닝 기반의 트렌드 분석)

  • Kim, Hyun-Jung;Jo, Nam-Ok;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.65-82
    • /
    • 2015
  • Recently, there has been a surge of interest in finding core issues and analyzing emerging trends for the future. This represents efforts to devise national strategies and policies based on the selection of promising areas that can create economic and social added value. The existing studies, including those dedicated to the discovery of future promising fields, have mostly been dependent on qualitative research methods such as literature review and expert judgement. Deriving results from large amounts of information under this approach is both costly and time consuming. Efforts have been made to make up for the weaknesses of the conventional qualitative analysis approach designed to select key promising areas through discovery of future core issues and emerging trend analysis in various areas of academic research. There needs to be a paradigm shift in toward implementing qualitative research methods along with quantitative research methods like text mining in a mutually complementary manner. The change is to ensure objective and practical emerging trend analysis results based on large amounts of data. However, even such studies have had shortcoming related to their dependence on simple keywords for analysis, which makes it difficult to derive meaning from data. Besides, no study has been carried out so far to develop core issues and analyze emerging trends in special domains like the aviation industry. The change used to implement recent studies is being witnessed in various areas such as the steel industry, the information and communications technology industry, the construction industry in architectural engineering and so on. This study focused on retrieving aviation-related core issues and emerging trends from overall research papers pertaining to aviation through text mining, which is one of the big data analysis techniques. In this manner, the promising future areas for the air transport industry are selected based on objective data from aviation-related research papers. In order to compensate for the difficulties in grasping the meaning of single words in emerging trend analysis at keyword levels, this study will adopt topic analysis, which is a technique used to find out general themes latent in text document sets. The analysis will lead to the extraction of topics, which represent keyword sets, thereby discovering core issues and conducting emerging trend analysis. Based on the issues, it identified aviation-related research trends and selected the promising areas for the future. Research on core issue retrieval and emerging trend analysis for the aviation industry based on big data analysis is still in its incipient stages. So, the analysis targets for this study are restricted to data from aviation-related research papers. However, it has significance in that it prepared a quantitative analysis model for continuously monitoring the derived core issues and presenting directions regarding the areas with good prospects for the future. In the future, the scope is slated to expand to cover relevant domestic or international news articles and bidding information as well, thus increasing the reliability of analysis results. On the basis of the topic analysis results, core issues for the aviation industry will be determined. Then, emerging trend analysis for the issues will be implemented by year in order to identify the changes they undergo in time series. Through these procedures, this study aims to prepare a system for developing key promising areas for the future aviation industry as well as for ensuring rapid response. Additionally, the promising areas selected based on the aforementioned results and the analysis of pertinent policy research reports will be compared with the areas in which the actual government investments are made. The results from this comparative analysis are expected to make useful reference materials for future policy development and budget establishment.

A Study of Intelligent Recommendation System based on Naive Bayes Text Classification and Collaborative Filtering (나이브베이즈 분류모델과 협업필터링 기반 지능형 학술논문 추천시스템 연구)

  • Lee, Sang-Gi;Lee, Byeong-Seop;Bak, Byeong-Yong;Hwang, Hye-Kyong
    • Journal of Information Management
    • /
    • v.41 no.4
    • /
    • pp.227-249
    • /
    • 2010
  • Scholarly information has increased tremendously according to the development of IT, especially the Internet. However, simultaneously, people have to spend more time and exert more effort because of information overload. There have been many research efforts in the field of expert systems, data mining, and information retrieval, concerning a system that recommends user-expected information items through presumption. Recently, the hybrid system combining a content-based recommendation system and collaborative filtering or combining recommendation systems in other domains has been developed. In this paper we resolved the problem of the current recommendation system and suggested a new system combining collaborative filtering and Naive Bayes Classification. In this way, we resolved the over-specialization problem through collaborative filtering and lack of assessment information or recommendation of new contents through Naive Bayes Classification. For verification, we applied the new model in NDSL's paper service of KISTI, especially papers from journals about Sitology and Electronics, and witnessed high satisfaction from 4 experimental participants.

Reconstructing Web Broadcasting Information based on User Retrieval Pattern (무선 환경에서 사용자 검색 성향을 반영한 웹 방송 정보 재구성 기법)

  • Kim, Won-Cheol;Lee, Soo-Cheol;Hwang, Een-Jun;Byeon, Kwang-Jun
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1149-1158
    • /
    • 2004
  • Today the fastest growing communities of web users are mobile visitors who browse web page with wireless PDAs and cellular phones. However, most web pages are optimiaed exclusively for desktop clients on the broadband network and are inconvenient to users with small screen mobile devices. They display only a few lines of text and cannot run client-side programs or scripts due to lack of system resource. Even worse, their connections are usually slow to support most of the data-intensive applications. In this paper, we propose a pageslet scheme that makes it feasible to browse ordinary web pages on small screen mobile devices. It extracts broadcasting sections of user preference from broadcasting web pages and automatically reorganizes the extracted sections for convenient browsing on mobile devices.