• Title/Summary/Keyword: synonyms

Search Result 155, Processing Time 0.021 seconds

Ontology-based User Customized Search Service Considering User Intention (온톨로지 기반의 사용자 의도를 고려한 맞춤형 검색 서비스)

  • Kim, Sukyoung;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.129-143
    • /
    • 2012
  • Recently, the rapid progress of a number of standardized web technologies and the proliferation of web users in the world bring an explosive increase of producing and consuming information documents on the web. In addition, most companies have produced, shared, and managed a huge number of information documents that are needed to perform their businesses. They also have discretionally raked, stored and managed a number of web documents published on the web for their business. Along with this increase of information documents that should be managed in the companies, the need of a solution to locate information documents more accurately among a huge number of information sources have increased. In order to satisfy the need of accurate search, the market size of search engine solution market is becoming increasingly expended. The most important functionality among much functionality provided by search engine is to locate accurate information documents from a huge information sources. The major metric to evaluate the accuracy of search engine is relevance that consists of two measures, precision and recall. Precision is thought of as a measure of exactness, that is, what percentage of information considered as true answer are actually such, whereas recall is a measure of completeness, that is, what percentage of true answer are retrieved as such. These two measures can be used differently according to the applied domain. If we need to exhaustively search information such as patent documents and research papers, it is better to increase the recall. On the other hand, when the amount of information is small scale, it is better to increase precision. Most of existing web search engines typically uses a keyword search method that returns web documents including keywords which correspond to search words entered by a user. This method has a virtue of locating all web documents quickly, even though many search words are inputted. However, this method has a fundamental imitation of not considering search intention of a user, thereby retrieving irrelevant results as well as relevant ones. Thus, it takes additional time and effort to set relevant ones out from all results returned by a search engine. That is, keyword search method can increase recall, while it is difficult to locate web documents which a user actually want to find because it does not provide a means of understanding the intention of a user and reflecting it to a progress of searching information. Thus, this research suggests a new method of combining ontology-based search solution with core search functionalities provided by existing search engine solutions. The method enables a search engine to provide optimal search results by inferenceing the search intention of a user. To that end, we build an ontology which contains concepts and relationships among them in a specific domain. The ontology is used to inference synonyms of a set of search keywords inputted by a user, thereby making the search intention of the user reflected into the progress of searching information more actively compared to existing search engines. Based on the proposed method we implement a prototype search system and test the system in the patent domain where we experiment on searching relevant documents associated with a patent. The experiment shows that our system increases the both recall and precision in accuracy and augments the search productivity by using improved user interface that enables a user to interact with our search system effectively. In the future research, we will study a means of validating the better performance of our prototype system by comparing other search engine solution and will extend the applied domain into other domains for searching information such as portal.

A Reconsideration of the List of National Endemic Plants (appendix 4-1) Under the Creation and Furtherance of Arboretums Act Proposed by Korea Forest Service (산림청 수목원 조성 및 진흥에 관한 법률의 특산식물 목록의 재고)

  • Park, Soo Kyung;Gil, Hee-Young;Kim, Hui;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.102 no.1
    • /
    • pp.38-58
    • /
    • 2013
  • The existence of endemism in many parts of the world is an important factor for conservationists. Conservation can only be carried out under national legislation, and national endemics, which have very limited ranges, fully depend on the effort and success of conservation. A total of 523 vascular plant taxa were listed in the latest national checklist by Ministry of Environment in 2005, while the 'Creation and Furtherance of Arboretums Act' including a national endemic list (appendix 4-1) was established by Korea Forest Service and was legislated as a law in late 2011. This legislation by Korea Forest Service on endemism of Korean vascular plants have required much attention because of discrepancies of nomenclature, taxonomic bias and inflation. Examining data for both lists proposed by Ministry of Environment and Korea Forest Service, of the total of 360 legislated taxa, around 286 taxa are shared with the list of Ministry of Environment, of which about 80% have been found as common taxa. Around 67(18.7%) are typographic errors, and 14 taxa (3.9%) are recorded as illegitimate and invalidly published names. Through this analysis 12 taxa (3.4%) were found in China as well as in Korea and these are thought to be non Korean endemic taxa. Taken together, the legislated list displayed 1/4 (24.9%) errors out of the total list. Only 59 taxa (16.5%) are identified as national endemic species. The remainder are either unresolved candidates (73 taxa, 20.4%) or synonyms (196 taxa, 54.7%) status. It must be noted, that the concept of endemism very much depends on the knowledge of the species concept, taxonomic bias and geographical range of a species. Also, the most major nomenclatural problem tend to be more stable if the information on database about Korea Plant Name, which is managed by Korea National Arboretum are well updated year to year. These exaggerated numbers underscore the urgency for regional conservation planning and implementing effective strategies to preserve these real endemic taxa into the future.

Story-based Information Retrieval (스토리 기반의 정보 검색 연구)

  • You, Eun-Soon;Park, Seung-Bo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.81-96
    • /
    • 2013
  • Video information retrieval has become a very important issue because of the explosive increase in video data from Web content development. Meanwhile, content-based video analysis using visual features has been the main source for video information retrieval and browsing. Content in video can be represented with content-based analysis techniques, which can extract various features from audio-visual data such as frames, shots, colors, texture, or shape. Moreover, similarity between videos can be measured through content-based analysis. However, a movie that is one of typical types of video data is organized by story as well as audio-visual data. This causes a semantic gap between significant information recognized by people and information resulting from content-based analysis, when content-based video analysis using only audio-visual data of low level is applied to information retrieval of movie. The reason for this semantic gap is that the story line for a movie is high level information, with relationships in the content that changes as the movie progresses. Information retrieval related to the story line of a movie cannot be executed by only content-based analysis techniques. A formal model is needed, which can determine relationships among movie contents, or track meaning changes, in order to accurately retrieve the story information. Recently, story-based video analysis techniques have emerged using a social network concept for story information retrieval. These approaches represent a story by using the relationships between characters in a movie, but these approaches have problems. First, they do not express dynamic changes in relationships between characters according to story development. Second, they miss profound information, such as emotions indicating the identities and psychological states of the characters. Emotion is essential to understanding a character's motivation, conflict, and resolution. Third, they do not take account of events and background that contribute to the story. As a result, this paper reviews the importance and weaknesses of previous video analysis methods ranging from content-based approaches to story analysis based on social network. Also, we suggest necessary elements, such as character, background, and events, based on narrative structures introduced in the literature. We extract characters' emotional words from the script of the movie Pretty Woman by using the hierarchical attribute of WordNet, which is an extensive English thesaurus. WordNet offers relationships between words (e.g., synonyms, hypernyms, hyponyms, antonyms). We present a method to visualize the emotional pattern of a character over time. Second, a character's inner nature must be predetermined in order to model a character arc that can depict the character's growth and development. To this end, we analyze the amount of the character's dialogue in the script and track the character's inner nature using social network concepts, such as in-degree (incoming links) and out-degree (outgoing links). Additionally, we propose a method that can track a character's inner nature by tracing indices such as degree, in-degree, and out-degree of the character network in a movie through its progression. Finally, the spatial background where characters meet and where events take place is an important element in the story. We take advantage of the movie script to extracting significant spatial background and suggest a scene map describing spatial arrangements and distances in the movie. Important places where main characters first meet or where they stay during long periods of time can be extracted through this scene map. In view of the aforementioned three elements (character, event, background), we extract a variety of information related to the story and evaluate the performance of the proposed method. We can track story information extracted over time and detect a change in the character's emotion or inner nature, spatial movement, and conflicts and resolutions in the story.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Review of the Korean Indigenous Species Investigation Project (2006-2020) by the National Institute of Biological Resources under the Ministry of Environment, Republic of Korea (한반도 자생생물 조사·발굴 연구사업 고찰(2006~2020))

  • Bae, Yeon Jae;Cho, Kijong;Min, Gi-Sik;Kim, Byung-Jik;Hyun, Jin-Oh;Lee, Jin Hwan;Lee, Hyang Burm;Yoon, Jung-Hoon;Hwang, Jeong Mi;Yum, Jin Hwa
    • Korean Journal of Environmental Biology
    • /
    • v.39 no.1
    • /
    • pp.119-135
    • /
    • 2021
  • Korea has stepped up efforts to investigate and catalog its flora and fauna to conserve the biodiversity of the Korean Peninsula and secure biological resources since the ratification of the Convention on Biological Diversity (CBD) in 1992 and the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits (ABS) in 2010. Thus, after its establishment in 2007, the National Institute of Biological Resources (NIBR) of the Ministry of Environment of Korea initiated a project called the Korean Indigenous Species Investigation Project to investigate indigenous species on the Korean Peninsula. For 15 years since its beginning in 2006, this project has been carried out in five phases, Phase 1 from 2006-2008, Phase 2 from 2009-2011, Phase 3 from 2012-2014, Phase 4 from 2015-2017, and Phase 5 from 2018-2020. Before this project, in 2006, the number of indigenous species surveyed was 29,916. The figure was cumulatively aggregated at the end of each phase as 33,253 species for Phase 1 (2008), 38,011 species for Phase 2 (2011), 42,756 species for Phase 3 (2014), 49,027 species for Phase 4 (2017), and 54,428 species for Phase 5(2020). The number of indigenous species surveyed grew rapidly, showing an approximately 1.8-fold increase as the project progressed. These statistics showed an annual average of 2,320 newly recorded species during the project period. Among the recorded species, a total of 5,242 new species were reported in scientific publications, a great scientific achievement. During this project period, newly recorded species on the Korean Peninsula were identified using the recent taxonomic classifications as follows: 4,440 insect species (including 988 new species), 4,333 invertebrate species except for insects (including 1,492 new species), 98 vertebrate species (fish) (including nine new species), 309 plant species (including 176 vascular plant species, 133 bryophyte species, and 39 new species), 1,916 algae species (including 178 new species), 1,716 fungi and lichen species(including 309 new species), and 4,812 prokaryotic species (including 2,226 new species). The number of collected biological specimens in each phase was aggregated as follows: 247,226 for Phase 1 (2008), 207,827 for Phase 2 (2011), 287,133 for Phase 3 (2014), 244,920 for Phase 4(2017), and 144,333 for Phase 5(2020). A total of 1,131,439 specimens were obtained with an annual average of 75,429. More specifically, 281,054 insect specimens, 194,667 invertebrate specimens (except for insects), 40,100 fish specimens, 378,251 plant specimens, 140,490 algae specimens, 61,695 fungi specimens, and 35,182 prokaryotic specimens were collected. The cumulative number of researchers, which were nearly all professional taxonomists and graduate students majoring in taxonomy across the country, involved in this project was around 5,000, with an annual average of 395. The number of researchers/assistant researchers or mainly graduate students participating in Phase 1 was 597/268; 522/191 in Phase 2; 939/292 in Phase 3; 575/852 in Phase 4; and 601/1,097 in Phase 5. During this project period, 3,488 papers were published in major scientific journals. Of these, 2,320 papers were published in domestic journals and 1,168 papers were published in Science Citation Index(SCI) journals. During the project period, a total of 83.3 billion won (annual average of 5.5 billion won) or approximately US $75 million (annual average of US $5 million) was invested in investigating indigenous species and collecting specimens. This project was a large-scale research study led by the Korean government. It is considered to be a successful example of Korea's compressed development as it attracted almost all of the taxonomists in Korea and made remarkable achievements with a massive budget in a short time. The results from this project led to the National List of Species of Korea, where all species were organized by taxonomic classification. Information regarding the National List of Species of Korea is available to experts, students, and the general public (https://species.nibr.go.kr/index.do). The information, including descriptions, DNA sequences, habitats, distributions, ecological aspects, images, and multimedia, has been digitized, making contributions to scientific advancement in research fields such as phylogenetics and evolution. The species information also serves as a basis for projects aimed at species distribution and biological monitoring such as climate-sensitive biological indicator species. Moreover, the species information helps bio-industries search for useful biological resources. The most meaningful achievement of this project can be in providing support for nurturing young taxonomists like graduate students. This project has continued for the past 15 years and is still ongoing. Efforts to address issues, including species misidentification and invalid synonyms, still have to be made to enhance taxonomic research. Research needs to be conducted to investigate another 50,000 species out of the estimated 100,000 indigenous species on the Korean Peninsula.