• Title/Summary/Keyword: multi-keyword

Search Result 62, Processing Time 0.025 seconds

Exploration of Hydrogen Research Trends through Social Network Analysis (연구 논문 네트워크 분석을 이용한 수소 연구 동향)

  • KIM, HYEA-KYEONG;CHOI, ILYOUNG
    • Journal of Hydrogen and New Energy
    • /
    • v.33 no.4
    • /
    • pp.318-329
    • /
    • 2022
  • This study analyzed keyword networks and Author's Affiliation networks of hydrogen-related papers published in Korea Citation Index (KCI) journals from 2016 to 2020. The study investigated co-occurrence patterns of institutions over time to examine collaboration trends of hydrogen scholars. The study also conducted frequency analysis of keyword networks to identify key topics and visualized keyword networks to explore topic trends. The result showed Collaborative research between institutions has not yet been extensively expanded. However, collaboration trends were much more pronounced with local universities. Keyword network analysis exhibited continuing diversification of topics in hydrogen research of Korea. In addition centrality analysis found hydrogen research mostly deals with multi-disciplinary and complex aspects like hydrogen production, transportation, and public policy.

Design of Multi-Purpose Preprocessor for Keyword Spotting and Continuous Language Support in Korean (한국어 핵심어 추출 및 연속 음성 인식을 위한 다목적 전처리 프로세서 설계)

  • Kim, Dong-Heon;Lee, Sang-Joon
    • Journal of Digital Convergence
    • /
    • v.11 no.1
    • /
    • pp.225-236
    • /
    • 2013
  • The voice recognition has been made continuously. Now, this technology could support even natural language beyond recognition of isolated words. Interests for the voice recognition was boosting after the Siri, I-phone based voice recognition software, was presented in 2010. There are some occasions implemented voice enabled services using Korean voice recognition softwares, but their accuracy isn't accurate enough, because of background noise and lack of control on voice related features. In this paper, we propose a sort of multi-purpose preprocessor to improve this situation. This supports Keyword spotting in the continuous speech in addition to noise filtering function. This should be independent of any voice recognition software and it can extend its functionality to support continuous speech by additionally identifying the pre-predicate and the post-predicate in relative to the spotted keyword. We get validation about noise filter effectiveness, keyword recognition rate, continuous speech recognition rate by experiments.

Exploratory Study of Developing a Synchronization-Based Approach for Multi-step Discovery of Knowledge Structures

  • Yu, So Young
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.2
    • /
    • pp.16-32
    • /
    • 2014
  • As Topic Modeling has been applied in increasingly various domains, the difficulty in naming and characterizing topics also has been recognized more. This study, therefore, explores an approach of combining text mining with network analysis in a multi-step approach. The concept of synchronization was applied to re-assign the top author keywords in more than one topic category, in order to improve the visibility of the topic-author keyword network, and to increase the topical cohesion in each topic. The suggested approach was applied using 16,548 articles with 2,881 unique author keywords in construction and building engineering indexed by KSCI. As a result, it was revealed that the combined approach could improve both the visibility of the topic-author keyword map and topical cohesion in most of the detected topic categories. There should be more cases of applying the approach in various domains for generalization and advancement of the approach. Also, more sophisticated evaluation methods should also be necessary to develop the suggested approach.

Dynamic Management of Equi-Join Results for Multi-Keyword Searches (다중 키워드 검색에 적합한 동등조인 연산 결과의 동적 관리 기법)

  • Lim, Sung-Chae
    • The KIPS Transactions:PartA
    • /
    • v.17A no.5
    • /
    • pp.229-236
    • /
    • 2010
  • With an increasing number of documents in the Internet or enterprises, it becomes crucial to efficiently support users' queries on those documents. In that situation, the full-text search technique is accepted in general, because it can answer uncontrolled ad-hoc queries by automatically indexing all the keywords found in the documents. The size of index files made for full-text searches grows with the increasing number of indexed documents, and thus the disk cost may be too large to process multi-keyword queries against those enlarged index files. To solve the problem, we propose both of the index file structure and its management scheme suitable to the processing of multi-keyword queries against a large volume of index files. For this, we adopt the structure of inverted-files, which are widely used in the multi-keyword searches, as a basic index structure and modify it to a hierarchical structure for join operations and ranking operations performed during the query processing. In order to save disk costs based on that index structure, we dynamically store in the main memory the results of join operations between two keywords, if they are highly expected to be entered in users' queries. We also do performance comparisons using a cost model of the disk to show the performance advantage of the proposed scheme.

KCI vs. WoS: Comparative Analysis of Korean and International Journal Publications in Library and Information Science

  • Yang, Kiduk;Lee, Hyekyung;Kim, Seonwook;Lee, Jongwook;Oh, Dong-Geun
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.3
    • /
    • pp.76-106
    • /
    • 2021
  • The study analyzed bibliometric data of papers published in Korea Citation Index (KCI) and Web of Science (WoS) journals from 2002 to 2021. After examining size differences of KCI and WoS domains in the number of authors, institutions, and journals to put publication and citations counts in perspective, the study investigated co-authorship patterns over time to compare collaboration trends of Korean and international scholars and analyzed the data at author, institution, and journal levels to explore how the influences of authors, institutions, and journals on research output differ across domains. The study also conducted frequency-based analysis of keywords to identify key topics and visualized keyword clusters to examine topic trends. The result showed Korean LIS authors to be twice as productive as international authors but much less impactful and Korean institutions to be at comparable levels of productivity and impact in contrast to much of productivity and impact concentrated in top international institutions. Citations to journals exhibited initially increasing pattern followed by a decreasing trend though WoS journals showed far more variance than KCI journals. Co-authorship trends were much more pronounced among international publication, where larger collaboration groups suggested multi-disciplinary and complex nature of international LIS research. Keyword analysis found continuing diversification of topics in international research compared to relatively static topic trend in Korea. Keyword visualization showed WoS keyword clusters to be much denser and diverse than KCI clusters. In addition, key keyword clusters of WoS were quite different from each other unlike KCI clusters which were similar.

A Development of Unicode-based Multi-lingual Namecard Recognizer (Unicode 기반 다국어 명함인식기 개발)

  • Jang, Dong-Hyeub;Lee, Jae-Hong
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.117-122
    • /
    • 2009
  • We developed a multi-lingual namecard recognizer for building up a global client management systems. At first, we created the Unicode-based character image database for character recognition and learning of multi languages, and applied many color image processing techniques to get more correct data for namecard images which were acquired by various input devices. And by applying multi-layer perceptron neural network, individual character recognition applied for language types, and post-processing utilizing keyword databases made for individual languages, we increased a recognition rate for multi-lingual namecards.

Cost-Effective Replication Schemes for Query Load Balancing in DHT-Based Peer-to-Peer File Searches

  • Cao, Qi;Fujita, Satoshi
    • Journal of Information Processing Systems
    • /
    • v.10 no.4
    • /
    • pp.628-645
    • /
    • 2014
  • In past few years, distributed hash table (DHT)-based P2P systems have been proven to be a promising way to manage decentralized index information and provide efficient lookup services. However, the skewness of users' preferences regarding keywords contained in a multi-keyword query causes a query load imbalance that combines both routing and response load. This imbalance means long file retrieval latency that negatively influences the overall system performance. Although index replication has a great potential for alleviating this problem, existing schemes did not explicitly address it or incurred high cost. To overcome this issue, we propose, in this paper, an integrated solution that consists of three replication schemes to alleviate query load imbalance while minimizing the cost. The first scheme is an active index replication that is used in order to decrease routing load in the system and to distribute response load of an index among peers that store replicas of the index. The second scheme is a proactive pointer replication that places location information of each index to a predetermined number of peers for reducing maintenance cost between the index and its replicas. The third scheme is a passive index replication that guarantees the maximum query load of peers. The result of simulations indicates that the proposed schemes can help alleviate the query load imbalance of peers. Moreover, it was found by comparison that our schemes are more cost-effective on placing replicas than PCache and EAD.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Relational Database Structure for Preserving Multi-role Topics in Topic Map (토픽맵의 다중역할 토픽 보존을 위한 관계형 데이터베이스 구조)

  • Jung, Yoonsoo;Y., Choon;Kim, Namgyu
    • The Journal of Information Systems
    • /
    • v.18 no.3
    • /
    • pp.327-349
    • /
    • 2009
  • Traditional keyword-based searching methods suffer from low accuracy and high complexity due to the rapid growth in the amount of information. Accordingly, many researchers attempt to implement a so-called semantic search which is based on the semantics of the user's query. Semantic information can be described using a semantic modeling language, such as Topic Map. In this paper, we propose a new method to map a topic map to a traditional Relational Database (RDB) without any information loss. Although there have been a few attempts to map topic maps to RDB, they have paid scant attention to handling multi-role topics. In this paper, we propose a new storage structure to map multi-role topics to traditional RDB. The proposed structure consists of a mapping table, role tables, and content tables. Additionally, we devise a query translator to convert a user's query to one appropriate to the proposed structure.

  • PDF

Visualization for Integrated Analysis of Multi-Omics Data by Harmful Substances Exposed to Human (인체 유래 환경유해물질 노출에 따른 멀티 오믹스 데이터 통합 분석 가시화 시스템)

  • Shin, Ga-Hee;Hong, Ji-Man;Park, Seo-Woo;Kang, Byeong-Chul;Lee, Bong-Mun
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.363-373
    • /
    • 2022
  • Multi-omics data is difficult to interpret due to the heterogeneity of information by the volume of data, the complexity of characteristics of each data, and the diversity of omics platforms. There is not yet a system for interpreting to visualize research data on environmental diseases concerning environmental harmful substances. We provide MEE, a web-based visualization tool, to comprehensively explore the complexity of data due to the interconnected characteristics of high-dimensional data sets according to exposure to various environmental harmful substances. MEE visualizes omics data of correlation between omics data, subjects and samples by keyword searches of meta data, multi-omics data, and harmful substances. MEE has been demonstrated the versatility by two examples. We confirmed the correlation between smoking and asthma with RNA-seq and Methylation-Chip data, it was visualized that genes (P HACTR3, PXDN, QZMB, SOCS3 etc.) significantly related to autoimmune or inflammatory diseases. To visualize the correlation between atopic dermatitis and heavy metals, we selected 32 genes related immune response by integrated analysis of multi-omics data. However, it did not show a significant correlation between mercury in blood and atopic dermatitis. In the future, should continuously collect an appropriate level of multi-omics data in MEE system, will obtain data to analyze environmental substances and diseases.