• Title/Summary/Keyword: source text

Search Result 267, Processing Time 0.028 seconds

Discovery Layer in Library Retrieval: VuFind as an Open Source Service for Academic Libraries in Developing Countries

  • Roy, Bijan Kumar;Mukhopadhyay, Parthasarathi;Biswas, Anirban
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.4
    • /
    • pp.3-22
    • /
    • 2022
  • This paper provides an overview of the emergence of resource discovery systems and services, along with their advantages, best practices, and current landscapes. It outlines some of the key services and functionalities of a comprehensive discovery model suitable for academic libraries in developing countries. The proposed model (VuFind as a discovery tool) performs like other existing web-scale resource discovery systems, both commercial and open-source, and is capable of providing information resources from different sources in a single-window search interface. The objective of the paper is to provide seamless access to globally distributed subscribed as well as open access resources through its discovery interface, based on a unified index. This model uses Koha, DSpace, and Greenstone as back-ends and VuFind as a discovery layer in the front-end and has also integrated many enhanced search features like Bento-box search, Geodetic search, and full-text search (using Apache Tika). The goal of this paper is to provide the academic community with a one-stop shop for better utilising and integrating heterogeneous bibliographic data sources with VuFind (https://vufind.org/vufind).

Domain Adaptive Fruit Detection Method based on a Vision-Language Model for Harvest Automation (작물 수확 자동화를 위한 시각 언어 모델 기반의 환경적응형 과수 검출 기술)

  • Changwoo Nam;Jimin Song;Yongsik Jin;Sang Jun Lee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.19 no.2
    • /
    • pp.73-81
    • /
    • 2024
  • Recently, mobile manipulators have been utilized in agriculture industry for weed removal and harvest automation. This paper proposes a domain adaptive fruit detection method for harvest automation, by utilizing OWL-ViT model which is an open-vocabulary object detection model. The vision-language model can detect objects based on text prompt, and therefore, it can be extended to detect objects of undefined categories. In the development of deep learning models for real-world problems, constructing a large-scale labeled dataset is a time-consuming task and heavily relies on human effort. To reduce the labor-intensive workload, we utilized a large-scale public dataset as a source domain data and employed a domain adaptation method. Adversarial learning was conducted between a domain discriminator and feature extractor to reduce the gap between the distribution of feature vectors from the source domain and our target domain data. We collected a target domain dataset in a real-like environment and conducted experiments to demonstrate the effectiveness of the proposed method. In experiments, the domain adaptation method improved the AP50 metric from 38.88% to 78.59% for detecting objects within the range of 2m, and we achieved 81.7% of manipulation success rate.

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

Formative Principles of Modernist Architectural Modes and their Application to Modern Fashion Design (모더니즘 건축양식의 조형원리와 현대패션디자인에의 적용)

  • Lee, Shin-Young;Suh, Seung-Hee
    • Journal of the Korean Society of Costume
    • /
    • v.60 no.1
    • /
    • pp.117-134
    • /
    • 2010
  • The multilateral attempt for present applications of art school of thought and expression is leading modern fashion as a chapter of various artistic expression. In fashion, historic art thinking which predominated the period are source of optional inspiration not simply being generated, culminated and disappear but layout principles which can reemerge as periodical needs. In other words, the past styles are the source of inspiration of new trend of the present time and will serve as the text that give birth to yet another trend. In this study, I conducted the research on the layout principles of Modernism Architecture in aspects of pure visibility and the layout characteristics. And I analyzed cases of modern fashion which were appling architectural layout principles in the view point of formal construction dimension. As the result, the layout principles of Modernism Architecture have the characteristics of diagram linearity, geometric planarity and exclusive closing. And I was applying the architectural layout principles of Modernism to fashion. The layout principles of Modernism Architecture have the tendency of lacking history types, abstract plane partition and reductionism purity in fashion. The meaning of this study is follows: The characteristics of art school of thought are given shape by appling & analysing the architectural layout principles of historical art school of thought to modern fashion in the view point of formal construction dimension. The applied possibility of historical art school of thought as the source of inspiration about the fashion design is extended.

A Study on the Meaning of "Pi(脾) is the source of the phlegm and lung is the container of the phlegm." ("비위생담지원(脾爲生痰之源), 폐위저담지기(肺爲貯痰之器)."의 의미에 대한 고찰)

  • Yun, Ki-ryoung;Baik, Yousang;Jang, Woo-chang;Jeong, Chang-hyun
    • Journal of Korean Medical classics
    • /
    • v.31 no.3
    • /
    • pp.109-122
    • /
    • 2018
  • Objectives : The teaching which states, "Pi is the source of the phlegm and lung is the container of the phlegm" is a sentence that is regarded to have been based on the understanding of the production and container of phlegm based on physiology and pathology of viscera and bowels. However, the author's suspicion that this sentence has not received enough research as to truly understand its meaning has led to further study into this sentence. Methods : Medical book database was searched and historic medical books were reviewed in order to understanding the true meaning of this sentence. First, the meaning of the sentence was pondered upon based on how it was introduced in the original text, and each of the two parts of the sentence were closely analyzed for its relations in order to get a clear meaning of the sentence. Results : The source of this sentence is Bencaogangmu, and it describes the phenomenon of cough in the phlegm appearing more than that from pi and lungs. Later, some disagreements on this sentence developed, claiming that kidney is the source of phlegm whereas stomach is the container. Pi deficiency derives from abnormality in the transportation and transformation of pi, and it originates from kidney deficiency. Thus, kidney can be understood as the origin of phlegm. When phlegm is dispersed all around the body, it's difficult to see the stomach as a container of the phlegm. Conclusions : The pathology of the production and container of phlegm is that deficiency in kidney qi leads to the malfunction of transportation and transformation of pi, and this creates the bodily fluid to become stagnant, making pathological products such as dampness, phlegm, and retained fluid. This can be expressed as "Kidney is the origin of the phlegm, and pi is the source of the phlegm." Here, phlegm is created and stored either when phlegm enters the lungs in the process of pi dissipating into the lungs, or when pi affects the lungs which inhibits the pi movement in the lungs. This is the true meaning of "lung is the container of the phelgm."

Implementation of Git's Commit Message Classification Model Using GPT-Linked Source Change Data

  • Ji-Hoon Choi;Jae-Woong Kim;Seong-Hyun Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.123-132
    • /
    • 2023
  • Git's commit messages manage the history of source changes during project progress or operation. By utilizing this historical data, project risks and project status can be identified, thereby reducing costs and improving time efficiency. A lot of research related to this is in progress, and among these research areas, there is research that classifies commit messages as a type of software maintenance. Among published studies, the maximum classification accuracy is reported to be 95%. In this paper, we began research with the purpose of utilizing solutions using the commit classification model, and conducted research to remove the limitation that the model with the highest accuracy among existing studies can only be applied to programs written in the JAVA language. To this end, we designed and implemented an additional step to standardize source change data into natural language using GPT. This text explains the process of extracting commit messages and source change data from Git, standardizing the source change data with GPT, and the learning process using the DistilBERT model. As a result of verification, an accuracy of 91% was measured. The proposed model was implemented and verified to ensure accuracy and to be able to classify without being dependent on a specific program. In the future, we plan to study a classification model using Bard and a management tool model helpful to the project using the proposed classification model.

Global Flood Alert System (GFAS)

  • Umeda, Kazuo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2006.05a
    • /
    • pp.28-35
    • /
    • 2006
  • Global Flood Alert System (GFAS) is an attempt to make the best use of satellite rainfall data in flood forecasting. The project of GFAS is promoted both by Ministry of Land, Infrastructure and Transport-Japan (MLIT) and Japan Aerospace Exploration Agency (JAXA), under which Infrastructure Development Institute-Japan (IDI) has been working on the development of Internet-based information system and just launched trial run of GFAS in April 2006 on International Flood Network (IFNet) website. The function of GFAS is to connect space agencies and hydrological services/river authorities in charge of flood forecasting and warning by providing global rainfall information in maps, text data e-mails and so on which is produced from binary global rainfall data downloaded from National Aeronautics and Space Administration (NASA) website. Although the effectiveness of satellite rainfall data in flood forecasting and warning has yet to be verified, satellite rainfall is expected to play an important role to strengthen existing flood forecasting systems by diversifying hydrological data source.

  • PDF

Classifying Temporal Topics with Similar Patterns on Twitter

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.3
    • /
    • pp.295-300
    • /
    • 2011
  • Twitter is a popular microblogging service that enables the users to send and read short text messages. These messages are becoming source to analyze topic trends and identify relations among temporal topics. In this paper, we propose a method to classify the temporal topics on Twitter as a problem of grouping the similar patterns. To provide a starting point for a classification under the same topics, we identify the content word weighting scheme based on Latent Dirichlet Allocation (LDA). And we formulate how the temporal topics in the time window can be classified like peaky topics, constant topics, and periodic topics. We provide different real case studies which show the validity of the proposed method. Evaluations show that the proposed method is useful as a classifying model in the analysis of the temporal topics.

A Topic Modeling Approach to Marketing Strategies for Smartphone Companies (소셜미디어 토픽모델링을 통한 스마트폰 마케팅 전략 수립 지원)

  • Cha, Yoon-Jeong;Lee, Jee-Hye;Choi, Jee-Eun;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.16 no.4
    • /
    • pp.69-87
    • /
    • 2015
  • Given the huge number of data produced by its users, SNS is a great source of customer insights. Since viral trends in SNS reflect customers' direct feedback, companies can draw out highly meaningful business insights when such data is effectively analyzed and managed. However, while the importance of understanding SNS big data keeps growing, the methods for analyzing atypical data such as SNS postings for business insights over product has not been well studied. This study aims to demonstrate the way to exploit topic modeling method to support marketing strategy generation and therefore leverage business process. First, we conducted topic modeling analysis for twitter data of Apple and Samsung smartphones. Then we comparatively examined the analysis results to draw meaningful market insights about each smartphone product. Finally, we draw out a strategic marketing recommendation for each smartphone brand based on the findings.

Multilingual Automatic Translation Based on UNL: A Case Study for the Vietnamese Language

  • Thuyen, Phan Thi Le;Hung, Vo Trung
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.77-84
    • /
    • 2016
  • In the field of natural language processing, Universal Networking Language (UNL) has been used by various researchers as an inter-lingual approach to automatic machine translation. The UNL system consists of two main components, namely, EnConverter for converting text from a source language to UNL, and DeConverter for converting from UNL to a target language. Currently, many projects are researching how to apply UNL to different languages. In this paper, we introduce the tools that are UNL's applications and discuss how to reuse them to encode a Vietnamese sentence into UNL expressions and decode UNL expressions into a Vietnamese sentence. The testing was done with about 1,000 Vietnamese sentences (a dictionary that includes 4573 entries and 3161 rules). In addition, we compare the proportion of sentences translated based on a direct method (Google Translator) and another one based on UNL.