• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.029 seconds

XSLT document editing for XML document conversion in WYSIWYG environment (XML 문서 변환을 위한 XSLT 문서편집 시스템에 관한 연구)

  • 김창수;박주상;이용준;김진수;정희경
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.212-216
    • /
    • 2004
  • XML(extensible Markup Language) of W3C(World Wide Web Consortium) that is used the standard core technology of data exchange on the current Internet is an independent data type of usable at the all platforms. Especially, it can handle rapidly because of the integration of each other data types that is exchanged. Between each application and system that built at an enterprise in the past. However, W3C had notice to use XSLT(extensible Stylesheet Language Transformation) that is document transformable standard to descript expression information in XML documents because documents of XML only have a logical structure information. It is designed for XML that is developed for data exchange on the internet. Moreover, it is proposed to process and to change as other data type for expression XML documents for user. Tris thesis design and implement XSLT document editing system transformable as a HTML data type applying XSLT at XML and developed the system. It can edit XSLT document that descript expression information in XML document that is used for data editing in the WYSIWYG environment.

  • PDF

A study on the User Experience at Unmanned Checkout Counter Using Big Data Analysis (빅데이터를 활용한 편의점 간편식에 대한 의미 분석)

  • Kim, Ae-sook;Ryu, Gi-hwan;Jung, Ju-hee;Kim, Hee-young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.375-380
    • /
    • 2022
  • The purpose of this study is to find out consumers' perception and meaning of convenience store convenience food by using big data. For this study, NNAVER and Daum analyzed news, intellectuals, blogs, cafes, intellectuals(tips), and web documents, and used 'convenience store convenience food' as keywords for data search. The data analysis period was selected as 3 years from January 1, 2019 to December 31, 2021. For data collection and analysis, frequency and matrix data were extracted using TEXTOM, and network analysis and visualization analysis were conducted using the NetDraw function of the UCINET 6 program. As a result, convenience store convenience foods were clustered into health, diversity, convenience, and economy according to consumers' selection attributes. It is expected to be the basis for the development of a new convenience menu that pursues convenience and convenience based on consumers' meaning of convenience store convenience foods such as appropriate prices, discount coupons, and events.

Understanding the Categories and Characteristics of Depressive Moods in Chatbot Data (챗봇 데이터에 나타난 우울 담론의 범주와 특성의 이해)

  • Chin, HyoJin;Jung, Chani;Baek, Gumhee;Cha, Chiyoung;Choi, Jeonghoi;Cha, Meeyoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.9
    • /
    • pp.381-390
    • /
    • 2022
  • Influenced by a culture that prefers non-face-to-face activity during the COVID-19 pandemic, chatbot usage is accelerating. Chatbots have been used for various purposes, not only for customer service in businesses and social conversations for fun but also for mental health. Chatbots are a platform where users can easily talk about their depressed moods because anonymity is guaranteed. However, most relevant research has been on social media data, especially Twitter data, and few studies have analyzed the commercially used chatbots data. In this study, we identified the characteristics of depressive discourse in user-chatbot interaction data by analyzing the chats, including the word 'depress,' using the topic modeling algorithm and the text-mining technique. Moreover, we compared its characteristics with those of the depressive moods in the Twitter data. Finally, we draw several design guidelines and suggest avenues for future research based on the study findings.

A Study on Comparison of Response Time using Open API of Daishin Securities Co. and eBestInvestment and Securities Co.

  • Ryu, Gui Yeol
    • International journal of advanced smart convergence
    • /
    • v.11 no.1
    • /
    • pp.11-18
    • /
    • 2022
  • Securities and investment services have and use large data. Investors started to invest through their own analysis methods. There are 22 major securities and investment companies in Korea and only 6 companies support open API. Python is effective for requesting and receiving, analyzing text data from open API. Daishin Securities Co. is the only open API that officially supports Python, and eBest Investment & Securities Co. unofficially supports Python. There are two important differences between CYBOS plus of Daishin Securities Co. and xingAPI of eBest Investment & Securities Co. First, we must log in to CYBOS plus to access the server of Daishin Securities Co. And the python program does not require a logon. However, to receive data using xingAPI, users log on in an individual Python program. Second, CYBOS plus receives data in a Request/Reply method, and zingAPI receives data through events. It can be thought that these points will show a difference in response time. Response time is important to users who use open APIs. Data were measured from August 5, 2021, to February 3, 2022. For each measurement, 15 repeated measurements were taken to obtain 420 measurements. To increase the accuracy of the study, both APIs were measured alternately under same conditions. A paired t-test was performed to test the hypothesis that the null hypothesis is there was no difference in means. The p-value is 0.2961, we do not reject null hypothesis. Therefore, we can see that there is no significant difference between means. From the boxplot, we can see that the distribution of the response time of eBest is more spread out than that of Cybos, and the position of the center is slightly lower. CYBOS plus has no restrictions on Python programming, but xingAPI has some limits because it indirectly supports Python programming. For example, there is a limit to receiving more than one current price.

Development of System for Enhancing the Quality of Power Generation Facilities Failure History Data Based on Explainable AI (XAI) (XAI 기반 발전설비 고장 기록 데이터 품질 향상 시스템 개발)

  • Kim Yu Rim;Park Jeong In;Park Dong Hyun;Kang Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.3
    • /
    • pp.479-493
    • /
    • 2024
  • Purpose: The deterioration in the quality of failure history data due to differences in interpretation of failures among workers at power plants and the lack of consistency in the way failures are recorded negatively impacts the efficient operation of power plants. The purpose of this study is to propose a system that classifies power generation facilities failures consistently based on the failure history text data created by the workers. Methods: This study utilizes data collected from three coal unloaders operated by Korea Midland Power Co., LTD, from 2012 to 2023. It classifies failures based on the results of Soft Voting, which incorporates the prediction probabilities derived from applying the predict_proba technique to four machine learning models: Random Forest, Logistic Regression, XGBoost, and SVM, along with scores obtained by constructing word dictionaries for each type of failure using LIME, one of the XAI (Explainable Artificial Intelligence) methods. Through this, failure classification system is proposed to improve the quality of power generation facilities failure history data. Results: The results of this study are as follows. When the power generation facilities failure classification system was applied to the failure history data of Continuous Ship Unloader, XGBoost showed the best performance with a Macro_F1 Score of 93%. When the system proposed in this study was applied, there was an increase of up to 0.17 in the Macro_F1 Score for Logistic Regression compared to when the model was applied alone. All four models used in this study, when the system was applied, showed equal or higher values in Accuracy and Macro_F1 Score than the single model alone. Conclusion: This study propose a failure classification system for power generation facilities to improve the quality of failure history data. This will contribute to cost reduction and stability of power generation facilities, as well as further improvement of power plant operation efficiency and stability.

Automatic Clustering of Same-Name Authors Using Full-text of Articles (논문 원문을 이용한 동명 저자 자동 군집화)

  • Kang, In-Su;Jung, Han-Min;Lee, Seung-Woo;Kim, Pyung;Goo, Hee-Kwan;Lee, Mi-Kyung;Goo, Nam-Ang;Sung, Won-Kyung
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.652-656
    • /
    • 2006
  • Bibliographic information retrieval systems require bibliographic data such as authors, organizations, source of publication to be uniquely identified using keys. In particular, when authors are represented simply as their names, users bear the burden of manually discriminating different users of the same name. Previous approaches to resolving the problem of same-name authors rely on bibliographic data such as co-author information, titles of articles, etc. However, these methods cannot handle the case of single author articles, or the case when articles do not have common terms in their titles. To complement the previous methods, this study introduces a classification-based approach using similarity between full-text of articles. Experiments using recent domestic proceedings showed that the proposed method has the potential to supplement the previous meta-data based approaches.

  • PDF

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF

Citizen Sentiment Analysis of the Social Disaster by Using Opinion Mining (오피니언 마이닝 기법을 이용한 사회적 재난의 시민 감성도 분석)

  • Seo, Min Song;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.37-46
    • /
    • 2017
  • Recently, disaster caused by social factors is frequently occurring in Korea. Prediction about what crisis could happen is difficult, raising the citizen's concern. In this study, we developed a program to acquire tweet data by applying Python language based Tweepy plug-in, regarding social disasters such as 'Nonspecific motive crimes' and 'Oxy' products. These data were used to evaluate psychological trauma and anxiety of citizens through the text clustering analysis and the opinion mining analysis of the R Studio program after natural language processing. In the analysis of the 'Oxy' case, the accident of Sewol ferry, the continual sale of Oxy products of the Oxy had the highest similarity and 'Nonspecific motive crimes', the coping measures of the government against unexpected incidents such as the 'incident' of the screen door, the accident of Sewol ferry and 'Nonspecific motive crime' due to misogyny in Busan, had the highest similarity. In addition, the average index of the Citizens sentiment score in Nonspecific motive crimes was more negative than that in the Oxy case by 11.61%p. Therefore, it is expected that the findings will be utilized to predict the mental health of citizens to prevent future accidents.

Hybrid Method using Frame Selection and Weighting Model Rank to improve Performance of Real-time Text-Independent Speaker Recognition System based on GMM (GMM 기반 실시간 문맥독립화자식별시스템의 성능향상을 위한 프레임선택 및 가중치를 이용한 Hybrid 방법)

  • 김민정;석수영;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.512-522
    • /
    • 2002
  • In this paper, we propose a hybrid method which is mixed with frame selection and weighting model rank method, based on GMM(gaussian mixture model), for real-time text-independent speaker recognition system. In the system, maximum likelihood estimation was used for GMM parameter optimization, and maximum likelihood was used for recognition basically Proposed hybrid method has two steps. First, likelihood score was calculated with speaker models and test data at frame level, and the difference is calculated between the biggest likelihood value and second. And then, the frame is selected if the difference is bigger than threshold. The second, instead of calculated likelihood, weighting value is used for calculating total score at each selected frame. Cepstrum coefficient and regressive coefficient were used as feature parameters, and the database for test and training consists of several data which are collected at different time, and data for experience are selected randomly In experiments, we applied each method to baseline system, and tested. In speaker recognition experiments, proposed hybrid method has an average of 4% higher recognition accuracy than frame selection method and 1% higher than W method, implying the effectiveness of it.

  • PDF

Hypermedia, Multimedia and Hypertext: Definitions and Overview (하이퍼미디어.멀티미디어.하이퍼텍스트: 정의(定義)와 개관(槪觀))

  • Kim, Ji-Hee
    • Journal of Information Management
    • /
    • v.25 no.1
    • /
    • pp.24-46
    • /
    • 1994
  • In this paper I will discuss definitions of hypermedia, multimedia and hypertext. Hypertext is the grouping of relevant information in the form of nodes. These nodes are then connected together through links. In the case of hypertext the nodes contain text or graphics. Multimedia is the combining of different media types for example sound, animation, text, graphics and video for the presentation of information by making use of computers. Hypermedia can be viewed as an extension of hypertext and multimedia. It is based on the concept of hypertext that uses nodes and links in the structuring of information in the system. In this case the nodes consist of an the different data types that are mentioned in the multimedia definition above. The 'node-and-link' concept is used in organisation of the information in hypermedia systems. The 'book' metaphor is an example of the way these systems are implemented. This concept is explained and a few advantages and disadvantages of making use of hypermedia systems are discussed. A new approach for the development of hypermedia systems, namely the knowledge-based approach is now looked into. Joel Peing-Ling Loo proposed this approach because he thought that it is the most effective way for handling this kind of technology. A semantic-based hypermedia model is developed in this approach to formulate solutions for the restrictions in presenting information authoring, maintenance and retrieval. The knowledge-based presentation of information includes the use of conventional data structures. These data structures make use of frames(objects), slots and the inheritance theory that is also used in expert systems. Relations develop between the different objects as these objects are included in the database. Relations can also exist between frames by means of attributes that belong to the frames.

  • PDF