• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.027 seconds

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

A Study of Influencing Factors Upon Using C4I Systems: The Perspective of Mediating Variables in a Structured Model (C4I 시스템 사용의 영향 요인에 관한 연구: 구조모형의 매개변수의 관점에서)

  • Kim, Chong-Man;Kim, In-Jai
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.73-94
    • /
    • 2009
  • The general aspects for the future warfare shows that the concept of firepower and maneuver centric warfare has been replacing with that of information and knowledge centric warfare. Thus, some developed countries are now trying to establish the information systems to perform intelligent warfare and innovate defense operations. The C4I(Command, Control, Communication, Computers and Intelligence for the Warrior) systems make it possible to do modern and systematic war operations. The basic idea of this study is to investigate how TAM(Technology Acceptance Model) can explain the acceptance behavior in military organizations. Because TAM is inadequate in explaining the acceptance processes forcomplex technologies and strict organizations, a revised research model based upon TAM was developed in order to assess the usage of the C4I system. The purpose of this study is to investigate factors affecting the usage of C4I in the Korean Army. The research model, based upon TAM, was extended through a belief construct such as self-efficacy as one of mediating variables. The self-efficacy has been used as a mediating variable for technology acceptance, and the variable was included in the research model. The external variables were selected on the basis of previous research. The external variables can be classified into following: 1) technological, 2) organizational, and 3) environmental factors on the basis of TOE(Technology-Organization-Environment) framework. The technological factor includes the information quality and the task-technology fitness. The organizational factor includes the influence of senior colleagues. The environmental factor includes the education/train data. The external variables are considered very important for explaining the behavior patterns of information technology or systems. A structured questionnaire was developed and administrated to those who were using the C4I system. Total 329 data were used for statistical data analyses. A confirmatory factor analysis and structured equation model were used as main statistical methods. Model fitness Indexes for measurement and structured models were verified before all 18 hypotheses were tested. This study shows that the perceived usefulness and the self-efficacy played their roles more than the perceived ease of use did in TAM. In military organizations, the perceived usefulness showed its mediating effects between external variables and dependent variable, but the perceived ease of use did not. These results imply that the perceived usefulness can explain the acceptance processes better than the perceived ease of use in the army. The self-efficacy was also used as one of the three mediating variables, and showed its mediating effects in explaining the acceptance processes. Such results also show that the self-efficacy can be selected as one possible belief construct in TAM. The perceived usefulness was influenced by such factors as senior colleagues, the information quality, and the task-technology fitness. The self-efficacy was affected by education/train and task-technology fitness. The actual usage of C4I was influenced not by the perceived ease of use but by the perceived usefulness and selfefficacy. This study suggests the followings: (1) An extended TAM can be applied to such strict organizations as the army; (2) Three mediation variables are included in the research model and tested at real situations; and (3) Several other implications are discussed.

Investigating the Impact of Corporate Social Responsibility on Firm's Short- and Long-Term Performance with Online Text Analytics (온라인 텍스트 분석을 통해 추정한 기업의 사회적책임 성과가 기업의 단기적 장기적 성과에 미치는 영향 분석)

  • Lee, Heesung;Jin, Yunseon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.13-31
    • /
    • 2016
  • Despite expectations of short- or long-term positive effects of corporate social responsibility (CSR) on firm performance, the results of existing research into this relationship are inconsistent partly due to lack of clarity about subordinate CSR concepts. In this study, keywords related to CSR concepts are extracted from atypical sources, such as newspapers, using text mining techniques to examine the relationship between CSR and firm performance. The analysis is based on data from the New York Times, a major news publication, and Google Scholar. We used text analytics to process unstructured data collected from open online documents to explore the effects of CSR on short- and long-term firm performance. The results suggest that the CSR index computed using the proposed text - online media - analytics predicts long-term performance very well compared to short-term performance in the absence of any internal firm reports or CSR institute reports. Our study demonstrates the text analytics are useful for evaluating CSR performance with respect to convenience and cost effectiveness.

The Improvement Plan for Indicator System of Personal Information Management Level Diagnosis in the Era of the 4th Industrial Revolution: Focusing on Application of Personal Information Protection Standards linked to specific IT technologies (제4차 산업시대의 개인정보 관리수준 진단지표체계 개선방안: 특정 IT기술연계 개인정보보호기준 적용을 중심으로)

  • Shin, Young-Jin
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.12
    • /
    • pp.1-13
    • /
    • 2021
  • This study tried to suggest ways to improve the indicator system to strengthen the personal information protection. For this purpose, the components of indicator system are derived through domestic and foreign literature, and it was selected as main the diagnostic indicators through FGI/Delphi analysis for personal information protection experts and a survey for personal information protection officers of public institutions. As like this, this study was intended to derive an inspection standard that can be reflected as a separate index system for personal information protection, by classifying the specific IT technologies of the 4th industrial revolution, such as big data, cloud, Internet of Things, and artificial intelligence. As a result, from the planning and design stage of specific technologies, the check items for applying the PbD principle, pseudonymous information processing and de-identification measures were selected as 2 common indicators. And the checklists were consisted 2 items related Big data, 5 items related Cloud service, 5 items related IoT, and 4 items related AI. Accordingly, this study expects to be an institutional device to respond to new technological changes for the continuous development of the personal information management level diagnosis system in the future.

The Improvement Plan for Personal Information Protection for Artificial Intelligence(AI) Service in South Korea (우리나라의 인공지능(AI)서비스를 위한 개인정보보호 개선방안)

  • Shin, Young-Jin
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.3
    • /
    • pp.20-33
    • /
    • 2021
  • This study is to suggest improvements of personal information protection in South Korea, according to requiring the safety of process and protection of personal information. Accordingly, based on data collection and analysis through literature research, this study derived the issues and suitable standards of personal information for major artificial intelligence services. In addition, this cases studies were reviewed, focusing on the legal compliance and porcessing compliance for personal information proection in major countries. And it suggested the improvement plan applied in South Korea. As the results, in legal compliance, it is required reorganization of related laws, responsibility and compliance to develop and provide AI, and operation of risk management for personal information protection laws in AI services. In terms of processing compliance, first, in pre-processing and refining, it is necessary to standardize data set reference models, control data set quality, and voluntarily label AI applications. Second, in development and utilization of algorithm, it is need to establish and apply a clear regulation of the algorithm. As such, South Korea should apply suitable improvement tasks for personal information protection of safe AI service.

An Empirical Study on the Cryptocurrency Investment Methodology Combining Deep Learning and Short-term Trading Strategies (딥러닝과 단기매매전략을 결합한 암호화폐 투자 방법론 실증 연구)

  • Yumin Lee;Minhyuk Lee
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.377-396
    • /
    • 2023
  • As the cryptocurrency market continues to grow, it has developed into a new financial market. The need for investment strategy research on the cryptocurrency market is also emerging. This study aims to conduct an empirical analysis on an investment methodology of cryptocurrency that combines short-term trading strategy and deep learning. Daily price data of the Ethereum was collected through the API of Upbit, the Korean cryptocurrency exchange. The investment performance of the experimental model was analyzed by finding the optimal parameters based on past data. The experimental model is a volatility breakout strategy(VBS), a Long Short Term Memory(LSTM) model, moving average cross strategy and a combined model. VBS is a short-term trading strategy that buys when volatility rises significantly on a daily basis and sells at the closing price of the day. LSTM is suitable for time series data among deep learning models, and the predicted closing price obtained through the prediction model was applied to the simple trading rule. The moving average cross strategy determines whether to buy or sell when the moving average crosses. The combined model is a trading rule made by using derived variables of the VBS and LSTM model using AND/OR for the buy conditions. The result shows that combined model is better investment performance than the single model. This study has academic significance in that it goes beyond simple deep learning-based cryptocurrency price prediction and improves investment performance by combining deep learning and short-term trading strategies, and has practical significance in that it shows the applicability in actual investment.

A Study on the Relationship between Social Media ESG Sentiment and Firm Performance (소셜미디어의 ESG 감성과 기업성과에 관한 연구)

  • Sujin Park;Sang-Yong Tom Lee
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.317-340
    • /
    • 2023
  • In a business context, ESG is defined as the use of environmental, social, and governance factors to assess a firm's progress in terms of sustainability. Social media has enabled the public to actively share firms' good and/or bad deeds, increasing public interest in ESG management. Therefore, this study aimed to investigate the association of firm performances with the respective sentiments towards each of environmental, social, and governance activities, as well as comprehensive ESG sentiments, which encompass all environmental, social, and governance sentiments. This study used panel regression models to examine the relationship between social media ESG sentiment and the Return on Assets (ROA) and Return on Equity (ROE) of 143 companies listed on the KOSPI 200. We collected data from 2018 to 2021, including sentiment data from a variety of social media channels, such as online communities, Instagram, blogs, Twitter, and other news. The results indicated that firm performance is significantly related to respective ESG and comprehensive ESG sentiments. This study has several implications. By using data from various social media channels, it presents an unbiased view of public ESG sentiment, rather than relying on ESG ratings, which may be influenced by rating agencies. Furthermore, the findings can be used to help firms determine the direction of their ESG management. Therefore, this study provides theoretical and practical insights for researchers and firms interested in ESG management.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Evaluation of Incident Detection Algorithms focused on APID, DES, DELOS and McMaster (돌발상황 검지알고리즘의 실증적 평가 (APID, DES, DELOS, McMaster를 중심으로))

  • Nam, Doo-Hee;Baek, Seung-Kirl;Kim, Sang-Gu
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.7 s.78
    • /
    • pp.119-129
    • /
    • 2004
  • This paper is designed to report the results of development and validation procedures in relation to the Freeway Incident Management System (FIMS) prototype development as part of Intelligent Transportation Systems Research and Development program. The central core of the FIMS is an integration of the component parts and the modular, but the integrated system for freeway management. The whole approach has been component-orientated, with a secondary emphasis being placed on the traffic characteristics at the sites. The first action taken during the development process was the selection of the required data for each components within the existing infrastructure of Korean freeway system. After through review and analysis of vehicle detection data, the pilot site led to the utilization of different technologies in relation to the specific needs and character of the implementation. This meant that the existing system was tested in a different configuration at different sections of freeway, thereby increasing the validity and scope of the overall findings. The incident detection module has been performed according to predefined system validation specifications. The system validation specifications have identified two component data collection and analysis patterns which were outlined in the validation specifications; the on-line and off-line testing procedural frameworks. The off-line testing was achieved using asynchronous analysis, commonly in conjunction with simulation of device input data to take full advantage of the opportunity to test and calibrate the incident detection algorithms focused on APID, DES, DELOS and McMaster. The simulation was done with the use of synchronous analysis, thereby providing a means for testing the incident detection module.