• Title/Summary/Keyword: management information system

Search Result 18,197, Processing Time 0.047 seconds

LIM Implementation Method for Planning Biotope Area Ratio in Apartment Complex - Focused on Terrain and Pavement Modeling - (공동주택단지의 생태면적률 계획을 위한 LIM 활용방법 - 지형 및 포장재 모델링을 중심으로 -)

  • Kim, Bok-Young;Son, Yong-Hoon;Lee, Soon-Ji
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.3
    • /
    • pp.14-26
    • /
    • 2018
  • The Biotope Area Ratio (BAR) is a quantitative pre-planning index for sustainable development and an integrated indicator for the balanced development of buildings and outdoor spaces. However, it has been pointed out that there are problems in operations management: errors in area calculation, insufficiency in the underground soil condition and depth, reduction in biotope area after construction, and functional failure as a pre-planning index. To address these problems, this study proposes implementing LIM. Since the weights of the BAR are mainly decided by the underground soil condition and depth with land cover types, the study focused on the terrain and pavements. The model should conform to BIM guidelines and standards provided by government agencies and professional organizations. Thus, the scope and Level Of Detail (LOD) of the model were defined, and the method to build a model with BIM software was developed. An apartment complex on sloping ground was selected as a case study, a 3D terrain modeled, paving libraries created with property information on the BAR, and a LIM model completed for the site. Then the BAR was calculated and construction documents were created with the BAR table and pavement details. As results of the study, it was found that the application of the criteria on the BAR and calculation became accurate, and the efficiency of design tasks was improved by LIM. It also enabled the performance of evidence-based design on the terrain and underground structures. To adopt LIM, it is necessary to create and distribute LIM library manuals or templates, and build library content that comply with KBIMS standards. The government policy must also have practitioners submit BIM models in the certification system. Since it is expected that the criteria on planting types in the BAR will be expanded, further research is needed to build and utilize the information model for planting materials.

A Study on Significance Testing of Driver's Visual Behavior due to the VMS Message Display Forms on the Road (도로상 VMS 표출방식별 운전자 유의성 검증에 관한 연구)

  • Kum, Ki-Jung;Son, Young-Tae;Bae, Deok-Mo;Son, Seung-Neo
    • International Journal of Highway Engineering
    • /
    • v.7 no.4 s.26
    • /
    • pp.151-162
    • /
    • 2005
  • Variable Message Sign (VMS), which provides drivers with direct information about state of traffic congestion and for prevent an accident, is the most effective method among the methods of providing information in Advanced Transportation Management System. Currently establishment and the VMS which is operated foundation lets in Guidelines on the use of Variable message sign (a book of the VMS) of 1999 November the Ministry Construction & Transportation, these contents mean main viewpoint on physical part such as message special quality variable (font, character size and line space, word interval) and position mainly among standard about establishment in general. But, it is true that using without effect verification on the character of VMS display and that using mode of stationary-centered. In this paper, it executed significance test to effort verification on the character of VMS display for more practical and effective information transmission based on the driver viewpoint For the researches; develop 3D-Simulation, select characteristics of driver's visual cognition behavior (the conspicuity, the legibility and the comprehensibility), evaluation each issue (day or night, 80km/h or 100km/h). Especially, that used the Eye Marker Recorder to measure of reading-time (legibility) thus, confirmed objectivity and reduce an observational error. The results showed that the conspicuity is Flashing> Stationary>Scroll. The legibility is not deference that Flashing between stationary form. Also the comprehensibility result showed that Flashing> Stationary>Stroll form.

  • PDF

A Study of community diagnosis activity by Community Health Nurse Working in Health Centers (보건소 보건간호사의 지역사회 진단활동에 관한 조사연구)

  • Cho Won-Jung;Kim Young-Ran
    • Journal of Korean Public Health Nursing
    • /
    • v.6 no.1
    • /
    • pp.32-45
    • /
    • 1992
  • An important role of community health nurses in health centers is to solve community health problems found through data collection methodology which has been used to identify the health needs of the community, diagnose the health problems and to plan health programs suitable for the health problems. Also community health nurses must be prepared to know the community health needs and to participate in the planning process. Since 1956 when the health center law was established, community health nurses have really implemented only the services which the government has asked them to do. This has kept them busy enough. But these days as society is in rapid change, community health nurses should have the flexibility to deal with the social change and demands that are unique to their community each which has different health needs and demands. So community health nurses need to identify what community health problems exist in their particular communities. The purposes of this study were as follows. 1) To explore the suitability of the health programs which the government has asked the community health nurses to do for their own communities and if these programs are not suitable, to explore the reasons why. 2) To explore the degree to which the community health nurses have the ability to identify health problems in their own communities and activate the community diagnostic process. 3) To identify the degree that the community health nurses have the ability to implement plans related to community diagnosis. 4) To find out how much data related to community health problems, the community health nurses have and how they are utilizing it. 5) To measure the community health nurses self-confidence concerning diagnostic activities for community health. The study subjects were 454 Community Health Nurses working in Health Centers in Seoul, Korea. The period of data collection was 6 days(Nov. 9th 1991-Nov. 15th 1991). A questionnaire used for data collection was composed of three different items; general characteristics, community health diagnostic activities and self-confidence in performing diagnostic activities. The results of the study are as follows. First, over one third of the respondents replied that the government required activities for their communities are not appropriate. Of these activities the most frequent reply $(51.2\%)$ indicated that many of the activities in the community were inappropriate to the actual situation. Further, $25\%$ of the replies indicated that many activities were only administratively oriented and as such not appropriate. Second, $49.8\%$ of the respondents replied that they had done general assessments and had a general idea of the health problems of their community. Effective solutions to health problems could be found with an increase in health personnel and management ability according to $41.5\%$ of the respondents. Third, to the question as to whether they had ever independently implemented a plan towards solving community diagnosed problems, $52\%$ of nurses replied 'never', $40\%$ 'occasionally' but only $7.5\%$ replied that they did it frequently. Actually there was very little done even in the basic work of collecting the necessary data. Fourth, when asked how much of basic information they had collected that might be used in community diagnosis activity, of 26 items in 5 areas, there was hardly one for which complete data had been collected. Fifteen percent did have data on the geographical aspects of their area, housing distribution and types of housing, while $17.8\%$ knew the frequency with which the health center was used. Concerning community resources, even with a list of community resources, only $12.3\%$ had data on any of these resources, and this data was incomplete. Further, information about social work institutions, and facilities was also incomplete, only $14.2\%$ of the respondents had any data and even it was incomplete; that is, in general, the nurses did not have this information. Fifth, concerning the confidence of the community health nurse in their ability to carry out community diagnoses activities, $60\%$ replied that they were very or at least nominally confident, indicating that although they were not doing community diagnostic activities they felt they could do so, as they were carrying out home visits and program planning as part of their official duties. The following recommendations are made based on the results of this study. First; since the community health nurses have a high perception of the need for community diagnostic activities and. high confidence in their ability to carry out this activity and high percentage of respondents replied that with a little training they could do this even better it is recommended that community diagnostic activity training be included in the continuing education program for community health nurses. Second, in order for the Community Health Nurses to successfully solve the health problems of their respective community they reported to a need to increase the number of health personnel, improve the facilities and the system of managing their work. Considering this, it is recommended that ways be sought to remedy these deficits.

  • PDF

A Study on contents related to geography in "Myriad Things"(萬物門) of $Miscellaneous$ $Explanations$ $of$ $Seongho$(星湖僿說) (성호사설 '만물문(萬物門)'의 지리 관련내용 고찰)

  • Sohn, Yong-Taek
    • Journal of the Korean Geographical Society
    • /
    • v.47 no.1
    • /
    • pp.60-78
    • /
    • 2012
  • The main objective of this study is to conduct subnational population projections of Korea based on a Myriad Things" (萬物門), which is part of Seongho's representative work entitled $Miscellaneous$ $Explanations$ $of$ $Seongho$ (星湖僿說), has been in this paper in order to understand Seongho's "thinking on geography". To do so, contents related to geography were selected and these were discussed and interpreted in terms of the classification system of today's geographical knowledge. Following is the result of this research. First, information on astronomical geography and natural geography such as uplift, tornado, structure of soil, and the $yut$ board as well as humangeographical topics such as wild $ginseng$, cigarettes, hot pepper, traditional fruits and nuts (chestnuts, jujubes, and persimmons), Goryeo paper (Korean paper), mulberry trees, cotton plants, natural dye, policy about horses, magnetic compass needles, and farming implements for rice transplantation are mentioned in "Myriad Things" in relation to geography. Second, the depth of information described varies from topic to topic, but the topics on tornado and magnetic compass needles, horses, wild ginseng, traditional fruits and nuts, and $yut$ board are described in depth and in detail. Third, authenticity of the contents on these topics are "true" insofar as bibliographical information and citations are provided for support. Fourth, these topics reflect the interests and circumstances that are related to the "economic improvement of common people's livelihood" in those days, such as agriculture, crops, and transportation of goods. Fifth, the bibliography and citations explaining all instances reveal that China (Qing) is a great civilization of the advanced world and that the scholarship of Joseon relied on and accepted it. Sixth, except for horse raising and management, farming implements for rice transplantation, sericulture, and natural dying of cloth, most of the topics are useful even today. In short, theres is a profound aspect to the content that makes it possible to estimate the "geographical thinking". In general, the focus of the content of this book directly linked to the practical agricultural economy of the common people.

  • PDF

Bankruptcy Type Prediction Using A Hybrid Artificial Neural Networks Model (하이브리드 인공신경망 모형을 이용한 부도 유형 예측)

  • Jo, Nam-ok;Kim, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.79-99
    • /
    • 2015
  • The prediction of bankruptcy has been extensively studied in the accounting and finance field. It can have an important impact on lending decisions and the profitability of financial institutions in terms of risk management. Many researchers have focused on constructing a more robust bankruptcy prediction model. Early studies primarily used statistical techniques such as multiple discriminant analysis (MDA) and logit analysis for bankruptcy prediction. However, many studies have demonstrated that artificial intelligence (AI) approaches, such as artificial neural networks (ANN), decision trees, case-based reasoning (CBR), and support vector machine (SVM), have been outperforming statistical techniques since 1990s for business classification problems because statistical methods have some rigid assumptions in their application. In previous studies on corporate bankruptcy, many researchers have focused on developing a bankruptcy prediction model using financial ratios. However, there are few studies that suggest the specific types of bankruptcy. Previous bankruptcy prediction models have generally been interested in predicting whether or not firms will become bankrupt. Most of the studies on bankruptcy types have focused on reviewing the previous literature or performing a case study. Thus, this study develops a model using data mining techniques for predicting the specific types of bankruptcy as well as the occurrence of bankruptcy in Korean small- and medium-sized construction firms in terms of profitability, stability, and activity index. Thus, firms will be able to prevent it from occurring in advance. We propose a hybrid approach using two artificial neural networks (ANNs) for the prediction of bankruptcy types. The first is a back-propagation neural network (BPN) model using supervised learning for bankruptcy prediction and the second is a self-organizing map (SOM) model using unsupervised learning to classify bankruptcy data into several types. Based on the constructed model, we predict the bankruptcy of companies by applying the BPN model to a validation set that was not utilized in the development of the model. This allows for identifying the specific types of bankruptcy by using bankruptcy data predicted by the BPN model. We calculated the average of selected input variables through statistical test for each cluster to interpret characteristics of the derived clusters in the SOM model. Each cluster represents bankruptcy type classified through data of bankruptcy firms, and input variables indicate financial ratios in interpreting the meaning of each cluster. The experimental result shows that each of five bankruptcy types has different characteristics according to financial ratios. Type 1 (severe bankruptcy) has inferior financial statements except for EBITDA (earnings before interest, taxes, depreciation, and amortization) to sales based on the clustering results. Type 2 (lack of stability) has a low quick ratio, low stockholder's equity to total assets, and high total borrowings to total assets. Type 3 (lack of activity) has a slightly low total asset turnover and fixed asset turnover. Type 4 (lack of profitability) has low retained earnings to total assets and EBITDA to sales which represent the indices of profitability. Type 5 (recoverable bankruptcy) includes firms that have a relatively good financial condition as compared to other bankruptcy types even though they are bankrupt. Based on the findings, researchers and practitioners engaged in the credit evaluation field can obtain more useful information about the types of corporate bankruptcy. In this paper, we utilized the financial ratios of firms to classify bankruptcy types. It is important to select the input variables that correctly predict bankruptcy and meaningfully classify the type of bankruptcy. In a further study, we will include non-financial factors such as size, industry, and age of the firms. Thus, we can obtain realistic clustering results for bankruptcy types by combining qualitative factors and reflecting the domain knowledge of experts.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (연관규칙 마이닝에서의 동시성 기준 확장에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu;Ahn, Jae-Hyeon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.23-38
    • /
    • 2012
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems. Our experiments use a real dataset acquired from one of the largest internet shopping malls in Korea. We use 66,278 transactions of 3,847 customers conducted during the last two years. Overall results show that the accuracy of association rules of frequent shoppers (whose average duration between orders is relatively short) is higher than that of causal shoppers. In addition we discover that with frequent shoppers, the accuracy of association rules appears very high when the co-occurrence criteria of the training set corresponds to the validation set (i.e., target set). It implies that the co-occurrence criteria of frequent shoppers should be set according to the application purpose period. For example, an analyzer should use a day as a co-occurrence criterion if he/she wants to offer a coupon valid only for a day to potential customers who will use the coupon. On the contrary, an analyzer should use a month as a co-occurrence criterion if he/she wants to publish a coupon book that can be used for a month. In the case of causal shoppers, the accuracy of association rules appears to not be affected by the period of the application purposes. The accuracy of the causal shoppers' association rules becomes higher when the longer co-occurrence criterion has been adopted. It implies that an analyzer has to set the co-occurrence criterion for as long as possible, regardless of the application purpose period.

Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation (웹툰 콘텐츠 추천을 위한 소비자 감성 패턴 맵 개발)

  • Lee, Junsik;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.67-88
    • /
    • 2019
  • Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced. In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers' choice of webtoons. However, there is a lack of research to improve webtoons' recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform 'Naver Webtoon' to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named 'Immersion', 'Touch' and 'Irritant' respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named 'Snack', 'Drama', 'Irritant', and 'Romance'. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type. In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The 'Snack' cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to 'Snack' are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to 'Drama' are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a 'Drama' cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of 'Irritant' shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to 'Romance' do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as 'healing content' targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons' ecosystem better understand consumers and formulate strategies.

Impact of impulsiveness on mobile banking usage: Moderating effect of credit card use and mediating effect of SNS addiction (충동성이 모바일뱅킹 사용률에 미치는 영향: 신용카드 사용 여부의 조절효과와 SNS 중독의 매개효과)

  • Lee, Youmi;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.113-137
    • /
    • 2021
  • According to the clear potential of mobile banking growth, many studies related to this are being conducted, but in Korea, it is concentrated on the analysis of technical factors or consumers' intentions, behaviors, and satisfaction. In addition, even though it has a strong customer base of 20s, there are few studies that have been conducted specifically for this customer group. In order for mobile banking to take a leap forward, a strategy to secure various perspectives is needed not only through research on itself but also through research on external factors affecting mobile banking. Therefore, this study analyzes impulsiveness, credit card use, and SNS addiction among various external factors that can significantly affect mobile banking in their 20s. This study examines whether the relationship between impulsiveness and mobile banking usage depends on whether or not a credit card is used, and checks whether a customer's impulsiveness is possible by examining whether a credit card is used. Based on this, it is possible to establish new standards for classification of marketing target groups of mobile banking. After finding out the static or unsuitable relationship between whether to use a credit card and impulsiveness, we want to indirectly predict the customer's impulsiveness through whether to use a credit card or not to use a credit card. It also verifies the mediating effect of SNS addiction in the relationship between impulsiveness and mobile banking usage. For this analysis, the collected data were conducted according to research problems using the SPSS Statistics 25 program. The findings are as follows. First, positive urgency has been shown to have a significant static effect on mobile banking usage. Second, whether to use credit cards has shown moderating effects in the relationship between fraudulent urgency and mobile banking usage. Third, it has been shown that all subfactors of impulsiveness have significant static relationships with subfactors of SNS addiction. Fourth, it has been confirmed that the relationship between positive urgency, SNS addiction, and mobile banking usage has total effect and direct effect. The first result means that mobile banking usage may be high if positive urgency is measured relatively high, even if the multi-dimensional impulsiveness scale is low. The second result indicates that mobile banking usage rates were not affected by the independent variable, negative urgency, but were found to have a significant static relationship with negative urgency when using credit cards. The third result means that SNS is likely to become addictive if lack of premeditation or lack of perseverance is high because it provides instant enjoyment and satisfaction as a mobile-based service. This also means that SNS can be used as an avoidance space for those with negative urgency, and as an emotional expression space for those with high positive urgency.

The effect of Big-data investment on the Market value of Firm (기업의 빅데이터 투자가 기업가치에 미치는 영향 연구)

  • Kwon, Young jin;Jung, Woo-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.99-122
    • /
    • 2019
  • According to the recent IDC (International Data Corporation) report, as from 2025, the total volume of data is estimated to reach ten times higher than that of 2016, corresponding to 163 zettabytes. then the main body of generating information is moving more toward corporations than consumers. So-called "the wave of Big-data" is arriving, and the following aftermath affects entire industries and firms, respectively and collectively. Therefore, effective management of vast amounts of data is more important than ever in terms of the firm. However, there have been no previous studies that measure the effects of big data investment, even though there are number of previous studies that quantitatively the effects of IT investment. Therefore, we quantitatively analyze the Big-data investment effects, which assists firm's investment decision making. This study applied the Event Study Methodology, which is based on the efficient market hypothesis as the theoretical basis, to measure the effect of the big data investment of firms on the response of market investors. In addition, five sub-variables were set to analyze this effect in more depth: the contents are firm size classification, industry classification (finance and ICT), investment completion classification, and vendor existence classification. To measure the impact of Big data investment announcements, Data from 91 announcements from 2010 to 2017 were used as data, and the effect of investment was more empirically observed by observing changes in corporate value immediately after the disclosure. This study collected data on Big Data Investment related to Naver 's' News' category, the largest portal site in Korea. In addition, when selecting the target companies, we extracted the disclosures of listed companies in the KOSPI and KOSDAQ market. During the collection process, the search keywords were searched through the keywords 'Big data construction', 'Big data introduction', 'Big data investment', 'Big data order', and 'Big data development'. The results of the empirically proved analysis are as follows. First, we found that the market value of 91 publicly listed firms, who announced Big-data investment, increased by 0.92%. In particular, we can see that the market value of finance firms, non-ICT firms, small-cap firms are significantly increased. This result can be interpreted as the market investors perceive positively the big data investment of the enterprise, allowing market investors to better understand the company's big data investment. Second, statistical demonstration that the market value of financial firms and non - ICT firms increases after Big data investment announcement is proved statistically. Third, this study measured the effect of big data investment by dividing by company size and classified it into the top 30% and the bottom 30% of company size standard (market capitalization) without measuring the median value. To maximize the difference. The analysis showed that the investment effect of small sample companies was greater, and the difference between the two groups was also clear. Fourth, one of the most significant features of this study is that the Big Data Investment announcements are classified and structured according to vendor status. We have shown that the investment effect of a group with vendor involvement (with or without a vendor) is very large, indicating that market investors are very positive about the involvement of big data specialist vendors. Lastly but not least, it is also interesting that market investors are evaluating investment more positively at the time of the Big data Investment announcement, which is scheduled to be built rather than completed. Applying this to the industry, it would be effective for a company to make a disclosure when it decided to invest in big data in terms of increasing the market value. Our study has an academic implication, as prior research looked for the impact of Big-data investment has been nonexistent. This study also has a practical implication in that it can be a practical reference material for business decision makers considering big data investment.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.