• Title/Summary/Keyword: Dataset Management

Search Result 540, Processing Time 0.042 seconds

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

Value of Information Technology Outsourcing: An Empirical Analysis of Korean Industries (IT 아웃소싱의 가치에 관한 연구: 한국 산업에 대한 실증분석)

  • Han, Kun-Soo;Lee, Kang-Bae
    • Asia pacific journal of information systems
    • /
    • v.20 no.3
    • /
    • pp.115-137
    • /
    • 2010
  • Information technology (IT) outsourcing, the use of a third-party vendor to provide IT services, started in the late 1980s and early 1990s in Korea, and has increased rapidly since 2000. Recently, firms have increased their efforts to capture greater value from IT outsourcing. To date, there have been a large number of studies on IT outsourcing. Most prior studies on IT outsourcing have focused on outsourcing practices and decisions, and little attention has been paid to objectively measuring the value of IT outsourcing. In addition, studies that examined the performance of IT outsourcing have mainly relied on anecdotal evidence or practitioners' perceptions. Our study examines the contribution of IT outsourcing to economic growth in Korean industries over the 1990 to 2007 period, using a production function framework and a panel data set for 54 industries constructed from input-output tables, fixed-capital formation tables, and employment tables. Based on the framework and estimation procedures that Han, Kauffman and Nault (2010) used to examine the economic impact of IT outsourcing in U.S. industries, we evaluate the impact of IT outsourcing on output and productivity in Korean industries. Because IT outsourcing started to grow at a significantly more rapid pace in 2000, we compare the impact of IT outsourcing in pre- and post-2000 periods. Our industry-level panel data cover a large proportion of Korean economy-54 out of 58 Korean industries. This allows us greater opportunity to assess the impacts of IT outsourcing on objective performance measures, such as output and productivity. Using IT outsourcing and IT capital as our primary independent variables, we employ an extended Cobb-Douglas production function in which both variables are treated as factor inputs. We also derive and estimate a labor productivity equation to assess the impact of our IT variables on labor productivity. We use data from seven years (1990, 1993, 2000, 2003, 2005, 2006, and 2007) for which both input-output tables and fixed-capital formation tables are available. Combining the input-output tables and fixed-capital formation tables resulted in 54 industries. IT outsourcing is measured as the value of computer-related services purchased by each industry in a given year. All the variables have been converted to 2000 Korean Won using GDP deflators. To calculate labor hours, we use the average work hours for each sector provided by the OECD. To effectively control for heteroskedasticity and autocorrelation present in our dataset, we use the feasible generalized least squares (FGLS) procedures. Because the AR1 process may be industry-specific (i.e., panel-specific), we consider both common AR1 and panel-specific AR1 (PSAR1) processes in our estimations. We also include year dummies to control for year-specific effects common across industries, and sector dummies (as defined in the GDP deflator) to control for time-invariant sector-specific effects. Based on the full sample of 378 observations, we find that a 1% increase in IT outsourcing is associated with a 0.012~0.014% increase in gross output and a 1% increase in IT capital is associated with a 0.024~0.027% increase in gross output. To compare the contribution of IT outsourcing relative to that of IT capital, we examined gross marginal product (GMP). The average GMP of IT outsourcing was 6.423, which is substantially greater than that of IT capital at 2.093. This indicates that on average if an industry invests KRW 1 millon, it can increase its output by KRW 6.4 million. In terms of the contribution to labor productivity, we find that a 1% increase in IT outsourcing is associated with a 0.009~0.01% increase in labor productivity while a 1% increase in IT capital is associated with a 0.024~0.025% increase in labor productivity. Overall, our results indicate that IT outsourcing has made positive and economically meaningful contributions to output and productivity in Korean industries over the 1990 to 2007 period. The average GMP of IT outsourcing we report about Korean industries is 1.44 times greater than that in U.S. industries reported in Han et al. (2010). Further, we find that the contribution of IT outsourcing has been significantly greater in the 2000~2007 period during which the growth of IT outsourcing accelerated. Our study provides implication for policymakers and managers. First, our results suggest that Korean industries can capture further benefits by increasing investments in IT outsourcing. Second, our analyses and results provide a basis for managers to assess the impact of investments in IT outsourcing and IT capital in an objective and quantitative manner. Building on our study, future research should examine the impact of IT outsourcing at a more detailed industry level and the firm level.

A Comparative Analysis on Multiple Authorship Counting for Author Co-citation Analysis (저자동시인용분석을 위한 복수저자 기여도 산정 방식의 비교 분석)

  • Lee, Jae Yun;Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.2
    • /
    • pp.57-77
    • /
    • 2014
  • As co-authorship has been prevalent within science communities, counting the credit of co-authors appropriately is an important consideration, particularly in the context of identifying the knowledge structure of fields with author-based analysis. The purpose of this study is to compare the characteristics of co-author credit counting methods by utilizing correlations, multidimensional scaling, and pathfinder networks. To achieve this purpose, this study analyzed a dataset of 2,014 journal articles and 3,892 cited authors from the Journal of the Architectural Institute of Korea: Planning & Design from 2003 to 2008 in the field of Architecture in Korea. In this study, six different methods of crediting co-authors are selected for comparative analyses. These methods are first-author counting (m1), straight full counting (m2), and fractional counting (m3), proportional counting with a total score of 1 (m4), proportional counting with a total score between 1 and 2 (m5), and first-author-weighted fractional counting (m6). As shown in the data analysis, m1 and m2 are found as extreme opposites, since m1 counts only first authors and m2 assigns all co-authors equally with a credit score of 1. With correlation and multidimensional scaling analyses, among five counting methods (from m2 to m6), a group of counting methods including m3, m4, and m5 are found to be relatively similar. When the knowledge structure is visualized with pathfinder network, the knowledge structure networks from different counting methods are differently presented due to the connections of individual links. In addition, the internal validity shows that first-author-weighted fractional counting (m6) might be considered a better method to author clustering. Findings demonstrate that different co-author counting methods influence the network results of knowledge structure and a better counting method is revealed for author clustering.

Estimation of Reference Crop Evapotranspiration Using Backpropagation Neural Network Model (역전파 신경망 모델을 이용한 기준 작물 증발산량 산정)

  • Kim, Minyoung;Choi, Yonghun;O'Shaughnessy, Susan;Colaizzi, Paul;Kim, Youngjin;Jeon, Jonggil;Lee, Sangbong
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.6
    • /
    • pp.111-121
    • /
    • 2019
  • Evapotranspiration (ET) of vegetation is one of the major components of the hydrologic cycle, and its accurate estimation is important for hydrologic water balance, irrigation management, crop yield simulation, and water resources planning and management. For agricultural crops, ET is often calculated in terms of a short or tall crop reference, such as well-watered, clipped grass (reference crop evapotranspiration, $ET_o$). The Penman-Monteith equation recommended by FAO (FAO 56-PM) has been accepted by researchers and practitioners, as the sole $ET_o$ method. However, its accuracy is contingent on high quality measurements of four meteorological variables, and its use has been limited by incomplete and/or inaccurate input data. Therefore, this study evaluated the applicability of Backpropagation Neural Network (BPNN) model for estimating $ET_o$ from less meteorological data than required by the FAO 56-PM. A total of six meteorological inputs, minimum temperature, average temperature, maximum temperature, relative humidity, wind speed and solar radiation, were divided into a series of input groups (a combination of one, two, three, four, five and six variables) and each combination of different meteorological dataset was evaluated for its level of accuracy in estimating $ET_o$. The overall findings of this study indicated that $ET_o$ could be reasonably estimated using less than all six meteorological data using BPNN. In addition, it was shown that the proper choice of neural network architecture could not only minimize the computational error, but also maximize the relationship between dependent and independent variables. The findings of this study would be of use in instances where data availability and/or accuracy are limited.

Establishment of A WebGIS-based Information System for Continuous Observation during Ocean Research Vessel Operation (WebGIS 기반 해양 연구선 상시관측 정보 체계 구축)

  • HAN, Hyeon-Gyeong;LEE, Cholyoung;KIM, Tae-Hoon;HAN, Jae-Rim;CHOI, Hyun-Woo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.1
    • /
    • pp.40-53
    • /
    • 2021
  • Research vessels(R/Vs) used for ocean research move to the planned research area and perform ocean observations suitable for the research purpose. The five research vessels of the Korea Institute of Ocean Science & Technology(KIOST) are equipped with global positioning system(GPS), water depth, weather, sea surface layer temperature and salinity measurement equipment that can be observed at all times during cruise. An information platform is required to systematically manage and utilize the data produced through such continuous observation equipment. Therefore, the data flow was defined through a series of business analysis ranging from the research vessel operation plan to observation during the operation of the research vessel, data collection, data processing, data storage, display and service. After creating a functional design for each stage of the business process, KIOST Underway Meteorological & Oceanographic Information System(KUMOS), a Web-Geographic information system (Web-GIS) based information platform, was built. Since the data produced during the cruise of the R/Vs have characteristics of temporal and spatial variability, a quality management system was developed that considered these variabilities. For the systematic management and service of data, the KUMOS integrated Database(DB) was established, and functions such as R/V tracking, data display, search and provision were implemented. The dataset provided by KUMOS consists of cruise report, raw data, Quality Control(QC) flagged data, filtered data, cruise track line data, and data report for each cruise of the R/V. The business processing procedure and system of KUMOS for each function developed through this study are expected to serve as a benchmark for domestic ocean-related institutions and universities that have research vessels capable of continuous observations during cruise.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

Nutrients and Chlorophyll Dynamics Along the Longitudinal Gradients of Daechung Reservoir (대청호에서 종적구배에 따른 영양염류 및 엽록소의 역동성)

  • Bae, Dae-Yeul;Yang, Eun-Chan;Jung, Seung-Hyun;Lee, Jae-Hoon;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.40 no.2
    • /
    • pp.285-293
    • /
    • 2007
  • The study was to determine zonal characteristics of nutrients and chlorophyll and evaluate their trophic relations in Daechung Reservoir. For this study, we compared longterm water quality data among three zones along with trophic state using 1993 to 2002 dataset, obtained from the Ministry of Environment, Korea. Total phosphorous (TP), Secchi depth (SD) and chlorophyll (CHL) showed typical longitudinal declines from the riverine to lacustrine zone, but total nitrogen (TN) was not evident. Largest seasonal variations in TP and CHL occurred during the summer monsoon from July to August. In the reservoir, ambient TN averaged 1.67 mg $L^{-1}$ and ratios of TN : TP averaged 88.04, indicating that nitrogen is not likely limited but phosphorus limitation was evident. Trophic State Index (TSI), based on CHL, TP, and SD, varied depending on the zones and seasons. Mean TSI (TP) in the riverine zone was 62 during the monsoon, indicating a hypertrophic condition, whereas the mean was 40 in the lacustrine, indicating a nearly oligotrophic. Values of TSI (CHL) showed maximum in the transition zone during the monsoon. The deviation analysis of TSI showed that about 65% of TSI (CHL)-TSI (TP) and TSI (CHL)-TSI (SD) values were less than zero and the lowest values were -42, indicating an effect of inorganic turbidity on algal growth in the reservoir. Correlation analysis of CHL vs. SD shewed greater correlation coefficient (p<0.001, r=-0.47) in the transition than other two zones (p<0.001, $r{\leq}-0.40$). Correlation analysis of TP vs. CHL was greatest in the lacustrine and TP was minimum in the lacustrine zone, indicating a lowest yield of algal biomass in the lacustrine. Overall data suggests that zonal response of chlorophyll yield at a given nutrient unit is clearly differed among the longitudinal gradients, so the management strategy such as cross sectional modelling should be provided in each zone.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Design of Deep Learning-based Tourism Recommendation System Based on Perceived Value and Behavior in Intelligent Cloud Environment (지능형 클라우드 환경에서 지각된 가치 및 행동의도를 적용한 딥러닝 기반의 관광추천시스템 설계)

  • Moon, Seok-Jae;Yoo, Kyoung-Mi
    • Journal of the Korean Applied Science and Technology
    • /
    • v.37 no.3
    • /
    • pp.473-483
    • /
    • 2020
  • This paper proposes a tourism recommendation system in intelligent cloud environment using information of tourist behavior applied with perceived value. This proposed system applied tourist information and empirical analysis information that reflected the perceptual value of tourists in their behavior to the tourism recommendation system using wide and deep learning technology. This proposal system was applied to the tourism recommendation system by collecting and analyzing various tourist information that can be collected and analyzing the values that tourists were usually aware of and the intentions of people's behavior. It provides empirical information by analyzing and mapping the association of tourism information, perceived value and behavior to tourism platforms in various fields that have been used. In addition, the tourism recommendation system using wide and deep learning technology, which can achieve both memorization and generalization in one model by learning linear model components and neural only components together, and the method of pipeline operation was presented. As a result of applying wide and deep learning model, the recommendation system presented in this paper showed that the app subscription rate on the visiting page of the tourism-related app store increased by 3.9% compared to the control group, and the other 1% group applied a model using only the same variables and only the deep side of the neural network structure, resulting in a 1% increase in subscription rate compared to the model using only the deep side. In addition, by measuring the area (AUC) below the receiver operating characteristic curve for the dataset, offline AUC was also derived that the wide-and-deep learning model was somewhat higher, but more influential in online traffic.

Participation Level in Online Knowledge Sharing: Behavioral Approach on Wikipedia (온라인 지식공유의 참여정도: 위키피디아에 대한 행태적 접근)

  • Park, Hyun Jung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.97-121
    • /
    • 2013
  • With the growing importance of knowledge for sustainable competitive advantages and innovation in a volatile environment, many researches on knowledge sharing have been conducted. However, previous researches have mostly relied on the questionnaire survey which has inherent perceptive errors of respondents. The current research has drawn the relationship among primary participant behaviors towards the participation level in knowledge sharing, basically from online user behaviors on Wikipedia, a representative community for online knowledge collaboration. Without users' participation in knowledge sharing, knowledge collaboration for creating knowledge cannot be successful. By the way, the editing patterns of Wikipedia users are diverse, resulting in different revisiting periods for the same number of edits, and thus varying results of shared knowledge. Therefore, we illuminated the participation level of knowledge sharing from two different angles of number of edits and revisiting period. The behavioral dimensions affecting the level of participation in knowledge sharing includes the article talk for public discussion and user talk for private messaging, and community registration, which are observable on Wiki platform. Public discussion is being progressed on article talk pages arranged for exchanging ideas about each article topic. An article talk page is often divided into several sections which mainly address specific type of issues raised during the article development procedure. From the diverse opinions about the relatively trivial things such as what text, link, or images should be added or removed and how they should be restructured to the profound professional insights are shared, negotiated, and improved over the course of discussion. Wikipedia also provides personal user talk pages as a private messaging tool. On these pages, diverse personal messages such as casual greetings, stories about activities on Wikipedia, and ordinary affairs of life are exchanged. If anyone wants to communicate with another person, he or she visits the person's user talk page and leaves a message. Wikipedia articles are assessed according to seven quality grades, of which the featured article level is the highest. The dataset includes participants' behavioral data related with 2,978 articles, which have reached the featured article level, with editing histories of articles, their article talk histories, and user talk histories extracted from user talk pages for each article. The time period for analysis is from the initiation of articles until their promotion to the featured article level. The number of edits represents the total number of participation in the editing of an article, and the revisiting period is the time difference between the first and last edits. At first, the participation levels of each user category classified according to behavioral dimensions have been analyzed and compared. And then, robust regressions have been conducted on the relationships among independent variables reflecting the degree of behavioral characteristics and the dependent variable representing the participation level. Especially, through adopting a motivational theory adequate for online environment in setting up research hypotheses, this work suggests a theoretical framework for the participation level of online knowledge sharing. Consequently, this work reached the following practical behavioral results besides some theoretical implications. First, both public discussion and private messaging positively affect the participation level in knowledge sharing. Second, public discussion exerts greater influence than private messaging on the participation level. Third, a synergy effect of public discussion and private messaging on the number of edits was found, whereas a pretty weak negative interaction effect of them on the revisiting period was observed. Fourth, community registration has a significant impact on the revisiting period, whereas being insignificant on the number of edits. Fifth, when it comes to the relation generated from private messaging, the frequency or depth of relation is shown to be more critical than the scope of relation for the participation level.