• Title/Summary/Keyword: machine-learning

Search Result 5,432, Processing Time 0.03 seconds

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Usefulness of Data Mining in Criminal Investigation (데이터 마이닝의 범죄수사 적용 가능성)

  • Kim, Joon-Woo;Sohn, Joong-Kweon;Lee, Sang-Han
    • Journal of forensic and investigative science
    • /
    • v.1 no.2
    • /
    • pp.5-19
    • /
    • 2006
  • Data mining is an information extraction activity to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Law enforcement agencies deal with mass data to investigate the crime and its amount is increasing due to the development of processing the data by using computer. Now new challenge to discover knowledge in that data is confronted to us. It can be applied in criminal investigation to find offenders by analysis of complex and relational data structures and free texts using their criminal records or statement texts. This study was aimed to evaluate possibile application of data mining and its limitation in practical criminal investigation. Clustering of the criminal cases will be possible in habitual crimes such as fraud and burglary when using data mining to identify the crime pattern. Neural network modelling, one of tools in data mining, can be applied to differentiating suspect's photograph or handwriting with that of convict or criminal profiling. A case study of in practical insurance fraud showed that data mining was useful in organized crimes such as gang, terrorism and money laundering. But the products of data mining in criminal investigation should be cautious for evaluating because data mining just offer a clue instead of conclusion. The legal regulation is needed to control the abuse of law enforcement agencies and to protect personal privacy or human rights.

  • PDF

Design Evaluation Model Based on Consumer Values: Three-step Approach from Product Attributes, Perceived Attributes, to Consumer Values (소비자 가치기반 디자인 평가 모형: 제품 속성, 인지 속성, 소비자 가치의 3단계 접근)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.57-76
    • /
    • 2017
  • Recently, consumer needs are diversifying as information technologies are evolving rapidly. A lot of IT devices such as smart phones and tablet PCs are launching following the trend of information technology. While IT devices focused on the technical advance and improvement a few years ago, the situation is changed now. There is no difference in functional aspects, so companies are trying to differentiate IT devices in terms of appearance design. Consumers also consider design as being a more important factor in the decision-making of smart phones. Smart phones have become a fashion items, revealing consumers' own characteristics and personality. As the design and appearance of the smartphone become important things, it is necessary to examine consumer values from the design and appearance of IT devices. Furthermore, it is crucial to clarify the mechanisms of consumers' design evaluation and develop the design evaluation model based on the mechanism. Since the influence of design gets continuously strong, various and many studies related to design were carried out. These studies can classify three main streams. The first stream focuses on the role of design from the perspective of marketing and communication. The second one is the studies to find out an effective and appealing design from the perspective of industrial design. The last one is to examine the consumer values created by a product design, which means consumers' perception or feeling when they look and feel it. These numerous studies somewhat have dealt with consumer values, but they do not include product attributes, or do not cover the whole process and mechanism from product attributes to consumer values. In this study, we try to develop the holistic design evaluation model based on consumer values based on three-step approach from product attributes, perceived attributes, to consumer values. Product attributes means the real and physical characteristics each smart phone has. They consist of bezel, length, width, thickness, weight and curvature. Perceived attributes are derived from consumers' perception on product attributes. We consider perceived size of device, perceived size of display, perceived thickness, perceived weight, perceived bezel (top - bottom / left - right side), perceived curvature of edge, perceived curvature of back side, gap of each part, perceived gloss and perceived screen ratio. They are factorized into six clusters named as 'Size,' 'Slimness,' 'No-Frame,' 'Roundness,' 'Screen Ratio,' and 'Looseness.' We conducted qualitative research to find out consumer values, which are categorized into two: look and feel values. We identified the values named as 'Silhouette,' 'Neatness,' 'Attractiveness,' 'Polishing,' 'Innovativeness,' 'Professionalism,' 'Intellectualness,' 'Individuality,' and 'Distinctiveness' in terms of look values. Also, we identifies 'Stability,' 'Comfortableness,' 'Grip,' 'Solidity,' 'Non-fragility,' and 'Smoothness' in terms of feel values. They are factorized into five key values: 'Sleek Value,' 'Professional Value,' 'Unique Value,' 'Comfortable Value,' and 'Solid Value.' Finally, we developed the holistic design evaluation model by analyzing each relationship from product attributes, perceived attributes, to consumer values. This study has several theoretical and practical contributions. First, we found consumer values in terms of design evaluation and implicit chain relationship from the objective and physical characteristics to the subjective and mental evaluation. That is, the model explains the mechanism of design evaluation in consumer minds. Second, we suggest a general design evaluation process from product attributes, perceived attributes to consumer values. It is an adaptable methodology not only smart phone but also other IT products. Practically, this model can support the decision-making when companies initiative new product development. It can help product designers focus on their capacities with limited resources. Moreover, if its model combined with machine learning collecting consumers' purchasing data, most preferred values, sales data, etc., it will be able to evolve intelligent design decision support system.

A Study on the Stereotype of ICT SMEs' R&D: Empirical Evidence from Korea (ICT 중소기업 R&D의 스테레오타입에 대한 연구 : 한국의 사례를 중심으로)

  • Jun, Seung-pyo;Choi, San;Jung, JaeOong
    • Journal of Korea Technology Innovation Society
    • /
    • v.20 no.2
    • /
    • pp.334-367
    • /
    • 2017
  • The ICT industry has been the main driver of Korea's economy with international competitiveness and is expected to be the growth engine that will revitalize the currently depressed economy. A broad range of different perspectives and opinions on the industry exist in Korea and overseas. Some of these are stereotypes, not all of which are based on objective evidence. Stereotypes refer to widely-held fixed opinions on a specific group and do not necessarily have negative connotations. However, they should not be viewed lightly because they can substantially affect decision-making process. In this regard, this study sought to review the stereotypes of ICT industry and identify objective and relative stereotypes. In the study, a decision-tree analysis was conducted on a survey result of 3,300 small and medium-sized enterprises (SMEs) in order to identify Korean ICT companies' characteristics that distinguish them from other technology companies. The decision-tree analysis, a data mining process based on machine learning, took a total of 291 variables into account in 10 subjects such as: corporate business in general, technology development activities as well as organization and people in technology development. Identifying the variables that distinguish ICT companies from other technology companies with the decision-tree analysis, the study then came up with a list of objective stereotypes of ICT companies. The findings from the stereotypes of Korean ICT companies are as follows. First, the companies are in need of technology policies that help R&D planning and market penetration. Second, policies must better support the companies working to sell new products or explore new business. Third, the companies need policies that support secure protection of development outcomes and proper management of IP rights. Fourth, the administrative procedures related to governmental support for ICT companies' R&D projects must be simplified. It is hoped that the outcome of this study will provide meaningful guidance in establishment, implementation and evaluation of technology policies for ICT SMEs, particularly to policymakers or researchers in relevant government agencies who determine R&D policies for ICT SMEs.

A Comparative Study on the Possibility of Land Cover Classification of the Mosaic Images on the Korean Peninsula (한반도 모자이크 영상의 토지피복분류 활용 가능성 탐색을 위한 비교 연구)

  • Moon, Jiyoon;Lee, Kwang Jae
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_4
    • /
    • pp.1319-1326
    • /
    • 2019
  • The KARI(Korea Aerospace Research Institute) operates the government satellite information application consultation to cope with ever-increasing demand for satellite images in the public sector, and carries out various support projects including the generation and provision of mosaic images on the Korean Peninsula every year to enhance user convenience and promote the use of satellite images. In particular, the government has wanted to increase the utilization of mosaic images on the Korean Peninsula and seek to classify and update mosaic images so that users can use them in their businesses easily. However, it is necessary to test and verify whether the classification results of the mosaic images can be utilized in the field since the original spectral information is distorted during pan-sharpening and color balancing, and there is a limitation that only R, G, and B bands are provided. Therefore, in this study, the reliability of the classification result of the mosaic image was compared to the result of KOMPSAT-3 image. The study found that the accuracy of the classification result of KOMPSAT-3 image was between 81~86% (overall accuracy is about 85%), while the accuracy of the classification result of mosaic image was between 69~72% (overall accuracy is about 72%). This phenomenon is interpreted not only because of the distortion of the original spectral information through pan-sharpening and mosaic processes, but also because NDVI and NDWI information were extracted from KOMPSAT-3 image rather than from the mosaic image, as only three color bands(R, G, B) were provided. Although it is deemed inadequate to distribute classification results extracted from mosaic images at present, it is believed that it will be necessary to explore ways to minimize the distortion of spectral information when making mosaic images and to develop classification techniques suitable for mosaic images as well as the provision of NIR band information. In addition, it is expected that the utilization of images with limited spectral information could be increased in the future if related research continues, such as the comparative analysis of classification results by geomorphological characteristics and the development of machine learning methods for image classification by objects of interest.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Assessment of climate change impact on aquatic ecology health indices in Han river basin using SWAT and random forest (SWAT 및 random forest를 이용한 기후변화에 따른 한강유역의 수생태계 건강성 지수 영향 평가)

  • Woo, So Young;Jung, Chung Gil;Kim, Jin Uk;Kim, Seong Joon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.10
    • /
    • pp.863-874
    • /
    • 2018
  • The purpose of this study is to evaluate the future climate change impact on stream aquatic ecology health of Han River watershed ($34,148km^2$) using SWAT (Soil and Water Assessment Tool) and random forest. The 8 years (2008~2015) spring (April to June) Aquatic ecology Health Indices (AHI) such as Trophic Diatom Index (TDI), Benthic Macroinvertebrate Index (BMI) and Fish Assessment Index (FAI) scored (0~100) and graded (A~E) by NIER (National Institute of Environmental Research) were used. The 8 years NIER indices with the water quality (T-N, $NH_4$, $NO_3$, T-P, $PO_4$) showed that the deviation of AHI score is large when the concentration of water quality is low, and AHI score had negative correlation when the concentration is high. By using random forest, one of the Machine Learning techniques for classification analysis, the classification results for the 3 indices grade showed that all of precision, recall, and f1-score were above 0.81. The future SWAT hydrology and water quality results under HadGEM3-RA RCP 4.5 and 8.5 scenarios of Korea Meteorological Administration (KMA) showed that the future nitrogen-related water quality in watershed average increased up to 43.2% by the baseflow increase effect and the phosphorus-related water quality decreased up to 18.9% by the surface runoff decrease effect. The future FAI and BMI showed a little better Index grade while the future TDI showed a little worse index grade. We can infer that the future TDI is more sensitive to nitrogen-related water quality and the future FAI and BMI are responded to phosphorus-related water quality.

Overview and Prospective of Satellite Chlorophyll-a Concentration Retrieval Algorithms Suitable for Coastal Turbid Sea Waters (연안 혼탁 해수에 적합한 위성 클로로필-a 농도 산출 알고리즘 개관과 전망)

  • Park, Ji-Eun;Park, Kyung-Ae;Lee, Ji-Hyun
    • Journal of the Korean earth science society
    • /
    • v.42 no.3
    • /
    • pp.247-263
    • /
    • 2021
  • Climate change has been accelerating in coastal waters recently; therefore, the importance of coastal environmental monitoring is also increasing. Chlorophyll-a concentration, an important marine variable, in the surface layer of the global ocean has been retrieved for decades through various ocean color satellites and utilized in various research fields. However, the commonly used chlorophyll-a concentration algorithm is only suitable for application in clear water and cannot be applied to turbid waters because significant errors are caused by differences in their distinct components and optical properties. In addition, designing a standard algorithm for coastal waters is difficult because of differences in various optical characteristics depending on the coastal area. To overcome this problem, various algorithms have been developed and used considering the components and the variations in the optical properties of coastal waters with high turbidity. Chlorophyll-a concentration retrieval algorithms can be categorized into empirical algorithms, semi-analytic algorithms, and machine learning algorithms. These algorithms mainly use the blue-green band ratio based on the reflective spectrum of sea water as the basic form. In constrast, algorithms developed for turbid water utilizes the green-red band ratio, the red-near-infrared band ratio, and the inherent optical properties to compensate for the effect of dissolved organisms and suspended sediments in coastal area. Reliable retrieval of satellite chlorophyll-a concentration from turbid waters is essential for monitoring the coastal environment and understanding changes in the marine ecosystem. Therefore, this study summarizes the pre-existing algorithms that have been utilized for monitoring turbid Case 2 water and presents the problems associated with the mornitoring and study of seas around the Korean Peninsula. We also summarize the prospective for future ocean color satellites, which can yield more accurate and diverse results regarding the ecological environment with the development of multi-spectral and hyperspectral sensors.

Artificial Intelligence In Wheelchair: From Technology for Autonomy to Technology for Interdependence and Care (휠체어 탄 인공지능: 자율적 기술에서 상호의존과 돌봄의 기술로)

  • HA, Dae-Cheong
    • Journal of Science and Technology Studies
    • /
    • v.19 no.2
    • /
    • pp.169-206
    • /
    • 2019
  • This article seeks to explore new relationships and ethics of human and technology by analyzing a cultural imaginary produced by artificial intelligence. Drawing on theoretical reflections of the Feminist Scientific and Technological Studies which understand science and technology as the matter of care(Puig de la Bellacas, 2011), this paper focuses on the fact that artificial intelligence and robots materialize cultural imaginary such as autonomy. This autonomy, defined as the capacity to adapt to a new environment through self-learning, is accepted as a way to conceptualize an authentic human or an ideal subject. However, this article argues that artificial intelligence is mediated by and dependent on invisible human labor and complex material devices, suggesting that such autonomy is close to fiction. The recent growth of the so-called 'assistant technology' shows that it is differentially visualizing the care work of both machines and humans. Technology and its cultural imaginary hide the care work of human workers and actively visualize the one of the machine. And they make autonomy and agency ideal humanness, leaving disabled bodies and dependency as unworthy. Artificial intelligence and its cultural imaginary negate the value of disabled bodies while idealizing abled-bodies, and result in eliminating the real relationship between man and technology as mutually dependent beings. In conclusion, the author argues that the technology we need is not the one to exclude the non-typical bodies and care work of others, but the one to include them as they are. This technology responsibly empathizes marginalized beings and encourages solidarity between fragile beings. Inspired by an art performance of artist Sue Austin, the author finally comes up with and suggests 'artificial intelligence in wheelchair' as an alternative figuration for the currently dominant 'autonomous artificial intelligence'.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.