• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.033 seconds

Analysis of Munitions Contract Work Using Process Mining (프로세스 마이닝을 이용한 군수품 계약업무 분석 : 공군 군수사 계약업무를 중심으로)

  • Joo, Yong Seon;Kim, Su Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.41-59
    • /
    • 2022
  • The timely procurement of military supplies is essential to maintain the military's operational capabilities, and contract work is the first step toward timely procurement. In addition, rapid signing of a contract enables consumers to set a leisurely delivery date and increases the possibility of budget execution, so it is essential to improve the contract process to prevent early execution of the budget and transfer or disuse. Recently, research using big data has been actively conducted in various fields, and process analysis using big data and process mining, an improvement technique, are also widely used in the private sector. However, the analysis of contract work in the military is limited to the level of individual analysis such as identifying the cause of each problem case of budget transfer and disuse contracts using the experience and fragmentary information of the person in charge. In order to improve the contract process, this study analyzed using the process mining technique with data on a total of 560 contract tasks directly contracted by the Department of Finance of the Air Force Logistics Command for about one year from November 2019. Process maps were derived by synthesizing distributed data, and process flow, execution time analysis, bottleneck analysis, and additional detailed analysis were conducted. As a result of the analysis, it was found that review/modification occurred repeatedly after request in a number of contracts. Repeated reviews/modifications have a significant impact on the delay in the number of days to complete the cost calculation, which has also been clearly revealed through bottleneck visualization. Review/modification occurs in more than 60% of the top 5 departments with many contract requests, and it usually occurs in the first half of the year when requests are concentrated, which means that a thorough review is required before requesting contracts from the required departments. In addition, the contract work of the Department of Finance was carried out in accordance with the procedures according to laws and regulations, but it was found that it was necessary to adjust the order of some tasks. This study is the first case of using process mining for the analysis of contract work in the military. Based on this, if further research is conducted to apply process mining to various tasks in the military, it is expected that the efficiency of various tasks can be derived.

Construction and Application of Intelligent Decision Support System through Defense Ontology - Application example of Air Force Logistics Situation Management System (국방 온톨로지를 통한 지능형 의사결정지원시스템 구축 및 활용 - 공군 군수상황관리체계 적용 사례)

  • Jo, Wongi;Kim, Hak-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.77-97
    • /
    • 2019
  • The large amount of data that emerges from the initial connection environment of the Fourth Industrial Revolution is a major factor that distinguishes the Fourth Industrial Revolution from the existing production environment. This environment has two-sided features that allow it to produce data while using it. And the data produced so produces another value. Due to the massive scale of data, future information systems need to process more data in terms of quantities than existing information systems. In addition, in terms of quality, only a large amount of data, Ability is required. In a small-scale information system, it is possible for a person to accurately understand the system and obtain the necessary information, but in a variety of complex systems where it is difficult to understand the system accurately, it becomes increasingly difficult to acquire the desired information. In other words, more accurate processing of large amounts of data has become a basic condition for future information systems. This problem related to the efficient performance of the information system can be solved by building a semantic web which enables various information processing by expressing the collected data as an ontology that can be understood by not only people but also computers. For example, as in most other organizations, IT has been introduced in the military, and most of the work has been done through information systems. Currently, most of the work is done through information systems. As existing systems contain increasingly large amounts of data, efforts are needed to make the system easier to use through its data utilization. An ontology-based system has a large data semantic network through connection with other systems, and has a wide range of databases that can be utilized, and has the advantage of searching more precisely and quickly through relationships between predefined concepts. In this paper, we propose a defense ontology as a method for effective data management and decision support. In order to judge the applicability and effectiveness of the actual system, we reconstructed the existing air force munitions situation management system as an ontology based system. It is a system constructed to strengthen management and control of logistics situation of commanders and practitioners by providing real - time information on maintenance and distribution situation as it becomes difficult to use complicated logistics information system with large amount of data. Although it is a method to take pre-specified necessary information from the existing logistics system and display it as a web page, it is also difficult to confirm this system except for a few specified items in advance, and it is also time-consuming to extend the additional function if necessary And it is a system composed of category type without search function. Therefore, it has a disadvantage that it can be easily utilized only when the system is well known as in the existing system. The ontology-based logistics situation management system is designed to provide the intuitive visualization of the complex information of the existing logistics information system through the ontology. In order to construct the logistics situation management system through the ontology, And the useful functions such as performance - based logistics support contract management and component dictionary are further identified and included in the ontology. In order to confirm whether the constructed ontology can be used for decision support, it is necessary to implement a meaningful analysis function such as calculation of the utilization rate of the aircraft, inquiry about performance-based military contract. Especially, in contrast to building ontology database in ontology study in the past, in this study, time series data which change value according to time such as the state of aircraft by date are constructed by ontology, and through the constructed ontology, It is confirmed that it is possible to calculate the utilization rate based on various criteria as well as the computable utilization rate. In addition, the data related to performance-based logistics contracts introduced as a new maintenance method of aircraft and other munitions can be inquired into various contents, and it is easy to calculate performance indexes used in performance-based logistics contract through reasoning and functions. Of course, we propose a new performance index that complements the limitations of the currently applied performance indicators, and calculate it through the ontology, confirming the possibility of using the constructed ontology. Finally, it is possible to calculate the failure rate or reliability of each component, including MTBF data of the selected fault-tolerant item based on the actual part consumption performance. The reliability of the mission and the reliability of the system are calculated. In order to confirm the usability of the constructed ontology-based logistics situation management system, the proposed system through the Technology Acceptance Model (TAM), which is a representative model for measuring the acceptability of the technology, is more useful and convenient than the existing system.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Rough Set Analysis for Stock Market Timing (러프집합분석을 이용한 매매시점 결정)

  • Huh, Jin-Nyung;Kim, Kyoung-Jae;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.77-97
    • /
    • 2010
  • Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na$\ddot{i}$ve and Boolean reasoning-based discretization. Equal frequency scaling fixes a number of intervals and examines the histogram of each variable, then determines cuts so that approximately the same number of samples fall into each of the intervals. Expert's knowledge-based discretization determines cuts according to knowledge of domain experts through literature review or interview with experts. Minimum entropy scaling implements the algorithm based on recursively partitioning the value set of each variable so that a local measure of entropy is optimized. Na$\ddot{i}$ve and Booleanreasoning-based discretization searches categorical values by using Na$\ddot{i}$ve scaling the data, then finds the optimized dicretization thresholds through Boolean reasoning. Although the rough set analysis is promising for market timing, there is little research on the impact of the various data discretization methods on performance from trading using the rough set analysis. In this study, we compare stock market timing models using rough set analysis with various data discretization methods. The research data used in this study are the KOSPI 200 from May 1996 to October 1998. KOSPI 200 is the underlying index of the KOSPI 200 futures which is the first derivative instrument in the Korean stock market. The KOSPI 200 is a market value weighted index which consists of 200 stocks selected by criteria on liquidity and their status in corresponding industry including manufacturing, construction, communication, electricity and gas, distribution and services, and financing. The total number of samples is 660 trading days. In addition, this study uses popular technical indicators as independent variables. The experimental results show that the most profitable method for the training sample is the na$\ddot{i}$ve and Boolean reasoning but the expert's knowledge-based discretization is the most profitable method for the validation sample. In addition, the expert's knowledge-based discretization produced robust performance for both of training and validation sample. We also compared rough set analysis and decision tree. This study experimented C4.5 for the comparison purpose. The results show that rough set analysis with expert's knowledge-based discretization produced more profitable rules than C4.5.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Exploring a Balanced Share of Slow Charging Options by Places Based on Heterogeneous Travel and Charging Behavior of Electric Vehicle Users (장소별 완속충전기 적정 보급 비율에 관한 연구 : 전기차 이용자의 통행 및 충전행태에 따른 이질성을 중심으로)

  • Jae Hyun Lee;Seo Youn Yoon;Hyeonmi Kim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.6
    • /
    • pp.21-35
    • /
    • 2022
  • With the support of local and central governments, various incentive policies for "green" cars have been established, and the number of electric vehicle users has been rapidly increasing in recent years. As a result, much attention is being given to establishing a user-centered charging infrastructure. A standard for the number of electric vehicle chargers to be supplied is being prepared based on building characteristics, but there is quite limited research on the appropriate ratio of slow and fast chargers based on the characteristics of each place. Therefore, this study derived an appropriate penetration ratio based on data about the distribution ratio of common slow chargers. These data were collected using a survey of actual electric vehicle users. Next, an analysis was done on how to categorize the needs of charging environments and to determine what criteria or characteristics to use for categorization. Based on the results of the survey analysis, three types of places were derived. Type-1 places require 10% of chargers to be slow chargers, Type-2 places require 40-60% of chargers to be slow chargers (i.e., around equal distribution of slow and fast chargers), and Type-3 places require more than 80% of chargers to be slow chargers. The required levels of slow chargers were classified by place type and by individual using latent class cluster analysis, which made it possible to categorize them into five clusters related to socioeconomic variables, vehicle characteristics, traffic, and charging behaviors. It was found that there was a high correlation between charging behavior, weekend travel behavior, gender, and income. The results and insights from this study could be used to establish charging infrastructure policies in the future and to prepare standards for supplying charging infrastructure according to changes in the electric vehicle market.

A Study on the Intelligent Online Judging System Using User-Based Collaborative Filtering

  • Hyun Woo Kim;Hye Jin Yun;Kwihoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.273-285
    • /
    • 2024
  • With the active utilization of Online Judge (OJ) systems in the field of education, various studies utilizing learner data have emerged. This research proposes a problem recommendation based on a user-based collaborative filtering approach with learner data to support learners in their problem selection. Assistance in learners' problem selection within the OJ system is crucial for enhancing the effectiveness of education as it impacts the learning path. To achieve this, this system identifies learners with similar problem-solving tendencies and utilizes their problem-solving history. The proposed technique has been implemented on an OJ site in the fields of algorithms and programming, operated by the Chungbuk Education Research and Information Institute. The technique's service utility and usability were assessed through expert reviews using the Delphi technique. Additionally, it was piloted with site users, and an analysis of the ratio of correctness revealed approximately a 16% higher submission rate for recommended problems compared to the overall submissions. A survey targeting users who used the recommended problems yielded a 78% response rate, with the majority indicating that the feature was helpful. However, low selection rates of recommended problems and low response rates within the subset of users who used recommended problems highlight the need for future research focusing on improving accessibility, enhancing user feedback collection, and diversifying learner data analysis.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

GIS-based Market Analysis and Sales Management System : The Case of a Telecommunication Company (시장분석 및 영업관리 역량 강화를 위한 통신사의 GIS 적용 사례)

  • Chang, Nam-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.61-75
    • /
    • 2011
  • A Geographic Information System(GIS) is a system that captures, stores, analyzes, manages and presents data with reference to geographic location data. In the later 1990s and earlier 2000s it was limitedly used in government sectors such as public utility management, urban planning, landscape architecture, and environmental contamination control. However, a growing number of open-source packages running on a range of operating systems enabled many private enterprises to explore the concept of viewing GIS-based sales and customer data over their own computer monitors. K telecommunication company has dominated the Korean telecommunication market by providing diverse services, such as high-speed internet, PSTN(Public Switched Telephone Network), VOLP (Voice Over Internet Protocol), and IPTV(Internet Protocol Television). Even though the telecommunication market in Korea is huge, the competition between major services providers is growing more fierce than ever before. Service providers struggled to acquire as many new customers as possible, attempted to cross sell more products to their regular customers, and made more efforts on retaining the best customers by offering unprecedented benefits. Most service providers including K telecommunication company tried to adopt the concept of customer relationship management(CRM), and analyze customer's demographic and transactional data statistically in order to understand their customer's behavior. However, managing customer information has still remained at the basic level, and the quality and the quantity of customer data were not enough not only to understand the customers but also to design a strategy for marketing and sales. For example, the currently used 3,074 legal regional divisions, which are originally defined by the government, were too broad to calculate sub-regional customer's service subscription and cancellation ratio. Additional external data such as house size, house price, and household demographics are also needed to measure sales potential. Furthermore, making tables and reports were time consuming and they were insufficient to make a clear judgment about the market situation. In 2009, this company needed a dramatic shift in the way marketing and sales activities, and finally developed a dedicated GIS_based market analysis and sales management system. This system made huge improvement in the efficiency with which the company was able to manage and organize all customer and sales related information, and access to those information easily and visually. After the GIS information system was developed, and applied to marketing and sales activities at the corporate level, the company was reported to increase sales and market share substantially. This was due to the fact that by analyzing past market and sales initiatives, creating sales potential, and targeting key markets, the system could make suggestions and enable the company to focus its resources on the demographics most likely to respond to the promotion. This paper reviews subjective and unclear marketing and sales activities that K telecommunication company operated, and introduces the whole process of developing the GIS information system. The process consists of the following 5 modules : (1) Customer profile cleansing and standardization, (2) Internal/External DB enrichment, (3) Segmentation of 3,074 legal regions into 46,590 sub_regions called blocks, (4) GIS data mart design, and (5) GIS system construction. The objective of this case study is to emphasize the need of GIS system and how it works in the private enterprises by reviewing the development process of the K company's market analysis and sales management system. We hope that this paper suggest valuable guideline to companies that consider introducing or constructing a GIS information system.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.