• Title/Summary/Keyword: algorithms

Search Result 16,353, Processing Time 0.045 seconds

A Study on People Counting in Public Metro Service using Hybrid CNN-LSTM Algorithm (Hybrid CNN-LSTM 알고리즘을 활용한 도시철도 내 피플 카운팅 연구)

  • Choi, Ji-Hye;Kim, Min-Seung;Lee, Chan-Ho;Choi, Jung-Hwan;Lee, Jeong-Hee;Sung, Tae-Eung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.131-145
    • /
    • 2020
  • In line with the trend of industrial innovation, IoT technology utilized in a variety of fields is emerging as a key element in creation of new business models and the provision of user-friendly services through the combination of big data. The accumulated data from devices with the Internet-of-Things (IoT) is being used in many ways to build a convenience-based smart system as it can provide customized intelligent systems through user environment and pattern analysis. Recently, it has been applied to innovation in the public domain and has been using it for smart city and smart transportation, such as solving traffic and crime problems using CCTV. In particular, it is necessary to comprehensively consider the easiness of securing real-time service data and the stability of security when planning underground services or establishing movement amount control information system to enhance citizens' or commuters' convenience in circumstances with the congestion of public transportation such as subways, urban railways, etc. However, previous studies that utilize image data have limitations in reducing the performance of object detection under private issue and abnormal conditions. The IoT device-based sensor data used in this study is free from private issue because it does not require identification for individuals, and can be effectively utilized to build intelligent public services for unspecified people. Especially, sensor data stored by the IoT device need not be identified to an individual, and can be effectively utilized for constructing intelligent public services for many and unspecified people as data free form private issue. We utilize the IoT-based infrared sensor devices for an intelligent pedestrian tracking system in metro service which many people use on a daily basis and temperature data measured by sensors are therein transmitted in real time. The experimental environment for collecting data detected in real time from sensors was established for the equally-spaced midpoints of 4×4 upper parts in the ceiling of subway entrances where the actual movement amount of passengers is high, and it measured the temperature change for objects entering and leaving the detection spots. The measured data have gone through a preprocessing in which the reference values for 16 different areas are set and the difference values between the temperatures in 16 distinct areas and their reference values per unit of time are calculated. This corresponds to the methodology that maximizes movement within the detection area. In addition, the size of the data was increased by 10 times in order to more sensitively reflect the difference in temperature by area. For example, if the temperature data collected from the sensor at a given time were 28.5℃, the data analysis was conducted by changing the value to 285. As above, the data collected from sensors have the characteristics of time series data and image data with 4×4 resolution. Reflecting the characteristics of the measured, preprocessed data, we finally propose a hybrid algorithm that combines CNN in superior performance for image classification and LSTM, especially suitable for analyzing time series data, as referred to CNN-LSTM (Convolutional Neural Network-Long Short Term Memory). In the study, the CNN-LSTM algorithm is used to predict the number of passing persons in one of 4×4 detection areas. We verified the validation of the proposed model by taking performance comparison with other artificial intelligence algorithms such as Multi-Layer Perceptron (MLP), Long Short Term Memory (LSTM) and RNN-LSTM (Recurrent Neural Network-Long Short Term Memory). As a result of the experiment, proposed CNN-LSTM hybrid model compared to MLP, LSTM and RNN-LSTM has the best predictive performance. By utilizing the proposed devices and models, it is expected various metro services will be provided with no illegal issue about the personal information such as real-time monitoring of public transport facilities and emergency situation response services on the basis of congestion. However, the data have been collected by selecting one side of the entrances as the subject of analysis, and the data collected for a short period of time have been applied to the prediction. There exists the limitation that the verification of application in other environments needs to be carried out. In the future, it is expected that more reliability will be provided for the proposed model if experimental data is sufficiently collected in various environments or if learning data is further configured by measuring data in other sensors.

A Study on the Landscape Philosophy of Hageohwon Garden (별업 하거원(何去園) 원림에 투영된 조영사상 연구)

  • Shin, Sang-Sup;Kim, Hyun-Wuk;Kang, Hyun-Min
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.30 no.1
    • /
    • pp.46-56
    • /
    • 2012
  • The research results of tracing the Landscape Philosophy of Hageowon garden(何去園) in Musu-dong, Daejon of Youhwadang, Kwon, Iijin(權以鎭, 1668-1734) is as below. The ideological background of the protagonist reflected in Hageowon is the Hyoje Ideology(filial piety and brotherly love, 孝弟) of Sinjongchuwon(painstakingly caring for one's ancestors), Musil ideology(pursuing ethical diligence and truthful mind, 務實) based on sadistic tradition and ethical rationalism, Confucionist Eunil Ideology(ideology on seclusion, 隱逸) of Cheonghanjiyeon(quiet relaxation, 淸閒之燕), and the Pungryu ideology(appreciation for the arts, 風流) of Taoism in the Taoist style. Thus, by substituting these ideological values into a space called Hageowon, the Byulup gardens(別業) such as the Symbolic garden(象徵園), meaning gaeden(意園), and miniascape garden(縮景園) were able to be constructed. 2) The space organization system of Hageowon is generally classified into three phases considering the hierarchy. The first territory is the transitional space having residential features, which is an area to reach peach tree - road(Taoist world 桃經) from Youhwadang(有懷堂). The second territory is a monumental memorial space where the Yocheondae(繞千臺), Jangwoodam(丈藕潭), Hwagae(花階), and the ancestral graves take place, centering on the yards of Sumanheon(收漫軒), and the third territory is the secluded space in the eastern outer garden where the mountain stream flows from the north to south and which is the vein of the left-hand blue dragon(靑龍) of the guardian mountain of Hageowon. 3) Symbolically, the first phase has symbolized the space as a meaningful scenery by overlapping the Confucionist place of Youhwadang - Gosudae(孤秀臺) - Odeokdae(五德臺), and the mystic world of Jukcheondang(竹遷堂) - peach tree - road(桃徑). The second phase, which is the space of Sumanheon(收漫軒), Yocheondae, and Jangwoodam, the symbolical value of Sinjongchuwon(愼終追遠) and the remembrance and longing for one's parents are reflected. The third phase, which is the eastern outer garden of Hageowon and where the mountain stream flows from the north to south, is composed of the east valley(東溪) - Hwalsudam(活水潭) - Sumi Waterfall(修眉瀑布). More specifically, (1) Mongjeong symbolizes the life of gaining knowledge through studying to realize one's foolishness, (2) Hwalsudam symbolizes a transcending attitude in life refusing to pursue wealth and fame, and (3) Jangwoodam symbolizes the gateway to the fairyland to enter the world of mystic gods. 4) The rationale behind Hageowon is that the two algorithms of Confucionism and Taoist Theory appear repeatedly and in an overlapping way. The Napoji(納汚池) and Hwalsudam, which pertains to the prelude of space development, has symbolized Susimyangseong(修心養成, meditating one's mind and improving one's nature), which is based on ethical rationalism. Moreover, if the Monjeong sphere pertaining to the eastern outer garden of Hageowon takes the Confucionist value system as its theme, including moral training, studying, and researching, Jangwudam, Sumi Waterfalls, and Unwa can be understood as a taste of Cheokbyeon(滌煩, eliminating troubles) for the arts where the mystic world is substituted as a meaningful scenery. 5) The miniascape technique called artificial mountain was substituted to Hageowon to construct a mystic world like the 12 peaks of Mt. Mu(巫山). By borrowing the symbolic meaning expressed in old poems, it has been named 'Habang(1/何放), Hwabong(2, 3/和峯), Chulgun(4, 5, 6/出群), Sinwan(7/神浣), Chwhigyu(8, 9, 10/聚糾), Cheomyo(11/處杳), Giyung(12/氣融).' The representative poet reciting artificial mountain were Wangeui(汪醫), Nosamgang(魯三江), Dubo(杜甫), Hanyou(韓愈), Jeonheaseong(錢希聖), and Beomseokho(范石湖). They related themselves with literature by transcending time and space and attempted to sing about the richness of the mental world by putting the mystic world and culture of appreciating the arts they pursued in the vacation home called Hageowon.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

An Empirical Study on Statistical Optimization Model for the Portfolio Construction of Sponsored Search Advertising(SSA) (키워드검색광고 포트폴리오 구성을 위한 통계적 최적화 모델에 대한 실증분석)

  • Yang, Hognkyu;Hong, Juneseok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.167-194
    • /
    • 2019
  • This research starts from the four basic concepts of incentive incompatibility, limited information, myopia and decision variable which are confronted when making decisions in keyword bidding. In order to make these concept concrete, four framework approaches are designed as follows; Strategic approach for the incentive incompatibility, Statistical approach for the limited information, Alternative optimization for myopia, and New model approach for decision variable. The purpose of this research is to propose the statistical optimization model in constructing the portfolio of Sponsored Search Advertising (SSA) in the Sponsor's perspective through empirical tests which can be used in portfolio decision making. Previous research up to date formulates the CTR estimation model using CPC, Rank, Impression, CVR, etc., individually or collectively as the independent variables. However, many of the variables are not controllable in keyword bidding. Only CPC and Rank can be used as decision variables in the bidding system. Classical SSA model is designed on the basic assumption that the CPC is the decision variable and CTR is the response variable. However, this classical model has so many huddles in the estimation of CTR. The main problem is the uncertainty between CPC and Rank. In keyword bid, CPC is continuously fluctuating even at the same Rank. This uncertainty usually raises questions about the credibility of CTR, along with the practical management problems. Sponsors make decisions in keyword bids under the limited information, and the strategic portfolio approach based on statistical models is necessary. In order to solve the problem in Classical SSA model, the New SSA model frame is designed on the basic assumption that Rank is the decision variable. Rank is proposed as the best decision variable in predicting the CTR in many papers. Further, most of the search engine platforms provide the options and algorithms to make it possible to bid with Rank. Sponsors can participate in the keyword bidding with Rank. Therefore, this paper tries to test the validity of this new SSA model and the applicability to construct the optimal portfolio in keyword bidding. Research process is as follows; In order to perform the optimization analysis in constructing the keyword portfolio under the New SSA model, this study proposes the criteria for categorizing the keywords, selects the representing keywords for each category, shows the non-linearity relationship, screens the scenarios for CTR and CPC estimation, selects the best fit model through Goodness-of-Fit (GOF) test, formulates the optimization models, confirms the Spillover effects, and suggests the modified optimization model reflecting Spillover and some strategic recommendations. Tests of Optimization models using these CTR/CPC estimation models are empirically performed with the objective functions of (1) maximizing CTR (CTR optimization model) and of (2) maximizing expected profit reflecting CVR (namely, CVR optimization model). Both of the CTR and CVR optimization test result show that the suggested SSA model confirms the significant improvements and this model is valid in constructing the keyword portfolio using the CTR/CPC estimation models suggested in this study. However, one critical problem is found in the CVR optimization model. Important keywords are excluded from the keyword portfolio due to the myopia of the immediate low profit at present. In order to solve this problem, Markov Chain analysis is carried out and the concept of Core Transit Keyword (CTK) and Expected Opportunity Profit (EOP) are introduced. The Revised CVR Optimization model is proposed and is tested and shows validity in constructing the portfolio. Strategic guidelines and insights are as follows; Brand keywords are usually dominant in almost every aspects of CTR, CVR, the expected profit, etc. Now, it is found that the Generic keywords are the CTK and have the spillover potentials which might increase consumers awareness and lead them to Brand keyword. That's why the Generic keyword should be focused in the keyword bidding. The contribution of the thesis is to propose the novel SSA model based on Rank as decision variable, to propose to manage the keyword portfolio by categories according to the characteristics of keywords, to propose the statistical modelling and managing based on the Rank in constructing the keyword portfolio, and to perform empirical tests and propose a new strategic guidelines to focus on the CTK and to propose the modified CVR optimization objective function reflecting the spillover effect in stead of the previous expected profit models.

Animal Infectious Diseases Prevention through Big Data and Deep Learning (빅데이터와 딥러닝을 활용한 동물 감염병 확산 차단)

  • Kim, Sung Hyun;Choi, Joon Ki;Kim, Jae Seok;Jang, Ah Reum;Lee, Jae Ho;Cha, Kyung Jin;Lee, Sang Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.137-154
    • /
    • 2018
  • Animal infectious diseases, such as avian influenza and foot and mouth disease, occur almost every year and cause huge economic and social damage to the country. In order to prevent this, the anti-quarantine authorities have tried various human and material endeavors, but the infectious diseases have continued to occur. Avian influenza is known to be developed in 1878 and it rose as a national issue due to its high lethality. Food and mouth disease is considered as most critical animal infectious disease internationally. In a nation where this disease has not been spread, food and mouth disease is recognized as economic disease or political disease because it restricts international trade by making it complex to import processed and non-processed live stock, and also quarantine is costly. In a society where whole nation is connected by zone of life, there is no way to prevent the spread of infectious disease fully. Hence, there is a need to be aware of occurrence of the disease and to take action before it is distributed. Epidemiological investigation on definite diagnosis target is implemented and measures are taken to prevent the spread of disease according to the investigation results, simultaneously with the confirmation of both human infectious disease and animal infectious disease. The foundation of epidemiological investigation is figuring out to where one has been, and whom he or she has met. In a data perspective, this can be defined as an action taken to predict the cause of disease outbreak, outbreak location, and future infection, by collecting and analyzing geographic data and relation data. Recently, an attempt has been made to develop a prediction model of infectious disease by using Big Data and deep learning technology, but there is no active research on model building studies and case reports. KT and the Ministry of Science and ICT have been carrying out big data projects since 2014 as part of national R &D projects to analyze and predict the route of livestock related vehicles. To prevent animal infectious diseases, the researchers first developed a prediction model based on a regression analysis using vehicle movement data. After that, more accurate prediction model was constructed using machine learning algorithms such as Logistic Regression, Lasso, Support Vector Machine and Random Forest. In particular, the prediction model for 2017 added the risk of diffusion to the facilities, and the performance of the model was improved by considering the hyper-parameters of the modeling in various ways. Confusion Matrix and ROC Curve show that the model constructed in 2017 is superior to the machine learning model. The difference between the2016 model and the 2017 model is that visiting information on facilities such as feed factory and slaughter house, and information on bird livestock, which was limited to chicken and duck but now expanded to goose and quail, has been used for analysis in the later model. In addition, an explanation of the results was added to help the authorities in making decisions and to establish a basis for persuading stakeholders in 2017. This study reports an animal infectious disease prevention system which is constructed on the basis of hazardous vehicle movement, farm and environment Big Data. The significance of this study is that it describes the evolution process of the prediction model using Big Data which is used in the field and the model is expected to be more complete if the form of viruses is put into consideration. This will contribute to data utilization and analysis model development in related field. In addition, we expect that the system constructed in this study will provide more preventive and effective prevention.

Deriving adoption strategies of deep learning open source framework through case studies (딥러닝 오픈소스 프레임워크의 사례연구를 통한 도입 전략 도출)

  • Choi, Eunjoo;Lee, Junyeong;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.27-65
    • /
    • 2020
  • Many companies on information and communication technology make public their own developed AI technology, for example, Google's TensorFlow, Facebook's PyTorch, Microsoft's CNTK. By releasing deep learning open source software to the public, the relationship with the developer community and the artificial intelligence (AI) ecosystem can be strengthened, and users can perform experiment, implementation and improvement of it. Accordingly, the field of machine learning is growing rapidly, and developers are using and reproducing various learning algorithms in each field. Although various analysis of open source software has been made, there is a lack of studies to help develop or use deep learning open source software in the industry. This study thus attempts to derive a strategy for adopting the framework through case studies of a deep learning open source framework. Based on the technology-organization-environment (TOE) framework and literature review related to the adoption of open source software, we employed the case study framework that includes technological factors as perceived relative advantage, perceived compatibility, perceived complexity, and perceived trialability, organizational factors as management support and knowledge & expertise, and environmental factors as availability of technology skills and services, and platform long term viability. We conducted a case study analysis of three companies' adoption cases (two cases of success and one case of failure) and revealed that seven out of eight TOE factors and several factors regarding company, team and resource are significant for the adoption of deep learning open source framework. By organizing the case study analysis results, we provided five important success factors for adopting deep learning framework: the knowledge and expertise of developers in the team, hardware (GPU) environment, data enterprise cooperation system, deep learning framework platform, deep learning framework work tool service. In order for an organization to successfully adopt a deep learning open source framework, at the stage of using the framework, first, the hardware (GPU) environment for AI R&D group must support the knowledge and expertise of the developers in the team. Second, it is necessary to support the use of deep learning frameworks by research developers through collecting and managing data inside and outside the company with a data enterprise cooperation system. Third, deep learning research expertise must be supplemented through cooperation with researchers from academic institutions such as universities and research institutes. Satisfying three procedures in the stage of using the deep learning framework, companies will increase the number of deep learning research developers, the ability to use the deep learning framework, and the support of GPU resource. In the proliferation stage of the deep learning framework, fourth, a company makes the deep learning framework platform that improves the research efficiency and effectiveness of the developers, for example, the optimization of the hardware (GPU) environment automatically. Fifth, the deep learning framework tool service team complements the developers' expertise through sharing the information of the external deep learning open source framework community to the in-house community and activating developer retraining and seminars. To implement the identified five success factors, a step-by-step enterprise procedure for adoption of the deep learning framework was proposed: defining the project problem, confirming whether the deep learning methodology is the right method, confirming whether the deep learning framework is the right tool, using the deep learning framework by the enterprise, spreading the framework of the enterprise. The first three steps (i.e. defining the project problem, confirming whether the deep learning methodology is the right method, and confirming whether the deep learning framework is the right tool) are pre-considerations to adopt a deep learning open source framework. After the three pre-considerations steps are clear, next two steps (i.e. using the deep learning framework by the enterprise and spreading the framework of the enterprise) can be processed. In the fourth step, the knowledge and expertise of developers in the team are important in addition to hardware (GPU) environment and data enterprise cooperation system. In final step, five important factors are realized for a successful adoption of the deep learning open source framework. This study provides strategic implications for companies adopting or using deep learning framework according to the needs of each industry and business.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Basic Research on the Possibility of Developing a Landscape Perceptual Response Prediction Model Using Artificial Intelligence - Focusing on Machine Learning Techniques - (인공지능을 활용한 경관 지각반응 예측모델 개발 가능성 기초연구 - 머신러닝 기법을 중심으로 -)

    • Kim, Jin-Pyo;Suh, Joo-Hwan
      • Journal of the Korean Institute of Landscape Architecture
      • /
      • v.51 no.3
      • /
      • pp.70-82
      • /
      • 2023
    • The recent surge of IT and data acquisition is shifting the paradigm in all aspects of life, and these advances are also affecting academic fields. Research topics and methods are being improved through academic exchange and connections. In particular, data-based research methods are employed in various academic fields, including landscape architecture, where continuous research is needed. Therefore, this study aims to investigate the possibility of developing a landscape preference evaluation and prediction model using machine learning, a branch of Artificial Intelligence, reflecting the current situation. To achieve the goal of this study, machine learning techniques were applied to the landscaping field to build a landscape preference evaluation and prediction model to verify the simulation accuracy of the model. For this, wind power facility landscape images, recently attracting attention as a renewable energy source, were selected as the research objects. For analysis, images of the wind power facility landscapes were collected using web crawling techniques, and an analysis dataset was built. Orange version 3.33, a program from the University of Ljubljana was used for machine learning analysis to derive a prediction model with excellent performance. IA model that integrates the evaluation criteria of machine learning and a separate model structure for the evaluation criteria were used to generate a model using kNN, SVM, Random Forest, Logistic Regression, and Neural Network algorithms suitable for machine learning classification models. The performance evaluation of the generated models was conducted to derive the most suitable prediction model. The prediction model derived in this study separately evaluates three evaluation criteria, including classification by type of landscape, classification by distance between landscape and target, and classification by preference, and then synthesizes and predicts results. As a result of the study, a prediction model with a high accuracy of 0.986 for the evaluation criterion according to the type of landscape, 0.973 for the evaluation criterion according to the distance, and 0.952 for the evaluation criterion according to the preference was developed, and it can be seen that the verification process through the evaluation of data prediction results exceeds the required performance value of the model. As an experimental attempt to investigate the possibility of developing a prediction model using machine learning in landscape-related research, this study was able to confirm the possibility of creating a high-performance prediction model by building a data set through the collection and refinement of image data and subsequently utilizing it in landscape-related research fields. Based on the results, implications, and limitations of this study, it is believed that it is possible to develop various types of landscape prediction models, including wind power facility natural, and cultural landscapes. Machine learning techniques can be more useful and valuable in the field of landscape architecture by exploring and applying research methods appropriate to the topic, reducing the time of data classification through the study of a model that classifies images according to landscape types or analyzing the importance of landscape planning factors through the analysis of landscape prediction factors using machine learning.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.