• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.027 seconds

Clock Glitch-based Fault Injection Attack on Deep Neural Network (Deep Neural Network에 대한 클럭 글리치 기반 오류 주입 공격)

  • Hyoju Kang;Seongwoo Hong;Youngju Lee;Jeacheol Ha
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.5
    • /
    • pp.855-863
    • /
    • 2024
  • The use of Deep Neural Network (DNN) is gradually increasing in various fields due to their high efficiency in data analysis and prediction. However, as the use of deep neural networks becomes more frequent, the security threats associated with them are also increasing. In particular, if a fault occurs in the forward propagation process and activation function that can directly affect the prediction of deep neural network, it can have a fatal damage on the prediction accuracy of the model. In this paper, we performed some fault injection attacks on the forward propagation process of each layer except the input layer in a deep neural network and the Softmax function used in the output layer, and analyzed the experimental results. As a result of fault injection on the MNIST dataset using a glitch clock, we confirmed that faut injection on into the iteration statements can conduct deterministic misclassification depending on the network parameters.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Social Network : A Novel Approach to New Customer Recommendations (사회연결망 : 신규고객 추천문제의 새로운 접근법)

  • Park, Jong-Hak;Cho, Yoon-Ho;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.1
    • /
    • pp.123-140
    • /
    • 2009
  • Collaborative filtering recommends products using customers' preferences, so it cannot recommend products to the new customer who has no preference information. This paper proposes a novel approach to new customer recommendations using the social network analysis which is used to search relationships among social entities such as genetics network, traffic network, organization network, etc. The proposed recommendation method identifies customers most likely to be neighbors to the new customer using the centrality theory in social network analysis and recommends products those customers have liked in the past. The procedure of our method is divided into four phases : purchase similarity analysis, social network construction, centrality-based neighborhood formation, and recommendation generation. To evaluate the effectiveness of our approach, we have conducted several experiments using a data set from a department store in Korea. Our method was compared with the best-seller-based method that uses the best-seller list to generate recommendations for the new customer. The experimental results show that our approach significantly outperforms the best-seller-based method as measured by F1-measure.

  • PDF

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

Time series and deep learning prediction study Using container Throughput at Busan Port (부산항 컨테이너 물동량을 이용한 시계열 및 딥러닝 예측연구)

  • Seung-Pil Lee;Hwan-Seong Kim
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.391-393
    • /
    • 2022
  • In recent years, technologies forecasting demand based on deep learning and big data have accelerated the smartification of the field of e-commerce, logistics and distribution areas. In particular, ports, which are the center of global transportation networks and modern intelligent logistics, are rapidly responding to changes in the global economy and port environment caused by the 4th industrial revolution. Port traffic forecasting will have an important impact in various fields such as new port construction, port expansion, and terminal operation. Therefore, the purpose of this study is to compare the time series analysis and deep learning analysis, which are often used for port traffic prediction, and to derive a prediction model suitable for the future container prediction of Busan Port. In addition, external variables related to trade volume changes were selected as correlations and applied to the multivariate deep learning prediction model. As a result, it was found that the LSTM error was low in the single-variable prediction model using only Busan Port container freight volume, and the LSTM error was also low in the multivariate prediction model using external variables.

  • PDF

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

A Study of Traffic Incident Flow Characteristics on Korean Highway Using Multi-Regime (Multi-Regime에 의한 돌발상황 시 교통류 분석)

  • Lee Seon-Ha;kang Hee-Chan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.4 no.1 s.6
    • /
    • pp.43-56
    • /
    • 2005
  • This research has examined a time series analysis(TSA) of an every hour traffic information such as occupancy, a traffic flow, and a speed, a statistical model of a surveyed data on the traffic fundamental diagram and an expand aspect of a traffic jam by many Parts of the traffic flow. Based on the detected data from traffic accidents on the Cheonan-Nonsan high way and events when the road volume decreases dramatically like traffic accidents it can be estimated from the change of occupancy right after accidents. When it comes to a traffic jam like events the changing gap of the occupancy and the mean speed is gentle, in addition to a quickness and an accuracy of a detection by the time series analyse of simple traffic index is weak. When it is a stable flow a relationship between the occupancy and a flow is a linear, which explain a very high reliability. In contrast, a platoon form presented by a wide deviation about an ideal speed of drivers is difficult to express by a statical model in a relationship between the speed and occupancy, In this case the speed drops shifty at 6$\~$8$\%$ occupancy. In case of an unstable flow, it is difficult to adopt a statistical model because the formation-clearance Process of a traffic jam is analyzed in each parts. Taken the formation-clearance process of a traffic jam by 2 parts division into consideration the flow having an accident is transferred to a stopped flow and the occupancy increases dramatically. When the flow recovers from a sloped flow to a free flow the occupancy which has increased dramatically decrease gradually and then traffic flow increases according as the result analyzed traffic flow by the multi regime as time series. When it is on the traffic jam the traffic flow transfers from an impeded free flow to a congested flow and then a jammed flow which is complicated more than on the accidents and the gap of traffic volume in each traffic conditions about a same occupancy is generated huge. This research presents a need of a multi-regime division when analyzing a traffic flow and for the future it needs a fixed quantity division and model about each traffic regimes.

  • PDF

Mitigation of Insufficient Capacity Problems of Central Bus Stops by Controlling Effective Green Time (유효녹색시간 조정을 활용한 중앙버스정류장 용량 부족 완화 방안 연구)

  • Koo, Kyo Min;Lee, Jae Duk;Ahn, Se Young;Chang, Iljoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.1
    • /
    • pp.35-50
    • /
    • 2022
  • After the introduction of the central bus lane system, bus traffic was prioritized. This resulted in improved trust from bus users. However, the low capacity at the central bus stop reduces traffic speed and punctuality. In addition, physical constraints are inevitable because the construction of central bus lanes and bus stops considers the city's road geometry. Therefore, this study attempted to optimize the effective green time of the traffic signal system at the entrance and exit of the central bus stop to remedy its insufficient operational capacity. The Transit Capacity and Quality of Service Manual and Korea Highway Capacity Manual were used as the analysis methodologies. The number of stop areas for central bus stops to be built was determined by excluding variable physical factors, and field survey data collected from nine randomly selected central bus stops currently installed in Seoul were used. A scenario analysis was conducted on the central bus stops with insufficient capacity by adjusting the effective green time, and the capacity of the central bus stop was set as the dependent variable. According to the results, 26.7 percent of the central bus stops with insufficient capacity can solve the problem of insufficient capacity. Therefore, the results of this study can be verified by improving the operation level, and it can be effective even if the number of central bus stops calculated by engineering is not guaranteed during the planning stage of the central bus stop. As the number of central bus stops is expected to increase further as the number of central bus stops increases, it is necessary to improve the number of central bus stops. Therefore, it is hoped that the results presented in this study will be used as basic data for the improvement plan at the operational level before introducing the physical improvement plan.