• Title/Summary/Keyword: Smart System

Search Result 8,418, Processing Time 0.04 seconds

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Methodology of Multimodal Public Transportation Network Building and Path Searching Using Transportation Card Data (교통카드 기반자료를 활용한 복합대중교통망 구축 및 경로탐색 방안 연구)

  • Cheon, Seung-Hoon;Shin, Seong-Il;Lee, Young-Ihn;Lee, Chang-Ju
    • Journal of Korean Society of Transportation
    • /
    • v.26 no.3
    • /
    • pp.233-243
    • /
    • 2008
  • Recognition for the importance and roles of public transportation is increasing because of traffic problems in many cities. In spite of this paradigm change, previous researches related with public transportation trip assignment have limits in some aspects. Especially, in case of multimodal public transportation networks, many characters should be considered such as transfers. operational time schedules, waiting time and travel cost. After metropolitan integrated transfer discount system was carried out, transfer trips are increasing among traffic modes and this takes the variation of users' route choices. Moreover, the advent of high-technology public transportation card called smart card, public transportation users' travel information can be recorded automatically and this gives many researchers new analytical methodology for multimodal public transportation networks. In this paper, it is suggested that the methodology for establishment of brand new multimodal public transportation networks based on computer programming methods using transportation card data. First, we propose the building method of integrated transportation networks based on bus and urban railroad stations in order to make full use of travel information from transportation card data. Second, it is offered how to connect the broken transfer links by computer-based programming techniques. This is very helpful to solve the transfer problems that existing transportation networks have. Lastly, we give the methodology for users' paths finding and network establishment among multi-modes in multimodal public transportation networks. By using proposed methodology in this research, it becomes easy to build multimodal public transportation networks with existing bus and urban railroad station coordinates. Also, without extra works including transfer links connection, it is possible to make large-scaled multimodal public transportation networks. In the end, this study can contribute to solve users' paths finding problem among multi-modes which is regarded as an unsolved issue in existing transportation networks.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Implementation of Radiotherapy Educational Contents Using Virtual Reality (가상현실 기술을 활용한 방사선치료 교육 콘텐츠 제작 구현)

  • Kwon, Soon-Mu;Shim, Jae-Goo;Chon, Kwon-Su
    • Journal of the Korean Society of Radiology
    • /
    • v.12 no.3
    • /
    • pp.409-415
    • /
    • 2018
  • The development of smart devices has brought about significant changes in daily life and one of the most significant changes is the virtual reality zone. Virtual reality is a technology that creates the illusion that a 3D high-resolution image has already been created using a display device just like it does in itself. Unrealized subjects are forced to rely on audiovisual materials, resulting in a decline in the concentration of practices and the quality of classes. It used virtual reality to develop effective teaching materials for radiology students. In order to produce a video clip bridge using virtual reality, a radiology clinic was selected to conduct two exposures from July to September 2017. The video was produced taking into account the radiology and work flow chart and filming was carried out in two separate locations : in the computerized tomography unit and in the LINAC room. Prior to filming the scenario and the filming route were checked in advance to facilitate editing of the video. Modeling and mapping was performed in a PC environment using the Window XP operating system. Using two leading virtual reality camera Gopro Hero, CC pixels were produced using a 4K UHD, Adobe, followed by an 8 megapixel resolution of $3,840{\times}2,160/4,096{\times}2,160$. Total regeneration time was performed in about 5 minutes during the production of using virtual reality to prevent vomiting and dizziness. Currently developed virtual reality radiation and educational contents are being used to secure the market and extend the promotion process to be used by various institutions. The researchers will investigate the satisfaction level of radiation and educational contents using virtual reality and carry out supplementary tasks depending on the results.

T-Cache: a Fast Cache Manager for Pipeline Time-Series Data (T-Cache: 시계열 배관 데이타를 위한 고성능 캐시 관리자)

  • Shin, Je-Yong;Lee, Jin-Soo;Kim, Won-Sik;Kim, Seon-Hyo;Yoon, Min-A;Han, Wook-Shin;Jung, Soon-Ki;Park, Se-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.293-299
    • /
    • 2007
  • Intelligent pipeline inspection gauges (PIGs) are inspection vehicles that move along within a (gas or oil) pipeline and acquire signals (also called sensor data) from their surrounding rings of sensors. By analyzing the signals captured in intelligent PIGs, we can detect pipeline defects, such as holes and curvatures and other potential causes of gas explosions. There are two major data access patterns apparent when an analyzer accesses the pipeline signal data. The first is a sequential pattern where an analyst reads the sensor data one time only in a sequential fashion. The second is the repetitive pattern where an analyzer repeatedly reads the signal data within a fixed range; this is the dominant pattern in analyzing the signal data. The existing PIG software reads signal data directly from the server at every user#s request, requiring network transfer and disk access cost. It works well only for the sequential pattern, but not for the more dominant repetitive pattern. This problem becomes very serious in a client/server environment where several analysts analyze the signal data concurrently. To tackle this problem, we devise a fast in-memory cache manager, called T-Cache, by considering pipeline sensor data as multiple time-series data and by efficiently caching the time-series data at T-Cache. To the best of the authors# knowledge, this is the first research on caching pipeline signals on the client-side. We propose a new concept of the signal cache line as a caching unit, which is a set of time-series signal data for a fixed distance. We also provide the various data structures including smart cursors and algorithms used in T-Cache. Experimental results show that T-Cache performs much better for the repetitive pattern in terms of disk I/Os and the elapsed time. Even with the sequential pattern, T-Cache shows almost the same performance as a system that does not use any caching, indicating the caching overhead in T-Cache is negligible.

The Relationships among Perceived Value, Use-Diffusion, Loyalty of Mobile Instant Messaging Service (모바일 메신저 서비스의 지각된 가치, 사용-확산 그리고 충성도 간의 관계에 대한 연구)

  • Jo, Dong-Hyuk;Park, Jong-Woo;Chun, Hyun-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.193-212
    • /
    • 2011
  • Mobile instant messaging service is surfacing to an important keyword in the mobile market together with popularization of Smart phones. Mobile instant messaging service in Korea has become popular to the degree of 87.9% usages from total Smartphone holders, and it is expected that using populations will be more enlarged afterwards if considering a fact that its populations of Smartphone is continuously being increased after exceeding 10 million persons (Trend Monitor, June 2011). In the instant messaging market where competitions have been deepened day by day, raising customer's royalties will be the key for company's business survivals and goals of corporate marketing strategies. It could be said that understanding on which factors affect to customer retentions and royalties is very important. Specially, as changing status is being progressed very quickly in case of innovative mobile services like the instant messaging service, research necessities on how many do consumers use the services after accepting them, how much do consumers use them variously, and whether does it connect to long-term relations have been increased, but studies on such matters are in insufficient situations actually. Therefore, this study examined on which effects were affected to use-diffusion and loyalty factors from perceived customer vales' factors having been occurred after accepting the mobile instant messaging service, namely 'functional value', 'monetary value', 'emotional value', and 'social value'. Also, the study looked into what kind of roles do the service usage and using variety play to service's continued using intents as a loyalty index, recommending intents to others, and brand switching intents. And then the study laid the main purpose in trying to provide implications for enhancing customer securities and royalties on the mobile instant messaging service through research's results. The research hypotheses are as follows; H1: Perceived values will affect influences to royalties. H2: Use-Diffusion will affect influences to loyalty. H3: Perceived value will affect influences to loyalty. H4: The use-diffusion will play intermediating roles between perceived values and loyalty. Total 276 cases among collected 284 ones were used for the statistical analysis by SPSS ver. 15 package. Reliability, Factor analysis, regression were done. As the result of research, 'monetary value' and 'emotional value' affected to 'usage' among perceived value factors, and 'emotional value' was appeared as affecting the largest influence. Besides, the usage affected to constant-using intents and recommending intents to others, and using varieties were displayed as affecting to recommending intents to others. On the other hand, 'Using' and 'Using diversity' were appeared as not affecting to 'brand switching intentions'. Meanwhile, as the result of recognizing about effects of perceived values on the loyalty, it was appeared such like 'continued using intents' affected to'functional value', 'monetary value', and 'social value' first, and also 'monetary value', 'emotional value', and 'social value' affected to 'recommending intents to others'. On the other hand, it was shown such like only 'social value' affected influences to 'brand switching intents', and thus contrary results with the factor 'constant-using intents' were displayed. So, it seems that there are many applications to service provides who are worrying about marketing strategies for making consumer retains (constant-using) and new consumer's inductions (brand-switching intents). Finally, as a result of looking into intermediating roles of the use-diffusion factor in relations between conceived values and royalties at hypothesis 4, 'using' and 'using diversity' were displayed as affecting significant influences all together. Regarding to research result's implications, for expanding and promoting continued uses of the mobile instant messaging service by service providers: First, encouraging recognitions on the perceived value connected to users' service usage are necessary. Second, setting up user's use-diffusion strategies are required so as to enhance the loyalty after understanding a fact that use-diffusion patterns affecting to the service's loyalty are different. Finally, methods of raising customer loyalties and making constant relationships have to be grouped by analyzing on what are the customer value's factors that can satisfy users in competitive alterations.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

An Embedding /Extracting Method of Audio Watermark Information for High Quality Stereo Music (고품질 스테레오 음악을 위한 오디오 워터마크 정보 삽입/추출 기술)

  • Bae, Kyungyul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.21-35
    • /
    • 2018
  • Since the introduction of MP3 players, CD recordings have gradually been vanishing, and the music consuming environment of music users is shifting to mobile devices. The introduction of smart devices has increased the utilization of music through music playback, mass storage, and search functions that are integrated into smartphones and tablets. At the time of initial MP3 player supply, the bitrate of the compressed music contents generally was 128 Kbps. However, as increasing of the demand for high quality music, sound quality of 384 Kbps appeared. Recently, music content of FLAC (Free License Audio Codec) format using lossless compression method is becoming popular. The download service of many music sites in Korea has classified by unlimited download with technical protection and limited download without technical protection. Digital Rights Management (DRM) technology is used as a technical protection measure for unlimited download, but it can only be used with authenticated devices that have DRM installed. Even if music purchased by the user, it cannot be used by other devices. On the contrary, in the case of music that is limited in quantity but not technically protected, there is no way to enforce anyone who distributes it, and in the case of high quality music such as FLAC, the loss is greater. In this paper, the author proposes an audio watermarking technology for copyright protection of high quality stereo music. Two kinds of information, "Copyright" and "Copy_free", are generated by using the turbo code. The two watermarks are composed of 9 bytes (72 bits). If turbo code is applied for error correction, the amount of information to be inserted as 222 bits increases. The 222-bit watermark was expanded to 1024 bits to be robust against additional errors and finally used as a watermark to insert into stereo music. Turbo code is a way to recover raw data if the damaged amount is less than 15% even if part of the code is damaged due to attack of watermarked content. It can be extended to 1024 bits or it can find 222 bits from some damaged contents by increasing the probability, the watermark itself has made it more resistant to attack. The proposed algorithm uses quantization in DCT so that watermark can be detected efficiently and SNR can be improved when stereo music is converted into mono. As a result, on average SNR exceeded 40dB, resulting in sound quality improvements of over 10dB over traditional quantization methods. This is a very significant result because it means relatively 10 times improvement in sound quality. In addition, the sample length required for extracting the watermark can be extracted sufficiently if the length is shorter than 1 second, and the watermark can be completely extracted from music samples of less than one second in all of the MP3 compression having a bit rate of 128 Kbps. The conventional quantization method can extract the watermark with a length of only 1/10 compared to the case where the sampling of the 10-second length largely fails to extract the watermark. In this study, since the length of the watermark embedded into music is 72 bits, it provides sufficient capacity to embed necessary information for music. It is enough bits to identify the music distributed all over the world. 272 can identify $4*10^{21}$, so it can be used as an identifier and it can be used for copyright protection of high quality music service. The proposed algorithm can be used not only for high quality audio but also for development of watermarking algorithm in multimedia such as UHD (Ultra High Definition) TV and high-resolution image. In addition, with the development of digital devices, users are demanding high quality music in the music industry, and artificial intelligence assistant is coming along with high quality music and streaming service. The results of this study can be used to protect the rights of copyright holders in these industries.