• Title/Summary/Keyword: Public Big data

Search Result 697, Processing Time 0.029 seconds

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Data-Driven Approach to Identify Research Topics for Science and Technology Diplomacy (과학외교를 위한 데이터기반의 연구주제선정 방법)

  • Yeo, Woon-Dong;Kim, Seonho;Lee, BangRae;Noh, Kyung-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.216-227
    • /
    • 2020
  • In science and technology diplomacy, major countries actively utilize their capabilities in science and technology for public diplomacy, especially for promoting diplomatic relations with politically sensitive regions and countries. Recently, with an increase in the influence of science and technology on national development, interest in science and technology diplomacy has increased. So far, science and technology diplomacy has relied on experts to find research topics that are of common interest to both the countries. However, this method has various problems such as the bias arising from the subjective judgment of experts, the attribution of the halo effect to famous researchers, and the use of different criteria for different experts. This paper presents an objective data-based approach to identify and recommend research topics to support science and technology diplomacy without relying on the expert-based approach. The proposed approach is based on big data analysis that uses deep-learning techniques and bibliometric methods. The Scopus database is used to find proper topics for collaborative research between two countries. This approach has been used to support science and technology diplomacy between Korea and Hungary and has raised expectations of policy makers. This paper finally discusses aspects that should be focused on to improve the system in the future.

A Methodology for Estimating Large Scale Dynamic O/D of Commuter Working Trip (대규모 동적 O/D 생성을 위한 추정 방법론 연구: 첨두 출근통행을 기준으로)

  • HAN, He;HONG, Kiman;KIM, Taegyun;WHANG, Junmun;HONG, Young Suk;CHO, Joong Rae
    • Journal of Korean Society of Transportation
    • /
    • v.36 no.3
    • /
    • pp.203-215
    • /
    • 2018
  • This study suggests a method to construct large scale dynamic O/D reflecting the characteristic that the passengers' travel patterns change according to the land use patterns of the destination. There are limitations in the existing research about dynamic O/D estimation method, such as the difficulty of collecting data, which can be applied only to a small area, or limiting to a specific transportation network such as highway networks or public transportation networks. In this paper, we propose a method to estimate dynamic O/D without limitation of analysis area based on transportation resources that can be easily collected and used according to the big data era. Clustering analysis was used to calculate the departure time trip distribution ratio based on arrival time and departure time trip distribution function was estimated by each cluster. As a result of the comparison test with the survey data, the estimated distribution function was statistically significant.

Arrival Time Estimation for Bus Information System Using Hidden Markov Model (은닉 마르코프 모델을 이용한 버스 정보 시스템의 도착 시간 예측)

  • Park, Chul Young;Kim, Hong Geun;Shin, Chang Sun;Cho, Yong Yun;Park, Jang Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.4
    • /
    • pp.189-196
    • /
    • 2017
  • BIS(Bus Information System) provides the different information related to buses including predictions of arriving times at stations. BIS have been deployed almost all cities in our country and played active roles to improve the convenience of public transportation systems. Moving average filters, Kalman filter and regression models have been representative in forecasting the arriving times of buses in current BIS. The accuracy in prediction of arriving times depends largely on the forecasting algorithms and traffic conditions considered when forecasting in BIS. In present BIS, the simple prediction algorithms are used only considering the passage times and distances between stations. The forecasting of arrivals, however, have been influenced by the traffic conditions such as traffic signals, traffic accidents and pedestrians ets., and missing data. To improve the accuracy of bus arriving estimates, there are big troubles in building models including the above problems. Hidden Markov Models have been effective algorithms considering various restrictions above. So, we have built the HMM forecasting models for bus arriving times in the current BIS. When building models, the data collected from Sunchean City at 2015 have been utilized. There are about 2298 stations and 217 routes in Suncheon city. The models are developed differently week days and weekend. And then the models are conformed with the data from different districts and times. We find that our HMM models can provide more accurate forecasting than other existing methods like moving average filters, Kalmam filters, or regression models. In this paper, we propose Hidden Markov Model to obtain more precise and accurate model better than Moving Average Filter, Kalman Filter and regression model. With the help of Hidden Markov Model, two different sections were used to find the pattern and verified using Bootstrap process.

A Study on the Improvement of RIMGIS for an Efficient River Information Service (효율적인 하천정보 서비스를 위한 RIMGIS 개선방안 연구)

  • Shin, Hyung-Jin;Chae, Hyo-Sok;Hwang, Eui-Ho;Lim, Kwang-Suop
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.1
    • /
    • pp.15-25
    • /
    • 2013
  • The RIMGIS(River Information Management GIS) has been developed since 2000 for public service and practical applications of related works after the standardization of national river data such as the river facility register report, river survey map, attached map, and etc. The RIMGIS has been improved in order to respond proactively to change in the information environment. Recently, Smart River-based river information services and related data have become so large as to be overwhelming, making necessary improvements in managing big data. In this study a plan was suggested both to respond to these changes in the information environment and to provide a future Smart River-based river information service by understanding the current state of RIMGIS, improving RIMGIS itself, redesigning the database, developing distribution, and integrating river information systems. Therefore, primary and foreign key, which can distinguish attribute information and entity linkages, were redefined to increase the usability of RIMGIS. Database construction of attribute information and entity relationship diagram have been newly redefined to redesign linkages among tables from the perspective of a river standard database. In addition, this study was undertaken to expand the current supplier-oriented operating system to a demand-oriented operating system by establishing an efficient management of river-related information and a utilization system capable of adapting to the changes of a river management paradigm.

An investigation on the Improvement of the Working Environment Measurement Reporting Policy (작업환경측정 보고제도 개선 방안 도출을 위한 조사 연구)

  • Lim, Dae Sung;Kim, Chi-Nyon;Lee, Seung kil;Park, Jung-Keun;Kim, Ki-Youn
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.32 no.2
    • /
    • pp.172-181
    • /
    • 2022
  • Objectives: In order to reduce the burden on employers and increase the reliability of measurement results, improvements to the provisions related to the work environment measurement reporting system, such as the current Occupational Safety and Health Act and its Enforcement Rules, are planned. This study aimed to suggest improvements for the work environment measurement reporting system through a survey and Delphi investigation. Method: This survey included workplaces (health managers), national institutions (the Ministry of Employment and Labor) that use the results of the work environment measurement reporting system for policy and supervision purposes, and work environment measurement institutions that enter the results were included. In addition to the survey, we tried to derive results through meetings with stakeholders and expert advisory meetings. Results: It is difficult to abolish or partially improve the reporting system under the Enforcement Regulations of the Occupational Safety and Health Act at this point because the opinions of workplaces, supervisory agencies, and measuring agencies differ in terms of its intended purpose and use. In the case of high-exposure harmful factors (over 50% on the basis of exposure) in the "comprehensive opinion" described in the work environment measurement results table, it is necessary to insert unit of work with exposed harmful factors, exposure factors, and current conditions in checklists or tables so that they can be reflected in government policies. In the case of workplaces that are feared to be highly exposed to substances subject to measurement, it seems desirable to improve them so that industrial health instructors registered with the Korea Safety and Health Agency or local labor offices can provide technical guidance. As an improvement plan to increase the reliability of data and the use of big data, it is necessary to improve the input method for processes and jobs. Conclusion: The laws and regulations of the work environment measurement reporting system are difficult to revise due to a lack of consensus among current stakeholders, but improvements can be achieved by improving the Ministry of Employment and Labor's notifications and other means. In addition, in order to effectively utilize the data from the K2B system, it is necessary to improve the input method for processes and jobs.

Impacts of Social Distancing for COVID-19 on Urban Space Use in Seoul (COVID-19 사회적 거리두기가 도시공간이용에 미치는 영향)

  • Park, Hong Il;Lee, Sangkyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.457-467
    • /
    • 2021
  • This paper aims to analyze changes in urban space use due to social distancing measures for COVID-19 using de facto population data in Seoul during daytime, which is estimated by Seoul Metropolitan Government and telecommunication company of KT using public big data and LTE signal data. The result of kernel density estimation and spatial autocorrelation analysis shows that the distribution patterns of de facto population in 2019 and 2020 were generally similar. This is a result of showing that the government's social distancing measures enabled a certain level of normal activities while suppressing the spread of COVID-19. However, analyzing de facto population subtracting 2019 from 2020 showed different results at the micro level. De facto population decreased in commercial areas but increased in residential areas. This means that COVID-19 social distancing measures had spatially uneven effect. The results of analyzing the effect of regional, land use, economic, educational, and accessibility characteristics on the changes of de facto population using spatial regression analysis are as follows. The higher the density of commercial facilities, the more businesses subject to regulations and schools and universities that require non-face-to-face classes, the more de facto population decreased. Conversely, it was found that de facto population increased in areas with many houses and parks due to telecommuting.

Smart Store in Smart City: The Development of Smart Trade Area Analysis System Based on Consumer Sentiments (Smart Store in Smart City: 소비자 감성기반 상권분석 시스템 개발)

  • Yoo, In-Jin;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.25-52
    • /
    • 2018
  • This study performs social network analysis based on consumer sentiment related to a location in Seoul using data reflecting consumers' web search activities and emotional evaluations associated with commerce. The study focuses on large commercial districts in Seoul. In addition, to consider their various aspects, social network indexes were combined with the trading area's public data to verify factors affecting the area's sales. According to R square's change, We can see that the model has a little high R square value even though it includes only the district's public data represented by static data. However, the present study confirmed that the R square of the model combined with the network index derived from the social network analysis was even improved much more. A regression analysis of the trading area's public data showed that the five factors of 'number of market district,' 'residential area per person,' 'satisfaction of residential environment,' 'rate of change of trade,' and 'survival rate over 3 years' among twenty two variables. The study confirmed a significant influence on the sales of the trading area. According to the results, 'residential area per person' has the highest standardized beta value. Therefore, 'residential area per person' has the strongest influence on commercial sales. In addition, 'residential area per person,' 'number of market district,' and 'survival rate over 3 years' were found to have positive effects on the sales of all trading area. Thus, as the number of market districts in the trading area increases, residential area per person increases, and as the survival rate over 3 years of each store in the trading area increases, sales increase. On the other hand, 'satisfaction of residential environment' and 'rate of change of trade' were found to have a negative effect on sales. In the case of 'satisfaction of residential environment,' sales increase when the satisfaction level is low. Therefore, as consumer dissatisfaction with the residential environment increases, sales increase. The 'rate of change of trade' shows that sales increase with the decreasing acceleration of transaction frequency. According to the social network analysis, of the 25 regional trading areas in Seoul, Yangcheon-gu has the highest degree of connection. In other words, it has common sentiments with many other trading areas. On the other hand, Nowon-gu and Jungrang-gu have the lowest degree of connection. In other words, they have relatively distinct sentiments from other trading areas. The social network indexes used in the combination model are 'density of ego network,' 'degree centrality,' 'closeness centrality,' 'betweenness centrality,' and 'eigenvector centrality.' The combined model analysis confirmed that the degree centrality and eigenvector centrality of the social network index have a significant influence on sales and the highest influence in the model. 'Degree centrality' has a negative effect on the sales of the districts. This implies that sales decrease when holding various sentiments of other trading area, which conflicts with general social myths. However, this result can be interpreted to mean that if a trading area has low 'degree centrality,' it delivers unique and special sentiments to consumers. The findings of this study can also be interpreted to mean that sales can be increased if the trading area increases consumer recognition by forming a unique sentiment and city atmosphere that distinguish it from other trading areas. On the other hand, 'eigenvector centrality' has the greatest effect on sales in the combined model. In addition, the results confirmed a positive effect on sales. This finding shows that sales increase when a trading area is connected to others with stronger centrality than when it has common sentiments with others. This study can be used as an empirical basis for establishing and implementing a city and trading area strategy plan considering consumers' desired sentiments. In addition, we expect to provide entrepreneurs and potential entrepreneurs entering the trading area with sentiments possessed by those in the trading area and directions into the trading area considering the district-sentiment structure.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.