• Title/Summary/Keyword: 빅데이터 분석 기법

Search Result 588, Processing Time 0.029 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Concrete Crack Detection Inside Finishing Materials Using Lock-in Thermography (위상 잠금 열화상 기법을 이용한 콘크리트 마감재 내부 균열 검출)

  • Myung-Hun Lee;Ukyong Woo;Hajin Choi;Jong-Chan Kim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.6
    • /
    • pp.30-38
    • /
    • 2023
  • As the number of old buildings subject to safety inspection increases, the burden on designated institutions and management entities that are responsible for safety management is increasing. Accordingly, when selecting buildings subject to safety inspection, appropriate safety inspection standards and appropriate technology are essential. The current safety inspection standards for old buildings give low scores when it is difficult to confirm damage such as cracks in structural members due to finishing materials. This causes the evaluation results to be underestimated regardless of the actual safety status of the structure, resulting in an increase in the number of aging buildings subject to safety inspection. Accordingly, this study proposed a thermal imaging technique, a non-destructive and non-contact inspection, to detect cracks inside finishing materials. A concrete specimen was produced to observe cracks inside the finishing material using a thermal imaging camera, and thermal image data was measured by exciting a heat source on the concrete surface and cracked area. As a result of the measurement, it was confirmed that it was possible to observe cracks inside the finishing material with a width of 0.3mm, 0.5mm, and 0.7mm, but it was difficult to determine the cracks due to uneven temperature distribution due to surface peeling and peeling of the wallpaper. Accordingly, as a result of performing data analysis by deriving the amplitude and phase difference of the thermal image data, clear crack measurement was possible for 0.5mm and 0.7mm cracks. Based on this study, we hope to increase the efficiency of field application and analysis through the development of technology using big data-based deep learning in the diagnosis of internal crack damage in finishing materials.

Radar rainfall forecasting evaluation using consecutive advection characteristics of rainfall fields (강우장의 연속 이류특성을 활용한 레이더 강수량 예측성 평가)

  • Kim, Tae-Jeong;Kim, Jang-Gyeong;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.39-39
    • /
    • 2021
  • 기상재해를 극소화하기 위해서는 그 원인이 되는 기상현상의 규모와 거동을 명확히 감시하고 분석하여 신뢰성 있는 예측정보가 제공되어야 한다. 최근 위험기상 발생빈도가 증가하여 초단기 및 위험기상 예보의 정확도 향상을 위한 고품질 레이더 정보 활용 연구가 활발하게 진행되고 있다. 레이더는 전자파를 이용하여 강우의 양과 분포, 이동특성을 관측하는 장비로써 우리나라는 초단기적 위험기상 대응능력 향상을 추진하기 위한 목적으로 첨단 성능의 이중편파레이더 관측망을 구축하고 있다. 국내 기상관측용 레이더는 기상예보(기상청), 홍수예보(환경부), 군 작전 기상지원(국방부) 등으로 각 기관이 개별적으로 설치운영 하고 있다. 본 연구에서는 관계부처에서 운영하고 있는 레이더의 합성장을 이용하여 강수장의 상관성을 기반으로 이류(advection) 특성을 도출하였다. 정확도 있는 이류특성을 도출하기 위하여 시간해상도는 10분을 적용하였으며 가우시안 필터링 기법을 적용하여 강수장 상관분석을 수행하였다. 호우와 태풍을 대상으로 강수장의 이류패턴을 추출하여 강수장의 이동방향 및 속도를 고려한 강수량 예측기법의 적용성을 평가하였다. 본 연구 결과는 격자형 강수예측정보를 제공하여 AI 홍수예보 및 수치예보 모델의 초기조건 입력 등에 활용되어 기후변동성에 따른 대국민 안전 실현을 확보하는데 기후변화 대응전략의 핵심기술로 활용될 수 있을 것으로 판단된다. 덧붙어, 4차 산업혁명에 따른 수문기상 빅 데이터(big data) 통합 플랫폼을 구축하여 고해상도 홍수대응 기술 및 GIS 및 모바일 시스템을 연계한 실시간 기후재해 예·경보가 가능할 것으로 사료된다.

  • PDF

Development of Mapping Method for Liquefaction Hazard in Moderate Seismic Region Considering the Uncertainty of Big Site Investigation Data (빅데이터 지반정보의 불확실성을 고려한 중진지역에서의 액상화 위험도 작성기법 개발)

  • Kwak, Minjung;Ku, Taijin;Choi, Jaesoon
    • Journal of the Korean GEO-environmental Society
    • /
    • v.16 no.1
    • /
    • pp.17-27
    • /
    • 2015
  • Recently, Korean government has tried out to set up earthquake hazards prevention system. In the system, several geotechnical hazard maps including liquefaction hazard map and landslide hazard map for the whole country have drawn to consider the domestic seismic characteristics. To draw the macro liquefaction hazard map, big data of site investigations in metropolitan areas and provincial areas has to be verified for its application. In this research, we carried out site response analyses using 522 borehole site investigation data in S city during a desirable earthquake. The soil classification was separately compared to shear wave velocity considering the uncertainty of site investigation data. Probability distribution and statistical analysis for the results of site response analyses was applied to the feasibility study. Finally, we suggest a new site amplification coefficient, hereby presented with the similar results of liquefaction hazard mapping using the calculated liquefaction potential index by the site response analyses. Above-mentioned study will be expected to help to follow research and draw liquefaction hazard map in moderate seismic region.

Development of a Gangwon Province Forest Fire Prediction Model using Machine Learning and Sampling (머신러닝과 샘플링을 이용한 강원도 지역 산불발생예측모형 개발)

  • Chae, Kyoung-jae;Lee, Yu-Ri;cho, yong-ju;Park, Ji-Hyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2018
  • The study is based on machine learning techniques to increase the accuracy of the forest fire predictive model. It used 14 years of data from 2003 to 2016 in Gang-won-do where forest fire were the most frequent. To reduce weather data errors, Gang-won-do was divided into nine areas and weather data from each region was used. However, dividing the forest fire forecast model into nine zones would make a large difference between the date of occurrence and the date of not occurring. Imbalance issues can degrade model performance. To address this, several sampling methods were applied. To increase the accuracy of the model, five indices in the Canadian Frost Fire Weather Index (FWI) were used as derived variable. The modeling method used statistical methods for logistic regression and machine learning methods for random forest and xgboost. The selection criteria for each zone's final model were set in consideration of accuracy, sensitivity and specificity, and the prediction of the nine zones resulted in 80 of the 104 fires that occurred, and 7426 of the 9758 non-fires. Overall accuracy was 76.1%.

SAAnnot-C3Pap: Ground Truth Collection Technique of Playing Posture Using Semi Automatic Annotation Method (SAAnnot-C3Pap: 반자동 주석화 방법을 적용한 연주 자세의 그라운드 트루스 수집 기법)

  • Park, So-Hyun;Kim, Seo-Yeon;Park, Young-Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.409-418
    • /
    • 2022
  • In this paper, we propose SAAnnot-C3Pap, a semi-automatic annotation method for obtaining ground truth of a player's posture. In order to obtain ground truth about the two-dimensional joint position in the existing music domain, openpose, a two-dimensional posture estimation method, was used or manually labeled. However, automatic annotation methods such as the existing openpose have the disadvantages of showing inaccurate results even though they are fast. Therefore, this paper proposes SAAnnot-C3Pap, a semi-automated annotation method that is a compromise between the two. The proposed approach consists of three main steps: extracting postures using openpose, correcting the parts with errors among the extracted parts using supervisely, and then analyzing the results of openpose and supervisely. Perform the synchronization process. Through the proposed method, it was possible to correct the incorrect 2D joint position detection result that occurred in the openpose, solve the problem of detecting two or more people, and obtain the ground truth in the playing posture. In the experiment, we compare and analyze the results of the semi-automated annotation method openpose and the SAAnnot-C3Pap proposed in this paper. As a result of comparison, the proposed method showed improvement of posture information incorrectly collected through openpose.

Analyzing Perceptions of Unused Facilities in Rural Areas Using Big Data Techniques - Focusing on the Utilization of Closed Schools as a Youth Start-up Space - (빅데이터 분석 기법을 활용한 농촌지역 유휴공간 인식 분석 - 청년창업 공간으로써 폐교 활용성을 중심으로 -)

  • Jee Yoon Do;Suyeon Kim
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.6
    • /
    • pp.556-576
    • /
    • 2023
  • This study attempted to find a way to utilize idle spaces in rural areas as a way to respond to rural extinction. Based on the keywords "startup," "youth start-up," and "youth start-up+rural," start-up+rural," the study sought to identify the perception of idle facilities in rural areas through the keywords "Idle facilities" and "closed schools." The study presented basic data for policy direction and plan search by reviewing frequency analysis, major keyword analysis, network analysis, emotional analysis, and domestic and foreign cases. As a result of the analysis, first, it was found that idle facilities and school closures are acting importantly as factors for regional regeneration. Second, in the case of youth startups in rural areas, it was found that not only education on agriculture but also problems for residence should be solved together. Third, in the case of young people, it was confirmed that it was necessary to establish digital utilization for agriculture by actively starting a business using digital. Finally, in order to attract young people and revitalize the region through best practices at home and abroad, policy measures that can serve as various platforms such as culture and education as well as startups should be presented in connection with local residents. These results are significant in that they presented implications for youth start-ups in rural areas by reviewing start-up recognition for the influx of young people as one of the alternatives for the use of idle facilities and regional regeneration, and if additional solutions are presented through field surveys, they can be used to set policy goals that fit the reality.

Spatial Hedonic Modeling using Geographically Weighted LASSO Model (GWL을 적용한 공간 헤도닉 모델링)

  • Jin, Chanwoo;Lee, Gunhak
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.917-934
    • /
    • 2014
  • Geographically weighted regression(GWR) model has been widely used to estimate spatially heterogeneous real estate prices. The GWR model, however, has some limitations of the selection of different price determinants over space and the restricted number of observations for local estimation. Alternatively, the geographically weighted LASSO(GWL) model has been recently introduced and received a growing interest. In this paper, we attempt to explore various local price determinants for the real estate by utilizing the GWL and its applicability to forecasting the real estate price. To do this, we developed the three hedonic models of OLS, GWR, and GWL focusing on the sales price of apartments in Seoul and compared those models in terms of model fit, prediction, and multicollinearity. As a result, local models appeared to be better than the global OLS on the whole, and in particular, the GWL appeared to be more explanatory and predictable than other models. Moreover, the GWL enabled to provide spatially different sets of price determinants which no multicollinearity exists. The GWL helps select the significant sets of independent variables from a high dimensional dataset, and hence will be a useful technique for large and complex spatial big data.

  • PDF

An Investigation of a Role of Affective factors in Users' Coping with Privacy Risk from Location-based Services (위치기반 서비스(Location-based Service)의 프라이버시 위험 대응에 있어 사용자 감정(Affect)의 역할)

  • Park, Jonghwa;Jung, Yoonhyuk
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.201-213
    • /
    • 2020
  • Despite empirical research that the response to human risk is significantly influenced affective factors, the role of affective factors has been unexplored in information privacy research. This study aims to explore the privacy behaviors of location-based service (LBS) users from an affective point of view. Specifically, the study explored the relationship between three types of privacy threats (collection, hacking, secondary use), two affects (worry, anger), and a coping behavior (continuous use intentions). The structured survey was conducted with 552 users. In order to analyze the effect of the combination of perception of particular privacy threats and particular affects on the intention of continuous use, association rules, one of the data mining techniques, was employed. As a result, there was a difference in the intention to use according to the combination of the perception of risk and affect responses, and the most significant influence on the intention is when the second use of personal information was combined with anger. This study has significant theoretical contribution in that it includes affective factors in the research of information privacy users, complementing the biases of existing cognition-oriented approaches and providing a comprehensive understanding of privacy response behavior.

A Methodology for Estimating Large Scale Dynamic O/D of Commuter Working Trip (대규모 동적 O/D 생성을 위한 추정 방법론 연구: 첨두 출근통행을 기준으로)

  • HAN, He;HONG, Kiman;KIM, Taegyun;WHANG, Junmun;HONG, Young Suk;CHO, Joong Rae
    • Journal of Korean Society of Transportation
    • /
    • v.36 no.3
    • /
    • pp.203-215
    • /
    • 2018
  • This study suggests a method to construct large scale dynamic O/D reflecting the characteristic that the passengers' travel patterns change according to the land use patterns of the destination. There are limitations in the existing research about dynamic O/D estimation method, such as the difficulty of collecting data, which can be applied only to a small area, or limiting to a specific transportation network such as highway networks or public transportation networks. In this paper, we propose a method to estimate dynamic O/D without limitation of analysis area based on transportation resources that can be easily collected and used according to the big data era. Clustering analysis was used to calculate the departure time trip distribution ratio based on arrival time and departure time trip distribution function was estimated by each cluster. As a result of the comparison test with the survey data, the estimated distribution function was statistically significant.