• Title/Summary/Keyword: vector data

Search Result 3,316, Processing Time 0.031 seconds

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Development of Beauty Experience Pattern Map Based on Consumer Emotions: Focusing on Cosmetics (소비자 감성 기반 뷰티 경험 패턴 맵 개발: 화장품을 중심으로)

  • Seo, Bong-Goon;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.179-196
    • /
    • 2019
  • Recently, the "Smart Consumer" has been emerging. He or she is increasingly inclined to search for and purchase products by taking into account personal judgment or expert reviews rather than by relying on information delivered through manufacturers' advertising. This is especially true when purchasing cosmetics. Because cosmetics act directly on the skin, consumers respond seriously to dangerous chemical elements they contain or to skin problems they may cause. Above all, cosmetics should fit well with the purchaser's skin type. In addition, changes in global cosmetics consumer trends make it necessary to study this field. The desire to find one's own individualized cosmetics is being revealed to consumers around the world and is known as "Finding the Holy Grail." Many consumers show a deep interest in customized cosmetics with the cultural boom known as "K-Beauty" (an aspect of "Han-Ryu"), the growth of personal grooming, and the emergence of "self-culture" that includes "self-beauty" and "self-interior." These trends have led to the explosive popularity of cosmetics made in Korea in the Chinese and Southeast Asian markets. In order to meet the customized cosmetics needs of consumers, cosmetics manufacturers and related companies are responding by concentrating on delivering premium services through the convergence of ICT(Information, Communication and Technology). Despite the evolution of companies' responses regarding market trends toward customized cosmetics, there is no "Intelligent Data Platform" that deals holistically with consumers' skin condition experience and thus attaches emotions to products and services. To find the Holy Grail of customized cosmetics, it is important to acquire and analyze consumer data on what they want in order to address their experiences and emotions. The emotions consumers are addressing when purchasing cosmetics varies by their age, sex, skin type, and specific skin issues and influences what price is considered reasonable. Therefore, it is necessary to classify emotions regarding cosmetics by individual consumer. Because of its importance, consumer emotion analysis has been used for both services and products. Given the trends identified above, we judge that consumer emotion analysis can be used in our study. Therefore, we collected and indexed data on consumers' emotions regarding their cosmetics experiences focusing on consumers' language. We crawled the cosmetics emotion data from SNS (blog and Twitter) according to sales ranking ($1^{st}$ to $99^{th}$), focusing on the ample/serum category. A total of 357 emotional adjectives were collected, and we combined and abstracted similar or duplicate emotional adjectives. We conducted a "Consumer Sentiment Journey" workshop to build a "Consumer Sentiment Dictionary," and this resulted in a total of 76 emotional adjectives regarding cosmetics consumer experience. Using these 76 emotional adjectives, we performed clustering with the Self-Organizing Map (SOM) method. As a result of the analysis, we derived eight final clusters of cosmetics consumer sentiments. Using the vector values of each node for each cluster, the characteristics of each cluster were derived based on the top ten most frequently appearing consumer sentiments. Different characteristics were found in consumer sentiments in each cluster. We also developed a cosmetics experience pattern map. The study results confirmed that recommendation and classification systems that consider consumer emotions and sentiments are needed because each consumer differs in what he or she pursues and prefers. Furthermore, this study reaffirms that the application of emotion and sentiment analysis can be extended to various fields other than cosmetics, and it implies that consumer insights can be derived using these methods. They can be used not only to build a specialized sentiment dictionary using scientific processes and "Design Thinking Methodology," but we also expect that these methods can help us to understand consumers' psychological reactions and cognitive behaviors. If this study is further developed, we believe that it will be able to provide solutions based on consumer experience, and therefore that it can be developed as an aspect of marketing intelligence.

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.

Accuracy Analysis of ADCP Stationary Discharge Measurement for Unmeasured Regions (ADCP 정지법 측정 시 미계측 영역의 유량 산정 정확도 분석)

  • Kim, Jongmin;Kim, Seojun;Son, Geunsoo;Kim, Dongsu
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.7
    • /
    • pp.553-566
    • /
    • 2015
  • Acoustic Doppler Current Profilers(ADCPs) have capability to concurrently capitalize three-dimensional velocity vector and bathymetry with highly efficient and rapid manner, and thereby enabling ADCPs to document the hydrodynamic and morphologic data in very high spatial and temporal resolution better than other contemporary instruments. However, ADCPs are also limited in terms of the inevitable unmeasured regions near bottom, surface, and edges of a given cross-section. The velocity in those unmeasured regions are usually extrapolated or assumed for calculating flow discharge, which definitely affects the accuracy in the discharge assessment. This study aimed at scrutinizing a conventional extrapolation method(i.e., the 1/6 power law) for estimating the unmeasured regions to figure out the accuracy in ADCP discharge measurements. For the comparative analysis, we collected spatially dense velocity data using ADV as well as stationary ADCP in a real-scale straight river channel, and applied the 1/6 power law for testing its applicability in conjunction with the logarithmic law which is another representative velocity law. As results, the logarithmic law fitted better with actual velocity measurement than the 1/6 power law. In particular, the 1/6 power law showed a tendency to underestimate the velocity in the near surface region and overestimate in the near bottom region. This finding indicated that the 1/6 power law could be unsatisfactory to follow actual flow regime, thus that resulted discharge estimates in both unmeasured top and bottom region can give rise to discharge bias. Therefore, the logarithmic law should be considered as an alternative especially for the stationary ADCP discharge measurement. In addition, it was found that ADCP should be operated in at least more than 0.6 m of water depth in the left and right edges for better estimate edge discharges. In the future, similar comparative analysis might be required for the moving boat ADCP discharge measurement method, which has been more widely used in the field.

Rietveld Structure Refinement of Biotite Using Neutron Powder Diffraction (중성자분말회절법을 이용한 흑운모의 Rietveld Structure Refinement)

  • 전철민;김신애;문희수
    • Economic and Environmental Geology
    • /
    • v.34 no.1
    • /
    • pp.1-12
    • /
    • 2001
  • The crystal structure of biotite-1M from Bancroft, Ontario, was determined by Rietveld refinement method using high-resolution neutron powder diffraction data at -26.3$^{\circ}C$, 2$0^{\circ}C$, 30$0^{\circ}C$, $600^{\circ}C$, 90$0^{\circ}C$. The crystal structure has been refined to a R sub(B) of 5.06%-11.9% and S (Goodness of fitness) of 2.97-3.94. The expansion rate of a, b, c unit cell dimensions with elevated temperature linearly increase to $600^{\circ}C$. The expansivity of the c dimension is $1.61{\times}10^{40}C^{-1}$, while $2.73{\times}10^{50}C^{-1}$ and $5.71{\times}10^{-50}C^{-1}$ for the a and b dimensions, respectively. Thus, the volume increase of the unit cell is dominated by expansion of the c axis as increasing temperature. In contrast to the trend, the expansivity of the dimensions is decreased at 90$0^{\circ}C$. It may be attributed to a change in cation size caused by dehydroxylation-oxidation of $Fe^{2+}$ to $Fe^{3+}$ in vacuum condition at such high temperature. The position of H-proton was determined by the refinement of diffraction pattern at low temperature (-2.63$^{\circ}C$). The position is 0.9103${\AA}$ from the O sub(4) location and located at atomic coordinates (x/a=0.138, y/b=0.5, z/c=0.305) with the OH vector almost normal to plane (001). According to the increase of the temperature, $\alpha$* (tetrahedral rotation angle), $t_{oct}$ (octahedral sheet thickness), mean distance increase except 90$0^{\circ}C$ data. But the trend is less clearly relative to unit cell dimension expansion because the expansion is dominant to the interlayer. Also, ${\Psi}$ (octahedral flattening angle) shows no trends as increasing temperature and it may be because the octahedron (M1, M2) is substituted by Mg and Fe.

  • PDF

On Method for LBS Multi-media Services using GML 3.0 (GML 3.0을 이용한 LBS 멀티미디어 서비스에 관한 연구)

  • Jung, Kee-Joong;Lee, Jun-Woo;Kim, Nam-Gyun;Hong, Seong-Hak;Choi, Beyung-Nam
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2004.12a
    • /
    • pp.169-181
    • /
    • 2004
  • SK Telecom has already constructed GIMS system as the base common framework of LBS/GIS service system based on OGC(OpenGIS Consortium)'s international standard for the first mobile vector map service in 2002, But as service content appears more complex, renovation has been needed to satisfy multi-purpose, multi-function and maximum efficiency as requirements have been increased. This research is for preparation ion of GML3-based platform to upgrade service from GML2 based GIMS system. And with this, it will be possible for variety of application services to provide location and geographic data easily and freely. In GML 3.0, it has been selected animation, event handling, resource for style mapping, topology specification for 3D and telematics services for mobile LBS multimedia service. And the schema and transfer protocol has been developed and organized to optimize data transfer to MS(Mobile Stat ion) Upgrade to GML 3.0-based GIMS system has provided innovative framework in the view of not only construction but also service which has been implemented and applied to previous research and system. Also GIMS channel interface has been implemented to simplify access to GIMS system, and service component of GIMS internals, WFS and WMS, has gotten enhanded and expanded function.

  • PDF

Measurement of Backscattering Coefficients of Rice Canopy Using a Ground Polarimetric Scatterometer System (지상관측 레이다 산란계를 이용한 벼 군락의 후방산란계수 측정)

  • Hong, Jin-Young;Kim, Yi-Hyun;Oh, Yi-Sok;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.23 no.2
    • /
    • pp.145-152
    • /
    • 2007
  • The polarimetric backscattering coefficients of a wet-land rice field which is an experimental plot belong to National Institute of Agricultural Science and Technology in Suwon are measured using ground-based polarimetric scatterometers at 1.8 and 5.3 GHz throughout a growth year from transplanting period to harvest period (May to October in 2006). The polarimetric scatterometers consist of a vector network analyzer with time-gating function and polarimetric antenna set, and are well calibrated to get VV-, HV-, VH-, HH-polarized backscattering coefficients from the measurements, based on single target calibration technique using a trihedral corner reflector. The polarimetric backscattering coefficients are measured at $30^{\circ},\;40^{\circ},\;50^{\circ}\;and\;60^{\circ}$ with 30 independent samples for each incidence angle at each frequency. In the measurement periods the ground truth data including fresh and dry biomass, plant height, stem density, leaf area, specific leaf area, and moisture contents are also collected for each measurement. The temporal variations of the measured backscattering coefficients as well as the measured plant height, LAI (leaf area index) and biomass are analyzed. Then, the measured polarimetric backscattering coefficients are compared with the rice growth parameters. The measured plant height increases monotonically while the measured LAI increases only till the ripening period and decreases after the ripening period. The measured backscattering coefficientsare fitted with polynomial expressions as functions of growth age, plant LAI and plant height for each polarization, frequency, and incidence angle. As the incidence angle is bigger, correlations of L band signature to the rice growth was higher than that of C band signatures. It is found that the HH-polarized backscattering coefficients are more sensitive than the VV-polarized backscattering coefficients to growth age and other input parameters. It is necessary to divide the data according to the growth period which shows the qualitative changes of growth such as panicale initiation, flowering or heading to derive functions to estimate rice growth.

Factor Analysis Affecting on Changes in Handysize Freight Index and Spot Trip Charterage (핸디사이즈 운임지수 및 스팟용선료 변화에 영향을 미치는 요인 분석)

  • Lee, Choong-Ho;Kim, Tae-Woo;Park, Keun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.2
    • /
    • pp.73-89
    • /
    • 2021
  • The handysize bulk carriers are capable of transporting a variety of cargo that cannot be transported by mid-large size ship, and the spot chartering market is active, and it is a market that is independent of mid-large size market, and is more risky due to market conditions and charterage variability. In this study, Granger causality test, the Impulse Response Function(IRF) and Forecast Error Variance Decomposition(FEVD) were performed using monthly time series data. As a result of Granger causality test, coal price for coke making, Japan steel plate commodity price, hot rolled steel sheet price, fleet volume and bunker price have causality to Baltic Handysize Index(BHSI) and charterage. After confirming the appropriate lag and stability of the Vector Autoregressive model(VAR), IRF and FEVD were analyzed. As a result of IRF, the three variables of coal price for coke making, hot rolled steel sheet price and bunker price were found to have significant at both upper and lower limit of the confidence interval. Among them, the impulse of hot rolled steel sheet price was found to have the most significant effect. As a result of FEVD, the explanatory power that affects BHSI and charterage is the same in the order of hot rolled steel sheet price, coal price for coke making, bunker price, Japan steel plate price, and fleet volume. It was found that it gradually increased, affecting BHSI by 30% and charterage by 26%. In order to differentiate from previous studies and to find out the effect of short term lag, analysis was performed using monthly price data of major cargoes for Handysize bulk carriers, and meaningful results were derived that can predict monthly market conditions. This study can be helpful in predicting the short term market conditions for shipping companies that operate Handysize bulk carriers and concerned parties in the handysize chartering market.

Analysis of Pinewood Nematode Damage Expansion in Gyeonggi Province Based on Monitoring Data from 2008 to 2015 (경기도의 소나무재선충병 피해 확산 양상 분석: 2008 ~ 2015년 예찰 데이터를 기반으로)

  • Park, Wan-Hyeok;Ko, Dongwook W.;Kwon, Tae-Sung;Nam, Youngwoo;Kwon, Young Dae
    • Journal of Korean Society of Forest Science
    • /
    • v.107 no.4
    • /
    • pp.486-496
    • /
    • 2018
  • Pine wilt disease (PWD) in Gyeonggi province was first detected in Gwangju in 2007, and ever since has caused extensive damage. Insect vector and host tree in Gyeonggi province are Monochamus saltuarius and Pinus koraiensis, respectively, which are different from the southern region that consist of Monochamus alternatus and Pinus densiflora. Consequently, spread and mortality characteristics may be different, but our understanding is limited. In this research, we utilized the spatial data of newly infected trees in Gyeonggi province from 2008 to 2015 to analyze how it is related to various environmental and human factors, such as elevation, forest type, and road network. We also analyzed the minimum distance from newly infected tree to last year's closest infected tree to examine the dispersal characteristics based on new outbreak locations. Annual number of newly infected trees rapidly increased from 2008 to 2013, which then stabilized. Number of administrative districts with infected trees was 5 in 2012, 11 in 2013, and 15 in 2014. Most of the infected trees was Pinus koraiensis, with its proportion close to 90% throughout the survey period. Mean distance to newly infected trees dramatically decreased over time, from 4,111 m from 2012 to 2013, to approximately 600 m from 2013 to 2014 and 2014 to 2015. Most new infections occurred in higher elevation over time. Distance to road from newly infected trees continuously increased, suggesting that natural diffusion dispersal is increasingly occurring compared to human-influenced dispersal over time.

A Comparative Study of Teachers' and Students' Preference of Socio-Scientific Issues Topics (교사와 학생의 사회적-과학적 쟁점(Socio-Scientific Issues) 주제 선호도 분석)

  • Hyun Ju Park
    • Journal of Science Education
    • /
    • v.47 no.2
    • /
    • pp.180-191
    • /
    • 2023
  • The purpose of this study was to investigate the preferred SSI topics of students and teachers in elementary, middle, and high schools. It analyzed the similarity of students' and teachers' preferred SSI topics by school level using the cosine similarity measure. A total of 566 students and 327 teachers from elementary, middle, and high schools participated in the study. Sixty topics were identified and listed in the areas of environment, science and technology, health and medicine, and other social issues based on the literature and SSI programs. Students and teachers were asked to select five of their favorite topics. The data was collected online using SurveyMonkey. The collected data was divided into six groups of students and teachers, and the frequency of topic selection was analyzed within each group. The topic preference similarity was analyzed by calculating vector values based on the frequency of the selected topics and measuring the cosine similarity between students, teachers, and teachers and students by school level. The results are as follows: First, the cosine similarity of SSI Preferred Topics between students' school-level cohorts was higher between middle and high school students (0.982) than between elementary and middle school students (0.651) or between elementary and high school students (0.662). Second, the cosine similarity of SSI Preferred Topics between teachers' school-level cohorts was similar for all comparison groups between elementary, middle, and high school. Third, the SSI topic preference similarity between students and teachers by school level had a higher cosine similarity between the elementary student and teacher cohorts (0.974) than the other school level comparisons, middle school (0.621) or high school (0.645). Access to topics of interest to students in SSI education is strongly associated with motivation and persistence in learning, as well as an enjoyable learning experience and positive attitudes toward learning. Therefore, when designing SSI lessons, it is important to examine topics from the perspective of student interest, especially if the teacher has selected SSI topics that are different from students' preferences. Careful instructional design will be needed to overcome the gap.