• Title/Summary/Keyword: frequency-based method

Search Result 6,110, Processing Time 0.065 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

A Study on Problem Drinking and Spending Leisure by CAGE and AUDIT in a Rural Area (일부 농촌지역에서의 CAGE와 AUDIT를 이용한 문제음주 및 여가활용에 관한 연구)

  • Kim, Yeal;Yu, Ji-Young;Jung, Sun-Im;Han, Ji-Yun;Pak, Jong-Hyuk;Kim, Han-Suk;Choi, Young-Sun;Kim, Min-Jung;Cho, Byung-Hee;Jung, Mun-Ho
    • Journal of agricultural medicine and community health
    • /
    • v.29 no.1
    • /
    • pp.147-161
    • /
    • 2004
  • Objectives: There are many habitual drinking in rural area. So it is the key point of drinking control policy in rural community to understand the drinking behavior in leisure time and to have an appropriate screening method for problem drinking. CAGE and AUDIT are famous screening tools for problem drinking and alcoholics. Even though there are some studies to validate CAGE and AUDIT which translated in Korean, they were not studied with community based population but with hospital based patients. In this study we assessed the usefulness of CAGE and AUDIT for selecting problem drinking in a rural population, and compared problem drinkers with normal group about spending leisure, Methods: The study subjects were 120 residents over 20 years old who lived in 3 districts in Dong-San Myun near Chun-chon city. We made up questionnaire by interview from Feb. 13 to 19, 2004. Results: The mean age of study population was 66.01 .26 years old. Defining the problem drinking as more than 12 score in AUDIT and more than 2 score in CAGE, the proportion of problem drinker was 30.600 and 28.9% respectively. This proportions were higher than those of other national wide studies. There were significant difference in drinking frequency per week and amount per one episode between problem drinker and normal group. Experience about driving, accident, injury, disturbance in working and quarrel after drinking were also significantly different. Problem drinker were more tolerable to the bad social culture about drinking (eg. force to drink, bad drunken habit. overdrinking, drinking relay etc.) than normal group. Watching TV and playing with neighborhood were most frequent method of spending leisure in this study population, normal male group exercised more frequently in leisure time than problem drinker. Conclusions: It may be useful to use CAGE and AUDIT score for screening problem drinking in rural community. Appropriate utilization of leisure time may he important for control of problem drinking in rural area.

  • PDF

The Relationship Between Social Security Network and Security Life Satisfaction in Community Residents: Scale Development and Application of Social Security Network (사회안전망과 지역사회주민의 안전생활만족의 관계: 사회안전망 척도개발과 적용)

  • Kim, Chan-Sun
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.6
    • /
    • pp.108-118
    • /
    • 2014
  • The purpose of this study is to develop a relationship of measuring method for the social security network and verify its validity and reliability and apply it to investigate the due to security life satisfaction. This study is based by setting general residents of Seoul in 2013 and using the stratified cluster random sampling method to analyze a total amount of 203 examples. The measuring methods for the social security network was developed through document research, conceptual definition and drafting the survey, experts' conference, preliminary inspection and original examination, verification of the validity and reliability of the survey. An experts' conference took pace to verify the validity of the survey, and 6 factors were extracted through exploratory factor analysis crime prevention design, street CCTV facilities, volunteer neighborhood patrol, local government security education, police public peace service, private security service. The conclusion are the following. Collected data was analyzed based on the aim of this study using SPSSWIN 18.0, and practice frequency analysis, F test, factor analysis, reliability analysis, correlation analysis, multiple regression analysis. First, the validity of the social security network measurement is very high. Thus, the factors constituting the social security network were found to be crime prevention design, street CCTV facilities, volunteer neighborhood patrol, local government security education, police public peace services, and private security services, and the crime prevention design factor was found to be most explanatory. Second, the reliability of the social security network measurement is very high. Thus, the correlation between the questions and the sector, the questions and the social security net was very high, and the internal consistency showed a Cronbach's${\alpha}$ value of over 0.865. Third, the establishment of a social security network had the biggest effect on people in their forties. Thus, when the crime prevention design, street CCTV facilities, local government security education, police public peace services are systematically established, the social anxiety of citizens was reduced.

Evaluation of Grassland Grade by Grassland Vegetation Ratio (초지식생비율에 의한 초지등급평가 연구)

  • Lee, Bae Hun;Kim, Ji Yung;Park, Hyung Soo;Sung, Kyung Il;Kim, Byong Wan
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.40 no.1
    • /
    • pp.29-36
    • /
    • 2020
  • This study was conducted to suggest the new grassland grade system on evaluating the grassland status. The grassland status has been evaluated based on the forage yield (good, fair and poor) by municipal authorities. The grassland grades by current system were 19 good, 11 fair and 11 poor among the 41 grassland farms from 6 provinces. This evaluation result differed greatly from the result of actual measurement of forage yields which showed all poor. The big difference was resulted from failing the reflection of the various characteristics, such as different seasonal growth and harvest frequency. Furthermore, the lack of consistent examining date and method added the inaccuracy of current grassland grade system. The new grassland grade system based on the grassland vegetation ratio (grass, weed and bare soil) was initially designed into 6-grade system (1st; 100~80%, 2nd; 79~60%, 3rd; 59~40%, 4th; 39~20%; 5th; 19~1% and 6th; 0% on the basis of grasses proportion), but later was changed into 4-grade system (1st, 2nd, 3rd, and 4th grades are 70% or more, 50% or more, 50% or less, and 0% of forage proportion, respectively) after reflecting the opinion of grassland farms and municipal authorities. Re-evaluation on the grassland status using the 4-grade system resulted in the total 80% consisted of 2nd, 3rd and 4th grade which means most grasslands needs the partial reseeding or the rehabilitation of entire grassland. Pictures and schematic diagrams depicting the 4-grade system were presented to improve the objectivity of evaluation. The optimal time for assessing grassland status is fall when plant height 20~30 cm. Conclusively, the 4-grade system is an efficient method for all non-professionals including grassland farms or municipal authorities in assessing the grassland status. To apply this system to the field, the institutional arrangements such as amendment of grassland act should take place in advance.

The Development of Vulnerable Elements and Assessment of Vulnerability of Maeul-soop Ecosystem in Korea (한국 마을숲 생태계 취약요소 발굴 및 취약성 평가)

  • Lim, Jeong-Cheol;Ryu, Tae-Bok;Ahn, Kyeong-Hwan;Choi, Byoung-Ki
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.34 no.4
    • /
    • pp.57-65
    • /
    • 2016
  • Maeul-soop(Village forest) is a key element of Korean traditional village landscape historically and culturally. However, a number of Maeul-soops have been lost or declined due to various influences since the modern age. For this Maeul-soop that has a variety of conservation values including historical, cultural and ecological ones, attention and efforts for a systematic conservation and restoration of Maeul-soop are needed. The purpose of the present study is to provide information on ecological restoration and sustainable use and management of Maeul-soops based on component plant species, habitat and location characteristics of 499 Maeul-soops spread throughout Korea. Major six categories of threat factors to Maeul-soop ecosystem were identified and the influence of each factor was evaluated. For the evaluation of weight by threat factors for the influence on the vulnerability of Maeul-soop ecosystem, more three-dimensional analysis was conducted using Analytic Hierarchy Process (AHP) analysis method. In the results of evaluation using AHP analysis method, reduction of area, among six categories, was spotted as the biggest threat to existence of Maeul-soops. Next, changes in topography and soil environment were considered as a threat factor of qualitative changes in Maeul-soop ecosystem. Influence of vegetation structure and its qualitative changes on the loss or decline of Masul-soop was evaluated to be lower than that of changes in habitat. Based on weight of each factor, the figures were converted with 100 points being the highest score and the evaluation of vulnerability of Maeul-soop was conducted with the converted figures. In the result of evaluation of vulnerability of Maeul-soops, grade III showed the highest frequency and a normal distribution was formed from low grade to high grade. 38 Maeul-soops were evaluated as grade I which showed high naturality and 10 Maeul-soops were evaluated as grade V as their maintenance was threatened. Also in the results of evaluation of vulnerability of each Maeul-soop, restoration of Maeul-soop's own area was found as top priority to guarantee the sustainability of Maeul-soops. It was confirmed that there was a need to prepare a national level ecological response strategy for each vulnerability factor of Maeul-soop, which was important national ecological resources.

Biological Yielding Potential of Rice in Association with Climatic Factors in Yeongnam Region (영남지역 기상과 수도의 한계생산력 해석)

  • Kim, Soon-Chul;Lee, Soo-Kwan;Chung, Geun-Sik
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.30 no.3
    • /
    • pp.259-270
    • /
    • 1985
  • Meteorological year variations for rice crop from 1973 to 1984 were compared by using air temperature and sunshine hour for nursery period, cooling index for reproductive stage and meteorological yield productivity index for ripening period. The most optimum transplanting date and heading date for crop yield based on real transplanting date-grain yield relationship or heading date-grain yield relationship, meteorological yield productivity index and actual results showed good agreement each other. Around May 26 for transplanting and August 10 for heading were the most optimum date in Indica/Japonica hybrid cultivars while these were about June 8 and August 23 for Japonica cultivars, respectively. On the other hand, theoretical late limiting heading date for safe ripening were August 20 for Indica/Japonica hybrid cultivars and August 30 for Japonica cultivars, respectively, for both methods, cumulative temperature method during ripening with 80% believable frequency and meteorological yield productive index method having 1000(kg/10a) yielding potential. Based on the yield forecast trial, the highest values of photosynthetic efficiency, 2.5%, and crop growth rate, 23g/㎡/day, were recorded during 30 days before rice heading. Considering the photosynthetic efficiency and solar radiation, the potential crop growth rate was more or less 30g/㎡/day and the biological grain yielding potential in a existing cultural practices was approximately 900-1000(kg/10a) in Milyang weather condition. To increase further yielding potential, either photosynthetic efficiency or harvest index or both should be improved by manipulating appropriate canopy architecture, plant spacing, fertilizer, chemical, etc.

  • PDF

Analysis on dynamic numerical model of subsea railway tunnel considering various ground and seismic conditions (다양한 지반 및 지진하중 조건을 고려한 해저철도 터널의 동적 수치모델 분석)

  • Changwon Kwak;Jeongjun Park;Mintaek Yoo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.25 no.6
    • /
    • pp.583-603
    • /
    • 2023
  • Recently, the advancement of mechanical tunnel boring machine (TBM) technology and the characteristics of subsea railway tunnels subjected to hydrostatic pressure have led to the widespread application of shield TBM methods in the design and construction of subsea railway tunnels. Subsea railway tunnels are exposed in a constant pore water pressure and are influenced by the amplification of seismic waves during earthquake. In particular, seismic loads acting on subsea railway tunnels under various ground conditions such as soft ground, soft soil-rock composite ground, and fractured zones can cause significant changes in tunnel displacement and stress, thereby affecting tunnel safety. Additionally, the dynamic response of the ground and tunnel varies based on seismic load parameters such as frequency characteristics, seismic waveform, and peak acceleration, adding complexity to the behavior of the ground-tunnel structure system. In this study, a finite difference method is employed to model the entire ground-tunnel structure system, considering hydrostatic pressure, for the investigation of dynamic behavior of subsea railway tunnel during earthquake. Since the key factors influencing the dynamic behavior during seismic events are ground conditions and seismic waves, six analysis cases are established based on virtual ground conditions: Case-1 with weathered soil, Case-2 with hard rock, Case-3 with a composite ground of soil and hard rock in the tunnel longitudinal direction, Case-4 with the tunnel passing through a narrow fault zone, Case-5 with a composite ground of soft soil and hard rock in the tunnel longitudinal direction, and Case-6 with the tunnel passing through a wide fractured zone. As a result, horizontal displacements due to earthquakes tend to increase with an increase in ground stiffness, however, the displacements tend to be restrained due to the confining effects of the ground and the rigid shield segments. On the contrary, peak compressive stress of segment significantly increases with weaker ground stiffness and the effects of displacement restrain contribute the increase of peak compressive stress of segment.

Performance Analysis of Frequent Pattern Mining with Multiple Minimum Supports (다중 최소 임계치 기반 빈발 패턴 마이닝의 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.1-8
    • /
    • 2013
  • Data mining techniques are used to find important and meaningful information from huge databases, and pattern mining is one of the significant data mining techniques. Pattern mining is a method of discovering useful patterns from the huge databases. Frequent pattern mining which is one of the pattern mining extracts patterns having higher frequencies than a minimum support threshold from databases, and the patterns are called frequent patterns. Traditional frequent pattern mining is based on a single minimum support threshold for the whole database to perform mining frequent patterns. This single support model implicitly supposes that all of the items in the database have the same nature. In real world applications, however, each item in databases can have relative characteristics, and thus an appropriate pattern mining technique which reflects the characteristics is required. In the framework of frequent pattern mining, where the natures of items are not considered, it needs to set the single minimum support threshold to a too low value for mining patterns containing rare items. It leads to too many patterns including meaningless items though. In contrast, we cannot mine any pattern if a too high threshold is used. This dilemma is called the rare item problem. To solve this problem, the initial researches proposed approximate approaches which split data into several groups according to item frequencies or group related rare items. However, these methods cannot find all of the frequent patterns including rare frequent patterns due to being based on approximate techniques. Hence, pattern mining model with multiple minimum supports is proposed in order to solve the rare item problem. In the model, each item has a corresponding minimum support threshold, called MIS (Minimum Item Support), and it is calculated based on item frequencies in databases. The multiple minimum supports model finds all of the rare frequent patterns without generating meaningless patterns and losing significant patterns by applying the MIS. Meanwhile, candidate patterns are extracted during a process of mining frequent patterns, and the only single minimum support is compared with frequencies of the candidate patterns in the single minimum support model. Therefore, the characteristics of items consist of the candidate patterns are not reflected. In addition, the rare item problem occurs in the model. In order to address this issue in the multiple minimum supports model, the minimum MIS value among all of the values of items in a candidate pattern is used as a minimum support threshold with respect to the candidate pattern for considering its characteristics. For efficiently mining frequent patterns including rare frequent patterns by adopting the above concept, tree based algorithms of the multiple minimum supports model sort items in a tree according to MIS descending order in contrast to those of the single minimum support model, where the items are ordered in frequency descending order. In this paper, we study the characteristics of the frequent pattern mining based on multiple minimum supports and conduct performance evaluation with a general frequent pattern mining algorithm in terms of runtime, memory usage, and scalability. Experimental results show that the multiple minimum supports based algorithm outperforms the single minimum support based one and demands more memory usage for MIS information. Moreover, the compared algorithms have a good scalability in the results.

Changes and Improvements of the Standardized Eddy Covariance Data Processing in KoFlux (표준화된 KoFlux 에디 공분산 자료 처리 방법의 변화와 개선)

  • Kang, Minseok;Kim, Joon;Lee, Seung-Hoon;Kim, Jongho;Chun, Jung-Hwa;Cho, Sungsik
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.1
    • /
    • pp.5-17
    • /
    • 2018
  • The standardized eddy covariance flux data processing in KoFlux has been updated, and its database has been amended accordingly. KoFlux data users have not been informed properly regarding these changes and the likely impacts on their analyses. In this paper, we have documented how the current structure of data processing in KoFlux has been established through the changes and improvements to ensure transparency, reliability and usability of the KoFlux database. Due to increasing diversity and complexity of flux site instrumentation and organization, we have re-implemented the previously ignored or simplified procedures in data processing (e.g., frequency response correction, stationarity test), and added new methods for $CH_4$ flux gap-filling and $CO_2$ flux correction and partitioning. To evaluate the effects of the changes, we processed the data measured at a flat and homogeneous paddy field (i.e., HPK) and a deciduous forest in complex and heterogeneous topography (i.e., GDK), and quantified the differences. Based on the results from our overall assessment, it is confirmed that (1) the frequency response correction (HPK: 11~18% of biases for annually integrated values, GDK: 6~10%) and the stationarity test (HPK: 4~19% of biases for annually integrated values, GDK: 9~23%) are important for quality control and (2) the minimization of the missing data and the choice of the appropriate driver (rather than the choice of the gap-filling method) are important to reduce the uncertainty in gap-filled fluxes. These results suggest the future directions for the data processing technology development to ensure the continuity of the long-term KoFlux database.

Analysis of the consumption pattern of delivery food according to food-related lifestyle (식생활라이프스타일에 따른 배달음식의 소비성향 분석)

  • Heo, So-Jeong;Bae, Hyun-Joo
    • Journal of Nutrition and Health
    • /
    • v.53 no.5
    • /
    • pp.547-561
    • /
    • 2020
  • Purpose: This study was conducted to segment the delivery food market and to develop customized products and services. Methods: This study analyzed 636 responses collected from customers who ordered delivery food. Statistical analyses were conducted using the SPSS program (ver. 25.0) for frequency analysis, χ2-test, one-way analysis of variance, factor analysis, and cluster analysis. Results: Four factors were extracted by exploratory factor analysis (safety-orientation, convenience-orientation, taste-orientation, and economy-orientation) to explain the consumers' food-related lifestyles. The results of cluster analysis indicated that the 'low-interest group', 'convenience and economy-oriented group', and 'gourmet and economy-oriented group' should be regarded as the target segments. Characteristic analysis of each cluster showed that lowinterest group had higher rates of married (67.1%) and living with family (85.4%) than other clusters. The convenience and the economy-oriented group had higher rates of living alone (28.9%) than others. The gourmet and the economy-oriented group had a higher percentage of unmarried (62.0%) than the others. In addition, the average age of convenience and economy-oriented group (32.3 years) and gourmet and economy-oriented group (32.5 years) were significantly lower than the safety seeker (40.0 years) (p < 0.001). Difference analysis of the consumption practice according to the cluster, revealed significant differences in the order frequency (p < 0.001), main day to order (p < 0.05), source of information about delivery food (p < 0.001), order method (p < 0.001), and co-consumer (p < 0.01). In addition, the convenience and the economy-oriented group had significantly higher overall satisfaction than the others (p < 0.001). Conclusion: These findings suggest that customer segmentation based on a food-related lifestyle can be used to build a successful marketing strategy. Therefore, restaurant managers and delivery platform operators should consider developing products and services according to the segmentation to maximize customer satisfaction.