• Title/Summary/Keyword: frequency-based method

Search Result 6,111, Processing Time 0.044 seconds

Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach (온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여)

  • Lee, Ji Hyeon;Jung, Sang Hyung;Kim, Jun Ho;Min, Eun Joo;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.97-117
    • /
    • 2020
  • Product evaluation criteria is an indicator describing attributes or values of products, which enable users or manufacturers measure and understand the products. When companies analyze their products or compare them with competitors, appropriate criteria must be selected for objective evaluation. The criteria should show the features of products that consumers considered when they purchased, used and evaluated the products. However, current evaluation criteria do not reflect different consumers' opinion from product to product. Previous studies tried to used online reviews from e-commerce sites that reflect consumer opinions to extract the features and topics of products and use them as evaluation criteria. However, there is still a limit that they produce irrelevant criteria to products due to extracted or improper words are not refined. To overcome this limitation, this research suggests LDA-k-NN model which extracts possible criteria words from online reviews by using LDA and refines them with k-nearest neighbor. Proposed approach starts with preparation phase, which is constructed with 6 steps. At first, it collects review data from e-commerce websites. Most e-commerce websites classify their selling items by high-level, middle-level, and low-level categories. Review data for preparation phase are gathered from each middle-level category and collapsed later, which is to present single high-level category. Next, nouns, adjectives, adverbs, and verbs are extracted from reviews by getting part of speech information using morpheme analysis module. After preprocessing, words per each topic from review are shown with LDA and only nouns in topic words are chosen as potential words for criteria. Then, words are tagged based on possibility of criteria for each middle-level category. Next, every tagged word is vectorized by pre-trained word embedding model. Finally, k-nearest neighbor case-based approach is used to classify each word with tags. After setting up preparation phase, criteria extraction phase is conducted with low-level categories. This phase starts with crawling reviews in the corresponding low-level category. Same preprocessing as preparation phase is conducted using morpheme analysis module and LDA. Possible criteria words are extracted by getting nouns from the data and vectorized by pre-trained word embedding model. Finally, evaluation criteria are extracted by refining possible criteria words using k-nearest neighbor approach and reference proportion of each word in the words set. To evaluate the performance of the proposed model, an experiment was conducted with review on '11st', one of the biggest e-commerce companies in Korea. Review data were from 'Electronics/Digital' section, one of high-level categories in 11st. For performance evaluation of suggested model, three other models were used for comparing with the suggested model; actual criteria of 11st, a model that extracts nouns by morpheme analysis module and refines them according to word frequency, and a model that extracts nouns from LDA topics and refines them by word frequency. The performance evaluation was set to predict evaluation criteria of 10 low-level categories with the suggested model and 3 models above. Criteria words extracted from each model were combined into a single words set and it was used for survey questionnaires. In the survey, respondents chose every item they consider as appropriate criteria for each category. Each model got its score when chosen words were extracted from that model. The suggested model had higher scores than other models in 8 out of 10 low-level categories. By conducting paired t-tests on scores of each model, we confirmed that the suggested model shows better performance in 26 tests out of 30. In addition, the suggested model was the best model in terms of accuracy. This research proposes evaluation criteria extracting method that combines topic extraction using LDA and refinement with k-nearest neighbor approach. This method overcomes the limits of previous dictionary-based models and frequency-based refinement models. This study can contribute to improve review analysis for deriving business insights in e-commerce market.

Rough Set Analysis for Stock Market Timing (러프집합분석을 이용한 매매시점 결정)

  • Huh, Jin-Nyung;Kim, Kyoung-Jae;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.77-97
    • /
    • 2010
  • Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na$\ddot{i}$ve and Boolean reasoning-based discretization. Equal frequency scaling fixes a number of intervals and examines the histogram of each variable, then determines cuts so that approximately the same number of samples fall into each of the intervals. Expert's knowledge-based discretization determines cuts according to knowledge of domain experts through literature review or interview with experts. Minimum entropy scaling implements the algorithm based on recursively partitioning the value set of each variable so that a local measure of entropy is optimized. Na$\ddot{i}$ve and Booleanreasoning-based discretization searches categorical values by using Na$\ddot{i}$ve scaling the data, then finds the optimized dicretization thresholds through Boolean reasoning. Although the rough set analysis is promising for market timing, there is little research on the impact of the various data discretization methods on performance from trading using the rough set analysis. In this study, we compare stock market timing models using rough set analysis with various data discretization methods. The research data used in this study are the KOSPI 200 from May 1996 to October 1998. KOSPI 200 is the underlying index of the KOSPI 200 futures which is the first derivative instrument in the Korean stock market. The KOSPI 200 is a market value weighted index which consists of 200 stocks selected by criteria on liquidity and their status in corresponding industry including manufacturing, construction, communication, electricity and gas, distribution and services, and financing. The total number of samples is 660 trading days. In addition, this study uses popular technical indicators as independent variables. The experimental results show that the most profitable method for the training sample is the na$\ddot{i}$ve and Boolean reasoning but the expert's knowledge-based discretization is the most profitable method for the validation sample. In addition, the expert's knowledge-based discretization produced robust performance for both of training and validation sample. We also compared rough set analysis and decision tree. This study experimented C4.5 for the comparison purpose. The results show that rough set analysis with expert's knowledge-based discretization produced more profitable rules than C4.5.

Strategy for Store Management Using SOM Based on RFM (RFM 기반 SOM을 이용한 매장관리 전략 도출)

  • Jeong, Yoon Jeong;Choi, Il Young;Kim, Jae Kyeong;Choi, Ju Choel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.93-112
    • /
    • 2015
  • Depending on the change in consumer's consumption pattern, existing retail shop has evolved in hypermarket or convenience store offering grocery and daily products mostly. Therefore, it is important to maintain the inventory levels and proper product configuration for effectively utilize the limited space in the retail store and increasing sales. Accordingly, this study proposed proper product configuration and inventory level strategy based on RFM(Recency, Frequency, Monetary) model and SOM(self-organizing map) for manage the retail shop effectively. RFM model is analytic model to analyze customer behaviors based on the past customer's buying activities. And it can differentiates important customers from large data by three variables. R represents recency, which refers to the last purchase of commodities. The latest consuming customer has bigger R. F represents frequency, which refers to the number of transactions in a particular period and M represents monetary, which refers to consumption money amount in a particular period. Thus, RFM method has been known to be a very effective model for customer segmentation. In this study, using a normalized value of the RFM variables, SOM cluster analysis was performed. SOM is regarded as one of the most distinguished artificial neural network models in the unsupervised learning tool space. It is a popular tool for clustering and visualization of high dimensional data in such a way that similar items are grouped spatially close to one another. In particular, it has been successfully applied in various technical fields for finding patterns. In our research, the procedure tries to find sales patterns by analyzing product sales records with Recency, Frequency and Monetary values. And to suggest a business strategy, we conduct the decision tree based on SOM results. To validate the proposed procedure in this study, we adopted the M-mart data collected between 2014.01.01~2014.12.31. Each product get the value of R, F, M, and they are clustered by 9 using SOM. And we also performed three tests using the weekday data, weekend data, whole data in order to analyze the sales pattern change. In order to propose the strategy of each cluster, we examine the criteria of product clustering. The clusters through the SOM can be explained by the characteristics of these clusters of decision trees. As a result, we can suggest the inventory management strategy of each 9 clusters through the suggested procedures of the study. The highest of all three value(R, F, M) cluster's products need to have high level of the inventory as well as to be disposed in a place where it can be increasing customer's path. In contrast, the lowest of all three value(R, F, M) cluster's products need to have low level of inventory as well as to be disposed in a place where visibility is low. The highest R value cluster's products is usually new releases products, and need to be placed on the front of the store. And, manager should decrease inventory levels gradually in the highest F value cluster's products purchased in the past. Because, we assume that cluster has lower R value and the M value than the average value of good. And it can be deduced that product are sold poorly in recent days and total sales also will be lower than the frequency. The procedure presented in this study is expected to contribute to raising the profitability of the retail store. The paper is organized as follows. The second chapter briefly reviews the literature related to this study. The third chapter suggests procedures for research proposals, and the fourth chapter applied suggested procedure using the actual product sales data. Finally, the fifth chapter described the conclusion of the study and further research.

Scaling up of single fracture using a spectral analysis and computation of its permeability coefficient (스펙트럼 분석을 응용한 단일 균열 규모확장과 투수계수 산정)

  • 채병곤
    • The Journal of Engineering Geology
    • /
    • v.14 no.1
    • /
    • pp.29-46
    • /
    • 2004
  • It is important to identify geometries of fracture that act as a conduit of fluid flow for characterization of ground water flow in fractured rock. Fracture geometries control hydraulic conductivity and stream lines in a rock mass. However, we have difficulties to acquire whole geometric data of fractures in a field scale because of discontinuous distribution of outcrops and impossibility of continuous collecting of subsurface data. Therefore, it is needed to develop a method to describe whole feature of a target fracture geometry. This study suggests a new approach to develop a method to characterize on the whole feature of a target fracture geometry based on the Fourier transform. After sampling of specimens along a target fracture from borehole cores, effective frequencies among roughness components were selected by the Fourier transform on each specimen. Then, the selected effective frequencies were averaged on each frequency. Because the averaged spectrum includes all the frequency profiles of each specimen, it shows the representative components of the fracture roughness of the target fracture. The inverse Fourier transform is conducted to reconstruct an averaged whole roughness feature after low pass filtering. The reconstructed roughness feature also shows the representative roughness of the target subsurface fracture including the geometrical characteristics of each specimen. It also means that overall roughness feature by scaling up of a fracture. In order to identify the characteristics of permeability coefficients along the target fracture, fracture models were constructed based on the reconstructed roughness feature. The computation of permeability coefficient was performed by the homogenization analysis that can calculate accurate permeability coefficients with full consideration of fracture geometry. The results show a range between $10^{-4}{\;}and{\;}10^{-3}{\;}cm/sec$, indicating reasonable values of permeability coefficient along a large fracture. This approach will be effectively applied to the analysis of permeability characteristics along a large fracture as well as identification of the whole feature of a fracture in a field scale.

A Study on The Actual Condition and Demand Assessment of First Aid Education on Higher Grade Students in Elementary School (초등학교 고학년생의 응급처치 교육실태 및 교육 요구도)

  • Cho, Keun-Ja;Choi, Eun-Sook;Lee, Hyeun-Ju
    • The Korean Journal of Emergency Medical Services
    • /
    • v.11 no.3
    • /
    • pp.175-189
    • /
    • 2007
  • Background and Purpose : Higher grade students in elementary schools are most adequate subjects for first aid training. The purpose of this study was to assess first aid education and needs of higher grade students in elementary schools. Method : The subjects of this study were 183 higher grade students from 8 elementary schools. Data were collected by the questionnaire during the period from March 19 to April 13, 2007. The data were analyzed through frequency, Cronbach's ${\alpha}$, Independent Two samples t-test, One Way ANOVA by SPSS win 12.0. Result : 1. It showed that 78.1%(143 persons) of sujects answered that they learned first aid. 65% of sujects learned in the school(65%). 61.2% of sujects were taught by health teachers. 36.7%(67 person) of subjects was educated using practice with demonstration including lecture. Learned contents were action at emergency(50.8%), CPR(36.6%), splint (33.9%). 2. It showed that 90.2%(165 persons) of subjects answered that first aid and CPR education are necessary. Also 74.9%(137 persons) of subjects answered that will be educated first aid and CPR if opportunities is given. The 53.3%(73 persons) of subjects wanted teaching method using practice with demonstration including lecture. 3. The total mean showed $2.29{\pm}.48$ in needs of first aid education by 3 points Likert scale. Needs of first aid education was ranked Heimlich maneuver($2.41{\pm}.65$), splint and bandaging($2.38{\pm}.59$). Priority of intensive training showed patient assessment(38.0%) and CPR(19.7%) in first, splint and bandging(22.6%), CPR(21.9%) and Heimlich maneuver(21.9%) in second. 4. The needs assessment of first aid education showed statistically significant differences according to teaching method(F = 2.563, p = .025), education necessity yes or no(F = 2.474, p = .015), attending future education yes or no(F = 2.253, p = .026). Conclusion : These results suggest that First aid education for higher grade students in elementary schools must be consisted of most adequate content and method based on current education condition and needs assessment.

  • PDF

Direct Time Domain Method for Nonlinear Earthquake Response Analysis of Dam-Reservoir Systems (댐-호소계 비선형 지진응답의 직접시간영역 해석기법)

  • Lee, Jin-Ho;Kim, Jae-Kwan
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.11-22
    • /
    • 2010
  • An analysis method is proposed for the transient linear or nonlinear analysis of dynamic interactions between a flexible dam body and reservoir impounding compressible water under earthquake loadings. The coupled dam-reservoir system consists of three substructures: (1) a dam body with linear or nonlinear behavior; (2) a semi-infinite fluid region with constant depth; and (3) an irregular fluid region between the dam body and far field. The dam body is modeled with linear and/or nonlinear finite elements. The far field is formulated as a displacement-based transmitting boundary in the frequency domain that can radiate energy into infinity. Then the transmitting boundary is transformed for the direct coupling in the time domain. The near field region is modeled as a compressible fluid contained between two substructures. The developed method is verified and applied to various earthquake response analyses of dam-reservoir systems. Also, the method is applied to a nonlinear analysis of a concrete gravity dam. The results show the location and severity of damage demonstrating the applicability to the seismic evaluation of existing and new dams.

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

An Improved Method of Developing Safety-Related Application Conditions for Safety Design of Railway Signalling Systems (철도신호시스템의 안전 설계를 위한 개선된 안전성 적용 조건 도출 방법)

  • Baek, Young-Goo;Lee, Jae-Chon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.11
    • /
    • pp.31-45
    • /
    • 2017
  • According to the railway accident statistics in recent years, the frequency of accidents has been significantly reduced, due to the advance of related technologies and the establishment of safety information management systems. Nonetheless, accidents due to errors in the operation and maintenance phase and faults in safety design continue to occur. Therefore, to prevent accidents, guidelines for the safety design and manufacture of railway vehicles were established, and a request for the independent safety evaluation of safety designs was made. To respond to this, rail system developers must prepare safety cases as a safety activity product. One of the main items of these safety cases is the safety-related application conditions (SRAC) and, thus, the question of how to develop these SRAC is an important one. The SRAC studies reported so far focused only on the simplicity of the derivation procedure and the specific safety activities in the design phase. This method seems to have the advantage of quickly deriving SRAC items. However, there is a risk that some important safety-related items may be missing. As such, this paper proposes an improved method of developing the SRAC based on the idea of performing both the safety design and safety evaluation activities throughout the whole system lifecycle. In this way, it is possible to develop and manage the SRAC more systematically. Especially, considering the SRAC from the initial stage of the design can allow the safety requirements to be reflected to a greater extent. Also, an application case study on railway signaling systems shows that the method presented herein can prevent the omission of important safety-related items, due to the consideration of the SRAC throughout the system lifecycle.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Precise Estimation of Nonlinear Parameter in Pulse-Like Ultrasonic Signal (펄스형 초음파 신호에서 비선형 파라미터의 정밀 추정)

  • Ha, Job;Jhang, Kyung-Young;Sasaki, Kimio;Tanaka, Hiroaki
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.26 no.2
    • /
    • pp.77-83
    • /
    • 2006
  • Ultrasonic nonlinearity has been considered as a solution for the detection of microcracks or interfacial delamination in a layered structure. The distinguished phenomenon in nonlinear ultrasonics is the generation of higher-order harmonic waves during the propagation. Therefore, in order to quantify the nonlinearity, the conventional method measures a parameter defined as the amplitude ratio of a second-order harmonic component and a fundamental frequency component included in the propagated ultrasonic wave signal. However, its application In field inspection is not easy at the present stage because no standard methodology has yet been made to accurately estimate this parameter. Thus, the aim of this paper is to propose an advanced signal processing technique for the precise estimation of a nonlinear ultrasonic parameter, which is based on power spectral and bispectral analysis. The method of estimating power spectrum and bispectrum of the pulse-like ultrasonic wave signal used in the commercial SAM (scanning acoustic microscopy) equipment is especially considered in this study The usefulness of the proposed method Is confirmed by experiments for a Newton ring with a continuous air gap between two glasses and a real semiconductor sample with local delaminations. The results show that the nonlinear parameter obtained tv the proposed method had a good correlation with the delamination.