• Title/Summary/Keyword: second-order accuracy

Search Result 563, Processing Time 0.033 seconds

A Topic Modeling-based Recommender System Considering Changes in User Preferences (고객 선호 변화를 고려한 토픽 모델링 기반 추천 시스템)

  • Kang, So Young;Kim, Jae Kyeong;Choi, Il Young;Kang, Chang Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.43-56
    • /
    • 2020
  • Recommender systems help users make the best choice among various options. Especially, recommender systems play important roles in internet sites as digital information is generated innumerable every second. Many studies on recommender systems have focused on an accurate recommendation. However, there are some problems to overcome in order for the recommendation system to be commercially successful. First, there is a lack of transparency in the recommender system. That is, users cannot know why products are recommended. Second, the recommender system cannot immediately reflect changes in user preferences. That is, although the preference of the user's product changes over time, the recommender system must rebuild the model to reflect the user's preference. Therefore, in this study, we proposed a recommendation methodology using topic modeling and sequential association rule mining to solve these problems from review data. Product reviews provide useful information for recommendations because product reviews include not only rating of the product but also various contents such as user experiences and emotional state. So, reviews imply user preference for the product. So, topic modeling is useful for explaining why items are recommended to users. In addition, sequential association rule mining is useful for identifying changes in user preferences. The proposed methodology is largely divided into two phases. The first phase is to create user profile based on topic modeling. After extracting topics from user reviews on products, user profile on topics is created. The second phase is to recommend products using sequential rules that appear in buying behaviors of users as time passes. The buying behaviors are derived from a change in the topic of each user. A collaborative filtering-based recommendation system was developed as a benchmark system, and we compared the performance of the proposed methodology with that of the collaborative filtering-based recommendation system using Amazon's review dataset. As evaluation metrics, accuracy, recall, precision, and F1 were used. For topic modeling, collapsed Gibbs sampling was conducted. And we extracted 15 topics. Looking at the main topics, topic 1, top 3, topic 4, topic 7, topic 9, topic 13, topic 14 are related to "comedy shows", "high-teen drama series", "crime investigation drama", "horror theme", "British drama", "medical drama", "science fiction drama", respectively. As a result of comparative analysis, the proposed methodology outperformed the collaborative filtering-based recommendation system. From the results, we found that the time just prior to the recommendation was very important for inferring changes in user preference. Therefore, the proposed methodology not only can secure the transparency of the recommender system but also can reflect the user's preferences that change over time. However, the proposed methodology has some limitations. The proposed methodology cannot recommend product elaborately if the number of products included in the topic is large. In addition, the number of sequential patterns is small because the number of topics is too small. Therefore, future research needs to consider these limitations.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Extracting Beginning Boundaries for Efficient Management of Movie Storytelling Contents (스토리텔링 콘텐츠의 효과적인 관리를 위한 영화 스토리 발단부의 자동 경계 추출)

  • Park, Seung-Bo;You, Eun-Soon;Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.279-292
    • /
    • 2011
  • Movie is a representative media that can transmit stories to audiences. Basically, a story is described by characters in the movie. Different from other simple videos, movies deploy narrative structures for explaining various conflicts or collaborations between characters. These narrative structures consist of 3 main acts, which are beginning, middle, and ending. The beginning act includes 1) introduction to main characters and backgrounds, and 2) conflicts implication and clues for incidents. The middle act describes the events developed by both inside and outside factors and the story dramatic tension heighten. Finally, in the end act, the events are developed are resolved, and the topic of story and message of writer are transmitted. When story information is extracted from movie, it is needed to consider that it has different weights by narrative structure. Namely, when some information is extracted, it has a different influence to story deployment depending on where it locates at the beginning, middle and end acts. The beginning act is the part that exposes to audiences for story set-up various information such as setting of characters and depiction of backgrounds. And thus, it is necessary to extract much kind information from the beginning act in order to abstract a movie or retrieve character information. Thereby, this paper proposes a novel method for extracting the beginning boundaries. It is the method that detects a boundary scene between the beginning act and middle using the accumulation graph of characters. The beginning act consists of the scenes that introduce important characters, imply the conflict relationship between them, and suggest clues to resolve troubles. First, a scene that the new important characters don't appear any more should be detected in order to extract a scene completed the introduction of them. The important characters mean the major and minor characters, which can be dealt as important characters since they lead story progression. Extra should be excluded in order to extract a scene completed the introduction of important characters in the accumulation graph of characters. Extra means the characters that appear only several scenes. Second, the inflection point is detected in the accumulation graph of characters. It is the point that the increasing line changes to horizontal line. Namely, when the slope of line keeps zero during long scenes, starting point of this line with zero slope becomes the inflection point. Inflection point will be detected in the accumulation graph of characters without extra. Third, several scenes are considered as additional story progression such as conflicts implication and clues suggestion. Actually, movie story can arrive at a scene located between beginning act and middle when additional several scenes are elapsed after the introduction of important characters. We will decide the ratio of additional scenes for total scenes by experiment in order to detect this scene. The ratio of additional scenes is gained as 7.67% by experiment. It is the story inflection point to change from beginning to middle act when this ratio is added to the inflection point of graph. Our proposed method consists of these three steps. We selected 10 movies for experiment and evaluation. These movies consisted of various genres. By measuring the accuracy of boundary detection experiment, we have shown that the proposed method is more efficient.

A Study on Touchless Finger Vein Recognition Robust to the Alignment and Rotation of Finger (손가락 정렬과 회전에 강인한 비 접촉식 손가락 정맥 인식 연구)

  • Park, Kang-Ryoung;Jang, Young-Kyoon;Kang, Byung-Jun
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.275-284
    • /
    • 2008
  • With increases in recent security requirements, biometric technology such as fingerprints, faces and iris recognitions have been widely used in many applications including door access control, personal authentication for computers, internet banking, automatic teller machines and border-crossing controls. Finger vein recognition uses the unique patterns of finger veins in order to identify individuals at a high level of accuracy. This paper proposes new device and methods for touchless finger vein recognition. This research presents the following five advantages compared to previous works. First, by using a minimal guiding structure for the finger tip, side and the back of finger, we were able to obtain touchless finger vein images without causing much inconvenience to user. Second, by using a hot mirror, which was slanted at the angle of 45 degrees in front of the camera, we were able to reduce the depth of the capturing device. Consequently, it would be possible to use the device in many applications having size limitations such as mobile phones. Third, we used the holistic texture information of the finger veins based on a LBP (Local Binary Pattern) without needing to extract accurate finger vein regions. By using this method, we were able to reduce the effect of non-uniform illumination including shaded and highly saturated areas. Fourth, we enhanced recognition performance by excluding non-finger vein regions. Fifth, when matching the extracted finger vein code with the enrolled one, by using the bit-shift in both the horizontal and vertical directions, we could reduce the authentic variations caused by the translation and rotation of finger. Experimental results showed that the EER (Equal Error Rate) was 0.07423% and the total processing time was 91.4ms.

Improvement of Tentative Korean Standard Differentiation of the Symptoms and Signs for Stroke for Clinical Application (중풍변증표준안 진료기록부 임상적용을 위한 증례기록부와 표준작업지침서의 개선과정)

  • Lee, Min-Goo;Kang, Byeong-Kab;Kim, Bo-Young;Ko, Ho-Yeon;Choi, Sun-Mi;Seol, In-Chan;Jo, Hyun-Kyung;Yun, Jong-Min;Moon, Byung-Soon;Lee, In
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.21 no.1
    • /
    • pp.347-351
    • /
    • 2007
  • This study was done to report the improvement of second case report form(CRF) and standard operating procedure(SOP) of Tentative Korean Standard Differentiation of the Symptoms and Signs for Stroke. We were in charge of developing case report form(CRF) and educating the investigators. In the process of this project, we needed to develop standard operating procedure(SOP) for this CRF. So we made Tentative Korean Standard Differentiation of the Symptoms and Signs for Stroke and tried clinical application at Department of Oriental Internal Medicine of Wonkwang University and Daejeon University in 2005. And in this pilot study we can find out some problems and need to improve it. We strengthen the incision and exclusion criteria of CRF We canceled the Chief complains entry for efficiency. We reflected the decision of Stroke standard committee. We reduced the differentiation index of CRF to promote efficiency and accuracy. We rearranged the order of the differentiation index to promote rationality and practicality. We regulated detail item belonging to Differentiation index. We used a colloquialism in question. We inserted flow chart in SOP. We inserted picture of diagnostic index.

Evaluating Suitable Analysis Methods Using Digital Terrain in Viewshed Analysis (수치지형도를 활용한 가시권 분석의 적정 분석방법에 관한 연구)

  • Yeo, Chang-Hwan;Jang, Young-Jin
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.1
    • /
    • pp.40-48
    • /
    • 2011
  • The purpose of this study is to contribute enhancing the accuracy of viewshed analysis through the explanation for an analysis method of viewshed analysis using GIS. According to previous studies, the visible area using digital terrain in viewshed analysis depends on a visible interest area, scale of terrain, spatial resolution and surface data. In this study, we used trend analysis and RMSE analysis in order to find the effect of a visible interest area, scale of terrain, etc in viewshed analysis. Results of this study are as follows. First, the result of viewshed analysis depends on a visible interest area, scale of terrain, spatial resolution, surface data such as previous studies. Second, the results in forest area are reliable than those of flat area in terms of a visible interest area. Third, the results based on raster grid data are stable than those of TIN(triangulated irregular network) in terms of input surface data. Fourth, according to the result of trend and RMSE analysis, the spatial resolution for analysis is differently applied to different scales digital terrain map in viewshed analysis. In detail, it is desirable that the spatial resolution is set less than 10m(in the case of 1/1,000 digital terrain map), 20m(in the case of 1/5,000 map), 30m(1/25,000 map).

A Case Study on Crime Prediction using Time Series Models (시계열 모형을 이용한 범죄예측 사례연구)

  • Joo, Il-Yeob
    • Korean Security Journal
    • /
    • no.30
    • /
    • pp.139-169
    • /
    • 2012
  • The purpose of this study is to contribute to establishing the scientific policing policies through deriving the time series models that can forecast the occurrence of major crimes such as murder, robbery, burglary, rape, violence and identifying the occurrence of major crimes using the models. In order to achieve this purpose, there were performed the statistical methods such as Generation of Time Series Model(C) for identifying the forecasting models of time series, Generation of Time Series Model(C) and Sequential Chart of Time Series(N) for identifying the accuracy of the forecasting models of time series on the monthly incidence of major crimes from 2002 to 2010 using IBM PASW(SPSS) 19.0. The following is the result of the study. First, murder, robbery, rape, theft and violence crime's forecasting models of time series are Simple Season, Winters Multiplicative, ARIMA(0,1,1)(0,1,1), ARIMA(1,1,0 )(0,1,1) and Simple Season. Second, it is possible to forecast the short-term's occurrence of major crimes such as murder, robbery, burglary, rape, violence using the forecasting models of time series. Based on the result of this study, we have to suggest various forecasting models of time series continuously, and have to concern the long-term forecasting models of time series which is based on the quarterly, yearly incidence of major crimes.

  • PDF

Development of a Method for Calculating the Allowable Storage Capacity of Rivers by Using Drone Images (드론 영상을 이용한 하천의 구간별 허용 저수량 산정 방법 개발)

  • Kim, Han-Gyeol;Kim, Jae-In;Yoon, Sung-Joo;Kim, Taejung
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.2_1
    • /
    • pp.203-211
    • /
    • 2018
  • Dam discharge is carried out for the management of rivers and area around rivers due to rainy season or drought. Dam discharge should be based on an accurate understanding of the flow rate that can be accommodated in the river. Therefore, understanding the allowable storage capacity of river is an important factor in the management of the environment around the river. However, the methods using water level meters and images, which are currently used to determine the allowable flow rate of rivers, show limitations in terms of accuracy and efficiency. In order to solve these problems, this paper proposes a method to automatically calculate the allowable storage capacity of river based on the images taken by drone. In the first step, we create a 3D model of the river by using the drone images. This generation process consists of tiepoint extraction, image orientation, and image matching. In the second step, the allowable storage capacity is calculated by cross section analysis of the river using the generated river 3D model and the road and river layers in the target area. In this step, we determine the maximum water level of the river, extract the cross-sectional profile along the river, and use the 3D model to calculate the allowable storage capacity for the area. To prove our method, we used Bukhan river's data and as a result, the allowable storage volume was automatically extracted. It is expected that the proposed method will be useful for real - time management of rivers and surrounding areas and 3D models using drone.

A Study on Fault Detection Monitoring and Diagnosis System of CNG Stations based on Principal Component Analysis(PCA) (주성분분석(PCA) 기법에 기반한 CNG 충전소의 이상감지 모니터링 및 진단 시스템 연구)

  • Lee, Kijun;Lee, Bong Woo;Choi, Dong-Hwang;Kim, Tae-Ok;Shin, Dongil
    • Journal of the Korean Institute of Gas
    • /
    • v.18 no.3
    • /
    • pp.53-59
    • /
    • 2014
  • In this study, we suggest a system to build the monitoring model for compressed natural gas (CNG) stations, operated in only non-stationary modes, and perform the real-time monitoring and the abnormality diagnosis using principal component analysis (PCA) that is suitable for processing large amounts of multi-dimensional data among multivariate statistical analysis methods. We build the model by the calculation of the new characteristic variables, called as the major components, finding the factors representing the trend of process operation, or a combination of variables among 7 pressure sensor data and 5 temperature sensor data collected from a CNG station at every second. The real-time monitoring is performed reflecting the data of process operation measured in real-time against the built model. As a result of conducting the test of monitoring in order to improve the accuracy of the system and verification, all data in the normal operation were distinguished as normal. The cause of abnormality could be refined, when abnormality was detected successfully, by tracking the variables out of the score plot.

Effect of System Parameters on Target Parameters in Extrusion Cooking of Corn Grit by Twin-Screw Extruder (옥분 압출가공시 이축압출성형기의 System Parameters에 따른 압출물의 특성변화)

  • Kim, Ji-Yong;Kim, Chong-Tai;Kim, Chul-Jin
    • Korean Journal of Food Science and Technology
    • /
    • v.23 no.1
    • /
    • pp.88-92
    • /
    • 1991
  • To analyze the effects of the system parameters on the target parameters, which include the amount of water evaporation, water solubility index(WSI) and water absorption index(WAI), test trials of fractional factorial design of the three process variables at three levels were carried out for corn grit with a laboratory twin-screw extruder with three different screw configurations. The system parameters collected from the trials, such as extrusion temperature, specific mechanical energy input(SME) and mean residence time(RT), were showed the ranges of $129{\sim}182^{\circ}C$, $67{\sim}163\;kwh/ton$ and $12{\sim}34\;sec$, respectively. Within these ranges of the system parameters, the target parameters were able to be quantified by using multiple regression equations. The correlation of results with the system parameters blocked by the screw configuration as dependent variables, yield correlation coefficients above 0.90, and the correlation using the system parameters obtained from whole experiment system as the dependent variables yield correlation coefficients around 0.80. The functional relationship, which can be quantified by second order polynomial regression equation with only two system parameters within necessary degree of accuracy, can he graped in three dimensional surface response and contour diagrams.

  • PDF