• Title/Summary/Keyword: 온라인 분석 처리

Search Result 466, Processing Time 0.022 seconds

Curvature stroke modeling for the recognition of on-line cursive korean characters (온라인 흘림체 한글 인식을 위한 곡률획 모델링 기법)

  • 전병환;김무영;김창수;박강령;김재희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.11
    • /
    • pp.140-149
    • /
    • 1996
  • Cursive characters are written on an economical principle to reduce the motion of a pen in the limit of distinction between characters. That is, the pen is not lifted up to move for writing a next stroke, the pen is not moved at all, or connected two strokes chance their shapes to a similar and simple shape which is easy to be written. For these reasons, strokes and korean alphabets are not only easy to be changed, but also difficult to be splitted. In this paper, we propose a curvature stroke modeling method for splitting and matching by using a structural primitive. A curvature stroke is defined as a substroke which does not change its curvanture. Input strokes handwritten in a cursive style are splitted into a sequence of curvature strokes by segmenting the points which change the direction of rotation, which occur a sudden change of direction, and which occur an excessive rotation Each reference of korean alphabets is handwritten in a printed style and is saved as a sequence of curvature strikes which is generated by splitting process. And merging process is used to generate various sequences of curvature strikes for matching. Here, it is also considered that imaginary strokes can be written or omitted. By using a curvature stroke as a unit of recognition, redundant splitting points in input characters are effectively reduced and exact matching is possible by generating a reference curvature stroke, which consists of the parts of adjacent two korean alphasbets, even when the connecting points between korean alphabets are not splitted. The results showed 83.6% as recognition rate of the first candidate and 0.99sec./character (CPU clock:66MHz) as processing time.

  • PDF

Monitoring of Pesticide Residues on Herbs and Spices in the Incheon Metropolitan Area (인천 지역에 유통 중인 향신식물 및 향신료가공품 잔류농약 안전성 조사)

  • Yeo, Eun-young;Jung, Seung-Hye;Jang, Jin-Seob;Kwon, Sung-Hee;Park, Byung-Kyu;Lee, Soo-Yeon;Park, Jeong-Eun;Seo, Soon-Jae;Kim, Jung-Im;Kim, Meyong-Hee;Joo, Kwang-Sig;Hur, Myung-Je
    • Journal of Food Hygiene and Safety
    • /
    • v.36 no.1
    • /
    • pp.60-67
    • /
    • 2021
  • In this study we investigated pesticide residues on herbs and spices disrtibuted in the Incheon Metropolitan area. A total of 112 samples were purchased from off-line and on-line markets from January to October 2020. In accordance with the implementation of the Positive List System (PLS), the proper usage of pesticides is now being enforced. It is assumed that unregistered pesticides are being used on herbs and spices due to the low number of registered pesticides in the agricultural industry. Pesticide residue levels were not detected in 99 samples but 11 kinds of pesticides in 6 samples (13 times) exceeded the MRLs. The pesticides that were used in accordance with the PLS were Diazinon, Diethofencarb and Pyridalyl. However, unregistered pesticides were on the herbs and spices. Therefore, it is necessary to educate producers of herbs and spices on the appropriate use of pesticides. It is also necessary to establish MRLs on herbs and spices.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

A Hybrid Recommender System based on Collaborative Filtering with Selective Use of Overall and Multicriteria Ratings (종합 평점과 다기준 평점을 선택적으로 활용하는 협업필터링 기반 하이브리드 추천 시스템)

  • Ku, Min Jung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.85-109
    • /
    • 2018
  • Recommender system recommends the items expected to be purchased by a customer in the future according to his or her previous purchase behaviors. It has been served as a tool for realizing one-to-one personalization for an e-commerce service company. Traditional recommender systems, especially the recommender systems based on collaborative filtering (CF), which is the most popular recommendation algorithm in both academy and industry, are designed to generate the items list for recommendation by using 'overall rating' - a single criterion. However, it has critical limitations in understanding the customers' preferences in detail. Recently, to mitigate these limitations, some leading e-commerce companies have begun to get feedback from their customers in a form of 'multicritera ratings'. Multicriteria ratings enable the companies to understand their customers' preferences from the multidimensional viewpoints. Moreover, it is easy to handle and analyze the multidimensional ratings because they are quantitative. But, the recommendation using multicritera ratings also has limitation that it may omit detail information on a user's preference because it only considers three-to-five predetermined criteria in most cases. Under this background, this study proposes a novel hybrid recommendation system, which selectively uses the results from 'traditional CF' and 'CF using multicriteria ratings'. Our proposed system is based on the premise that some people have holistic preference scheme, whereas others have composite preference scheme. Thus, our system is designed to use traditional CF using overall rating for the users with holistic preference, and to use CF using multicriteria ratings for the users with composite preference. To validate the usefulness of the proposed system, we applied it to a real-world dataset regarding the recommendation for POI (point-of-interests). Providing personalized POI recommendation is getting more attentions as the popularity of the location-based services such as Yelp and Foursquare increases. The dataset was collected from university students via a Web-based online survey system. Using the survey system, we collected the overall ratings as well as the ratings for each criterion for 48 POIs that are located near K university in Seoul, South Korea. The criteria include 'food or taste', 'price' and 'service or mood'. As a result, we obtain 2,878 valid ratings from 112 users. Among 48 items, 38 items (80%) are used as training dataset, and the remaining 10 items (20%) are used as validation dataset. To examine the effectiveness of the proposed system (i.e. hybrid selective model), we compared its performance to the performances of two comparison models - the traditional CF and the CF with multicriteria ratings. The performances of recommender systems were evaluated by using two metrics - average MAE(mean absolute error) and precision-in-top-N. Precision-in-top-N represents the percentage of truly high overall ratings among those that the model predicted would be the N most relevant items for each user. The experimental system was developed using Microsoft Visual Basic for Applications (VBA). The experimental results showed that our proposed system (avg. MAE = 0.584) outperformed traditional CF (avg. MAE = 0.591) as well as multicriteria CF (avg. AVE = 0.608). We also found that multicriteria CF showed worse performance compared to traditional CF in our data set, which is contradictory to the results in the most previous studies. This result supports the premise of our study that people have two different types of preference schemes - holistic and composite. Besides MAE, the proposed system outperformed all the comparison models in precision-in-top-3, precision-in-top-5, and precision-in-top-7. The results from the paired samples t-test presented that our proposed system outperformed traditional CF with 10% statistical significance level, and multicriteria CF with 1% statistical significance level from the perspective of average MAE. The proposed system sheds light on how to understand and utilize user's preference schemes in recommender systems domain.

Participant Characteristic and Educational Effects for Cyber Agricultural Technology Training Courses (사이버농업기술교육 참가자의 특성과 교육효과)

  • Kang, Dae-Koo
    • Journal of Agricultural Extension & Community Development
    • /
    • v.21 no.1
    • /
    • pp.35-82
    • /
    • 2014
  • It was main objectives to find the learners characteristics and educational effects of cyber agricultural technology courses in RDA. For the research, it was followed by literature reviews and internet based survey methods. In internet based survey, two staged stratified sampling method was adopted from cyber training members database in RDA along with some key word as open course or certificate course, and enrollment years. Instrument was composed through literature reviews about cyber education effects and educational effect factors. And learner characteristics items were added in survey documents. It was sent to sampled persons by e-mail and 316 data was returned via google survey systems. Through the data cleaning, 303 data were analysed by chi-square, t-test and F-test. It's significance level was .05. The results of the research were as followed; First, the respondent was composed of mainly man(77.9%), and monthly income group was mainly 2,000,000 or 3,000,000 won(24%), bachelor degree(48%), fifty or forty age group was shared to 75%, and their job was changed after learning(12.2%). So major respondents' job was not changed. Their major was not mainly agriculture. Learners' learning style were composed of two or more types as concrete-sequential, mixing, abstract-random, so e-learning course should be developed for the students' type. Second, it was attended at 3.2 days a week, 53.53 minutes a class, totally 172.63 minutes a week. They were very eager or generally eager to study, and attended two or more subjects. The cyber education motives was for farming knowledge, personal competency development, job performance enlarging. They selected subjects along with their interest. A subject person couldn't choose more subjects for little time, others, non interesting subject, but more subject persons were for job performance benefits and previous subjects effectiveness. Most learner was finished their subject, but a fourth was not finished for busy (26.7%). And their entrying behavior was not enough to learn e-course and computer or internet using ability was middle level as software using. And they thought RDA cyber course was comfort in non time or space limit, knowledge acquisition, and personal competency development. Cyber learning group was composed of open course only (12.5%), certificate only(25.7%), both(36.3%). Third, satisfaction and academic achievement of e-learning learners were good, and educational service offering for doing job in learning application category was good, but effect of cyber education was not good, especially, agricultural income increasing was not good because major learner group was not farmer, so they couldn't apply their knowledge to farming. And content structure and design, content comprehension, content amount were good. The more learning subject group responded to good in effects, and both open course and certificate course group satisfied more than open course only group. Based on the results, recommendation was offered as cyber course specialization before main course in RDA training system, support staff and faculty enlargement, building blended learning system with local RDA office, introducing cyber tutor system.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.