• Title/Summary/Keyword: Information Criterion

Search Result 1,502, Processing Time 0.024 seconds

Establishing meteorological drought severity considering the level of emergency water supply (비상급수의 규모를 고려한 기상학적 가뭄 강도 수립)

  • Lee, Seungmin;Wang, Wonjoon;Kim, Donghyun;Han, Heechan;Kim, Soojun;Kim, Hung Soo
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.10
    • /
    • pp.619-629
    • /
    • 2023
  • Recent intensification of climate change has led to an increase in damages caused by droughts. Currently, in Korea, the Standardized Precipitation Index (SPI) is used as a criterion to classify the intensity of droughts. Based on the accumulated precipitation over the past six months (SPI-6), meteorological drought intensities are classified into four categories: concern, caution, alert, and severe. However, there is a limitation in classifying drought intensity solely based on precipitation. To overcome the limitations of the meteorological drought warning criteria based on SPI, this study collected emergency water supply damage data from the National Drought Information Portal (NDIP) to classify drought intensity. Factors of SPI, such as precipitation, and factors used to calculate evapotranspiration, such as temperature and humidity, were indexed using min-max normalization. Coefficients for each factor were determined based on the Genetic Algorithm (GA). The drought intensity based on emergency water supply was used as the dependent variable, and the coefficients of each meteorological factor determined by GA were used as coefficients to derive a new Drought Severity Classification Index (DSCI). After deriving the DSCI, cumulative distribution functions were used to present intensity stage classification boundaries. It is anticipated that using the proposed DSCI in this study will allow for more accurate drought intensity classification than the traditional SPI, supporting decision-making for disaster management personnel.

A Hybrid Recommender System based on Collaborative Filtering with Selective Use of Overall and Multicriteria Ratings (종합 평점과 다기준 평점을 선택적으로 활용하는 협업필터링 기반 하이브리드 추천 시스템)

  • Ku, Min Jung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.85-109
    • /
    • 2018
  • Recommender system recommends the items expected to be purchased by a customer in the future according to his or her previous purchase behaviors. It has been served as a tool for realizing one-to-one personalization for an e-commerce service company. Traditional recommender systems, especially the recommender systems based on collaborative filtering (CF), which is the most popular recommendation algorithm in both academy and industry, are designed to generate the items list for recommendation by using 'overall rating' - a single criterion. However, it has critical limitations in understanding the customers' preferences in detail. Recently, to mitigate these limitations, some leading e-commerce companies have begun to get feedback from their customers in a form of 'multicritera ratings'. Multicriteria ratings enable the companies to understand their customers' preferences from the multidimensional viewpoints. Moreover, it is easy to handle and analyze the multidimensional ratings because they are quantitative. But, the recommendation using multicritera ratings also has limitation that it may omit detail information on a user's preference because it only considers three-to-five predetermined criteria in most cases. Under this background, this study proposes a novel hybrid recommendation system, which selectively uses the results from 'traditional CF' and 'CF using multicriteria ratings'. Our proposed system is based on the premise that some people have holistic preference scheme, whereas others have composite preference scheme. Thus, our system is designed to use traditional CF using overall rating for the users with holistic preference, and to use CF using multicriteria ratings for the users with composite preference. To validate the usefulness of the proposed system, we applied it to a real-world dataset regarding the recommendation for POI (point-of-interests). Providing personalized POI recommendation is getting more attentions as the popularity of the location-based services such as Yelp and Foursquare increases. The dataset was collected from university students via a Web-based online survey system. Using the survey system, we collected the overall ratings as well as the ratings for each criterion for 48 POIs that are located near K university in Seoul, South Korea. The criteria include 'food or taste', 'price' and 'service or mood'. As a result, we obtain 2,878 valid ratings from 112 users. Among 48 items, 38 items (80%) are used as training dataset, and the remaining 10 items (20%) are used as validation dataset. To examine the effectiveness of the proposed system (i.e. hybrid selective model), we compared its performance to the performances of two comparison models - the traditional CF and the CF with multicriteria ratings. The performances of recommender systems were evaluated by using two metrics - average MAE(mean absolute error) and precision-in-top-N. Precision-in-top-N represents the percentage of truly high overall ratings among those that the model predicted would be the N most relevant items for each user. The experimental system was developed using Microsoft Visual Basic for Applications (VBA). The experimental results showed that our proposed system (avg. MAE = 0.584) outperformed traditional CF (avg. MAE = 0.591) as well as multicriteria CF (avg. AVE = 0.608). We also found that multicriteria CF showed worse performance compared to traditional CF in our data set, which is contradictory to the results in the most previous studies. This result supports the premise of our study that people have two different types of preference schemes - holistic and composite. Besides MAE, the proposed system outperformed all the comparison models in precision-in-top-3, precision-in-top-5, and precision-in-top-7. The results from the paired samples t-test presented that our proposed system outperformed traditional CF with 10% statistical significance level, and multicriteria CF with 1% statistical significance level from the perspective of average MAE. The proposed system sheds light on how to understand and utilize user's preference schemes in recommender systems domain.

Development and Evaluation of Validity of Dish Frequency Questionnaire (DFQ) and Short DFQ Using Na Index for Estimation of Habitual Sodium Intake (나트륨 섭취량 추정을 위한 음식섭취빈도조사지와 Na Index를 이용한 간이음식섭취빈도조사지의 개발 및 타당성 검증에 관한 연구)

  • Son, Sook-Mee;Huh, Gwui-Yeop;Lee, Hong-Sup
    • Korean Journal of Community Nutrition
    • /
    • v.10 no.5
    • /
    • pp.677-692
    • /
    • 2005
  • The assessment of sodium intake is complex because of the variety and nature of dietary sodium. This study intended to develop a dish frequency questionnaire (DFQ) for estimating the habitual sodium intake and a short DFQ for screening subjects with high or low sodium intake. For DFQ112, one hundred and twelve dish items were selected based on the information of sodium content of the one serving size and consumption frequency. Frequency of consumption was determined through nine categories ranging from more than 3 times a day to almost never to indicate how often the specified amount of each food item was consumed during the past 6 months. One hundred seventy one adults (male: 78, female: 93) who visited hypertension or health examination clinic participated in the validation study. DFQ55 was developed from DFQ112 by omitting the food items not frequently consumed, selecting the dish items that showed higher sodium content per one portion size and higher consumption frequency. To develop a short DFQs for classifying subjects with low or high sodium intakes, the weighed score according to the sodium content of one protion size was given to each dish item of DFQ25 or DFQ14 and multiplied with the consumption frequency score. A sum index of all the dish items was formed and called sodium index (Na index). For validation study the DFQ112, 2-day diet record and one 24-hour urine collection were analyzed to estimate sodium intakes. The sodium intakes estimated with DFQ112 and 24-h urine analysis showed $65\%$ agreement to be classified into the same quartile and showed significant correlation (r=0.563 p<0.05). However, the actual amount of sodium intake estimated with DFQ112 (male: 6221.9mg, female: 6127.6mg) showed substantial difference with that of 24-h urine analysis (male: 4556.9mg, female: 5107.4mg). The sodium intake estimated with DFQ55 (male: 4848.5mg, female: 4884.3mg) showed small difference from that estimated with 24-h urine analysis, higher proportion to be classfied into the same quartile and higher correlation with the sodium intakes estimated with 24-h urine analysis and systolic blood pressure. It seems DFQ55 can be used as a tool for quantitative estimation of sodium intake. Na index25 or Na index14 showed $39\~50\%$ agreement to be classified into the same quartile, substantial correlations with the sodium intake estimated with DFQ55 and significant correlations with the sodium intake estimated with 24-h urine analysis. When point 119 for Na index25 was used as a criterion of low sodium intake, sensitivity, specificity and positive predictive value was $62.5\%,\;81.8\%\;and\;53.2\%$, respectively. When point 102 for Na index14 was used as a criterion of high sodium intake, sensitivity, specificity and positive predictive value were $73.8\%,\;84.0\%,\;62.0\%$, respectively. It seems the short DFQs using Na index 14 or Na index25 are simple, easy and proper instruments to classify the low or high sodium intake group.

A Qualitative Study on Facilitating Factors of User-Created Contents: Based on Theories of Folklore (사용자 제작 콘텐츠의 활성화 요인에 대한 정성적 연구: 구비문학 이론을 중심으로)

  • Jung, Seung-Ki;Lee, Ki-Ho;Lee, In-Seong;Kim, Jin-Woo
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.43-72
    • /
    • 2009
  • Recently, user-created content (UCC) have emerged as popular medium of on-line participation among users. The Internet environment has been constantly evolving, attracting active participation and information sharing among common users. This tendency is a significant deviation from the earlier Internet use as an one-way information channel through which users passively received information or contents from contents providers. Thanks to UCCs online users can now more freely generate and exchange contents; therefore, identifying the critical factors that affect content-generating activities has increasingly become an important issue. This paper proposes a set of critical factors for stimulating contents generation and sharing activities by Internet users. These factors were derived from the theories of folklores such as tales and songs. Based on some shared traits of folklores and UCC content, we found four critical elements which should be heeded in constructing UCC contents, which are: context of culture, context of situation, skill of generator, and response of audience. In addition, we selected three major UCC websites: a specialized contents portal, a general internet portal, and an official contents service site, They have different use environments, user interfaces, and service policies, To identify critical factors for generating, sharing and transferring UCC, we traced user activities, interactions and flows of content in the three UCC websites. Moreover, we conducted extensive interviews with users and operators as well as policy makers in each site. Based on qualitative and quantitative analyses of the data, this research identifies nine critical factors that facilitate contents generation and sharing activities among users. In the context of culture, we suggest voluntary community norms, proactive use of copyrights, strong user relationships, and a fair monetary reward system as critical elements in facilitating the process of contents generation and sharing activities. Norms which were established by users themselves regulate user behavior and influence content format. Strong relationships of users stimulate content generation activities by enhancing collaborative content generation. Particularly, users generate contents through collaboration with others, based on their enhanced relationship and specialized skills. They send and receive contents by leaving messages on website or blogs, using instant messenger or SMS. It is an interesting and important phenomenon, because the quality of contents can be constantly improved and revised, depending on the specialized abilities of those engaged in a particular content. In this process, the reward system is an essential driving factor. Yet, monetary reward should be considered only after some fair criterion is established. In terms of the context of the situation, the quality of contents uploading system was proposed to have strong influence on the content generating activities. Among other influential factors on contents generation activities are generators' specialized skills and involvement of the users were proposed. In addition, the audience response, especially effective development of shared interests as well as feedback, was suggested to have significant influence on contents generation activities. Content generators usually reflect the shared interest of others. Shared interest is a distinct characteristic of UCC and observed in all the three websites, in which common interest is formed by the "threads" embedded with content. Through such threads of information and contents users discuss and share ideas while continuously extending and updating shared contents in the process. Evidently, UCC is a new paradigm representing the next generation of the Internet. In order to fully utilize this innovative paradigm, we need to understand how users take advantage of this medium in generating contents, and what affects their content generation activities. Based on these findings, UCC service providers should design their websites as common playground where users freely interact and share their common interests. As such this paper makes an important first step to gaining better understand about this new communication paradigm created by UCC.

Underpricing of Initial Offerings and the Efficiency of Investments (신주(新株)의 저가상장현상(低價上場現象)과 투자(投資)의 효율성(效率成)에 대한 연구(硏究))

  • Nam, Il-chong
    • KDI Journal of Economic Policy
    • /
    • v.12 no.2
    • /
    • pp.95-120
    • /
    • 1990
  • The underpricing of new shares of a firm that are offered to the public for the first time (initial offerings) is well known and has puzzled financial economists for a long time since it seems at odds with the optimal behavior of the owners of issuing firms. Past attempts by financial economists to explain this phenomenon have not been successful in the sense that the explanations given by them are either inconsistent with the equilibrium theory or implausible. Approaches by such authors as Welch or Allen and Faulhaber are no exceptions. In this paper, we develop a signalling model of capital investment to explain the underpricing phenomenon and also analyze the efficiency of investment. The model focuses on the information asymmetry between the owners of issuing firms and general investors. We consider a firm that has been owned and operated by a single owner and that has a profitable project but has no capital to develop it. The profit from the project depends on the capital invested in the project as well as a profitability parameter. The model also assumes that the financial market is represented by a single investor who maximizes the expected wealth. The owner has superior information as to the value of the firm to investors in the sense that it knows the true value of the parameter while investors have only a probability distribution about the parameter. The owner offers the representative investor a fraction of the ownership of the firm in return for a certain amount of investment in the firm. This offer condition is equivalent to the usual offer condition consisting of the number of issues to sell and the unit price of a share. Thus, the model is a signalling game. Using Kreps' criterion as the solution concept, we obtained an essentially unique separating equilibrium offer condition. Analysis of this separating equilibrium shows that the owner of the firm with high profitability chooses an offer condition that raises an amount of capital that is short of the amount that maximizes the potential profit from the project. It also reveals that the fraction of the ownership of the firm that the representative investor receives from the owner of the highly profitable firm in return for its investment has a value that exceeds the investment. In other words, the initial offering in the model is underpriced when the profitability of the firm is high. The source of underpricing and underinvestment is the signalling activity by the owner of the highly profitable firm who attempts to convince investors that his firm has a highly profitable project by choosing an offer condition that cannot be imitated by the owner of a firm with low profitability. Thus, we obtained two main results. First, underpricing is a result of a signalling activity by the owner of a firm with high profitability when there exists information asymmetry between the owner of the issuing firm and investors. Second, such information asymmetry also leads to underinvestment in a highly profitable project. Those results clearly show the underpricing entails underinvestment and that information asymmetry leads to a social cost as well as a private cost. The above results are quite general in the sense that they are based upon a neoclassical profit function and full rationality of economic agents. We believe that the results of this paper can be used as a basis for further research on the capital investment process. For instance, one can view the results of this paper as a subgame equilibrium in a larger game in which a firm chooses among diverse ways to raise capital. In addition, the method used in this paper can be used in analyzing a wide range of problems arising from information asymmetry that the Korean financial market faces.

  • PDF

Managing Duplicate Memberships of Websites : An Approach of Social Network Analysis (웹사이트 중복회원 관리 : 소셜 네트워크 분석 접근)

  • Kang, Eun-Young;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.153-169
    • /
    • 2011
  • Today using Internet environment is considered absolutely essential for establishing corporate marketing strategy. Companies have promoted their products and services through various ways of on-line marketing activities such as providing gifts and points to customers in exchange for participating in events, which is based on customers' membership data. Since companies can use these membership data to enhance their marketing efforts through various data analysis, appropriate website membership management may play an important role in increasing the effectiveness of on-line marketing campaign. Despite the growing interests in proper membership management, however, there have been difficulties in identifying inappropriate members who can weaken on-line marketing effectiveness. In on-line environment, customers tend to not reveal themselves clearly compared to off-line market. Customers who have malicious intent are able to create duplicate IDs by using others' names illegally or faking login information during joining membership. Since the duplicate members are likely to intercept gifts and points that should be sent to appropriate customers who deserve them, this can result in ineffective marketing efforts. Considering that the number of website members and its related marketing costs are significantly increasing, it is necessary for companies to find efficient ways to screen and exclude unfavorable troublemakers who are duplicate members. With this motivation, this study proposes an approach for managing duplicate membership based on the social network analysis and verifies its effectiveness using membership data gathered from real websites. A social network is a social structure made up of actors called nodes, which are tied by one or more specific types of interdependency. Social networks represent the relationship between the nodes and show the direction and strength of the relationship. Various analytical techniques have been proposed based on the social relationships, such as centrality analysis, structural holes analysis, structural equivalents analysis, and so on. Component analysis, one of the social network analysis techniques, deals with the sub-networks that form meaningful information in the group connection. We propose a method for managing duplicate memberships using component analysis. The procedure is as follows. First step is to identify membership attributes that will be used for analyzing relationship patterns among memberships. Membership attributes include ID, telephone number, address, posting time, IP address, and so on. Second step is to compose social matrices based on the identified membership attributes and aggregate the values of each social matrix into a combined social matrix. The combined social matrix represents how strong pairs of nodes are connected together. When a pair of nodes is strongly connected, we expect that those nodes are likely to be duplicate memberships. The combined social matrix is transformed into a binary matrix with '0' or '1' of cell values using a relationship criterion that determines whether the membership is duplicate or not. Third step is to conduct a component analysis for the combined social matrix in order to identify component nodes and isolated nodes. Fourth, identify the number of real memberships and calculate the reliability of website membership based on the component analysis results. The proposed procedure was applied to three real websites operated by a pharmaceutical company. The empirical results showed that the proposed method was superior to the traditional database approach using simple address comparison. In conclusion, this study is expected to shed some light on how social network analysis can enhance a reliable on-line marketing performance by efficiently and effectively identifying duplicate memberships of websites.

Applying Meta-model Formalization of Part-Whole Relationship to UML: Experiment on Classification of Aggregation and Composition (UML의 부분-전체 관계에 대한 메타모델 형식화 이론의 적용: 집합연관 및 복합연관 판별 실험)

  • Kim, Taekyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.99-118
    • /
    • 2015
  • Object-oriented programming languages have been widely selected for developing modern information systems. The use of concepts relating to object-oriented (OO, in short) programming has reduced efforts of reusing pre-existing codes, and the OO concepts have been proved to be a useful in interpreting system requirements. In line with this, we have witnessed that a modern conceptual modeling approach supports features of object-oriented programming. Unified Modeling Language or UML becomes one of de-facto standards for information system designers since the language provides a set of visual diagrams, comprehensive frameworks and flexible expressions. In a modeling process, UML users need to consider relationships between classes. Based on an explicit and clear representation of classes, the conceptual model from UML garners necessarily attributes and methods for guiding software engineers. Especially, identifying an association between a class of part and a class of whole is included in the standard grammar of UML. The representation of part-whole relationship is natural in a real world domain since many physical objects are perceived as part-whole relationship. In addition, even abstract concepts such as roles are easily identified by part-whole perception. It seems that a representation of part-whole in UML is reasonable and useful. However, it should be admitted that the use of UML is limited due to the lack of practical guidelines on how to identify a part-whole relationship and how to classify it into an aggregate- or a composite-association. Research efforts on developing the procedure knowledge is meaningful and timely in that misleading perception to part-whole relationship is hard to be filtered out in an initial conceptual modeling thus resulting in deterioration of system usability. The current method on identifying and classifying part-whole relationships is mainly counting on linguistic expression. This simple approach is rooted in the idea that a phrase of representing has-a constructs a par-whole perception between objects. If the relationship is strong, the association is classified as a composite association of part-whole relationship. In other cases, the relationship is an aggregate association. Admittedly, linguistic expressions contain clues for part-whole relationships; therefore, the approach is reasonable and cost-effective in general. Nevertheless, it does not cover concerns on accuracy and theoretical legitimacy. Research efforts on developing guidelines for part-whole identification and classification has not been accumulated sufficient achievements to solve this issue. The purpose of this study is to provide step-by-step guidelines for identifying and classifying part-whole relationships in the context of UML use. Based on the theoretical work on Meta-model Formalization, self-check forms that help conceptual modelers work on part-whole classes are developed. To evaluate the performance of suggested idea, an experiment approach was adopted. The findings show that UML users obtain better results with the guidelines based on Meta-model Formalization compared to a natural language classification scheme conventionally recommended by UML theorists. This study contributed to the stream of research effort about part-whole relationships by extending applicability of Meta-model Formalization. Compared to traditional approaches that target to establish criterion for evaluating a result of conceptual modeling, this study expands the scope to a process of modeling. Traditional theories on evaluation of part-whole relationship in the context of conceptual modeling aim to rule out incomplete or wrong representations. It is posed that qualification is still important; but, the lack of consideration on providing a practical alternative may reduce appropriateness of posterior inspection for modelers who want to reduce errors or misperceptions about part-whole identification and classification. The findings of this study can be further developed by introducing more comprehensive variables and real-world settings. In addition, it is highly recommended to replicate and extend the suggested idea of utilizing Meta-model formalization by creating different alternative forms of guidelines including plugins for integrated development environments.

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

Detection Ability of Occlusion Object in Deep Learning Algorithm depending on Image Qualities (영상품질별 학습기반 알고리즘 폐색영역 객체 검출 능력 분석)

  • LEE, Jeong-Min;HAM, Geon-Woo;BAE, Kyoung-Ho;PARK, Hong-Ki
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.3
    • /
    • pp.82-98
    • /
    • 2019
  • The importance of spatial information is rapidly rising. In particular, 3D spatial information construction and modeling for Real World Objects, such as smart cities and digital twins, has become an important core technology. The constructed 3D spatial information is used in various fields such as land management, landscape analysis, environment and welfare service. Three-dimensional modeling with image has the hig visibility and reality of objects by generating texturing. However, some texturing might have occlusion area inevitably generated due to physical deposits such as roadside trees, adjacent objects, vehicles, banners, etc. at the time of acquiring image Such occlusion area is a major cause of the deterioration of reality and accuracy of the constructed 3D modeling. Various studies have been conducted to solve the occlusion area. Recently the researches of deep learning algorithm have been conducted for detecting and resolving the occlusion area. For deep learning algorithm, sufficient training data is required, and the collected training data quality directly affects the performance and the result of the deep learning. Therefore, this study analyzed the ability of detecting the occlusion area of the image using various image quality to verify the performance and the result of deep learning according to the quality of the learning data. An image containing an object that causes occlusion is generated for each artificial and quantified image quality and applied to the implemented deep learning algorithm. The study found that the image quality for adjusting brightness was lower at 0.56 detection ratio for brighter images and that the image quality for pixel size and artificial noise control decreased rapidly from images adjusted from the main image to the middle level. In the F-measure performance evaluation method, the change in noise-controlled image resolution was the highest at 0.53 points. The ability to detect occlusion zones by image quality will be used as a valuable criterion for actual application of deep learning in the future. In the acquiring image, it is expected to contribute a lot to the practical application of deep learning by providing a certain level of image acquisition.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.