• Title/Summary/Keyword: 데이터 추출

Search Result 6,310, Processing Time 0.038 seconds

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

A Study on the Impact of Artificial Intelligence on Decision Making : Focusing on Human-AI Collaboration and Decision-Maker's Personality Trait (인공지능이 의사결정에 미치는 영향에 관한 연구 : 인간과 인공지능의 협업 및 의사결정자의 성격 특성을 중심으로)

  • Lee, JeongSeon;Suh, Bomil;Kwon, YoungOk
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.231-252
    • /
    • 2021
  • Artificial intelligence (AI) is a key technology that will change the future the most. It affects the industry as a whole and daily life in various ways. As data availability increases, artificial intelligence finds an optimal solution and infers/predicts through self-learning. Research and investment related to automation that discovers and solves problems on its own are ongoing continuously. Automation of artificial intelligence has benefits such as cost reduction, minimization of human intervention and the difference of human capability. However, there are side effects, such as limiting the artificial intelligence's autonomy and erroneous results due to algorithmic bias. In the labor market, it raises the fear of job replacement. Prior studies on the utilization of artificial intelligence have shown that individuals do not necessarily use the information (or advice) it provides. Algorithm error is more sensitive than human error; so, people avoid algorithms after seeing errors, which is called "algorithm aversion." Recently, artificial intelligence has begun to be understood from the perspective of the augmentation of human intelligence. We have started to be interested in Human-AI collaboration rather than AI alone without human. A study of 1500 companies in various industries found that human-AI collaboration outperformed AI alone. In the medicine area, pathologist-deep learning collaboration dropped the pathologist cancer diagnosis error rate by 85%. Leading AI companies, such as IBM and Microsoft, are starting to adopt the direction of AI as augmented intelligence. Human-AI collaboration is emphasized in the decision-making process, because artificial intelligence is superior in analysis ability based on information. Intuition is a unique human capability so that human-AI collaboration can make optimal decisions. In an environment where change is getting faster and uncertainty increases, the need for artificial intelligence in decision-making will increase. In addition, active discussions are expected on approaches that utilize artificial intelligence for rational decision-making. This study investigates the impact of artificial intelligence on decision-making focuses on human-AI collaboration and the interaction between the decision maker personal traits and advisor type. The advisors were classified into three types: human, artificial intelligence, and human-AI collaboration. We investigated perceived usefulness of advice and the utilization of advice in decision making and whether the decision-maker's personal traits are influencing factors. Three hundred and eleven adult male and female experimenters conducted a task that predicts the age of faces in photos and the results showed that the advisor type does not directly affect the utilization of advice. The decision-maker utilizes it only when they believed advice can improve prediction performance. In the case of human-AI collaboration, decision-makers higher evaluated the perceived usefulness of advice, regardless of the decision maker's personal traits and the advice was more actively utilized. If the type of advisor was artificial intelligence alone, decision-makers who scored high in conscientiousness, high in extroversion, or low in neuroticism, high evaluated the perceived usefulness of the advice so they utilized advice actively. This study has academic significance in that it focuses on human-AI collaboration that the recent growing interest in artificial intelligence roles. It has expanded the relevant research area by considering the role of artificial intelligence as an advisor of decision-making and judgment research, and in aspects of practical significance, suggested views that companies should consider in order to enhance AI capability. To improve the effectiveness of AI-based systems, companies not only must introduce high-performance systems, but also need employees who properly understand digital information presented by AI, and can add non-digital information to make decisions. Moreover, to increase utilization in AI-based systems, task-oriented competencies, such as analytical skills and information technology capabilities, are important. in addition, it is expected that greater performance will be achieved if employee's personal traits are considered.

The Study of Land Surface Change Detection Using Long-Term SPOT/VEGETATION (장기간 SPOT/VEGETATION 정규화 식생지수를 이용한 지면 변화 탐지 개선에 관한 연구)

  • Yeom, Jong-Min;Han, Kyung-Soo;Kim, In-Hwan
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.4
    • /
    • pp.111-124
    • /
    • 2010
  • To monitor the environment of land surface change is considered as an important research field since those parameters are related with land use, climate change, meteorological study, agriculture modulation, surface energy balance, and surface environment system. For the change detection, many different methods have been presented for distributing more detailed information with various tools from ground based measurement to satellite multi-spectral sensor. Recently, using high resolution satellite data is considered the most efficient way to monitor extensive land environmental system especially for higher spatial and temporal resolution. In this study, we use two different spatial resolution satellites; the one is SPOT/VEGETATION with 1 km spatial resolution to detect coarse resolution of the area change and determine objective threshold. The other is Landsat satellite having high resolution to figure out detailed land environmental change. According to their spatial resolution, they show different observation characteristics such as repeat cycle, and the global coverage. By correlating two kinds of satellites, we can detect land surface change from mid resolution to high resolution. The K-mean clustering algorithm is applied to detect changed area with two different temporal images. When using solar spectral band, there are complicate surface reflectance scattering characteristics which make surface change detection difficult. That effect would be leading serious problems when interpreting surface characteristics. For example, in spite of constant their own surface reflectance value, it could be changed according to solar, and sensor relative observation location. To reduce those affects, in this study, long-term Normalized Difference Vegetation Index (NDVI) with solar spectral channels performed for atmospheric and bi-directional correction from SPOT/VEGETATION data are utilized to offer objective threshold value for detecting land surface change, since that NDVI has less sensitivity for solar geometry than solar channel. The surface change detection based on long-term NDVI shows improved results than when only using Landsat.

Encounters and Acceptable Number of Encounters at the Seoseokdae Trail Section of Mudeungsan National Park (무등산국립공원 서석대 구간의 탐방객 조우수와 허용가능 조우수)

  • Kim, Sang-Mi;Kim, Sang-Oh
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.5
    • /
    • pp.454-465
    • /
    • 2020
  • This study measured the present number of encounters and established the evaluation criterion for the allowable number of encounters in the Seoseokdae summit area (SSA) of Mudeungsan National Park to examine managerial conditions of the number of visitors to the Seoseokdae trail section (STS). Data were obtained from a questionnaire survey of 263 visitors to STS selected through convenient sampling during June 2019. The average number of encounters in SSA was 18.7. Most of the respondents (95.4%) encountered fewer than 30 other visitors. The average maximum number of simultaneous users (AMNSU, measured at 15-minute intervals) in SSA was 13.4 persons (range: 3~31 persons). The AMNSU by the hour was the highest with 21.0 persons at 13-14, followed by 19.8 persons at 11-12, 15.5 persons at 14-15, 15.3 persons at 12-13, 12.3 persons at 10-11, and 10.8 persons at 8-9. Acceptable encounter number (AEN) developed by long-question format (LQF) was 59.2 persons, and that by short-question format (SQF) was 55.1 persons. AEN of the respondents who preferred "near-nature experience" at 27.5 persons was fewer than those who preferred "resort/tourism area like experience" at 46.6 persons. The present number of encounters and AMNUS (range: 3~31 persons) in SSA were fewer than AENs derived from LQF (59.2 persons) and SQF (55.1 persons). Eighty-three percent of the respondents preferred "near-nature experience," while only 10.5% of the respondents preferred "resort/tourism area like experience." 78.4% of the respondents did not perceive that SSA was crowded. The absolute majority of the respondents (92.3%) answered higher personal AEN than the perceived encounter numbers (PEN). The gaps between the personal AEN and the PEN were negatively correlated with perceived crowding.

A Structural Relationship of Topography, Developed Areas, and Riparian Vegetation on the Concentration of Total Nitrogen in Streams (지형, 개발지역, 수변림과 하천 내 총질소 농도와의 구조적 관계 분석)

  • Lee, Sang-Woo;Lee, Jong-Won;Park, Se-Rin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.1
    • /
    • pp.25-34
    • /
    • 2020
  • Land use in watersheds has been shown to be a major driving factor in determining the status of the water quality of streams. In this light, scientists have been investigating the roles of riparian vegetation on the relationships between land use in watersheds and the associated stream water quality. Numerous studies reported that riparian vegetation could alleviate the adverse effects caused by land use in watersheds and on stream water quality through various hydrological, biochemical and ecological mechanisms. However, this concept has been criticized as the true effects of riparian vegetation must be assessed by comprehensive models that mimic real environmental settings. This study aimed to estimate a comprehensive structural equation model integrating topography, land use, and characteristics of riparian vegetation. We used water quality data from the Nakdong River system monitored under the National Aquatic Ecosystem Monitoring Program (NAEMP) of the Korean Ministry of Environment (MOE). Also, riparian vegetation data and land use data were extracted from the Land Use/Land Cover map (LULC) produced by the MOE. The number of structural equation models (SEMs) were estimated in Amos of IBM SPSS. Study results revealed that land use was determined by elevation, and developed areas within a watershed significantly increased the concentration of Total Nitrogen (TN) in streams and LDI in riparian vegetation. On the contrary, developed areas significantly reduced LPI and PLAND. At the same time, PLAND and LDI significantly reduced the concentration of TN in streams. Thus, it was clear that developed areas in watersheds had both a direct and an indirect impact on the concentration of TN in streams, and spatial pattern and the amount of vegetation of riparian vegetation could significantly alleviate the negative impacts of developed areas on TN concentration in streams. To enhance stream water quality, reducing developed areas in a watershed is critical for long-term watershed management plans, restoration patterns for riparian vegetation could be immediately implemented since riparian areas were less developed than most other watersheds.

A Study on the Management of Auto-camping in National Parks through Survey of Visitors to Auto-campground (오토캠핑장 이용실태 분석을 통한 국립공원 내 오토캠핑 관리 방안)

  • Cho, Woo;Sung, Chan Yong
    • Korean Journal of Environment and Ecology
    • /
    • v.30 no.3
    • /
    • pp.406-414
    • /
    • 2016
  • This study examined the factors by which people choose auto-camping as their primary leisure activity after conducting a questionnaire survey of the visitors to the Chiaksan national park Guryong auto-campground. The majority of the visitors were employed (60%) in their 30s and 40s (85%) with relatively high education (88% of which had bachelor's degree or higher) and income levels (87% of which had family income greater than KRW30 million). Most visitors were family group visitors (82%), and for many visitors, auto-camping appeared to be their primary leisure activity as 24% of the respondents said that they visited auto-campground more than 10 times a year. Only 18% of the visitors had auto-camped for longer than 5 years, indicating that auto-camping is a relatively new leisure activity that has become popular in recent times. Factor analysis with 19 items that measured the degree of agreement on the relative advantages of auto-camping extracted four latent factors that affected the selection of auto-camping as a leisure activity: factor 1 (refreshment through contact with nature), factor 2 (novelty and a sense of accomplishment), factor 3 (convenience), and factor 4 (entertainment). Results of regression analysis that examined the effects of the four extracted factors on the visitors' level of satisfaction on auto-camping (measured by the number of visits to auto-campgrounds per year) indicated that 'refreshment through contact with nature' was the most critical factor when deciding to select auto-camping as their leisure activity. 'Novelty and a sense of accomplishment' and 'convenience' were also statistically significant -but to a lesser degree, whereas 'entertainment' did not statistically significantly affect the visitors' decision. These results suggest that, for designing and managing auto-campgrounds, it is more important to preserve the surrounding nature than to provide more facilities for campers' convenience and entertainment.

Development of Sauces Made from Gochujang Using the Quality Function Deployment Method: Focused on U.S. and Chinese Markets (품질기능전개(Quality Function Deployment) 방법을 적용한 고추장 소스 콘셉트 개발: 미국과 중국 시장을 중심으로)

  • Lee, Seul Ki;Kim, A Young;Hong, Sang Pil;Lee, Seung Je;Lee, Min A
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.44 no.9
    • /
    • pp.1388-1398
    • /
    • 2015
  • Quality Function Deployment (QFD) is the most complete and comprehensive method for translating what customers need from a product. This study utilized QFD to develop sauces made from Gochujang and to determine how to fulfill international customers' requirements. A customer survey and expert opinion survey were conducted from May 13 to August 22, 2014 and targeted 220 consumers and 20 experts in the U.S. and China. Finally, a total of 208 (190 consumers and 18 experts) useable data were selected. The top three customer requirements for Gochujang sauces were identified as fresh flavor (4.40), making better flavor (3.99), and cooking availability (3.90). Thirty-three engineering characteristics were developed. The results from the calculation of relative importance of engineering characteristics identified that 'cooking availability', 'free sample and food testing', 'unique concept', and 'development of brand' were the highest. The relative importance of engineering characteristics, correlation, and technical difficulties are ranked, and this result could contribute to the development Korean sauces based on customer needs and engineering characteristics.

Characterization of Anti-Advanced Glycation End Products (AGEs) and Radical Scavenging Constituents from Ainsliaea acerifolia (단풍취의 최종당화산물 생성 저해 및 라디칼 소거 물질의 동정)

  • Jeong, Gyeng Han;Kim, Tae Hoon
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.46 no.6
    • /
    • pp.759-764
    • /
    • 2017
  • Reactive oxygen species (ROS) and advanced glycation end products (AGEs) are valuable therapeutic targets for the regulation of diabetic complications. Activity-guided isolation of the ethylacetate (EtOAc)-soluble portion of 70% ethanolic extract from aerial parts of Ainsliaea acerifolia was performed, followed by AGE formation inhibition assay for the characterization of four dicaffeoylquinic acid derivatives of a previously known structure, methyl 3,5-di-O-caffeoyl-epi-quinate (1), 3,5-di-O-caffeoyl-epi-quinic acid (2), 4,5-di-O-caffeoyl-quinic acid (3), and methyl 4,5-di-O-caffeoyl-quinate (4). The structures of these compounds were confirmed by interpretation of nuclear magnetic resonance (NMR, $^1H-$, $^{13}C-NMR$, two-dimensional NMR) and mass spectroscopic data. Among the isolates, the major secondary metabolites, 3,5-di-O-caffeoyl-epi-quinic acid (2) and 4,5-di-O-caffeoyl-quinic acid (3) showed the most potent inhibitory effects against AGE formation with $IC_{50}$ values of $0.6{\pm}0.1{\mu}M$ and $0.4{\pm}0.1{\mu}M$, respectively. Furthermore, all isolated dicaffeoylquinic acid derivatives were evaluated for their radical scavenging activities using 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid) radical, and compound 3 exhibited the most potent inhibitory effect in a concentration-dependent manner. This result suggests that the caffeoylquinic acid dimers isolated from A. acerifolia might be beneficial for the prevention of diabetic complications and related diseases.