• Title/Summary/Keyword: 모델검증시스템

Search Result 2,770, Processing Time 0.034 seconds

A Study on the Application of Suitable Urban Regeneration Project Types Reflecting the Spatial Characteristics of Urban Declining Areas (도시 쇠퇴지역 공간 특성을 반영한 적합 도시재생 사업유형 적용방안 연구)

  • CHO, Don-Cherl;SHIN, Dong-Bin
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.148-163
    • /
    • 2021
  • The diversification of the New Deal urban regeneration projects, that started in 2017 in accordance with the "Special Act on Urban Regeneration Activation and Support", generated the increased demand for the accuracy of data-driven diagnosis and project type forecast. Thus, this research was conducted to develop an application model able to identify the most appropriate New Deal project type for "eup", "myeon" and "dong" across the country. Data for application model development were collected through Statistical geographic information service(SGIS) and the 'Urban Regeneration Comprehensive Information Open System' of the Urban Regeneration Information System, and data for the analysis model was constructed through data pre-processing. Four models were derived and simulations were performed through polynomial regression analysis and multinomial logistic regression analysis for the application of the appropriate New Deal project type. I verified the applicability and validity of the four models by the comparative analysis of spatial distribution of the previously selected New Deal projects by targeting the sites located in Seoul by each model and the result showed that the DI-54 model had the highest concordance rate.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

A study on perceived interactivity of Dance video contents and intention to use: Focused on YouTube (무용영상콘텐츠의 정보서비스 이용에 대한 상호작용성 인식과 이용지속의도에 관한 연구- 유투브를 중심으로)

  • Jung, Sae-Bom;Won, Do-Yeon;Jang, Young-Jin
    • 한국체육학회지인문사회과학편
    • /
    • v.55 no.3
    • /
    • pp.349-363
    • /
    • 2016
  • The purpose of this study was to perceived interactivity about experience of using dance video contents and intention to use. This study was examine an adaptive model of Technology Acceptance Model and focused on YouTube. In order to achieve the purpose of this study, total 350 questionnaires were surveyed and 311 data were finally analyzed. All data were processed through SPSS for Windows 20.0 version and AMOS 18.0. For the analysis of data, frequency analysis, reliability analysis, confirmatory factor analysis, correlation analysis, model evaluation, and structural equation modeling techniques. The results were as follows. first. Perceived Interactivity had not affect on perceived usefulness but two-way communication, user control among the sub-factor in perceived Interactivity had a positive effect on perceived ease of use. second, perceived ease of use had a positive effect on perceived usefulness. Lastly, Perceived usefulness had not affect intention to use, but perceived ease of use had a positive effect on intention use.

Consumers' Attitude toward Complaining: A Cross-Cultural Comparison of its Traits Predictors (소비자 불평토로성향에 대한 성격특성 예측변수: 한·미 비교문화적 접근)

  • Park, Sojin;John C. Mowen
    • Asia Marketing Journal
    • /
    • v.11 no.1
    • /
    • pp.1-27
    • /
    • 2009
  • The research compared the motivational network of traits predictive of complaint attitudes across consumers in the U.S. and South Korean cultures. Overall, the results revealed a similar pattern of traits predictive of complaint attitudes in the two cultures. The traits of value consciousness, general self-efficacy, emotional instability, and the need for material resources were positively related to attitudes toward complaining. In contrast, conscientiousness was negatively related to complaint attitudes. The only trait predictor of complaining attitude that was significantly different between the Korean and U.S. samples was shopping enjoyment. It was negatively related to complaining attitude in the U.S. sample but unrelated to complaining attitude in the Korean sample. Understanding the personality traits predictive of complaint attitudes has the potential to help marketers develop messages that will encourage the low complaint prone to voice their dissatisfaction. This is important, because when a consumer complains about and unsatisfactory purchase, it gives the firm a chance to take actions to avoid losing a customer.

  • PDF

Design of Digital Phase-locked Loop based on Two-layer Frobenius norm Finite Impulse Response Filter (2계층 Frobenius norm 유한 임펄스 응답 필터 기반 디지털 위상 고정 루프 설계)

  • Sin Kim;Sung Shin;Sung-Hyun You;Hyun-Duck Choi
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.31-38
    • /
    • 2024
  • The digital phase-locked loop(DPLL) is one of the circuits composed of a digital detector, digital loop filter, voltage-controlled oscillator, and divider as a fundamental circuit, widely used in many fields such as electrical and circuit fields. A state estimator using various mathematical algorithms is used to improve the performance of a digital phase-locked loop. Traditional state estimators have utilized Kalman filters of infinite impulse response state estimators, and digital phase-locked loops based on infinite impulse response state estimators can cause rapid performance degradation in unexpected situations such as inaccuracies in initial values, model errors, and various disturbances. In this paper, we propose a two-layer Frobenius norm-based finite impulse state estimator to design a new digital phase-locked loop. The proposed state estimator uses the estimated state of the first layer to estimate the state of the first layer with the accumulated measurement value. To verify the robust performance of the new finite impulse response state estimator-based digital phase locked-loop, simulations were performed by comparing it with the infinite impulse response state estimator in situations where noise covariance information was inaccurate.

Application of Geo-Segment Anything Model (SAM) Scheme to Water Body Segmentation: An Experiment Study Using CAS500-1 Images (수체 추출을 위한 Geo-SAM 기법의 응용: 국토위성영상 적용 실험)

  • Hayoung Lee;Kwangseob Kim;Kiwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.4
    • /
    • pp.343-350
    • /
    • 2024
  • Since the release of Meta's Segment Anything Model (SAM), a large-scale vision transformer generation model with rapid image segmentation capabilities, several studies have been conducted to apply this technology in various fields. In this study, we aimed to investigate the applicability of SAM for water bodies detection and extraction using the QGIS Geo-SAM plugin, which enables the use of SAM with satellite imagery. The experimental data consisted of Compact Advanced Satellite 500 (CAS500)-1 images. The results obtained by applying SAM to these data were compared with manually digitized water objects, Open Street Map (OSM), and water body data from the National Geographic Information Institute (NGII)-based hydrological digital map. The mean Intersection over Union (mIoU) calculated for all features extracted using SAM and these three-comparison data were 0.7490, 0.5905, and 0.4921, respectively. For features commonly appeared or extracted in all datasets, the results were 0.9189, 0.8779, and 0.7715, respectively. Based on analysis of the spatial consistency between SAM results and other comparison data, SAM showed limitations in detecting small-scale or poorly defined streams but provided meaningful segmentation results for water body classification.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.