• Title/Summary/Keyword: Research Information Systems

Search Result 12,210, Processing Time 0.045 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

A Disk Group Commit Protocol for Main-Memory Database Systems (주기억 장치 데이타베이스 시스템을 위한 디스크 그룹 완료 프로토콜)

  • 이인선;염헌영
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.516-526
    • /
    • 2004
  • Main-Memory DataBase(MMDB) system where all the data reside on the main memory shows tremendous performance boost since it does not need any disk access during the transaction processing. Since MMDB still needs disk logging for transaction commit, it has become another bottleneck for the transaction throughput and the commit protocol should be examined carefully. There have been several attempts to reduce the logging overhead. The pre-commit and group commit are two well known techniques which do not require additional hardware. However, there has not been any research to analyze their effect on MMDB system. In this paper, we identify the possibility of deadlock resulting from the group commit and propose the disk group commit protocol which can be readily deployed. Using extensive simulation, we have shown that the group commit is effective on improving the MMDB transaction performance and the proposed disk group commit almost always outperform carefully tuned group commit. Also, we note that the pre-commit does not have any effect when used alone but shows some improvement if used in conjunction with the group commit.

Development of Local Animal BLAST Search System Using Bioinformatics Tools (생물정보시스템을 이용한 Local Animal BLAST Search System 구축)

  • Kim, Byeong-Woo;Lee, Geun-Woo;Kim, Hyo-Seon;No, Seung-Hui;Lee, Yun-Ho;Kim, Si-Dong;Jeon, Jin-Tae;Lee, Ji-Ung;Jo, Yong-Min;Jeong, Il-Jeong;Lee, Jeong-Gyu
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.99-102
    • /
    • 2006
  • The Basic Local Alignment Search Tool (BLAST) is one of the most established software in bioinformatics research and it compares a query sequence against the libraries of known sequences in order to investigate sequence similarity. Expressed Sequence Tags (ESTs) are single-pass sequence reads from mRNA (or cDNA) and represent the expression for a given cDNA library and the snapshot of genes expressed in a given tissue and/or at a given developmental stage. Therefore, ESTs can be very valuable information for functional genomics and bioinformatics researches. Although major bio database (DB) websites including NCBI are providing BLAST services and EST data, local DB and search system is demanding for better performance and security issue. Here we present animal EST DBs and local BLAST search system. The animal ESTs DB in NCBI Genbank were divided by animal species using the Perl script we developed. and we also built the new extended DB search systems fur the new data (Local Animal BLAST Search System: http://bioinfo.kohost.net), which was constructed on the high-capacity PC Cluster system fur the best performance. The new local DB contains 650,046 sequences for Bos taurus(cattle), 368,120 sequences for Sus scrofa (pig), 693,005 sequences for Gallus gallus (fowl), respectively.

  • PDF

Path Analysis of Factors Limiting Crop Yield in Rice Paddy and Upland Corn Fields (벼와 옥수수 재배 포장에서 경로분석을 이용한 작물 수확량 제한요인 분석)

  • Chung S. O.;Sudduth K. A.;Chang Y. C.
    • Journal of Biosystems Engineering
    • /
    • v.30 no.1 s.108
    • /
    • pp.45-55
    • /
    • 2005
  • Knowledge of the relationship between crop yield and yield-limiting factors is essential for precision farming. However, developing this knowledge is not easy because these yield-limiting factors are interrelated and affect crop yield in different ways. In this study, data for grain yield and yield-limiting factors, including crop chlorophyll content, soil chemical properties, and topography were collected for a small (0.3 ha) rice paddy field in Korea and a large (36 ha) upland corn field in the USA, and relationships were investigated with path analysis. Using this approach, the effects of limiting factors on crop yield could be separated into direct effects and indirect effects acting through other factors. Path analysis provided more insight into these complex relationships than did simple correlation or multiple linear regression analysis. Results of correlation analysis for the rice paddy field showed that EC, Ca, and $SiO_2$ had significant (P<0.1) correlations with rice yield, while pH, Ca, Mg, Na, $SiO_2,\;and\;P_2O_5$ had significant correlations with the SPAD chlorophyll reading. Path analysis provided additional information about the importance and contribution paths of soil variables to rice yield and growth. Ca had the highest direct effect (0.52) and indirect effect via Mg (-0.37) on rice yield. The indirect effect of Mg through Ca (0.51) was higher than the direct effect (-0.38). Path analysis also enabled more appropriate selection of important factors limiting crop yield by considering cause-and-effect relationships among predictor and response variables. For example, although pH showed a positive correlation (r=0.35) with SPAD readings, the correlation was mainly due to the indirect positive effects acting through Mg and $SiO_2$, while pH not only showed negative direct effects, but also negatively impacted indirect effects of other variables on SPAD readings. For the large upland Missouri corn field, two topographic factors, elevation and slope, had significant (P<0.1) direct effects on yield and highly significant (P<0.01) correlations with other limiting factors. Based on the correlation analysis alone, P and K were determined to be nutrients that would increase corn yield for this field. With the help of path analysis, however, increases in Mg could also be expected to increase corn yield in this case. In general, path analysis results were consistent with published optimum ranges of nutrients for rice and com production. We conclude that path analysis can be a useful tool to investigate interrelationships between crop yield and yield limiting factors on a site-specific basis.

A Design-phase Quality Model for Ubiquitous Service Ontology (유비쿼터스 서비스 온톨로지를 위한 설계 품질 모델)

  • Lee, Mee-Yeon;Park, Seung-Soo;Lee, Jung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.430-445
    • /
    • 2010
  • Effective service description and modeling methodologies are essential for dynamic service composition to provide autonomous services for users in ubiquitous computing environments. In our previous research, we proposed a 'u-Service' as an abstract and structured concept for operations of devices in ubiquitous environments. In addition, we established the mechanism to structure u-Services as an ontology and the description specification to represent attributes of u-Services. However, it did not provide enough methods or standards to analyze and evaluate the effectiveness of the u-Service ontology in the design time. Since existing quality models for software products or computing systems cannot consider characteristics of ubiquitous services, they are not suitable for ubiquitous service ontology. Therefore, in this paper, we propose a quality evaluation model to design and modeling a good ubiquitous service ontology, based on our u-Service ontology building process. We extract modeling goals and evaluation indicators according to characteristics of ubiquitous service ontology, and establish quality metrics to quantify each quality sub-characteristics. The experiment result of the proposed quality evaluation model for u-Service ontologies which are constructed for our previous works shows that we can analyze the design of ubiquitous service ontology from various angles, and indicate recommendations for improvement.

Using the fusion of spatial and temporal features for malicious video classification (공간과 시간적 특징 융합 기반 유해 비디오 분류에 관한 연구)

  • Jeon, Jae-Hyun;Kim, Se-Min;Han, Seung-Wan;Ro, Yong-Man
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.365-374
    • /
    • 2011
  • Recently, malicious video classification and filtering techniques are of practical interest as ones can easily access to malicious multimedia contents through the Internet, IPTV, online social network, and etc. Considerable research efforts have been made to developing malicious video classification and filtering systems. However, the malicious video classification and filtering is not still being from mature in terms of reliable classification/filtering performance. In particular, the most of conventional approaches have been limited to using only the spatial features (such as a ratio of skin regions and bag of visual words) for the purpose of malicious image classification. Hence, previous approaches have been restricted to achieving acceptable classification and filtering performance. In order to overcome the aforementioned limitation, we propose new malicious video classification framework that takes advantage of using both the spatial and temporal features that are readily extracted from a sequence of video frames. In particular, we develop the effective temporal features based on the motion periodicity feature and temporal correlation. In addition, to exploit the best data fusion approach aiming to combine the spatial and temporal features, the representative data fusion approaches are applied to the proposed framework. To demonstrate the effectiveness of our method, we collect 200 sexual intercourse videos and 200 non-sexual intercourse videos. Experimental results show that the proposed method increases 3.75% (from 92.25% to 96%) for classification of sexual intercourse video in terms of accuracy. Further, based on our experimental results, feature-level fusion approach (for fusing spatial and temporal features) is found to achieve the best classification accuracy.

Prediction of the Following BCI Performance by Means of Spectral EEG Characteristics in the Prior Resting State (뇌신호 주파수 특성을 이용한 CNN 기반 BCI 성능 예측)

  • Kang, Jae-Hwan;Kim, Sung-Hee;Youn, Joosang;Kim, Junsuk
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.11
    • /
    • pp.265-272
    • /
    • 2020
  • In the research of brain computer interface (BCI) technology, one of the big problems encountered is how to deal with some people as called the BCI-illiteracy group who could not control the BCI system. To approach this problem efficiently, we investigated a kind of spectral EEG characteristics in the prior resting state in association with BCI performance in the following BCI tasks. First, spectral powers of EEG signals in the resting state with both eyes-open and eyes-closed conditions were respectively extracted. Second, a convolution neural network (CNN) based binary classifier discriminated the binary motor imagery intention in the BCI task. Both the linear correlation and binary prediction methods confirmed that the spectral EEG characteristics in the prior resting state were highly related to the BCI performance in the following BCI task. Linear regression analysis demonstrated that the relative ratio of the 13 Hz below and above the spectral power in the resting state with only eyes-open, not eyes-closed condition, were significantly correlated with the quantified metrics of the BCI performance (r=0.544). A binary classifier based on the linear regression with L1 regularization method was able to discriminate the high-performance group and low-performance group in the following BCI task by using the spectral-based EEG features in the precedent resting state (AUC=0.817). These results strongly support that the spectral EEG characteristics in the frontal regions during the resting state with eyes-open condition should be used as a good predictor of the following BCI task performance.

A Study on Economy Effects of ICT Industry on Transportation Industry -For Convergence of ICT and Transportation- (정보통신산업이 운송산업에 미치는 경제적 효과에 관한 연구 -정보통신과 운송의 융합을 위한-)

  • Shin, Yong-Jae;Choi, Sung-Wook
    • Journal of Digital Convergence
    • /
    • v.13 no.8
    • /
    • pp.321-329
    • /
    • 2015
  • This study investigates effects of hardware and telecommunication and software service divided by ICT service on each 5 transportations to explore convergence of ICT and Transportation. Research models are production inducing effects, Added Value inducing effects of Demand-Driven model and Shortage cost effects of Supply-Driven model by using data for 2010~2012 of Input-Output Table. Results are that network and software service effects are more impact than hardware effects on transportations. Especially, hardware is impacted heavily on production inducing effect, telecommunications and software services has had a significant impact on the production inducing effect and Shortage cost effects. In addition, by each detail the transportation industries, packages and other transport and road transport is influenced greatly from ICT. On the other hand, rail and water transport are relatively lower impact by ICT, However, the effects of rail and water transport by ICT is grater than investment ratio of ICT. As a result, increasing investment in the ICT services could contribute to development of rail and water transport development.