• Title/Summary/Keyword: Software classification

Search Result 899, Processing Time 0.03 seconds

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

An Implementation of OTB Extension to Produce TOA and TOC Reflectance of LANDSAT-8 OLI Images and Its Product Verification Using RadCalNet RVUS Data (Landsat-8 OLI 영상정보의 대기 및 지표반사도 산출을 위한 OTB Extension 구현과 RadCalNet RVUS 자료를 이용한 성과검증)

  • Kim, Kwangseob;Lee, Kiwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.449-461
    • /
    • 2021
  • Analysis Ready Data (ARD) for optical satellite images represents a pre-processed product by applying spectral characteristics and viewing parameters for each sensor. The atmospheric correction is one of the fundamental and complicated topics, which helps to produce Top-of-Atmosphere (TOA) and Top-of-Canopy (TOC) reflectance from multi-spectral image sets. Most remote sensing software provides algorithms or processing schemes dedicated to those corrections of the Landsat-8 OLI sensors. Furthermore, Google Earth Engine (GEE), provides direct access to Landsat reflectance products, USGS-based ARD (USGS-ARD), on the cloud environment. We implemented the Orfeo ToolBox (OTB) atmospheric correction extension, an open-source remote sensing software for manipulating and analyzing high-resolution satellite images. This is the first tool because OTB has not provided calibration modules for any Landsat sensors. Using this extension software, we conducted the absolute atmospheric correction on the Landsat-8 OLI images of Railroad Valley, United States (RVUS) to validate their reflectance products using reflectance data sets of RVUS in the RadCalNet portal. The results showed that the reflectance products using the OTB extension for Landsat revealed a difference by less than 5% compared to RadCalNet RVUS data. In addition, we performed a comparative analysis with reflectance products obtained from other open-source tools such as a QGIS semi-automatic classification plugin and SAGA, besides USGS-ARD products. The reflectance products by the OTB extension showed a high consistency to those of USGS-ARD within the acceptable level in the measurement data range of the RadCalNet RVUS, compared to those of the other two open-source tools. In this study, the verification of the atmospheric calibration processor in OTB extension was carried out, and it proved the application possibility for other satellite sensors in the Compact Advanced Satellite (CAS)-500 or new optical satellites.

Applying Meta-model Formalization of Part-Whole Relationship to UML: Experiment on Classification of Aggregation and Composition (UML의 부분-전체 관계에 대한 메타모델 형식화 이론의 적용: 집합연관 및 복합연관 판별 실험)

  • Kim, Taekyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.99-118
    • /
    • 2015
  • Object-oriented programming languages have been widely selected for developing modern information systems. The use of concepts relating to object-oriented (OO, in short) programming has reduced efforts of reusing pre-existing codes, and the OO concepts have been proved to be a useful in interpreting system requirements. In line with this, we have witnessed that a modern conceptual modeling approach supports features of object-oriented programming. Unified Modeling Language or UML becomes one of de-facto standards for information system designers since the language provides a set of visual diagrams, comprehensive frameworks and flexible expressions. In a modeling process, UML users need to consider relationships between classes. Based on an explicit and clear representation of classes, the conceptual model from UML garners necessarily attributes and methods for guiding software engineers. Especially, identifying an association between a class of part and a class of whole is included in the standard grammar of UML. The representation of part-whole relationship is natural in a real world domain since many physical objects are perceived as part-whole relationship. In addition, even abstract concepts such as roles are easily identified by part-whole perception. It seems that a representation of part-whole in UML is reasonable and useful. However, it should be admitted that the use of UML is limited due to the lack of practical guidelines on how to identify a part-whole relationship and how to classify it into an aggregate- or a composite-association. Research efforts on developing the procedure knowledge is meaningful and timely in that misleading perception to part-whole relationship is hard to be filtered out in an initial conceptual modeling thus resulting in deterioration of system usability. The current method on identifying and classifying part-whole relationships is mainly counting on linguistic expression. This simple approach is rooted in the idea that a phrase of representing has-a constructs a par-whole perception between objects. If the relationship is strong, the association is classified as a composite association of part-whole relationship. In other cases, the relationship is an aggregate association. Admittedly, linguistic expressions contain clues for part-whole relationships; therefore, the approach is reasonable and cost-effective in general. Nevertheless, it does not cover concerns on accuracy and theoretical legitimacy. Research efforts on developing guidelines for part-whole identification and classification has not been accumulated sufficient achievements to solve this issue. The purpose of this study is to provide step-by-step guidelines for identifying and classifying part-whole relationships in the context of UML use. Based on the theoretical work on Meta-model Formalization, self-check forms that help conceptual modelers work on part-whole classes are developed. To evaluate the performance of suggested idea, an experiment approach was adopted. The findings show that UML users obtain better results with the guidelines based on Meta-model Formalization compared to a natural language classification scheme conventionally recommended by UML theorists. This study contributed to the stream of research effort about part-whole relationships by extending applicability of Meta-model Formalization. Compared to traditional approaches that target to establish criterion for evaluating a result of conceptual modeling, this study expands the scope to a process of modeling. Traditional theories on evaluation of part-whole relationship in the context of conceptual modeling aim to rule out incomplete or wrong representations. It is posed that qualification is still important; but, the lack of consideration on providing a practical alternative may reduce appropriateness of posterior inspection for modelers who want to reduce errors or misperceptions about part-whole identification and classification. The findings of this study can be further developed by introducing more comprehensive variables and real-world settings. In addition, it is highly recommended to replicate and extend the suggested idea of utilizing Meta-model formalization by creating different alternative forms of guidelines including plugins for integrated development environments.

Research on ITB Contract Terms Classification Model for Risk Management in EPC Projects: Deep Learning-Based PLM Ensemble Techniques (EPC 프로젝트의 위험 관리를 위한 ITB 문서 조항 분류 모델 연구: 딥러닝 기반 PLM 앙상블 기법 활용)

  • Hyunsang Lee;Wonseok Lee;Bogeun Jo;Heejun Lee;Sangjin Oh;Sangwoo You;Maru Nam;Hyunsik Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.471-480
    • /
    • 2023
  • The Korean construction order volume in South Korea grew significantly from 91.3 trillion won in public orders in 2013 to a total of 212 trillion won in 2021, particularly in the private sector. As the size of the domestic and overseas markets grew, the scale and complexity of EPC (Engineering, Procurement, Construction) projects increased, and risk management of project management and ITB (Invitation to Bid) documents became a critical issue. The time granted to actual construction companies in the bidding process following the EPC project award is not only limited, but also extremely challenging to review all the risk terms in the ITB document due to manpower and cost issues. Previous research attempted to categorize the risk terms in EPC contract documents and detect them based on AI, but there were limitations to practical use due to problems related to data, such as the limit of labeled data utilization and class imbalance. Therefore, this study aims to develop an AI model that can categorize the contract terms based on the FIDIC Yellow 2017(Federation Internationale Des Ingenieurs-Conseils Contract terms) standard in detail, rather than defining and classifying risk terms like previous research. A multi-text classification function is necessary because the contract terms that need to be reviewed in detail may vary depending on the scale and type of the project. To enhance the performance of the multi-text classification model, we developed the ELECTRA PLM (Pre-trained Language Model) capable of efficiently learning the context of text data from the pre-training stage, and conducted a four-step experiment to validate the performance of the model. As a result, the ensemble version of the self-developed ITB-ELECTRA model and Legal-BERT achieved the best performance with a weighted average F1-Score of 76% in the classification of 57 contract terms.

Improved Sentence Boundary Detection Method for Web Documents (웹 문서를 위한 개선된 문장경계인식 방법)

  • Lee, Chung-Hee;Jang, Myung-Gil;Seo, Young-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.455-463
    • /
    • 2010
  • In this paper, we present an approach to sentence boundary detection for web documents that builds on statistical-based methods and uses rule-based correction. The proposed system uses the classification model learned offline using a training set of human-labeled web documents. The web documents have many word-spacing errors and frequently no punctuation mark that indicates the end of sentence boundary. As sentence boundary candidates, the proposed method considers every Ending Eomis as well as punctuation marks. We optimize engine performance by selecting the best feature, the best training data, and the best classification algorithm. For evaluation, we made two test sets; Set1 consisting of articles and blog documents and Set2 of web community documents. We use F-measure to compare results on a large variety of tasks, Detecting only periods as sentence boundary, our basis engine showed 96.5% in Set1 and 56.7% in Set2. We improved our basis engine by adapting features and the boundary search algorithm. For the final evaluation, we compared our adaptation engine with our basis engine in Set2. As a result, the adaptation engine obtained improvements over the basis engine by 39.6%. We proved the effectiveness of the proposed method in sentence boundary detection.

Analysis of Fire Intensity According to the Zones Classification in Traditional Market Stores (전통재래시장 상가간의 구역 구분에 따른 화재강도 분석)

  • Kim, Tae Kwon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.154-160
    • /
    • 2020
  • This study analyzed the fire intensity according to the zones classification between traditional market stores using FDS software. Modeling was conducted for the Seomoon traditional market district 4 at Daegu, which places combustibles, such as textiles and clothing near the passageway. The first ignition point assumed a short circuit fire situation at the fourth store combustible. The analysis was conducted under similar conditions as the fire situation in 2016. When there was no section wall, the fire spread rapidly through radiation in all directions from the fire-origin point. After 600 seconds, the mall was burnt to the ground. When section walls were present, however, the fire could be restricted inside the compartment. The first intensity of the two analysis conditions was predicted from the total heat energy from 200 seconds (X1) to 600 seconds (X2), where the heat generation rate began to increase rapidly. As a result of installing section walls near the fire point, heat energy generation of approximately 11.12 MW (55.68 %) was delayed. Further analysis of smoke control, according to the section wall arrangement and re-installation facilities, will be needed to study the characteristics of fire in traditional markets comprehensively.

Service Identification of Component-Based For Extending Service-Oriented Computing System (서비스지향 컴퓨팅 시스템으로의 확장을 위한 컴포넌트 기반의 서비스 식별)

  • Choi, Mi-Sook;Lee, Seo-Jeong;Lee, Jong-Suk;Yang, Seung-Won
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.5
    • /
    • pp.710-727
    • /
    • 2008
  • Service-oriented computing systems have been issued by their properties of reducing software development time and effort by reusing functional service units. The reusability of services can effectively promote through loose coupling between services. But strong associations of object-oriented systems such as inheritance and aggregation create a rather tight coupling between objects. The component-based systems without inheritance and aggregation create a loose coupling between components. Thus components provide service realization at runtime using the functionality provided by their interfaces. Therefore legacy component-based systems need to have service-oriented computing concept in order to support functional service units efficiently. Also, conventional methods for service-oriented computing system have not suggested the clear classification of service layers, the clear service identification guideline introducing service layers and a service mapping method between serviceces of each layer. Therefore we suggest the service classification and the identification guideline of business view and implementation view introducing layers and propose a mapping between two views. That is, we research service layers, service identification, diversified service sizes and a service mapping method between services of each layer. This can be applied to legacy component-based system to extend to the service-oriented computing system.

  • PDF

Digital Motion Capture for Types and Shapes of 3D Character Animation (디지털 모션 캡쳐(Motion Capture)를 위한 3D캐릭터 애니메이션의 종류별, 형태별 모델 분류)

  • Yun, Hwang-Rok;Ryu, Seuc-Ho;Lee, Dong-Lyeor
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.8
    • /
    • pp.102-108
    • /
    • 2007
  • Among culture industry that greet digital generation and is observed 21th century the most representative game industry latest is caught what and more interest degree is rising. 2D and 3D animation accomplish continuous growth and development depending action expression along with development of computer technology, and 2D and 3D animation practical use extent are trend that is widening the area in TV, movie, GAME industry etc. through computer hardware and fast change of software technology. The trend of latest game graphic is trend that the weight is changing from 2D to 3D by 3D game and activation of 3D game character that raise player's immersion stuff and Control in 2D's simplicity manufacturing game balance for one side. This treatise that is reality of 3D game character to classify kind of (Motion Capture) and 3D character animation, form model the sense put. Recognize that is overview and reality of 3D game character first for this about example, and is considered to efficiency is high game industry and digital contents industry hereafter by proposing kind model classification of 3D game character animation, form model classification data and character animation manufacture process that application is possible at fast time and effect in 3D character animation application are big.

A Method of Detecting the Aggressive Driving of Elderly Driver (노인 운전자의 공격적인 운전 상태 검출 기법)

  • Koh, Dong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.537-542
    • /
    • 2017
  • Aggressive driving is a major cause of car accidents. Previous studies have mainly analyzed young driver's aggressive driving tendency, yet they were only done through pure clustering or classification technique of machine learning. However, since elderly people have different driving habits due to their fragile physical conditions, it is necessary to develop a new method such as enhancing the characteristics of driving data to properly analyze aggressive driving of elderly drivers. In this study, acceleration data collected from a smartphone of a driving vehicle is analyzed by a newly proposed ECA(Enhanced Clustering method for Acceleration data) technique, coupled with a conventional clustering technique (K-means Clustering, Expectation-maximization algorithm). ECA selects high-intensity data among the data of the cluster group detected through K-means and EM in all of the subjects' data and models the characteristic data through the scaled value. Using this method, the aggressive driving data of all youth and elderly experiment participants were collected, unlike the pure clustering method. We further found that the K-means clustering has higher detection efficiency than EM method. Also, the results of K-means clustering demonstrate that a young driver has a driving strength 1.29 times higher than that of an elderly driver. In conclusion, the proposed method of our research is able to detect aggressive driving maneuvers from data of the elderly having low operating intensity. The proposed method is able to construct a customized safe driving system for the elderly driver. In the future, it will be possible to detect abnormal driving conditions and to use the collected data for early warning to drivers.

Machine-Learning Based Biomedical Term Recognition (기계학습에 기반한 생의학분야 전문용어의 자동인식)

  • Oh Jong-Hoon;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.718-729
    • /
    • 2006
  • There has been increasing interest in automatic term recognition (ATR), which recognizes technical terms for given domain specific texts. ATR is composed of 'term extraction', which extracts candidates of technical terms and 'term selection' which decides whether terms in a term list derived from 'term extraction' are technical terms or not. 'term selection' is a process to rank a term list depending on features of technical term and to find the boundary between technical term and general term. The previous works just use statistical features of terms for 'term selection'. However, there are limitations on effectively selecting technical terms among a term list using the statistical feature. The objective of this paper is to find effective features for 'term selection' by considering various aspects of technical terms. In order to solve the ranking problem, we derive various features of technical terms and combine the features using machine-learning algorithms. For solving the boundary finding problem, we define it as a binary classification problem which classifies a term in a term list into technical term and general term. Experiments show that our method records 78-86% precision and 87%-90% recall in boundary finding, and 89%-92% 11-point precision in ranking. Moreover, our method shows higher performance than the previous work's about 26% in maximum.