• Title/Summary/Keyword: Semantic Technique

Search Result 295, Processing Time 0.022 seconds

Improving Hypertext Classification Systems through WordNet-based Feature Abstraction (워드넷 기반 특징 추상화를 통한 웹문서 자동분류시스템의 성능향상)

  • Roh, Jun-Ho;Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.2
    • /
    • pp.95-110
    • /
    • 2013
  • This paper presents a novel feature engineering technique that can improve the conventional machine learning-based text classification systems. The proposed method extends the initial set of features by using hyperlink relationships in order to effectively categorize hypertext web documents. Web documents are connected to each other through hyperlinks, and in many cases hyperlinks exist among highly related documents. Such hyperlink relationships can be used to enhance the quality of features which consist of classification models. The basic idea of the proposed method is to generate a sort of ed concept feature which consists of a few raw feature words; for this, the method computes the semantic similarity between a target document and its neighbor documents by utilizing hierarchical relationships in the WordNet ontology. In developing classification models, the ed concept features are equated with other raw features, and they can play a great role in developing more accurate classification models. Through the extensive experiments with the Web-KB test collection, we prove that the proposed methods outperform the conventional ones.

Landscape Value Analysis of Hallyǒ Haesang Sea National Park (한려해상국립공원(閑麗海上國立公園)의 경관자원(景觀資源) 가치분석(價値分析))

  • Kim, Sei-Cheon
    • Journal of Korean Society of Forest Science
    • /
    • v.89 no.2
    • /
    • pp.145-160
    • /
    • 2000
  • This study is focused to the national park of Korean typical Sea Hally$\check{o}$ Haesang, and its visual resources and practiced inspect course by the way of suppositions and tests, to show the visual resource management objectively, and that of qualitative basic data. Accordingly by measuring the physical amount spatial structure with the visual amount originated from the Mesh Analyzing Method and the Visual Preference from the Scenic Beauty Estimation(S.B.E.) method and analyzed the valuation of the visual resource by Iverson method. Spatial image structure measured by Semantic Differential(S.D.) Scale was shown through the factor analysis algorithm for the analyzing psychological amount and examined the flowing out of decisive factor and the objective importance related to the mutual factors by appling the measurement of the visual quality. As a national Park, the visual factors that have natural landscape harmonized with forest, sky, surface of the water, curious stones and rocks, and temples should be escalated their values affirmatively so as to be the scenery of pointed direction and enjoyable, and it is of more needed for visual resource and its' controlling technique to make artificial structures more intentional planning and systemical setting. When we are viewing the improvement for the national park along with the visual resource management, reasonable level of development is needed, because when men interference surpass plantations and leasts will be damaged and the quality of natural landscape can be lowered, so it is needed to set up a management end, tangibly or clearly; and it is permitted limit coming and going ablably by accounting the suitable number for availing. But the controling end should be set in every level, positive management, very actively within the permissive varcability. It is the main business for the national park to prevent the damage from human for their gay life or to prevent the damage of a land carpet, and to restorate for the visual resource management.

  • PDF

Functional MRI Study on Perceiving Orthographic Structure and Simplified Semantic Pictures (의미론적인 단순화된 그림 및 표의문자를 인지하는 과정에 대한 fMRI 연구)

  • Kim Kyung Hwan;Lee Sung Ki;Song Myung Sung;Kwon Min Jung;Chung Jun Young;Park Hyun Wook;Yoon Hyo Woon
    • Investigative Magnetic Resonance Imaging
    • /
    • v.7 no.2
    • /
    • pp.93-99
    • /
    • 2003
  • The different perceiving patterns of each picture, alphabetic words and Chinese characters, were widely investigated psychophysically. The more precise localisation can be done in terms of brain activity us-ing functional image technique such as PET and fMRI recently, Until now, there was no fMRI study to make direct comparison between perception of single Chinese character and simplified pictures (pictograph). We have made direct comparison of these two components using modern magnetic resonance techniques. We cannot confirm the right hemispheric dominance for perception of single Chinese character and pictographs. These two kinds of perceiving pattern can be underlying different mechanism.

  • PDF

Deep learning-based Multilingual Sentimental Analysis using English Review Data (영어 리뷰데이터를 이용한 딥러닝 기반 다국어 감성분석)

  • Sung, Jae-Kyung;Kim, Yung Bok;Kim, Yong-Guk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.9-15
    • /
    • 2019
  • Large global online shopping malls, such as Amazon, offer services in English or in the language of a country when their products are sold. Since many customers purchase products based on the product reviews, the shopping malls actively utilize the sentimental analysis technique in judging preference of each product using the large amount of review data that the customer has written. And the result of such analysis can be used for the marketing to look the potential shoppers. However, it is difficult to apply this English-based semantic analysis system to different languages used around the world. In this study, more than 500,000 data from Amazon fine food reviews was used for training a deep learning based system. First, sentiment analysis evaluation experiments were carried out with three models of English test data. Secondly, the same data was translated into seven languages (Korean, Japanese, Chinese, Vietnamese, French, German and English) and then the similar experiments were done. The result suggests that although the accuracy of the sentimental analysis was 2.77% lower than the average of the seven countries (91.59%) compared to the English (94.35%), it is believed that the results of the experiment can be used for practical applications.

Metamorphic Malware Detection using Subgraph Matching (행위 그래프 기반의 변종 악성코드 탐지)

  • Kwon, Jong-Hoon;Lee, Je-Hyun;Jeong, Hyun-Cheol;Lee, Hee-Jo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.2
    • /
    • pp.37-47
    • /
    • 2011
  • In the recent years, malicious codes called malware are having shown significant increase due to the code obfuscation to evade detection mechanisms. When the code obfuscation technique is applied to malwares, they can change their instruction sequence and also even their signature. These malwares which have same functionality and different appearance are able to evade signature-based AV products. Thus, AV venders paid large amount of cost to analyze and classify malware for generating the new signature. In this paper, we propose a novel approach for detecting metamorphic malwares. The proposed mechanism first converts malware's API call sequences to call graph through dynamic analysis. After that, the callgraph is converted to semantic signature using 128 abstract nodes. Finally, we extract all subgraphs and analyze how similar two malware's behaviors are through subgraph similarity. To validate proposed mechanism, we use 273 real-world malwares include obfuscated malware and analyze 10,100 comparison results. In the evaluation, all metamorphic malwares are classified correctly, and similar module behaviors among different malwares are also discovered.

Spatial Replicability Assessment of Land Cover Classification Using Unmanned Aerial Vehicle and Artificial Intelligence in Urban Area (무인항공기 및 인공지능을 활용한 도시지역 토지피복 분류 기법의 공간적 재현성 평가)

  • Geon-Ung, PARK;Bong-Geun, SONG;Kyung-Hun, PARK;Hung-Kyu, LEE
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.4
    • /
    • pp.63-80
    • /
    • 2022
  • As a technology to analyze and predict an issue has been developed by constructing real space into virtual space, it is becoming more important to acquire precise spatial information in complex cities. In this study, images were acquired using an unmanned aerial vehicle for urban area with complex landscapes, and land cover classification was performed object-based image analysis and semantic segmentation techniques, which were image classification technique suitable for high-resolution imagery. In addition, based on the imagery collected at the same time, the replicability of land cover classification of each artificial intelligence (AI) model was examined for areas that AI model did not learn. When the AI models are trained on the training site, the land cover classification accuracy is analyzed to be 89.3% for OBIA-RF, 85.0% for OBIA-DNN, and 95.3% for U-Net. When the AI models are applied to the replicability assessment site to evaluate replicability, the accuracy of OBIA-RF decreased by 7%, OBIA-DNN by 2.1% and U-Net by 2.3%. It is found that U-Net, which considers both morphological and spectroscopic characteristics, performs well in land cover classification accuracy and replicability evaluation. As precise spatial information becomes important, the results of this study are expected to contribute to urban environment research as a basic data generation method.

A Study on Research Trends in Metaverse Platform Using Big Data Analysis (빅데이터 분석을 활용한 메타버스 플랫폼 연구 동향 분석)

  • Hong, Jin-Wook;Han, Jung-Wan
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.627-635
    • /
    • 2022
  • As the non-face-to-face situation continues for a long time due to COVID-19, the underlying technologies of the 4th industrial revolution such as IOT, AR, VR, and big data are affecting the metaverse platform overall. Such changes in the external environment such as society and culture can affect the development of academics, and it is very important to systematically organize existing achievements in preparation for changes. The Korea Educational Research Information Service (RISS) collected data including the 'metaverse platform' in the keyword and used the text mining technique, one of the big data analysis. The collected data were analyzed for word cloud frequency, connection strength between keywords, and semantic network analysis to examine the trends of metaverse platform research. As a result of the study, keywords appeared in the order of 'use', 'digital', 'technology', and 'education' in word cloud analysis. As a result of analyzing the connection strength (N-gram) between keywords, 'Edue→Tech' showed the highest connection strength and a total of three clusters of word chain clusters were derived. Detailed research areas were classified into five areas, including 'digital technology'. Considering the analysis results comprehensively, It seems necessary to discover and discuss more active research topics from the long-term perspective of developing a metaverse platform.

An Exploratory Study of Generative AI Service Quality using LDA Topic Modeling and Comparison with Existing Dimensions (LDA토픽 모델링을 활용한 생성형 AI 챗봇의 탐색적 연구 : 기존 AI 챗봇 서비스 품질 요인과의 비교)

  • YaeEun Ahn;Jungsuk Oh
    • Journal of Service Research and Studies
    • /
    • v.13 no.4
    • /
    • pp.191-205
    • /
    • 2023
  • Artificial Intelligence (AI), especially in the domain of text-generative services, has witnessed a significant surge, with forecasts indicating the AI-as-a-Service (AIaaS) market reaching a valuation of $55.0 Billion by 2028. This research set out to explore the quality dimensions characterizing synthetic text media software, with a focus on four key players in the industry: ChatGPT, Writesonic, Jasper, and Anyword. Drawing from a comprehensive dataset of over 4,000 reviews sourced from a software evaluation platform, the study employed the Latent Dirichlet Allocation (LDA) topic modeling technique using the Gensim library. This process resulted the data into 11 distinct topics. Subsequent analysis involved comparing these topics against established AI service quality dimensions, specifically AICSQ and AISAQUAL. Notably, the reviews predominantly emphasized dimensions like availability and efficiency, while others, such as anthropomorphism, which have been underscored in prior literature, were absent. This observation is attributed to the inherent nature of the reviews of AI services examined, which lean more towards semantic understanding rather than direct user interaction. The study acknowledges inherent limitations, mainly potential biases stemming from the singular review source and the specific nature of the reviewer demographic. Possible future research includes gauging the real-world implications of these quality dimensions on user satisfaction and to discuss deeper into how individual dimensions might impact overall ratings.

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.