Search | Korea Science

Research on Mining Technology for Explainable Decision Making (설명가능한 의사결정을 위한 마이닝 기술)

Kyungyong Chung
- Journal of the Institute of Convergence Signal Processing
- /
- v.24 no.4
- /
- pp.186-191
- /
- 2023
Data processing techniques play a critical role in decision-making, including handling missing and outlier data, prediction, and recommendation models. This requires a clear explanation of the validity, reliability, and accuracy of all processes and results. In addition, it is necessary to solve data problems through explainable models using decision trees, inference, etc., and proceed with model lightweight by considering various types of learning. The multi-layer mining classification method that applies the sixth principle is a method that discovers multidimensional relationships between variables and attributes that occur frequently in transactions after data preprocessing. This explains how to discover significant relationships using mining on transactions and model the data through regression analysis. It develops scalable models and logistic regression models and proposes mining techniques to generate class labels through data cleansing, relevance analysis, data transformation, and data augmentation to make explanatory decisions.
https://doi.org/10.23087/jkicsp.2023.24.4.002 인용 PDF

An Efficient Bitmap Indexing Method for Multimedia Data Reflecting the Characteristics of MPEG-7 Visual Descriptors (MPEG-7 시각 정보 기술자의 특성을 반영한 효율적인 멀티미디어 데이타 비트맵 인덱싱 방법)

Jeong Jinguk;Nang Jongho
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.1
- /
- pp.9-20
- /
- 2005
Recently, the MPEG-7 standard a multimedia content description standard is wide]y used for content based image/video retrieval systems. However, since the descriptors standardized in MPEG-7 are usually multidimensional and the problem called 'Curse of dimensionality', previously proposed indexing methods(for example, multidimensional indexing methods, dimensionality reduction methods, filtering methods, and so on) could not be used to effectively index the multimedia database represented in MPEG-7. This paper proposes an efficient multimedia data indexing mechanism reflecting the characteristics of MPEG-7 visual descriptors. In the proposed indexing mechanism, the descriptor is transformed into a histogram of some attributes. By representing the value of each bin as a binary number, the histogram itself that is a visual descriptor for the object in multimedia database could be represented as a bit string. Bit strings for all objects in multimedia database are collected to form an index file, bitmap index, in the proposed indexing mechanism. By XORing them with the descriptors for query object, the candidate solutions for similarity search could be computed easily and they are checked again with query object to precisely compute the similarity with exact metric such as Ll-norm. These indexing and searching mechanisms are efficient because the filtering process is performed by simple bit-operation and it reduces the search space dramatically. Upon experimental results with more than 100,000 real images, the proposed indexing and searching mechanisms are about IS times faster than the sequential searching with more than 90% accuracy.
PDF KSCI

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
- The KIPS Transactions:PartA
- /
- v.16A no.6
- /
- pp.453-462
- /
- 2009
Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.
https://doi.org/10.3745/KIPSTA.2009.16A.6.453 인용 PDF KSCI

Submarket Identification in Property Markets: Focusing on a Hedonic Price Model Improvement (부동산 하부시장 구획: 헤도닉 모형의 개선을 중심으로)

Lee, Chang Ro;Eum, Young Seob;Park, Key Ho
- Journal of the Korean Geographical Society
- /
- v.49 no.3
- /
- pp.405-422
- /
- 2014
Two important issues in hedonic model are to specify accurate model and delineate submarkets. While the former has experienced much improvement over recent decades, the latter has received relatively little attention. However, the accuracy of estimates from hedonic model will be necessarily reduced when the analysis does not adequately address market segmentation which can capture the spatial scale of price formation process in real estate. Placing emphasis on improvement of performance in hedonic model, this paper tried to segment real estate markets in Gangnam-gu and Jungrang-gu, which correspond to most heterogeneous and homogeneous ones respectively in 25 autonomous districts of Seoul. First, we calculated variable coefficients from mixed geographically weighted regression model (mixed GWR model) as input for clustering, since the coefficient from hedonic model can be interpreted as shadow price of attributes constituting real estate. After that, we developed a spatially constrained data-driven methodology to preserve spatial contiguity by utilizing the SKATER algorithm based on a minimum spanning tree. Finally, the performance of this method was verified by applying a multi-level model. We concluded that submarket does not exist in Jungrang-gu and five submarkets centered on arterial roads would be reasonable in Gangnam-gu. Urban infrastructure such as arterial roads has not been considered an important factor for delineating submarkets until now, but it was found empirically that they play a key role in market segmentation.
PDF

Location Generalization Method of Moving Object using $R^$-Tree and Grid ($R^$-Tree와 Grid를 이용한 이동 객체의 위치 일반화 기법)

Ko, Hyun;Kim, Kwang-Jong;Lee, Yon-Sik
- Journal of the Korea Society of Computer and Information
- /
- v.12 no.2 s.46
- /
- pp.231-242
- /
- 2007
The existing pattern mining methods[1,2,3,4,5,6,11,12,13] do not use location generalization method on the set of location history data of moving object, but even so they simply do extract only frequent patterns which have no spatio-temporal constraint in moving patterns on specific space. Therefore, it is difficult for those methods to apply to frequent pattern mining which has spatio-temporal constraint such as optimal moving or scheduling paths among the specific points. And also, those methods are required more large memory space due to using pattern tree on memory for reducing repeated scan database. Therefore, more effective pattern mining technique is required for solving these problems. In this paper, in order to develop more effective pattern mining technique, we propose new location generalization method that converts data of detailed level into meaningful spatial information for reducing the processing time for pattern mining of a massive history data set of moving object and space saving. The proposed method can lead the efficient spatial moving pattern mining of moving object using by creating moving sequences through generalizing the location attributes of moving object into 2D spatial area based on $R^*$-Tree and Area Grid Hash Table(AGHT) in preprocessing stage of pattern mining.
PDF

Implementation of an Efficient Microbial Medical Image Retrieval System Applying Knowledge Databases (지식 데이타베이스를 적용한 효율적인 세균 의료영상 검색 시스템의 구현)

Shin Yong Won;Koo Bong Oh
- Journal of the Korea Society of Computer and Information
- /
- v.10 no.1 s.33
- /
- pp.93-100
- /
- 2005
This study is to desist and implement an efficient microbial medical image retrieval system based on knowledge and content of them which can make use of more accurate decision on colony as doll as efficient education for new techicians. For this. re first address overall inference to set up flexible search path using rule-base in order U redure time required original microbial identification by searching the fastest path of microbial identification phase based on heuristics knowledge. Next, we propose a color ffature gfraction mtU, which is able to extract color feature vectors of visual contents from a inn microbial image based on especially bacteria image using HSV color model. In addition, for better retrieval performance based on large microbial databases, we present an integrated indexing technique that combines with B+-tree for indexing simple attributes, inverted file structure for text medical keywords list, and scan-based filtering method for high dimensional color feature vectors. Finally. the implemented system shows the possibility to manage and retrieve the complex microbial images using knowledge and visual contents itself effectively. We expect to decrease rapidly Loaming time for elementary technicians by tell organizing knowledge of clinical fields through proposed system.
PDF

A Study on the 'fragmentation' trend of modern film montage (현대영화 몽타주의 '파편화(fragmentation)' 경향 연구)

LEE, Jiyoung
- Trans-
- /
- v.3
- /
- pp.29-53
- /
- 2017
The film scholar Vincent Amiel divides into three types of montage through his book The Aesthetics of Montage ; Montage narratif, Montage discursif, and Montage decorrespondances. These three categories are the concept that encompasses the aesthetic class to which most movies belong. Early films pursued the essential and basic functions of editing, which tend to be modified in the direction of enhancing the director's goals over time. In this way, "Expressive Montage" is one of most important concepts of montage, not as a 'methodology' that combines narrative but as a 'purpose'. In the montage stage, the expressive montage work is done through three steps of decision. The process of 'combining' to combine the selected films in a certain order, after the process of 'selection' which selects only necessary parts of the rush film, and 'connection' to determine the scene connection considering the duration of the shot. The connection is the final stage of the montage. There are exceptions, of course. When fiction films of classical narratives use close-ups, or when using models or objects of neutered animals, the film induces the tendency of a "montage decorrespondances" rather than a "montage narratif" or "montage discursif". This study attempts to analyze the tendency of montage of works with 'uncertain connection' through 'collage' used by close-ups and montage decorrespondances as 'fragmentation tendency of modern films'. The fragmentation of the montage in contemporary film breaks the continuous and structural nature of the film, and confuses the narration structure that is visible on the surface of the film. The tendency of the fragmentation of the montage, which started from this close-up, seems to give an answer to the extensibility of the modern image.
PDF

3D Building Modeling Using Aerial LiDAR Data (항공 LiDAR 데이터를 이용한 3차원 건물모델링)

Cho, Hong-Beom;Cho, Woo-Sug;Park, Jun-Ku;Song, Nak-Hyun
- Korean Journal of Remote Sensing
- /
- v.24 no.2
- /
- pp.141-152
- /
- 2008
The 3D building modeling is one of crucial components in constructing 3D geospatial information. The existing methods for 3D building modeling depend mainly on manual photogrammetric processes, which indeed take great amount of time and efforts. In recent years, many researches on 3D building modeling using aerial LiDAR data have been actively performed to aim at overcoming the limitations of existing 3D building modeling methods. Either techniques with interpolated grid data or data fusion with digital map and images have been investigated in most of existing researches on 3D building modeling with aerial LiDAR data. The paper proposed a method of 3D building modeling with LiDAR data only. Firstly, octree-based segmentation is applied recursively to LiDAR data classified as buildings in 3D space until there are no more LiDAR points to be segmented. Once octree-based segmentation is completed, each segmented patch is thereafter merged together based on its geometric spatial characteristics. Secondly, building model components are created with merged patches. Finally, a 3D building model is generated and composed with building model components. The experimental results with real LiDAR data showed that the proposed method was capable of modeling various types of 3D buildings.
https://doi.org/10.7780/kjrs.2008.24.2.141 인용 PDF KSCI

A Study on Updated Object Detection and Extraction of Underground Information (지하정보 변화객체 탐지 및 추출 연구)

Kim, Kwangsoo;Lee, Heyung-Sub;Kim, Juwan
- Journal of Software Assessment and Valuation
- /
- v.16 no.2
- /
- pp.99-107
- /
- 2020
An underground integrated map is being built for underground safety management and is being updated periodically. The map update proceeds with the procedure of deleting all previously stored objects and saving newly entered objects. However, even unchanged objects are repeatedly stored, deleted, and stored. That causes the delay of the update time. In this study, in order to shorten the update time of the integrated map, an updated object and an unupdated object are separated, and only updated objects are reflected in the underground integrated map, and a system implementing this technology is described. For the updated object, an object comparison method using the center point of the object is used, and a quad tree is used to improve the search speed. The types of updated objects are classified into addition and deletion using the shape of the object, and change using its attributes. The proposed system consists of update object detection, extraction, conversion, storage, and history management modules. This system has the advantage of being able to update the integrated map about four times faster than the existing method based on the data used in the experiment, and has the advantage that it can be applied to both ground and underground facilities.
https://doi.org/10.29056/jsav.2020.12.11 인용

A Hierarchical Cluster Tree Based Fast Searching Algorithm for Raman Spectroscopic Identification (계층 클러스터 트리 기반 라만 스펙트럼 식별 고속 검색 알고리즘)

Kim, Sun-Keum;Ko, Dae-Young;Park, Jun-Kyu;Park, Aa-Ron;Baek, Sung-June
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.20 no.3
- /
- pp.562-569
- /
- 2019
Raman spectroscopy has been receiving increased attention as a standoff explosive detection technique. In addition, there is a growing need for a fast search method that can identify raman spectrum for measured chemical substances compared to known raman spectra in large database. By far the most simple and widely used method is to calculate and compare the Euclidean distance between the given spectrum and the spectra in a database. But it is non-trivial problem because of the inherent high dimensionality of the data. One of the most serious problems is the high computational complexity of searching for the closet spectra. To overcome this problem, we presented the MPS Sort with Sorted Variance+PDS method for the fast algorithm to search for the closet spectra in the last paper. the proposed algorithm uses two significant features of a vector, mean values and variance, to reject many unlikely spectra and save a great deal of computation time. In this paper, we present two new methods for the fast algorithm to search for the closet spectra. the PCA+PDS algorithm reduces the amount of computation by reducing the dimension of the data through PCA transformation with the same result as the distance calculation using the whole data. the Hierarchical Cluster Tree algorithm makes a binary hierarchical tree using PCA transformed spectra data. then it start searching from the clusters closest to the input spectrum and do not calculate many spectra that can not be candidates, which save a great deal of computation time. As the Experiment results, PCA+PDS shows about 60.06% performance improvement for the MPS Sort with Sorted Variance+PDS. also, Hierarchical Tree shows about 17.74% performance improvement for the PCA+PDS. The results obtained confirm the effectiveness of the proposed algorithm.
https://doi.org/10.5762/KAIS.2019.20.3.562 인용 PDF KSCI HTML

Search Result 160, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)