• Title/Summary/Keyword: 대용량 분류

Search Result 243, Processing Time 0.023 seconds

Improved Hot data verification considering the continuity and frequency of data update requests (데이터 갱신요청의 연속성과 빈도를 고려한 개선된 핫 데이터 검증기법)

  • Lee, Seungwoo
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.5
    • /
    • pp.33-39
    • /
    • 2022
  • A storage device used in the mobile computing field should have low power, light weight, durability, etc., and should be able to effectively store and manage large-capacity data generated by users. NAND flash memory is mainly used as a storage device in the field of mobile computing. Due to the structural characteristics of NAND flash memory, it is impossible to overwrite in place when a data update request is made, so it can be solved by accurately separating requests that frequently request data update and requests that do not, and storing and managing them in each block. The classification method for such a data update request is called a hot data identification method, and various studies have been conducted at present. This paper continuously records the occurrence of data update requests using a counting filter for more accurate hot data validation, and also verifies hot data by considering how often the requested update requests occur during a specific time.

Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory (데이터마이닝의 자동 데이터 규칙 추출 방법론 개발 : 계층적 클러스터링 알고리듬과 러프 셋 이론을 중심으로)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.135-142
    • /
    • 2009
  • Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. The major techniques used in data mining are mining association rules, classification and clustering. Since these techniques are used individually, it is necessary to develop the methodology for rule extraction using a process of integrating these techniques. Rule extraction techniques assist humans in analyzing of large data sets and to turn the meaningful information contained in the data sets into successful decision making. This paper proposes an autonomous method of rule extraction using clustering and rough set theory. The experiments are carried out on data sets of UCI KDD archive and present decision rules from the proposed method. These rules can be successfully used for making decisions.

Performance Comparison and Error Analysis of Korean Bio-medical Named Entity Recognition (한국어 생의학 개체명 인식 성능 비교와 오류 분석)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.701-708
    • /
    • 2024
  • The advent of transformer architectures in deep learning has been a major breakthrough in natural language processing research. Object name recognition is a branch of natural language processing and is an important research area for tasks such as information retrieval. It is also important in the biomedical field, but the lack of Korean biomedical corpora for training has limited the development of Korean clinical research using AI. In this study, we built a new biomedical corpus for Korean biomedical entity name recognition and selected language models pre-trained on a large Korean corpus for transfer learning. We compared the name recognition performance of the selected language models by F1-score and the recognition rate by tag, and analyzed the errors. In terms of recognition performance, KlueRoBERTa showed relatively good performance. The error analysis of the tagging process shows that the recognition performance of Disease is excellent, but Body and Treatment are relatively low. This is due to over-segmentation and under-segmentation that fails to properly categorize entity names based on context, and it will be necessary to build a more precise morphological analyzer and a rich lexicon to compensate for the incorrect tagging.

Structural Segmentation for 3-D Brain Image by Intensity Coherence Enhancement and Classification (명암도 응집성 강화 및 분류를 통한 3차원 뇌 영상 구조적 분할)

  • Kim, Min-Jeong;Lee, Joung-Min;Kim, Myoung-Hee
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.465-472
    • /
    • 2006
  • Recently, many suggestions have been made in image segmentation methods for extracting human organs or disease affected area from huge amounts of medical image datasets. However, images from some areas, such as brain, which have multiple structures with ambiruous structural borders, have limitations in their structural segmentation. To address this problem, clustering technique which classifies voxels into finite number of clusters is often employed. This, however, has its drawback, the influence from noise, which is caused from voxel by voxel operations. Therefore, applying image enhancing method to minimize the influence from noise and to make clearer image borders would allow more robust structural segmentation. This research proposes an efficient structural segmentation method by filtering based clustering to extract detail structures such as white matter, gray matter and cerebrospinal fluid from brain MR. First, coherence enhancing diffusion filtering is adopted to make clearer borders between structures and to reduce the noises in them. To the enhanced images from this process, fuzzy c-means clustering method was applied, conducting structural segmentation by assigning corresponding cluster index to the structure containing each voxel. The suggested structural segmentation method, in comparison with existing ones with clustering using Gaussian or general anisotropic diffusion filtering, showed enhanced accuracy which was determined by how much it agreed with the manual segmentation results. Moreover, by suggesting fine segmentation method on the border area with reproducible results and minimized manual task, it provides efficient diagnostic support for morphological abnormalities in brain.

A Study on Element Features and Research Frames of Game Trailers (게임 트레일러의 유형 및 산업적 연구 프레임에 관한 고찰)

  • Kwon, Jae-Woong
    • Cartoon and Animation Studies
    • /
    • s.41
    • /
    • pp.187-222
    • /
    • 2015
  • The quantitave increase and qualitative development in the game industry leads to bitter competition and makes game companies struggle to find better ways promoting their own games. The game trailer is one of the critical ways to publicize diverse games by showing visual images directly. There are three reasons why the game trailer comes into the spotlight these days; the rapid growth of the Internet speed handling the large size of files, the remarkable development of visual image quality just like digital movies, and the advent of video websites such as You Tube that shows huge amount of videos regardless of the type and size. However, there are not enough amount of research on the game trailer because using game trailers as the marketing source is still at an early stage. Therefore, this research focuses on providing characteristics of game trailers that are available for practical market analysis. First, this research shows that game trailers can be divided by the category of display, style, and contents type. Second, this research provides the component parts of game trailers that are divided into contents factors such as characters, backgrounds, events and promotional factors such as title, production company name, distribution company name. Third, this research explores research frames that would be needed to analyze marketing strategies, effects of game trailers, production pipelines and so on. These categorizations would be useful for producing game trailers efficiently and utilizing them effectively.

A study on Digital Agriculture Data Curation Service Plan for Digital Agriculture

  • Lee, Hyunjo;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.171-177
    • /
    • 2022
  • In this paper, we propose a service method that can provide insight into multi-source agricultural data, way to cluster environmental factor which supports data analysis according to time flow, and curate crop environmental factors. The proposed curation service consists of four steps: collection, preprocessing, storage, and analysis. First, in the collection step, the service system collects and organizes multi-source agricultural data by using an OpenAPI-based web crawler. Second, in the preprocessing step, the system performs data smoothing to reduce the data measurement errors. Here, we adopt the smoothing method for each type of facility in consideration of the error rate according to facility characteristics such as greenhouses and open fields. Third, in the storage step, an agricultural data integration schema and Hadoop HDFS-based storage structure are proposed for large-scale agricultural data. Finally, in the analysis step, the service system performs DTW-based time series classification in consideration of the characteristics of agricultural digital data. Through the DTW-based classification, the accuracy of prediction results is improved by reflecting the characteristics of time series data without any loss. As a future work, we plan to implement the proposed service method and apply it to the smart farm greenhouse for testing and verification.

BIM Mesh Optimization Algorithm Using K-Nearest Neighbors for Augmented Reality Visualization (증강현실 시각화를 위해 K-최근접 이웃을 사용한 BIM 메쉬 경량화 알고리즘)

  • Pa, Pa Win Aung;Lee, Donghwan;Park, Jooyoung;Cho, Mingeon;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.2
    • /
    • pp.249-256
    • /
    • 2022
  • Various studies are being actively conducted to show that the real-time visualization technology that combines BIM (Building Information Modeling) and AR (Augmented Reality) helps to increase construction management decision-making and processing efficiency. However, when large-capacity BIM data is projected into AR, there are various limitations such as data transmission and connection problems and the image cut-off issue. To improve the high efficiency of visualizing, a mesh optimization algorithm based on the k-nearest neighbors (KNN) classification framework to reconstruct BIM data is proposed in place of existing mesh optimization methods that are complicated and cannot adequately handle meshes with numerous boundaries of the 3D models. In the proposed algorithm, our target BIM model is optimized with the Unity C# code based on triangle centroid concepts and classified using the KNN. As a result, the algorithm can check the number of mesh vertices and triangles before and after optimization of the entire model and each structure. In addition, it is able to optimize the mesh vertices of the original model by approximately 56 % and the triangles by about 42 %. Moreover, compared to the original model, the optimized model shows no visual differences in the model elements and information, meaning that high-performance visualization can be expected when using AR devices.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Studies on the Food Consumption Pattern of College girls in Pusan Area (여대생의 식물섭취(소비) 패턴에 관한 연구)

  • Kim, Sang-Ae
    • Journal of the Korean Society of Food Culture
    • /
    • v.6 no.4
    • /
    • pp.393-401
    • /
    • 1991
  • In order to compare the food consumption pattern of college girls on weekday and Sunday, the food intake of college girls in a district of Pusan city was analyzed through the factor analysis method. The principal results are as follows. 1. The amount of food intake on Sunday was generally larger than the average values of the food intake on weekday except for soybean & products, meats and beverage. 2. As for correlation coefficients among the intake of each food group, on weekday, positive correlation was noted among rice, potatoes, soybean & products and sea-weeds, fish & shells and fat & oil. And confectionery & sugar showed also a positive correlation to bread, while, on Sunday, vegetables, sea-weeds and meats showed a positive correlation to rice, and other cereals and eggs showed a positive correlation to bread & noodles, too. As for the relationship among rice and bread & noodles, a negative correlation was noted both on weekday and Sunday. 3. As for the factor analysis of the food intake on weekday through the correlation matrix, in the first factor, soybean & products, fat & oil, sea-weeds, rice and fish & shells showed comparatively large factor load, and in the second factor, meats, kimchi, vegetables, soybean & products, fruits and fish & shells showed large factor load. Here the first factor showed the Korean dietary life and the second factor showed subsidiary food (control nutrient) factor. In case of the food intake on Sunday, in the first factor, rice, meats, sea-weeds, soybean & products and vegetables showed large factor load, and in the second factor, fish & shells, vegetables, fat & oil, fruits and rice showed large factor load. Accordingly, the first factor and the second factor were considered to show Korean dietary pattern. 4. Nutrient intake on weekday and Sunday was substantially sufficient to RDA except for energy and iron.

  • PDF

A Hybrid Collaborative Filtering Using a Low-dimensional Linear Model (저차원 선형 모델을 이용한 하이브리드 협력적 여과)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.777-785
    • /
    • 2009
  • Collaborative filtering is a technique used to predict whether a particular user will like a particular item. User-based or item-based collaborative techniques have been used extensively in many commercial recommender systems. In this paper, a hybrid collaborative filtering method that combines user-based and item-based methods using a low-dimensional linear model is proposed. The proposed method solves the problems of sparsity and a large database by using NMF among the low-dimensional linear models. In collaborative filtering systems the methods using the NMF are useful in expressing users as semantic relations. However, they are model-based methods and the process of computation is complex, so they can not recommend items dynamically. In order to complement the shortcomings, the proposed method clusters users into groups by using NMF and selects features of groups by using TF-IDF. Mutual information is then used to compute similarities between items. The proposed method clusters users into groups and extracts features of groups on offline and determines the most suitable group for an active user using the features of groups on online. Finally, the proposed method reduces the time required to classify an active user into a group and outperforms previous methods by combining user-based and item-based collaborative filtering methods.