• Title/Summary/Keyword: large data visualization

Search Result 244, Processing Time 0.027 seconds

Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset (기계 학습 기반 분석을 위한 다변량 정형 데이터 처리 및 시각화 방법: Titanic 데이터셋 적용 사례 연구)

  • Juhyoung Sung;Kiwon Kwon;Kyoungwon Park;Byoungchul Song
    • Journal of Internet Computing and Services
    • /
    • v.25 no.4
    • /
    • pp.121-130
    • /
    • 2024
  • As internet and communication technology (ICT) is improved exponentially, types and amount of available data also increase. Even though data analysis including statistics is significant to utilize this large amount of data, there are inevitable limits to process various and complex data in general way. Meanwhile, there are many attempts to apply machine learning (ML) in various fields to solve the problems according to the enhancement in computational performance and increase in demands for autonomous systems. Especially, data processing for the model input and designing the model to solve the objective function are critical to achieve the model performance. Data processing methods according to the type and property have been presented through many studies and the performance of ML highly varies depending on the methods. Nevertheless, there are difficulties in deciding which data processing method for data analysis since the types and characteristics of data have become more diverse. Specifically, multi-variate data processing is essential for solving non-linear problem based on ML. In this paper, we present a multi-variate tabular data processing scheme for ML-aided data analysis by using Titanic dataset from Kaggle including various kinds of data. We present the methods like input variable filtering applying statistical analysis and normalization according to the data property. In addition, we analyze the data structure using visualization. Lastly, we design an ML model and train the model by applying the proposed multi-variate data process. After that, we analyze the passenger's survival prediction performance of the trained model. We expect that the proposed multi-variate data processing and visualization can be extended to various environments for ML based analysis.

Multidisciplinary CAE Management System Using a Lightweight CAE Format (경량 CAE 포맷을 이용한 다분야 CAE 관리 시스템 개발)

  • Park, Byoung-Keon;Kim, Jay-Jung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.15 no.2
    • /
    • pp.157-165
    • /
    • 2010
  • In the manufacturing industries, CAE analysis results are frequently required during the product development process for design verification. CAE data which include all related information of an analysis is, however, not efficiently shared among engineers because CAE data size is in general very large to deal with. At first, we represent a proposed lightweight format which is capable to include all the types of CAE analysis results and to support hierarchical data structure. Since each CAE system has different data structures of its own, a translator which translates to the proposed format is also represented. Unlike the design environment with CAD system, many CAE systems are used in a manufacturing company because many sorts of analysis are performed usually for a product design. Thus, lots of CAE results are generated and occupy huge size within storage, and they make it harder to manage or share many CAE results efficiently. A multi-CAE management system which is able to share many types of CAE data simultaneously using lightweight format is proposed in this paper. Finally, an implementation of the system for this will be introduced.

Improved Multidimensional Scaling Techniques Considering Cluster Analysis: Cluster-oriented Scaling (클러스터링을 고려한 다차원척도법의 개선: 군집 지향 척도법)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.45-70
    • /
    • 2012
  • There have been many methods and algorithms proposed for multidimensional scaling to mapping the relationships between data objects into low dimensional space. But traditional techniques, such as PROXSCAL or ALSCAL, were found not effective for visualizing the proximities between objects and the structure of clusters of large data sets have more than 50 objects. The CLUSCAL(CLUster-oriented SCALing) technique introduced in this paper differs from them especially in that it uses cluster structure of input data set. The CLUSCAL procedure was tested and evaluated on two data sets, one is 50 authors co-citation data and the other is 85 words co-occurrence data. The results can be regarded as promising the usefulness of CLUSCAL method especially in identifying clusters on MDS maps.

Data Interpretation Methods for Petroleomics

  • Islam, Annana;Cho, Yun-Ju;Ahmed, Arif;Kim, Sung-Hwan
    • Mass Spectrometry Letters
    • /
    • v.3 no.3
    • /
    • pp.63-67
    • /
    • 2012
  • The need of heavy and unconventional crude oil as an energy source is increasing day by day, so does the importance of petroleomics: the pursuit of detailed knowledge of heavy crude oil. Crude oil needs techniques with ultra-high resolving capabilities to resolve its complex characteristics. Therefore, ultra-high resolution mass spectrometry represented by Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) has been successfully applied to the study of heavy and unconventional crude oils. The analysis of crude oil with high resolution mass spectrometry (FT-ICR MS) has pushed analysis to the limits of instrumental and methodological capabilities. Each high-resolution mass spectrum of crude oil may routinely contain over 50,000 peaks. To visualize and effectively study the large amount of data sets is not trivial. Therefore, data processing and visualization methods such as Kendrick mass defect and van Krevelen analyses and statistical analyses have played an important role. In this regard, it will not be an overstatement to say that the success of FT-ICR MS to the study of crude oil has been critically dependent on data processing methods. Therefore, this review offers introduction to peotroleomic data interpretation methods.

A Study on Urban Noise Visualization using 3D-GIS (3차원 GIS를 활용한 도시소음 시각화에 관한 연구)

  • Ryu, Keun-Won;Kim, Geun-Han;Kim, Hye-Young;Jun, Chul-Min
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.3
    • /
    • pp.17-24
    • /
    • 2007
  • The noise problem is one of the major problems associated with large cities and is considered important factor not only in maintenance but also in development of cities. Accordingly, the noise map is being increasingly used in city planning and design. However, the existing two-dimensional noise maps only show regional, planar distribution of noise. This study presented a method to build a data model for analyzing and visualizing noise levels at fine scale considering the vertical distribution of noise in a building. By expanding the 2D topology concept used in conventional GIS to 3D, it suggested a 3D GIS data model that makes 3D spatial queries, analyses and visualization possible and applied the proposed approach to building a 3D noise information system. By building and testing the system, the study showed different functionalities including 3D spatial queries and 3D visualization of noise levels varying temporally or according as the height of sound-proof walls. In each case, the population exposed to noise was quantitatively computed to illustrate the potential in the areas of city planning and design.

  • PDF

Harmony Search for Virtual Machine Replacement (화음 탐색법을 활용한 가상머신 재배치 연구)

  • Choi, Jae-Ho;Kim, Jang-Yeop;Seo, Young Jin;Kim, Young-Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.26-35
    • /
    • 2019
  • By operating servers, storage, and networking devices, Data centers consume a lot of power such as cooling facilities, air conditioning facilities, and emergency power facilities. In the United States, The power consumed by data centers accounted for 1.8% of total power consumption in 2004. The data center industry has evolved to a large scale, and the number of large hyper scale data centers is expected to grow in the future. However, as a result of examining the server share of the data center, There is a problem where the server is not used effectively such that the average occupancy rate is only about 15% to 20%. To solve this problem, we propose a Virtual Machine Reallocation research using virtual machine migration function. In this paper, we use meta-heuristic for effective virtual machine reallocation. The virtual machine reallocation problem with the goal of maximizing the idle server was designed and solved through experiments. This study aims to reducing the idle rate of data center servers and reducing power consumption simultaneously by solving problems.

Appendiceal Visualization on 2-mSv CT vs. Conventional-Dose CT in Adolescents and Young Adults with Suspected Appendicitis: An Analysis of Large Pragmatic Randomized Trial Data

  • Jungheum Cho;Youngjune Kim;Seungjae Lee;Hooney Daniel Min;Yousun Ko;Choong Guen Chee;Hae Young Kim;Ji Hoon Park;Kyoung Ho Lee;LOCAT Group
    • Korean Journal of Radiology
    • /
    • v.23 no.4
    • /
    • pp.413-425
    • /
    • 2022
  • Objective: We compared appendiceal visualization on 2-mSv CT vs. conventional-dose CT (median 7 mSv) in adolescents and young adults and analyzed the undesirable clinical and diagnostic outcomes that followed appendiceal nonvisualization. Materials and Methods: A total of 3074 patients aged 15-44 years (mean ± standard deviation, 28 ± 9 years; 1672 female) from 20 hospitals were randomized to the 2-mSv CT or conventional-dose CT group (1535 vs. 1539) from December 2013 through August 2016. A total of 161 radiologists from 20 institutions prospectively rated appendiceal visualization (grade 0, not identified; grade 1, unsure or partly visualized; and grade 2, clearly and entirely visualized) and the presence of appendicitis in these patients. The final diagnosis was based on CT imaging and surgical, pathologic, and clinical findings. We analyzed undesirable clinical or diagnostic outcomes, such as negative appendectomy, perforated appendicitis, more extensive than simple appendectomy, delay in patient management, or incorrect CT diagnosis, which followed appendiceal nonvisualization (defined as grade 0 or 1) and compared the outcomes between the two groups. Results: In the 2-mSv CT and conventional-dose CT groups, appendiceal visualization was rated as grade 0 in 41 (2.7%) and 18 (1.2%) patients, respectively; grade 1 in 181 (11.8%) and 81 (5.3%) patients, respectively; and grade 2 in 1304 (85.0%) and 1421 (92.3%) patients, respectively (p < 0.001). Overall, undesirable outcomes were rare in both groups. Compared to the conventional-dose CT group, the 2-mSv CT group had slightly higher rates of perforated appendicitis (1.1% [17] vs. 0.5% [7], p = 0.06) and false-negative diagnoses (0.4% [6] vs. 0.0% [0], p = 0.01) following appendiceal nonvisualization. Otherwise, these two groups were comparable. Conclusion: The use of 2-mSv CT instead of conventional-dose CT impairs appendiceal visualization in more patients. However, appendiceal nonvisualization on 2-mSv CT rarely leads to undesirable clinical or diagnostic outcomes.

Quality Analysis of Three-Dimensional Geo-spatial Information Using Digital Photogrammetry (수치사진측량 기법을 이용한 3차원 공간정보의 품질 분석)

  • Lee, Hyun-Jik;Ru, Ji-Ho;Kim, Sang-Youn
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.18 no.4
    • /
    • pp.141-149
    • /
    • 2010
  • Three-dimensional geo-spatial information is important for the efficient use and management of the country and the three-dimensional expression and analysis of urban projects, such as urban plans devised by local governments and urban management. Thanks to the revitalization of the geo-spatial information service industry, it is now being variously used not only in public but also private areas. For the creation of high-guiltily three-dimensional geo-spatial information, emphasis should be placed on not only the quality of the source image and three-dimensional geo-spatial model but also the level of visualization, such as level of detail and texturing. However, in the case of existing three-dimensional geo-spatial information, its establishment process is complicated and its data are not updated frequently enough, as it uses ready-created digital maps. In addition, as it uses Ortho Images, the images exist Relief displacement. As a result, the visibility is low and the three-dimensional models of artificial features are simplified to reach LoD between 2 and 3, making the images look less realistic. Therefore, this paper, analyzed the quality of three-dimensional geo-spatial information created using the three-dimensional modeling technique were applied using Digital photogrammetry technique, using digital aerial photo images by an existing large-format digital camera and multi-looking camera. The analysis of the accuracy of visualization information of three-dimensional models showed that the source image alone, without other visualization information, secured the accuracy of 84% or more and that the establishment of three-dimensional spatial information carried out simultaneously with filming made it easier to gain the latest data. The analysis of the location accuracy of true Ortho images used in the work process showed that the location accuracy was better than the allowable horizontal position accuracy of 1:1,000 digital maps.

BIM Mesh Optimization Algorithm Using K-Nearest Neighbors for Augmented Reality Visualization (증강현실 시각화를 위해 K-최근접 이웃을 사용한 BIM 메쉬 경량화 알고리즘)

  • Pa, Pa Win Aung;Lee, Donghwan;Park, Jooyoung;Cho, Mingeon;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.2
    • /
    • pp.249-256
    • /
    • 2022
  • Various studies are being actively conducted to show that the real-time visualization technology that combines BIM (Building Information Modeling) and AR (Augmented Reality) helps to increase construction management decision-making and processing efficiency. However, when large-capacity BIM data is projected into AR, there are various limitations such as data transmission and connection problems and the image cut-off issue. To improve the high efficiency of visualizing, a mesh optimization algorithm based on the k-nearest neighbors (KNN) classification framework to reconstruct BIM data is proposed in place of existing mesh optimization methods that are complicated and cannot adequately handle meshes with numerous boundaries of the 3D models. In the proposed algorithm, our target BIM model is optimized with the Unity C# code based on triangle centroid concepts and classified using the KNN. As a result, the algorithm can check the number of mesh vertices and triangles before and after optimization of the entire model and each structure. In addition, it is able to optimize the mesh vertices of the original model by approximately 56 % and the triangles by about 42 %. Moreover, compared to the original model, the optimized model shows no visual differences in the model elements and information, meaning that high-performance visualization can be expected when using AR devices.

Building Large-scale CityGML Feature for Digital 3D Infrastructure (디지털 3D 인프라 구축을 위한 대규모 CityGML 객체 생성 방법)

  • Jang, Hanme;Kim, HyunJun;Kang, HyeYoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.3
    • /
    • pp.187-201
    • /
    • 2021
  • Recently, the demand for a 3D urban spatial information infrastructure for storing, operating, and analyzing a large number of digital data produced in cities is increasing. CityGML is a 3D spatial information data standard of OGC (Open Geospatial Consortium), which has strengths in the exchange and attribute expression of city data. Cases of constructing 3D urban spatial data in CityGML format has emerged on several cities such as Singapore and New York. However, the current ecosystem for the creation and editing of CityGML data is limited in constructing CityGML data on a large scale because of lack of completeness compared to commercial programs used to construct 3D data such as sketchup or 3d max. Therefore, in this study, a method of constructing CityGML data is proposed using commercial 3D mesh data and 2D polygons that are rapidly and automatically produced through aerial LiDAR (Light Detection and Ranging) or RGB (Red Green Blue) cameras. During the data construction process, the original 3D mesh data was geometrically transformed so that each object could be expressed in various CityGML LoD (Levels of Detail), and attribute information extracted from the 2D spatial information data was used as a supplement to increase the utilization as spatial information. The 3D city features produced in this study are CityGML building, bridge, cityFurniture, road, and tunnel. Data conversion for each feature and property construction method were presented, and visualization and validation were conducted.