• Title/Summary/Keyword: Generate Data

Search Result 3,065, Processing Time 0.035 seconds

Semantic Process Retrieval with Similarity Algorithms (유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안)

  • Lee, Hong-Joo;Klein, Mark
    • Asia pacific journal of information systems
    • /
    • v.18 no.1
    • /
    • pp.79-96
    • /
    • 2008
  • One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Data issue and Improvement Direction for Marine Spatial Planning (해양공간계획 지원을 위한 정보 현안 및 개선 방향 연구)

  • CHANG, Min-Chol;PARK, Byung-Moon;CHOI, Yun-Soo;CHOI, Hee-Jung;KIM, Tae-Hoon;LEE, Bang-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.175-190
    • /
    • 2018
  • Recently, policy of the marine advanced countries were switched from the preemption using ocean to post-project development. In this study, we suggest improvement and the pending issues when are deducted to the database of the marine spatial information is constructed over the GIS system for the Korean Marine Spatial Planning (KMSP). More than 250 spatial information in the seas of Korea were processed in order of data collection, GIS transformation, data analysis and processing, data grouping, and space mapping. It's process had some problem occurred to error of coordinate system, digitizing process for lack of the spatial information, performed by overlapping for the original marine spatial information, and so on. Moreover, solution is needed to data processing methods excluding personal information which is necessary when produce the spatial data for analysis of the used marine status and minimized method for different between the spatial information based GIS system and the based real information. Therefore, collection and securing system of lacking marine spatial information is enhanced for marine spatial planning. it is necessary to link and expand marine fisheries survey system. It is needed to the marine spatial planning. The marine spatial planning is required to the evaluation index of marine spatial and detailed marine spatial map. In addition, Marine spatial planning is needed to standard guideline and system of quality management. This standard guideline generate to phase for production, processing, analysis, and utilization. Also, the quality management system improve for the information quality of marine spatial information. Finally, we suggest necessity need for the depths study which is considered as opening extension of the marine spatial information and deduction on application model.

A Study on the Data Driven Neural Network Model for the Prediction of Time Series Data: Application of Water Surface Elevation Forecasting in Hangang River Bridge (시계열 자료의 예측을 위한 자료 기반 신경망 모델에 관한 연구: 한강대교 수위예측 적용)

  • Yoo, Hyungju;Lee, Seung Oh;Choi, Seohye;Park, Moonhyung
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.2
    • /
    • pp.73-82
    • /
    • 2019
  • Recently, as the occurrence frequency of sudden floods due to climate change increased, the flood damage on riverside social infrastructures was extended so that there has been a threat of overflow. Therefore, a rapid prediction of potential flooding in riverside social infrastructure is necessary for administrators. However, most current flood forecasting models including hydraulic model have limitations which are the high accuracy of numerical results but longer simulation time. To alleviate such limitation, data driven models using artificial neural network have been widely used. However, there is a limitation that the existing models can not consider the time-series parameters. In this study the water surface elevation of the Hangang River bridge was predicted using the NARX model considering the time-series parameter. And the results of the ANN and RNN models are compared with the NARX model to determine the suitability of NARX model. Using the 10-year hydrological data from 2009 to 2018, 70% of the hydrological data were used for learning and 15% was used for testing and evaluation respectively. As a result of predicting the water surface elevation after 3 hours from the Hangang River bridge in 2018, the ANN, RNN and NARX models for RMSE were 0.20 m, 0.11 m, and 0.09 m, respectively, and 0.12 m, 0.06 m, and 0.05 m for MAE, and 1.56 m, 0.55 m and 0.10 m for peak errors respectively. By analyzing the error of the prediction results considering the time-series parameters, the NARX model is most suitable for predicting water surface elevation. This is because the NARX model can learn the trend of the time series data and also can derive the accurate prediction value even in the high water surface elevation prediction by using the hyperbolic tangent and Rectified Linear Unit function as an activation function. However, the NARX model has a limit to generate a vanishing gradient as the sequence length becomes longer. In the future, the accuracy of the water surface elevation prediction will be examined by using the LSTM model.

Automatic Target Recognition Study using Knowledge Graph and Deep Learning Models for Text and Image data (지식 그래프와 딥러닝 모델 기반 텍스트와 이미지 데이터를 활용한 자동 표적 인식 방법 연구)

  • Kim, Jongmo;Lee, Jeongbin;Jeon, Hocheol;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.145-154
    • /
    • 2022
  • Automatic Target Recognition (ATR) technology is emerging as a core technology of Future Combat Systems (FCS). Conventional ATR is performed based on IMINT (image information) collected from the SAR sensor, and various image-based deep learning models are used. However, with the development of IT and sensing technology, even though data/information related to ATR is expanding to HUMINT (human information) and SIGINT (signal information), ATR still contains image oriented IMINT data only is being used. In complex and diversified battlefield situations, it is difficult to guarantee high-level ATR accuracy and generalization performance with image data alone. Therefore, we propose a knowledge graph-based ATR method that can utilize image and text data simultaneously in this paper. The main idea of the knowledge graph and deep model-based ATR method is to convert the ATR image and text into graphs according to the characteristics of each data, align it to the knowledge graph, and connect the heterogeneous ATR data through the knowledge graph. In order to convert the ATR image into a graph, an object-tag graph consisting of object tags as nodes is generated from the image by using the pre-trained image object recognition model and the vocabulary of the knowledge graph. On the other hand, the ATR text uses the pre-trained language model, TF-IDF, co-occurrence word graph, and the vocabulary of knowledge graph to generate a word graph composed of nodes with key vocabulary for the ATR. The generated two types of graphs are connected to the knowledge graph using the entity alignment model for improvement of the ATR performance from images and texts. To prove the superiority of the proposed method, 227 documents from web documents and 61,714 RDF triples from dbpedia were collected, and comparison experiments were performed on precision, recall, and f1-score in a perspective of the entity alignment..

The Experimental Study on the Evaluation of Tidal Power Generation Output Using Water Tank (수조를 이용한 조력발전량산정에 관한 실험적 연구)

  • Jeong, Shin-Taek;Kim, Jeong-Dae;Ko, Dong-Hui;Choi, Woo-Jung;Oh, Nam-Sun
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.20 no.2
    • /
    • pp.232-237
    • /
    • 2008
  • A method to generate electric power from small scale water tank. For this purpose, manufacturing tank is investigated, measuring water level change at any time, and finally comparing experimental and theoretical value, are performed. Inner and outer tank are made to simulate flood and ebb generation. Two sets of pipe are connected between tanks, and experiments are performed under varying flowrate. Coefficients of flowrate are calculated comparing water level change data and theoretical value. Measured and theoretical water levels are highly correlated, and this ascertains that analytical equation simulates real water level changes well. Flowrate change depending on the existence of propeller and valve, on flood and ebb generation, shows the necessity of experiments in the process of manufacturing electric power system. Moreover, total energy calculated from experimental data agrees well with that of theoretical equation. In spite of small tidal power output, this generating system with optimum water tank can be applied to any place where high water level change occurs, and can make a contribution to producing new and renewable energy consequently.

Generation of Progressively Sampled DTM using Model Key Points Extracted from Contours in Digital Vector Maps (수치지도 등고선의 Model Key Point 추출과 Progressive Sampling에 의한 수치지형모델 생성)

  • Lee, Sun-Geun;Yom, Jae-Hong;Lim, Sae-Bom;Kim, Kye-Lim;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.25 no.6_2
    • /
    • pp.645-651
    • /
    • 2007
  • In general, contours in digital vector maps, which represent terrain characteristics and shape, are created by 3D digitizing the same height points using aerial photographs on the analytical or digital plotters with stereoscopic viewing. Hence, it requires lots of task, and subjective decision and experience of the operators. DTMs are generated indirectly by using contours since the national digital maps do not include digital terrain model (DTM) data. In this study, model key points which depict the important information about terrain characteristics were extracted from the contours. Further, determination of the efficient and flexible grid sizes were proposed to generate optimal DTM in terms of both quantitative and qualitative aspects. For this purpose, a progressive sampling technique was implemented, i.e., the smaller grid sizes are assigned for the mountainous areas where have large relief while the larger grid sizes are assigned for the relatively flat areas. In consequence, DTMs with multi-grid for difference areas could be generated instead of DTMs with a fixed grid size. The multi-grid DTMs reduce computations for data processing and provide fast display.

Usefulness of CT based SPECT Fusion Image in the lung Disease : Preliminary Study (폐질환의 SPECT와 CT 융합영상의 유용성: 초기연구)

  • Park, Hoon-Hee;Kim, Tae-Hyung;Shin, Ji-Yun;Lee, Tae-Soo;Lyu, Kwang-Yeul
    • Journal of radiological science and technology
    • /
    • v.35 no.1
    • /
    • pp.59-64
    • /
    • 2012
  • Recently, SPECT/CT system has been applied to many diseases, however, the application is not extensively applied at pulmonary disease. Especially, in case that, the pulmonary embolisms suspect at the CT images, SPECT is performed. For the accurate diagnosis, SPECT/CT tests are subsequently undergoing.However, without SPECT/CT, there are some limitations to apply these procedures. With SPECT/CT, although, most of the examination performed after CT. Moreover, such a test procedures generate unnecessary dual irradiation problem to the patient. In this study, we evaluated the amount of unnecessary irradiation, and the usefulness of fusion images of pulmonary disease, which independently acquired from SPECT and CT. Using NEMA PhantomTM (NU2-2001), SPECT and CT scan were performed for fusion images. From June 2011 to September 2010, 10 patients who didn't have other personal history, except lung disease were selected (male: 7, female: 3, mean age: $65.3{\pm}12.7$). In both clinical patient and phantom data, the fusion images scored higher than SPECT and CT images. The fusion images, which is combined with pulmonary vessel images from CT and functional images from SPECT, can increase the detection possibility in detecting pulmonary embolism in the resin of lung parenchyma. It is sure that performing SPECT and CT in integral SPECT/CT system were better. However, we believe this protocol can give more informative data to have more accurate diagnosis in the hospital without integral SPECT/CT system.

Mapping Technique for Heavy Snowfall Distribution Using Terra MODIS Images and Ground Measured Snowfall Data (Terra MODIS 영상과 지상 적설심 자료를 이용한 적설분포도 구축기법 연구)

  • Kim, Saet-Byul;Shin, Hyung-Jin;Lee, Ji-Wan;Yu, Young-Seok;Kim, Seong-Joon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.14 no.4
    • /
    • pp.33-43
    • /
    • 2011
  • This study is to make snowfall distribution map for the 4 heavy snowfall events of January 2001, March of 2004, December of 2005 and January of 2010, and compare the results for three cases of construction methods. The cases are to generate the map by applying IDW(Inverse Distance Weighting) interpolation to 76 ground measured snowfall point data (Snow Depth Map; SDM), mask out the SDM with the MODIS snow cover area (MODIS SCA) of Terra MODIS (MODerate resolution Imaging Spectroradiometer) (SDM+MODIS SCA; SDM_M), and consider the snowdepth lapse rate of snowfall by elevation (Digital Elevation Model; DEM) to the second case (SDM_M+DEM; SDM_MD). By applying the MODIS SCA, the SCA of 4 events was 62.9%, 44.1%, 52.0%, and 69.0% for the area of South Korea. For the average snow depth, the SDM_M decreased 0.9cm, 1.9cm, 0.8cm, and 1.5cm compared to SDM and the SDM_MD increased 1.3cm, 0.9cm, 0.4cm, and 1.2cm respectively.

Pre/Post processor for structural analysis simulation integration with open source solver (Calculix, Code_Aster) (오픈소스 솔버(Calculix, Code_Aster)를 통합한 구조해석 시뮬레이션 전·후처리기 개발)

  • Seo, Dong-Woo;Kim, Jae-Sung;Kim, Myung-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.9
    • /
    • pp.425-435
    • /
    • 2017
  • Structural analysis is used not only for large enterprises, but also for small and medium sized ones, as a necessary procedure for strengthening the certification process for product delivery and shortening the time in the process from concept design to detailed design. Open-source solvers that can be used atlow cost differ from commercial solvers. If there is a problem with the input data, such as with the grid, errors or failures can occur in the calculation step. In this paper, we propose a pre- and post-processor that can be easily applied to the analysis of mechanical structural problems by using the existing structural analysis open source solver (Caculix, Code_Aster). In particular, we propose algorithms for analyzing different types of data using open source solvers in order to extract and generate accurate information,such as 3D models, grids and simulation conditions, and develop and apply information analysis. In addition, to improve the accuracy of open source solvers and to prevent errors, we created a grid that matches the solver characteristics and developed an automatic healing function for the grid model. Finally, to verify the accuracy of the system, the verification and utilization results are compared with the software used.