• Title/Summary/Keyword: Pre-processing Process

Search Result 483, Processing Time 0.04 seconds

Descent Dataset Generation and Landmark Extraction for Terrain Relative Navigation on Mars (화성 지형상대항법을 위한 하강 데이터셋 생성과 랜드마크 추출 방법)

  • Kim, Jae-In
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1015-1023
    • /
    • 2022
  • The Entry-Descent-Landing process of a lander involves many environmental and technical challenges. To solve these problems, recently, terrestrial relative navigation (TRN) technology has been essential for landers. TRN is a technology for estimating the position and attitude of a lander by comparing Inertial Measurement Unit (IMU) data and image data collected from a descending lander with pre-built reference data. In this paper, we present a method for generating descent dataset and extracting landmarks, which are key elements for developing TRN technologies to be used on Mars. The proposed method generates IMU data of a descending lander using a simulated Mars landing trajectory and generates descent images from high-resolution ortho-map and digital elevation map through a ray tracing technique. Landmark extraction is performed by an area-based extraction method due to the low-textured surfaces on Mars. In addition, search area reduction is carried out to improve matching accuracy and speed. The performance evaluation result for the descent dataset generation method showed that the proposed method can generate images that satisfy the imaging geometry. The performance evaluation result for the landmark extraction method showed that the proposed method ensures several meters of positioning accuracy while ensuring processing speed as fast as the feature-based methods.

Automatic Generation of Bibliographic Metadata with Reference Information for Academic Journals (학술논문 내에서 참고문헌 정보가 포함된 서지 메타데이터 자동 생성 연구)

  • Jeong, Seonki;Shin, Hyeonho;Ji, Seon-Yeong;Choi, Sungphil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.3
    • /
    • pp.241-264
    • /
    • 2022
  • Bibliographic metadata can help researchers effectively utilize essential publications that they need and grasp academic trends of their own fields. With the manual creation of the metadata costly and time-consuming. it is nontrivial to effectively automatize the metadata construction using rule-based methods due to the immoderate variety of the article forms and styles according to publishers and academic societies. Therefore, this study proposes a two-step extraction process based on rules and deep neural networks for generating bibliographic metadata of scientific articlles to overcome the difficulties above. The extraction target areas in articles were identified by using a deep neural network-based model, and then the details in the areas were analyzed and sub-divided into relevant metadata elements. IThe proposed model also includes a model for generating reference summary information, which is able to separate the end of the text and the starting point of a reference, and to extract individual references by essential rule set, and to identify all the bibliographic items in each reference by a deep neural network. In addition, in order to confirm the possibility of a model that generates the bibliographic information of academic papers without pre- and post-processing, we conducted an in-depth comparative experiment with various settings and configurations. As a result of the experiment, the method proposed in this paper showed higher performance.

Relationship between Alcohol Use Disorders Identification Test Fractional Anisotropy Value of Diffusion Tensor Image in Brain White Matter Region (알코올 선별 검사법(Alcohol Use Disorders Identification Test)과 뇌 백질 영역의 확산텐서 비등방도 계측 값의 관련성)

  • Lee, Chi Hyung;Kim, Gyeong Rip;Kwak, Jong Hyeok
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.5
    • /
    • pp.575-583
    • /
    • 2022
  • Magnetic resonance diffusion tensor imaging (DTI) has revealed the disruption of brain white matter microstructure in normal aging and alcoholism undetectable with conventional structural MR imaging. we plan to analyze the FA measurements of the ROI of dangerous drinkers selected from Alcohol Use Disorders Identification Test (AUDIT) and Tract-Based Spatial Statics (TBSS) tool was used to extract FA values in the ROI from the image acquired through the pre-processing process. TBSS has a higher sensitivity of the FA value and MD value in the white matter than the brain gray matter, and has the advantage of quantitatively deriving the unlimited degree of brain nerve fibers, and more specialized in the brain white matter. We plan to analyze the fractional anisotropy (FA) measurement value for damage by selecting the center of the anatomical structure of the white matter region of the brain with high anisotropy among the brain neural networks that are particularly vulnerable to alcohol as the region of interest (ROI). In this study, we expected that alcohol causes damage to the brain white matter microstructure from FA value in various areas including both Choroid plexus. Especially, In the case of the moderate drunker, the mean value of FA in Lt, Rt. Choroid plexus was 0.2831 and 0.2872, whereas, in the case of the severe drunker, the mean value of FA was 0.1972 and 0.1936. We found that the higher the score on the AUDIT scale, the lower the FA value in ROI region of the brain white matter. Using the AUDIT scale, the guideline for the FA value of DTI can be presented, and it is possible to select a significant number of potentially severe drinkers. In other words, AUDIT was proved as useful tool in screening and discrimination of severe drunker through DTI.

Designing a Blockchain-based Smart Contract for Seafarer Wage Payment (블록체인 기반 선원 임금지불을 위한 스마트 컨트랙트 설계)

  • Yoo, Sang-Lok;Kim, Kwang-Il;Ahn, Jang-Young
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.7
    • /
    • pp.1038-1043
    • /
    • 2021
  • Guaranteed seafarer wage payment is essential to ensure a stable supply of seafarers. However, disputes over non-payment of wages to seafarers often occur. In this study, an automatic wage payment system was designed using a blockchain-based smart contract to resolve the problem of seafarers' wage arrears. The designed system consists of an information register, a matching processing unit, a review rating management unit, and wage remittance before deploying smart contracts. The matching process was designed to send an automatic notification to seafarers and shipowners if the sum of the weight of the four variables, namely wages, ship type/fishery, position, and license, exceeded a pre-defined threshold. In addition, a review rating management system, based on a combination of mean and median, was presented to serve as a medium to mutually fulfill the normal working conditions. The smart contract automatically fulfills the labor contract between the parties without an intermediary. This system will naturally resolve problems such as fraudulent advance payment to seafarers, embezzlement by unregistered employment agencies, overdue wages, and forgery of seafarers' books. If this system design is commercialized and institutionally activated, it is expected that stable wages will be guaranteed to seafarers, and in turn, the difficulties in human resources supply will be solved. We plan to test it in a local environment for further developing this system.

Development of SVM-based Construction Project Document Classification Model to Derive Construction Risk (건설 리스크 도출을 위한 SVM 기반의 건설프로젝트 문서 분류 모델 개발)

  • Kang, Donguk;Cho, Mingeon;Cha, Gichun;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.841-849
    • /
    • 2023
  • Construction projects have risks due to various factors such as construction delays and construction accidents. Based on these construction risks, the method of calculating the construction period of the construction project is mainly made by subjective judgment that relies on supervisor experience. In addition, unreasonable shortening construction to meet construction project schedules delayed by construction delays and construction disasters causes negative consequences such as poor construction, and economic losses are caused by the absence of infrastructure due to delayed schedules. Data-based scientific approaches and statistical analysis are needed to solve the risks of such construction projects. Data collected in actual construction projects is stored in unstructured text, so to apply data-based risks, data pre-processing involves a lot of manpower and cost, so basic data through a data classification model using text mining is required. Therefore, in this study, a document-based data generation classification model for risk management was developed through a data classification model based on SVM (Support Vector Machine) by collecting construction project documents and utilizing text mining. Through quantitative analysis through future research results, it is expected that risk management will be possible by being used as efficient and objective basic data for construction project process management.

Cost Analysis of the Recent Projects for Overseas Vanadium Metallurgical Processing Plants (해외 바나듐 제련 플랜트 관련 사업 비용 분석)

  • Gyuri Kim;Sang-hun Lee
    • Resources Recycling
    • /
    • v.33 no.3
    • /
    • pp.3-11
    • /
    • 2024
  • This study addressed the cost structure of metallurgical plants for vanadium recovery or production, which were previously planned or implemented. Vanadium metallurgy consists of several sub-processes such as such as pretreatment, roasting, leaching, precipitation, and filtration, in order to finally produce vanadium pentoxide. Here, lots of costs should be spent for such plants, in which these costs are largely divided into CAPEX (Capital Expenditure) and OPEX (Operational Expenditure). As a result, the capacities (feed input rates) and vanadium contents are various along the target projects for this study. However, final production rates and grades of vanadium pentoxide showed relatively small differences. In addition, a noticeable correlation is found between capacities and specific operating costs, in that a steadily decreasing trend is described with a non-linear curve with around -0.3 power. Therefore, for the plant capacity below 100,000 tons per year, the specific operating cost rapidly decreases as the capacity increases, whereas the cost remains relatively stable in the range of 0.6 to 1.2 million tons per year of the capacity. From a technical perspective, effective optimization of the metallurgical process plant can be achieved by improving vanadium recovery rate in the pre-treatment and/or roasting-leaching processes. Finally, the results of this study should be updated through future research with on-going field verification and further detailed cost analysis.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.127-146
    • /
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Current status and future of insect smart factory farm using ICT technology (ICT기술을 활용한 곤충스마트팩토리팜의 현황과 미래)

  • Seok, Young-Seek
    • Food Science and Industry
    • /
    • v.55 no.2
    • /
    • pp.188-202
    • /
    • 2022
  • In the insect industry, as the scope of application of insects is expanded from pet insects and natural enemies to feed, edible and medicinal insects, the demand for quality control of insect raw materials is increasing, and interest in securing the safety of insect products is increasing. In the process of expanding the industrial scale, controlling the temperature and humidity and air quality in the insect breeding room and preventing the spread of pathogens and other pollutants are important success factors. It requires a controlled environment under the operating system. European commercial insect breeding facilities have attracted considerable investor interest, and insect companies are building large-scale production facilities, which became possible after the EU approved the use of insect protein as feedstock for fish farming in July 2017. Other fields, such as food and medicine, have also accelerated the application of cutting-edge technology. In the future, the global insect industry will purchase eggs or small larvae from suppliers and a system that focuses on the larval fattening, i.e., production raw material, until the insects mature, and a system that handles the entire production process from egg laying, harvesting, and initial pre-treatment of larvae., increasingly subdivided into large-scale production systems that cover all stages of insect larvae production and further processing steps such as milling, fat removal and protein or fat fractionation. In Korea, research and development of insect smart factory farms using artificial intelligence and ICT is accelerating, so insects can be used as carbon-free materials in secondary industries such as natural plastics or natural molding materials as well as existing feed and food. A Korean-style customized breeding system for shortening the breeding period or enhancing functionality is expected to be developed soon.