• Title/Summary/Keyword: 정형모델

Search Result 569, Processing Time 0.027 seconds

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

Development of an Efficiency Calibration Model Optimization Method for Improving In-Situ Gamma-Ray Measurement for Non-Standard NORM Residues (비정형 공정부산물 In-Situ 감마선 측정 정확도 향상을 위한 효율교정 모델 최적화 방법 개발)

  • WooCheol Choi;Tae-Hoon Jeon;Jung-Ho Song;KwangPyo Kim
    • Journal of Radiation Industry
    • /
    • v.17 no.4
    • /
    • pp.471-479
    • /
    • 2023
  • In In-situ radioactivity measurement techniques, efficiency calibration models use predefined models to simulate a sample's geometry and radioactivity distribution. However, simplified efficiency calibration models lead to uncertainties in the efficiency curves, which in turn affect the radioactivity concentration results. This study aims to develop an efficiency calibration optimization methodology to improve the accuracy of in-situ gamma radiation measurements for byproducts from industrial facilities. To accomplish the objective, a drive mechanism for rotational measurement of an byproduct simulator and a sample was constructed. Using ISOCS, an efficiency calibration model of the designed object was generated. Then, the sensitivity analysis of the efficiency calibration model was performed, and the efficiency curve of the efficiency calibration model was optimized using the sensitivity analysis results. Finally, the radiation concentration of the simulated subject was estimated, compared, and evaluated with the designed certification value. For the sensitivity assessment of the influencing factors of the efficiency calibration model, the ISOCS Uncertainty Estimator was used for the horizontal and vertical size and density of the measured object. The standard deviation of the measurement efficiency as a function of the longitudinal size and density of the efficiency calibration model decreased with increasing energy region. When using the optimized efficiency calibration model, the measurement efficiency using IUE was improved compared to the measurement efficiency using ISOCS at the energy of 228Ac (911 keV) for the nuclide under analysis. Using the ISOCS efficiency calibration method, the difference between the measured radiation concentration and the design value for each simulated subject measurement direction was 4.1% (1% to 10%) on average. The difference between the estimated radioactivity concentration and the design value was 3.6% (1~8%) on average when using the ISOCS IUE efficiency calibration method, which was closer to the design value than the efficiency calibration method using ISOCS. In other words, the estimated radioactivity concentration using the optimized efficiency curve was similar to the designed radioactivity concentration. The results of this study can be utilized as the main basis for the development of regulatory technologies for the treatment and disposal of waste generated during the operation, maintenance, and facility replacement of domestic byproduct generation facilities.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Analysis on the Snow Cover Variations at Mt. Kilimanjaro Using Landsat Satellite Images (Landsat 위성영상을 이용한 킬리만자로 만년설 변화 분석)

  • Park, Sung-Hwan;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.4
    • /
    • pp.409-420
    • /
    • 2012
  • Since the Industrial Revolution, CO2 levels have been increasing with climate change. In this study, Analyze time-series changes in snow cover quantitatively and predict the vanishing point of snow cover statistically using remote sensing. The study area is Mt. Kilimanjaro, Tanzania. 23 image data of Landsat-5 TM and Landsat-7 ETM+, spanning the 27 years from June 1984 to July 2011, were acquired. For this study, first, atmospheric correction was performed on each image using the COST atmospheric correction model. Second, the snow cover area was extracted using the NDSI (Normalized Difference Snow Index) algorithm. Third, the minimum height of snow cover was determined using SRTM DEM. Finally, the vanishing point of snow cover was predicted using the trend line of a linear function. Analysis was divided using a total of 23 images and 17 images during the dry season. Results show that snow cover area decreased by approximately $6.47km^2$ from $9.01km^2$ to $2.54km^2$, equivalent to a 73% reduction. The minimum height of snow cover increased by approximately 290 m, from 4,603 m to 4,893 m. Using the trend line result shows that the snow cover area decreased by approximately $0.342km^2$ in the dry season and $0.421km^2$ overall each year. In contrast, the annual increase in the minimum height of snow cover was approximately 9.848 m in the dry season and 11.251 m overall. Based on this analysis of vanishing point, there will be no snow cover 2020 at 95% confidence interval. This study can be used to monitor global climate change by providing the change in snow cover area and reference data when studying this area or similar areas in future research.

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.

Observation of Ice Gradient in Cheonji, Baekdu Mountain Using Modified U-Net from Landsat -5/-7/-8 Images (Landsat 위성 영상으로부터 Modified U-Net을 이용한 백두산 천지 얼음변화도 관측)

  • Lee, Eu-Ru;Lee, Ha-Seong;Park, Sun-Cheon;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1691-1707
    • /
    • 2022
  • Cheonji Lake, the caldera of Baekdu Mountain, located on the border of the Korean Peninsula and China, alternates between melting and freezing seasonally. There is a magma chamber beneath Cheonji, and variations in the magma chamber cause volcanic antecedents such as changes in the temperature and water pressure of hot spring water. Consequently, there is an abnormal region in Cheonji where ice melts quicker than in other areas, freezes late even during the freezing period, and has a high-temperature water surface. The abnormal area is a discharge region for hot spring water, and its ice gradient may be used to monitor volcanic activity. However, due to geographical, political and spatial issues, periodic observation of abnormal regions of Cheonji is limited. In this study, the degree of ice change in the optimal region was quantified using a Landsat -5/-7/-8 optical satellite image and a Modified U-Net regression model. From January 22, 1985 to December 8, 2020, the Visible and Near Infrared (VNIR) band of 83 Landsat images including anomalous regions was utilized. Using the relative spectral reflectance of water and ice in the VNIR band, unique data were generated for quantitative ice variability monitoring. To preserve as much information as possible from the visible and near-infrared bands, ice gradient was noticed by applying it to U-Net with two encoders, achieving good prediction accuracy with a Root Mean Square Error (RMSE) of 140 and a correlation value of 0.9968. Since the ice change value can be seen with high precision from Landsat images using Modified U-Net in the future may be utilized as one of the methods to monitor Baekdu Mountain's volcanic activity, and a more specific volcano monitoring system can be built.

A Study on Intelligent Skin Image Identification From Social media big data

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.191-203
    • /
    • 2022
  • In this paper, we developed a system that intelligently identifies skin image data from big data collected from social media Instagram and extracts standardized skin sample data for skin condition diagnosis and management. The system proposed in this paper consists of big data collection and analysis stage, skin image analysis stage, training data preparation stage, artificial neural network training stage, and skin image identification stage. In the big data collection and analysis stage, big data is collected from Instagram and image information for skin condition diagnosis and management is stored as an analysis result. In the skin image analysis stage, the evaluation and analysis results of the skin image are obtained using a traditional image processing technique. In the training data preparation stage, the training data were prepared by extracting the skin sample data from the skin image analysis result. And in the artificial neural network training stage, an artificial neural network AnnSampleSkin that intelligently predicts the skin image type using this training data was built up, and the model was completed through training. In the skin image identification step, skin samples are extracted from images collected from social media, and the image type prediction results of the trained artificial neural network AnnSampleSkin are integrated to intelligently identify the final skin image type. The skin image identification method proposed in this paper shows explain high skin image identification accuracy of about 92% or more, and can provide standardized skin sample image big data. The extracted skin sample set is expected to be used as standardized skin image data that is very efficient and useful for diagnosing and managing skin conditions.

Estimating Optimal Timber Production for the Economic and Public Functions of the National Forests in South Korea (국유림의 경제적·공익적 기능을 고려한 적정 목재생산량 추정)

  • Yujin Jeong;Younghwan Kim;Yoonseong Chang;Dooahn Kwak;Gihyun Park;Dayoung Kim;Hyungsik Jeong;Hee Han
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.4
    • /
    • pp.561-573
    • /
    • 2023
  • National forests have an advantage over private forests in terms of higher investment in capital, technology, and labor, allowing for more intensive management. As such, national forests are expected to serve not only as a strategic reserve of forest resources to address the long-term demand for timber but also to stably perform various essential forest functions demanded by society. However, most forest stands in the current national forests belong to the fourth age class or above, indicating an imminent timber harvesting period amid an imbalanced age class structure. Therefore, if timber harvesting is not conducted based on systematic management planning, it will become difficult to ensure the continuity of the national forests' diverse functions. This study was conducted to determine the optimal volume of timber production in the national forests to improve the age-class structure while sustainably maintaining their economic and public functions. To achieve this, the study first identified areas within the national forests suitable for timber production. Subsequently, a forest management planning model was developed using multi-objective linear programming, taking into account both the national forests' economic role and their public benefits. The findings suggest that approximately 488,000 hectares within the national forests are suitable for timber production. By focusing on management of these areas, it is possible to not only improve the age-class distribution but also to sustainably uphold the forests' public benefits. Furthermore, the potential volume of timber production from the national forests for the next 100 years would be around 2 million m3 per year, constituting about 44% of the annual domestic timber supply.

Analysis of the Effect of Corner Points and Image Resolution in a Mechanical Test Combining Digital Image Processing and Mesh-free Method (디지털 이미지 처리와 강형식 기반의 무요소법을 융합한 시험법의 모서리 점과 이미지 해상도의 영향 분석)

  • Junwon Park;Yeon-Suk Jeong;Young-Cheol Yoon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.37 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • In this paper, we present a DIP-MLS testing method that combines digital image processing with a rigid body-based MLS differencing approach to measure mechanical variables and analyze the impact of target location and image resolution. This method assesses the displacement of the target attached to the sample through digital image processing and allocates this displacement to the node displacement of the MLS differencing method, which solely employs nodes to calculate mechanical variables such as stress and strain of the studied object. We propose an effective method to measure the displacement of the target's center of gravity using digital image processing. The calculation of mechanical variables through the MLS differencing method, incorporating image-based target displacement, facilitates easy computation of mechanical variables at arbitrary positions without constraints from meshes or grids. This is achieved by acquiring the accurate displacement history of the test specimen and utilizing the displacement of tracking points with low rigidity. The developed testing method was validated by comparing the measurement results of the sensor with those of the DIP-MLS testing method in a three-point bending test of a rubber beam. Additionally, numerical analysis results simulated only by the MLS differencing method were compared, confirming that the developed method accurately reproduces the actual test and shows good agreement with numerical analysis results before significant deformation. Furthermore, we analyzed the effects of boundary points by applying 46 tracking points, including corner points, to the DIP-MLS testing method. This was compared with using only the internal points of the target, determining the optimal image resolution for this testing method. Through this, we demonstrated that the developed method efficiently addresses the limitations of direct experiments or existing mesh-based simulations. It also suggests that digitalization of the experimental-simulation process is achievable to a considerable extent.