• Title/Summary/Keyword: vectorization

Search Result 57, Processing Time 0.021 seconds

A Study on the Asphalt Road Boundary Extraction Using Shadow Effect Removal (그림자영향 소거를 통한 아스팔트 도로 경계추출에 관한 연구)

  • Yun Kong-Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.2
    • /
    • pp.123-129
    • /
    • 2006
  • High-resolution aerial color image offers great possibilities for geometric and semantic information for spatial data generation. However, shadow casts by buildings and trees in high-density urban areas obscure much of the information in the image giving rise to potentially inaccurate classification and inexact feature extraction. Though many researches have been implemented for solving shadow casts, few studies have been carried out about the extraction of features hindered by shadows from aerial color images in urban areas. This paper presents a asphalt road boundary extraction technique that combines information from aerial color image and LIDAR (LIght Detection And Ranging) data. The following steps have been performed to remove shadow effects and to extract road boundary from the image. First, the shadow regions of the aerial color image are precisely located using LEAR DSM (Digital Surface Model) and solar positions. Second, shadow regions assumed as road are corrected by shadow path reconstruction algorithms. After that, asphalt road boundary extraction is implemented by segmentation and edge detection. Finally, asphalt road boundary lines are extracted as vector data by vectorization technique. The experimental results showed that this approach was effective and great potential advantages.

An analysis study on the quality of article to improve the performance of hate comments discrimination (악성댓글 판별의 성능 향상을 위한 품사 자질에 대한 분석 연구)

  • Kim, Hyoung Ju;Min, Moon Jong;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.71-79
    • /
    • 2021
  • One of the social aspects that changes as the use of the Internet becomes widespread is communication in online space. In the past, only one-on-one conversations were possible remotely, except when they were physically in the same space, but nowadays, technology has been developed to enable communication with a large number of people remotely through bulletin boards, communities, and social network services. Due to the development of such information and communication networks, life becomes more convenient, and at the same time, the damage caused by rapid information exchange is also constantly increasing. Recently, cyber crimes such as sending sexual messages or personal attacks to certain people with recognition on the Internet, such as not only entertainers but also influencers, have occurred, and some of those exposed to these cybercrime have committed suicide. In this paper, in order to reduce the damage caused by malicious comments, research a method for improving the performance of discriminate malicious comments through feature extraction based on parts-of-speech.

Application of Fractal Dimension on Consistent Calculation of Coastline Length - Focused on Jeju Island (일관된 해안선 길이 산출을 위한 프랙탈 차원 적용 방안 연구 - 제주도를 중심으로 -)

  • Woo, Hee Sook;Kwon, Kwang Seok;Kim, Byung Guk;Cho, Seck Hyun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.4
    • /
    • pp.83-88
    • /
    • 2016
  • The use of consistent coastlines is an important element for the systematic management of maritime boundaries and the interests of local governments. The Hydrographic and Oceanographic Agency conducted a preliminary survey for consistent coastline production, since 2001. As a result, the length of coastline was different by year. Because of the lack of systematic management, the use of incorrect data, etc. We also changed the coastline on the sea chart to show on a digital map for realization of terrain expression method. However, there was a variation in shoreline length due to various surveying techniques and shoreline extraction methods. In this paper, the characteristics of Jeju-do coastline were analysed by using a modified divider method of fractal dimension. The accuracy of the vectorization was determined by converting the actual distance in the Public Survey Amendment for proper divider use. With 1:5,000 and 1:25,000 digital maps of Jeju-si and Seogwipo-si each fractal dimensions were calculated. Jeju-si=1.14 and Seogwipo-si=1.12 in 1: 5,000. Jeju-si=1.13 and Seogwipo-si=1.10 in 1: 25,000. Calculated fractal dimension were correlated to data from digital maps. It was considered that complexity and scale of coastlines affected. In the future coastline length statistics and minimum ratio of calculated coastline length to original length need to be determined for consistency of coastline length statistics.

A Study on Stroke Extraction for Handwritten Korean Character Recognition (필기체 한글 문자 인식을 위한 획 추출에 관한 연구)

  • Choi, Young-Kyoo;Rhee, Sang-Burm
    • The KIPS Transactions:PartB
    • /
    • v.9B no.3
    • /
    • pp.375-382
    • /
    • 2002
  • Handwritten character recognition is classified into on-line handwritten character recognition and off-line handwritten character recognition. On-line handwritten character recognition has made a remarkable outcome compared to off-line hacdwritten character recognition. This method can acquire the dynamic written information such as the writing order and the position of a stroke by means of pen-based electronic input device such as a tablet board. On the contrary, Any dynamic information can not be acquired in off-line handwritten character recognition since there are extreme overlapping between consonants and vowels, and heavily noisy images between strokes, which change the recognition performance with the result of the preprocessing. This paper proposes a method that effectively extracts the stroke including dynamic information of characters for off-line Korean handwritten character recognition. First of all, this method makes improvement and binarization of input handwritten character image as preprocessing procedure using watershed algorithm. The next procedure is extraction of skeleton by using the transformed Lu and Wang's thinning: algorithm, and segment pixel array is extracted by abstracting the feature point of the characters. Then, the vectorization is executed with a maximum permission error method. In the case that a few strokes are bound in a segment, a segment pixel array is divided with two or more segment vectors. In order to reconstruct the extracted segment vector with a complete stroke, the directional component of the vector is mortified by using right-hand writing coordinate system. With combination of segment vectors which are adjacent and can be combined, the reconstruction of complete stroke is made out which is suitable for character recognition. As experimentation, it is verified that the proposed method is suitable for handwritten Korean character recognition.

IoT data processing techniques based on machine learning optimized for AIoT environments (AIoT 환경에 최적화된 머신러닝 기반의 IoT 데이터 처리 기법)

  • Jeong, Yoon-Su;Kim, Yong-Tae
    • Journal of Industrial Convergence
    • /
    • v.20 no.3
    • /
    • pp.33-40
    • /
    • 2022
  • Recently, IoT-linked services have been used in various environments, and IoT and artificial intelligence technologies are being fused. However, since technologies that process IoT data stably are not fully supported, research is needed for this. In this paper, we propose a processing technique that can optimize IoT data after generating embedded vectors based on machine learning for IoT data. In the proposed technique, for processing efficiency, embedded vectorization is performed based on QR such as index of IoT data, collection location (binary values of X and Y axis coordinates), group index, type, and type. In addition, data generated by various IoT devices are integrated and managed so that load balancing can be performed in the IoT data collection process to asymmetrically link IoT data. The proposed technique processes IoT data to be orthogonalized based on hash so that IoT data can be asymmetrically grouped. In addition, interference between IoT data may be minimized because it is periodically generated and grouped according to IoT data types and characteristics. Future research plans to compare and evaluate proposed techniques in various environments that provide IoT services.

Automatic scoring of mathematics descriptive assessment using random forest algorithm (랜덤 포레스트 알고리즘을 활용한 수학 서술형 자동 채점)

  • Inyong Choi;Hwa Kyung Kim;In Woo Chung;Min Ho Song
    • The Mathematical Education
    • /
    • v.63 no.2
    • /
    • pp.165-186
    • /
    • 2024
  • Despite the growing attention on artificial intelligence-based automated scoring technology as a support method for the introduction of descriptive items in school environments and large-scale assessments, there is a noticeable lack of foundational research in mathematics compared to other subjects. This study developed an automated scoring model for two descriptive items in first-year middle school mathematics using the Random Forest algorithm, evaluated its performance, and explored ways to enhance this performance. The accuracy of the final models for the two items was found to be between 0.95 to 1.00 and 0.73 to 0.89, respectively, which is relatively high compared to automated scoring models in other subjects. We discovered that the strategic selection of the number of evaluation categories, taking into account the amount of data, is crucial for the effective development and performance of automated scoring models. Additionally, text preprocessing by mathematics education experts proved effective in improving both the performance and interpretability of the automated scoring model. Selecting a vectorization method that matches the characteristics of the items and data was identified as one way to enhance model performance. Furthermore, we confirmed that oversampling is a useful method to supplement performance in situations where practical limitations hinder balanced data collection. To enhance educational utility, further research is needed on how to utilize feature importance derived from the Random Forest-based automated scoring model to generate useful information for teaching and learning, such as feedback. This study is significant as foundational research in the field of mathematics descriptive automatic scoring, and there is a need for various subsequent studies through close collaboration between AI experts and math education experts.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.