• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.029 seconds

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

Detecting Rectangular Image Regions in a Window Image for 3D Conversion (3D 변환을 위한 윈도우영상에서 사각 이미지 영역 검출)

  • Gil, Jong In;Lee, Jun Seok;Kim, Manbae
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.795-807
    • /
    • 2013
  • In recent years, 2D-to-3D conversion techniques have gained much attraction. Most of conventional methods focused on natural images such as movie, animation and so forth. However, it is difficult to apply these techniques to window images mixed with text, image, logo, and icon. Also, different depth values of text pixels will cause distortion and a proper 3D image can not be delivered in some situations. To solve this problem, we propose a method to classify a given image into either a window or a natural image. For the window image, only rectangular image regions (RIR) are detected and converted in 3D. Other text and background are displayed in 2D. The proposed method was performed on more than 10,000 test images. In the experimental results, the detection ratio of window image reaches 97% and RIR detection ratio is 87%.

Recommendation Method of SNS Following to Category Classification of Image and Text Information (이미지와 텍스트 정보의 카테고리 분류에 의한 SNS 팔로잉 추천 방법)

  • Hong, Taek Eun;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.5 no.3
    • /
    • pp.54-61
    • /
    • 2016
  • According to many smart devices are development, SNS(Social Network Service) users are getting higher that is possible for real-time communicating, information sharing without limitations in distance and space. Nowadays, SNS users that based on communication and relationships, are getting uses SNS for information sharing. In this paper, we used the SNS posts for users to extract the category and information provider, how to following of recommend method. Particularly, this paper focuses on classifying the words in the text of the posts and measures the frequency using Inception-v3 model, which is one of the machine learning technique -CNN(Convolutional Neural Network) we classified image word. By classifying the category of a word in a text and image, that based on DMOZ to build the information provider DB. Comparing user categories classified in categories and posts from information provider DB. If the category is matched by measuring the degree of similarity to the information providers is classified in the category, we suggest that how to recommend method of the most similar information providers account.

Proposal for User-Product Attributes to Enhance Chatbot-Based Personalized Fashion Recommendation Service (챗봇 기반의 개인화 패션 추천 서비스 향상을 위한 사용자-제품 속성 제안)

  • Hyosun An;Sunghoon Kim;Yerim Choi
    • Journal of Fashion Business
    • /
    • v.27 no.3
    • /
    • pp.50-62
    • /
    • 2023
  • The e-commerce fashion market has experienced a remarkable growth, leading to an overwhelming availability of shared information and numerous choices for users. In light of this, chatbots have emerged as a promising technological solution to enhance personalized services in this context. This study aimed to develop user-product attributes for a chatbot-based personalized fashion recommendation service using big data text mining techniques. To accomplish this, over one million consumer reviews from Coupang, an e-commerce platform, were collected and analyzed using frequency analyses to identify the upper-level attributes of users and products. Attribute terms were then assigned to each user-product attribute, including user body shape (body proportion, BMI), user needs (functional, expressive, aesthetic), user TPO (time, place, occasion), product design elements (fit, color, material, detail), product size (label, measurement), and product care (laundry, maintenance). The classification of user-product attributes was found to be applicable to the knowledge graph of the Conversational Path Reasoning model. A testing environment was established to evaluate the usefulness of attributes based on real e-commerce users and purchased product information. This study is significant in proposing a new research methodology in the field of Fashion Informatics for constructing the knowledge base of a chatbot based on text mining analysis. The proposed research methodology is expected to enhance fashion technology and improve personalized fashion recommendation service and user experience with a chatbot in the e-commerce market.

Speech Emotion Recognition in People at High Risk of Dementia

  • Dongseon Kim;Bongwon Yi;Yugwon Won
    • Dementia and Neurocognitive Disorders
    • /
    • v.23 no.3
    • /
    • pp.146-160
    • /
    • 2024
  • Background and Purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia. Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition. Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%. Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

Sensitivity Identification Method for New Words of Social Media based on Naive Bayes Classification (나이브 베이즈 기반 소셜 미디어 상의 신조어 감성 판별 기법)

  • Kim, Jeong In;Park, Sang Jin;Kim, Hyoung Ju;Choi, Jun Ho;Kim, Han Il;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.51-59
    • /
    • 2020
  • From PC communication to the development of the internet, a new term has been coined on the social media, and the social media culture has been formed due to the spread of smart phones, and the newly coined word is becoming a culture. With the advent of social networking sites and smart phones serving as a bridge, the number of data has increased in real time. The use of new words can have many advantages, including the use of short sentences to solve the problems of various letter-limited messengers and reduce data. However, new words do not have a dictionary meaning and there are limitations and degradation of algorithms such as data mining. Therefore, in this paper, the opinion of the document is confirmed by collecting data through web crawling and extracting new words contained within the text data and establishing an emotional classification. The progress of the experiment is divided into three categories. First, a word collected by collecting a new word on the social media is subjected to learned of affirmative and negative. Next, to derive and verify emotional values using standard documents, TF-IDF is used to score noun sensibilities to enter the emotional values of the data. As with the new words, the classified emotional values are applied to verify that the emotions are classified in standard language documents. Finally, a combination of the newly coined words and standard emotional values is used to perform a comparative analysis of the technology of the instrument.

Automated Data Extraction from Unstructured Geotechnical Report based on AI and Text-mining Techniques (AI 및 텍스트 마이닝 기법을 활용한 지반조사보고서 데이터 추출 자동화)

  • Park, Jimin;Seo, Wanhyuk;Seo, Dong-Hee;Yun, Tae-Sup
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.4
    • /
    • pp.69-79
    • /
    • 2024
  • Field geotechnical data are obtained from various field and laboratory tests and are documented in geotechnical investigation reports. For efficient design and construction, digitizing these geotechnical parameters is essential. However, current practices involve manual data entry, which is time-consuming, labor-intensive, and prone to errors. Thus, this study proposes an automatic data extraction method from geotechnical investigation reports using image-based deep learning models and text-mining techniques. A deep-learning-based page classification model and a text-searching algorithm were employed to classify geotechnical investigation report pages with 100% accuracy. Computer vision algorithms were utilized to identify valid data regions within report pages, and text analysis was used to match and extract the corresponding geotechnical data. The proposed model was validated using a dataset of 205 geotechnical investigation reports, achieving an average data extraction accuracy of 93.0%. Finally, a user-interface-based program was developed to enhance the practical application of the extraction model. It allowed users to upload PDF files of geotechnical investigation reports, automatically analyze these reports, and extract and edit data. This approach is expected to improve the efficiency and accuracy of digitizing geotechnical investigation reports and building geotechnical databases.

Informal Quality Data Analysis via Sentimental analysis and Word2vec method (감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석)

  • Lee, Chinuk;Yoo, Kook Hyun;Mun, Byeong Min;Bae, Suk Joo
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.1
    • /
    • pp.117-128
    • /
    • 2017
  • Purpose: This study analyzes automobile quality review data to develop alternative analytical method of informal data. Existing methods to analyze informal data are based mainly on the frequency of informal data, however, this research tries to use correlation information of each informal data. Method: After sentimental analysis to acquire the user information for automobile products, three classification methods, that is, $na{\ddot{i}}ve$ Bayes, random forest, and support vector machine, were employed to accurately classify the informal user opinions with respect to automobile qualities. Additionally, Word2vec was applied to discover correlated information about informal data. Result: As applicative results of three classification methods, random forest method shows most effective results compared to the other classification methods. Word2vec method manages to discover closest relevant data with automobile components. Conclusion: The proposed method shows its effectiveness in terms of accuracy and sensitivity on the analysis of informal quality data, however, only two sentiments (positive or negative) can be categorized due to human errors. Further studies are required to derive more sentiments to accurately classify informal quality data. Word2vec method also shows comparative results to discover the relevance of components precisely.