• Title/Summary/Keyword: 학습 데이터 모델

Search Result 3,041, Processing Time 0.036 seconds

IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents (특허문서 필드의 기능적 특성을 활용한 IPC 다중 레이블 분류)

  • Lim, Sora;Kwon, YongJin
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.77-88
    • /
    • 2017
  • Recently, with the advent of knowledge based society where information and knowledge make values, patents which are the representative form of intellectual property have become important, and the number of the patents follows growing trends. Thus, it needs to classify the patents depending on the technological topic of the invention appropriately in order to use a vast amount of the patent information effectively. IPC (International Patent Classification) is widely used for this situation. Researches about IPC automatic classification have been studied using data mining and machine learning algorithms to improve current IPC classification task which categorizes patent documents by hand. However, most of the previous researches have focused on applying various existing machine learning methods to the patent documents rather than considering on the characteristics of the data or the structure of patent documents. In this paper, therefore, we propose to use two structural fields, technical field and background, considered as having impacts on the patent classification, where the two field are selected by applying of the characteristics of patent documents and the role of the structural fields. We also construct multi-label classification model to reflect what a patent document could have multiple IPCs. Furthermore, we propose a method to classify patent documents at the IPC subclass level comprised of 630 categories so that we investigate the possibility of applying the IPC multi-label classification model into the real field. The effect of structural fields of patent documents are examined using 564,793 registered patents in Korea, and 87.2% precision is obtained in the case of using title, abstract, claims, technical field and background. From this sequence, we verify that the technical field and background have an important role in improving the precision of IPC multi-label classification in IPC subclass level.

Verification the Systems Thinking Factor Structure and Comparison of Systems Thinking Based on Preferred Subjects about Elementary School Students' (초등학생의 시스템 사고 요인 구조 검증과 선호 과목에 따른 시스템 사고 비교)

  • Lee, Hyonyong;Jeon, Jaedon;Lee, Hyundong
    • Journal of The Korean Association For Science Education
    • /
    • v.39 no.2
    • /
    • pp.161-171
    • /
    • 2019
  • The purposes of this study are: 1) to verify the systems thinking factor structure of elementary school students and 2) to compare systems thinking according to their preferred subjects in order to get implications for following research. For the study, pre-tests analyze data from 732 elementary school students using the STMI (Systems Thinking Measuring Instrument) developed by Lee et al. (2013). And exploratory factor analysis was conducted to identify the factor structure of the students. Based on the results of the pre-test, the expert group council revised the STMI so that elementary school students could respond to the 5-factor structure that STMI intended. In the post-test, 503 data were analyzed by modified STMI and exploratory factor analysis was performed. The results of the study are as follows: First, in the pre-test, elementary school students responded to the STMI with a test paper consisting of two factors (personal internal factors and personal external factors). The total reliability of the instrument was .932 and the reliability of each factor was analyzed as .857 and .894. Second, for modified STMI, elementary school students responded a 4-factor instrument. Team learning, Shared Vision, and Personal Mastery were derived independent factors, and mental model and systems analysis were derived 1-factor. The total reliability of the instrument was .886 and the reliability of each factor was analyzed as .686 to .864. Finally, a comparison of systems thinking according to preferred subjects showed a significant difference between students who selected science (engineering) group and art (music and physical education). In conclusion, it was confirmed that statistically meaningful results could be obtained using STMI modified by term and sentence structure appropriate for elementary school students, and it is a necessary to study the relation of systems thinking with various student variables such as the preferred subjects.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

A Basic Study for Sustainable Analysis and Evaluation of Energy Environment in Buildings : Focusing on Energy Environment Historical Data of Residential Buildings (빌딩의 지속가능 에너지환경 분석 및 평가를 위한 기초 연구 : 주거용 건물의 에너지환경 실적정보를 중심으로)

  • Lee, Goon-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.1
    • /
    • pp.262-268
    • /
    • 2017
  • The energy consumption of buildings is approximately 20.5% of the total energy consumption, and the interest in energy efficiency and low consumption of the building is increasing. Several studies have performed energy analysis and evaluation. Energy analysis and evaluation are effective when applied in the initial design phase. In the initial design phase, however, the energy performance is evaluated using general level information, such as glazing area and surface area. Therefore, the evaluation results of the detailed design stage, which is based on the drawings, including detailed information of the materials and facilities, will be different. Thus far, most studies have reported the analysis and evaluation at the detailed design stage, where detailed information about the materials installed in the building becomes clear. Therefore, it is possible to improve the accuracy of the energy environment analysis if the energy environment information generated during the life cycle of the building can be established and accurate information can be provided in the analysis at the initial design stage using a probability / statistical method. On the other hand, historical data on energy use has not been established in Korea. Therefore, this study performed energy environment analysis to construct the energy environment historical data. As a result of the research, information classification system, information model, and service model for acquiring and providing energy environment information that can be used for building lifecycle information of buildings are presented and used as the basic data. The results can be utilized in the historical data management system so that the reliability of analysis can be improved by supplementing the input information at the initial design stage. If the historical data is stacked, it can be used as learning data in methods, such as probability / statistics or artificial intelligence for energy environment analysis in the initial design stage.

Rainfall image DB construction for rainfall intensity estimation from CCTV videos: focusing on experimental data in a climatic environment chamber (CCTV 영상 기반 강우강도 산정을 위한 실환경 실험 자료 중심 적정 강우 이미지 DB 구축 방법론 개발)

  • Byun, Jongyun;Jun, Changhyun;Kim, Hyeon-Joon;Lee, Jae Joon;Park, Hunil;Lee, Jinwook
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.6
    • /
    • pp.403-417
    • /
    • 2023
  • In this research, a methodology was developed for constructing an appropriate rainfall image database for estimating rainfall intensity based on CCTV video. The database was constructed in the Large-Scale Climate Environment Chamber of the Korea Conformity Laboratories, which can control variables with high irregularity and variability in real environments. 1,728 scenarios were designed under five different experimental conditions. 36 scenarios and a total of 97,200 frames were selected. Rain streaks were extracted using the k-nearest neighbor algorithm by calculating the difference between each image and the background. To prevent overfitting, data with pixel values greater than set threshold, compared to the average pixel value for each image, were selected. The area with maximum pixel variability was determined by shifting with every 10 pixels and set as a representative area (180×180) for the original image. After re-transforming to 120×120 size as an input data for convolutional neural networks model, image augmentation was progressed under unified shooting conditions. 92% of the data showed within the 10% absolute range of PBIAS. It is clear that the final results in this study have the potential to enhance the accuracy and efficacy of existing real-world CCTV systems with transfer learning.

A Case Study on Venture and Small-Business Executives' Use of Strategic Intuition in the Decision Making Process (벤처.중소기업가의 전략적 직관에 의한 의사결정 모형에 대한 사례연구)

  • Park, Jong An;Kim, Young Su;Do, Man Seung
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.9 no.1
    • /
    • pp.15-23
    • /
    • 2014
  • A Case Study on Venture and Small-Business Executives' Use of Strategic Intuition in the Decision Making Process This paper is a case study on how Venture and Small-Business Executives managers can take advantage of their intuitions in situations where the business environment is increasingly uncertain, a novel situation occurs without any data to reflect on, when rational decision-making is not possible, and when the business environment changes. The case study is based on a literature review, in-depth interviews with 16 business managers, and an analysis of Klein, G's (1998) "Generic Mental Simulation Model." The "intuition" discussed in this analysis is classified into two types of intuition: the Expert Intuition which is based on one's own experiences, and Strategic Intuition which is based on the experience of others. Case study strategic management intuition and intuition, the experts were utilized differently. Features of professional intuition to work quickly without any effort by, while the strategic intuition, is time-consuming. Another feature that has already occurred, one expert intuition in decision-making about the widely used strategic intuition was used a lot in future decision-making. The case study results revealed that managers were using expert intuition and strategic intuition differentially. More specifically, Expert Intuition was activated effortlessly, while strategic intuition required more time. Also, expert intuition was used mainly for making judgments about events that have already happened, while strategic intuition was used more often for judgments regarding events in the future. The process of strategic intuition involved (1) Strategic concerns, (2) the discovery of medium, (3) Primary mental simulation, (4) The offsetting of key parameters, (5) secondary mental simulation, and (6) the decision making process. These steps were used to develop the "Strategic Intuition Decision-making Model" for Venture and Small-Business Executives. The case study results further showed that firstly, the success of decision-making was determined in the "secondary mental simulation' stage, and secondly, that more difficulty in management was encountered when expert intuition was used more than strategic intuition and lastly strategic intuition is possible to be educated.

  • PDF

Exploratory Study on the Phenomena of Entrepreneurship Education in Food and Agriculture Sectors Based on the Grounded Theory Approach (근거이론접근법에 기반한 농식품분야 창업교육현상에 관한 탐색적 연구)

  • Seol, Byung Moon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.15 no.3
    • /
    • pp.33-46
    • /
    • 2020
  • This study analyzes the entrepreneurship education phenomena for agri-food entrepreneurs whose main business is the production of agricultural products and the sale of processed products, using the qualitative study Strauss & Corbin(1998)'s evidence theory approach. From the entrepreneur's point of view, I would like to summarize the phenomena that appear in education, and to prepare a theoretical basis for explaining the phenomena. The importance of entrepreneurship education is emphasized to cultivate the ability to develop and provide products tailored to customers. The necessity of education leads to an increase in demand according to the situational awareness of the founders, and the quantitative increase in entrepreneurship education in the agri-food sector is a clear trend. Inevitably, the need for various discussions on systematic and effective entrepreneurship education is raised. For the study, an interview was conducted with preliminary or entrepreneur who have experienced entrepreneurship education in the agri-food sector. As a research method, I use Strauss & Corbin(1998)'s approach and analyze qualitative data using QSR's NVIVO 12 program. Through this study, it was found that contextual and systematic entrepreneurship education in the agri-food sector has the effect of strengthening competitiveness and strengthening sales. There is a need for follow-up management of trainees. Strengthening the competitiveness of start-ups is based on training professional manpower through education and linking regions with cities. Strengthening sales is based on product planning and market development. This study explores entrepreneurship education in the agri-food sector, which has not been actively conducted in the past. Exploratory analysis on the experiences of the founders of agri-food sector as education demanders has an important meaning for understanding the phenomenon of start-up education.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Improvement of Mid-Wave Infrared Image Visibility Using Edge Information of KOMPSAT-3A Panchromatic Image (KOMPSAT-3A 전정색 영상의 윤곽 정보를 이용한 중적외선 영상 시인성 개선)

  • Jinmin Lee;Taeheon Kim;Hanul Kim;Hongtak Lee;Youkyung Han
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1283-1297
    • /
    • 2023
  • Mid-wave infrared (MWIR) imagery, due to its ability to capture the temperature of land cover and objects, serves as a crucial data source in various fields including environmental monitoring and defense. The KOMPSAT-3A satellite acquires MWIR imagery with high spatial resolution compared to other satellites. However, the limited spatial resolution of MWIR imagery, in comparison to electro-optical (EO) imagery, constrains the optimal utilization of the KOMPSAT-3A data. This study aims to create a highly visible MWIR fusion image by leveraging the edge information from the KOMPSAT-3A panchromatic (PAN) image. Preprocessing is implemented to mitigate the relative geometric errors between the PAN and MWIR images. Subsequently, we employ a pre-trained pixel difference network (PiDiNet), a deep learning-based edge information extraction technique, to extract the boundaries of objects from the preprocessed PAN images. The MWIR fusion imagery is then generated by emphasizing the brightness value corresponding to the edge information of the PAN image. To evaluate the proposed method, the MWIR fusion images were generated in three different sites. As a result, the boundaries of terrain and objects in the MWIR fusion images were emphasized to provide detailed thermal information of the interest area. Especially, the MWIR fusion image provided the thermal information of objects such as airplanes and ships which are hard to detect in the original MWIR images. This study demonstrated that the proposed method could generate a single image that combines visible details from an EO image and thermal information from an MWIR image, which contributes to increasing the usage of MWIR imagery.

Re-validation of the Revised Systems Thinking Measuring Instrument for Vietnamese High School Students and Comparison of Latent Means between Korean and Vietnamese High School Students (베트남 고등학생을 대상으로 한 개정 시스템 사고 검사 도구 재타당화 및 한국과 베트남 고등학생의 잠재 평균 비교)

  • Hyonyong Lee;Nguyen Thi Thuy;Byung-Yeol Park;Jaedon Jeon;Hyundong Lee
    • Journal of the Korean earth science society
    • /
    • v.45 no.2
    • /
    • pp.157-171
    • /
    • 2024
  • The purposes of this study were: (1) to revalidate the revised Systems Thinking Measuring Instrument (Re_STMI) reported by Lee et al. (2024) among Vietnamese high school students and (2) to investigate the differences in systems thinking abilities between Korean and Vietnamese high school students. To achieve this, data from 234 Vietnamese high school students who responded to translated Re_STMI consisting of 20 items and an Scale consisting of 20 items were used. Validity analysis was conducted through item response analysis (Item Reliability, Item Map, Infit and Outfit MNSQ, DIF between male and female) and exploratory factor analysis (principal axis factor analysis using Promax). Furthermore, structural equation modeling was employed with data from 475 Korean high school students to verify the latent mean analysis. The results were as follows: First, in the item response analysis of the 20 translated Re_STMI items in Vietnamese, the Item Reliability was .97, and the Infit MNSQ ranged from .67 to 1.38. The results from the Item Map and DIF analysis align with previous findings. In the exploratory factor analysis, all items were loaded onto intended sub-factors, with sub-factor reliabilities ranging from .662 to .833 and total reliability at .876. Confirmatory factor analysis for latent mean analysis between Korean and Vietnamese students yielded acceptable model fit indices (χ2/df: 2.830, CFI: .931, TLI: .918, SRMR: .043, RMSEA: .051). Lastly, the latent mean analysis between Korean and Vietnamese students revealed a small effect size in systems analysis, mental models, team learning, and shared vision factors, whereas a medium effect size was observed in personal mastery factors, with Vietnamese high school students showing significantly higher results in systems thinking. This study confirmed the reliability and validity of the Re_STMI items. Furthermore, international comparative studies on systems thinking using Re_STMI translated into Vietnamese, English, and other languages are warranted in the context of students' systems thinking analysis.