• 제목/요약/키워드: Data & Knowledge Engineering

검색결과 1,311건 처리시간 0.039초

IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents (특허문서 필드의 기능적 특성을 활용한 IPC 다중 레이블 분류)

  • Lim, Sora;Kwon, YongJin
    • Journal of Internet Computing and Services
    • /
    • 제18권1호
    • /
    • pp.77-88
    • /
    • 2017
  • Recently, with the advent of knowledge based society where information and knowledge make values, patents which are the representative form of intellectual property have become important, and the number of the patents follows growing trends. Thus, it needs to classify the patents depending on the technological topic of the invention appropriately in order to use a vast amount of the patent information effectively. IPC (International Patent Classification) is widely used for this situation. Researches about IPC automatic classification have been studied using data mining and machine learning algorithms to improve current IPC classification task which categorizes patent documents by hand. However, most of the previous researches have focused on applying various existing machine learning methods to the patent documents rather than considering on the characteristics of the data or the structure of patent documents. In this paper, therefore, we propose to use two structural fields, technical field and background, considered as having impacts on the patent classification, where the two field are selected by applying of the characteristics of patent documents and the role of the structural fields. We also construct multi-label classification model to reflect what a patent document could have multiple IPCs. Furthermore, we propose a method to classify patent documents at the IPC subclass level comprised of 630 categories so that we investigate the possibility of applying the IPC multi-label classification model into the real field. The effect of structural fields of patent documents are examined using 564,793 registered patents in Korea, and 87.2% precision is obtained in the case of using title, abstract, claims, technical field and background. From this sequence, we verify that the technical field and background have an important role in improving the precision of IPC multi-label classification in IPC subclass level.

A Study on Design Education Re-engineering by Multi-disciplinary Approach (다학제적 접근을 통한 대학디자인 교육혁신 프로그램 연구)

  • Lee, Soon-Jong;Kim, Jong-Won;Chu, Wu-Jin;Chae, Sung-Zin;Yoon, Su-Hyun
    • Archives of design research
    • /
    • 제20권3호
    • /
    • pp.299-314
    • /
    • 2007
  • For the past 20 years, the growth and development of university-design-educational institutes contributed to the industrial development of our country. Due to the technological fluctuation and changes in the industrial structure in the latter half of the 20th century, the enterprise is demanding professionally-oriented design manpower. The principle which appears from instances of the advanced nations is to accommodate the demands in social changes and apply them to educational design programs. In order to respond promptly to the industrial demand especially, the advanced nations adopted "multidisciplinary design education programs" to lead innovation in the area of design globally. The objective of the research consequently is to suggest an educational system and a program through which the designer can be educated to obtain complex knowledge and the technique demanded by the industry and enterprise. Nowadays in order to adapt to a new business environment, designers specially should have both the knowledge and techniques in engineering and business administration. We suggest that the IPDI, a multidisciplinary design educational system and program is made up of the coordinated operation of major classes, on-the-job training connection, educational system for research base creation, renovation design development program for the application and the synthesis of alternative proposals about the training facility joint ownership by connecting with the education of design, business administration and engineering.

  • PDF

The Development for guideline of raw matrials on technical document of Medical Device (의료기기 허가.기술문서 원자재 작성 가이드라인 개발)

  • Park, Ki-Jung;Ryu, Gyu-Ha;Lee, Sung-Hee;Lee, Chang-Hyung;Jung, Jin-Baek;Lee, Jae-Keun;Hur, Chan-Hoi;Kim, Hyung-Bum;Choi, Min-Yong;Kim, Yong-Woo;Hwang, Sang-Yeon;Jung, Jae-Hoon;Koo, Ja-Jung;Hong, Hye-Kyung;Lim, Kyung-Taek;Kang, Se-Ku;Kwak, Young-Ji
    • Journal of Biomedical Engineering Research
    • /
    • 제31권6호
    • /
    • pp.434-437
    • /
    • 2010
  • For approval of medical devices manufactured or imported, submission of technical documents as well as the application form is required. The manufacturer (or importer) should properly identify the raw materials the applied product is made of and the manufacturing processes the product undergoes before it is shipped in the application form. In the technical documents, scientific data to evaluate the efficacy, safety, and quality of the applied product that has been described in the application form should be provided. Therefore, identifying the raw materials that were used for the parts of the applied product and describing the physical and chemical characteristics of the raw materials are quite important and essential in ensuring the efficacy, safety, and quality of the applied product. To describe the physical and chemical characteristics of the raw materials correctively, the applicant is required to have broad knowledge in the scientific fields such as chemical, polymer, metal, and ceramic science and engineering. But most of the applicant are not experts in these fields, so that the description in the application form often includes wrong and improper descriptions. Thus, we developed a guideline which explains the raw materials for medical devices, show the their examples. The purpose of this description guideline is to help the applicant properly completing the "Raw materials or constituents and their volumes" part in the application form.

Estimation of Displacement Responses Using the Wavelet Decomposition Signal (웨이블릿 분해신호를 이용한 변위응답의 추정)

  • Jung, Beom-Seok;Kim, Nam-Sik;Kook, Seung-Kyu
    • Journal of the Korea Concrete Institute
    • /
    • 제18권3호
    • /
    • pp.347-354
    • /
    • 2006
  • In this paper we have attempted to bring the wavelet transform theory to the dynamic response conversion algorithm. This algorithm is proposed for the problem of estimating the displacement data by defining the transformed responses. In this algerian, the displacement response can be obtained from the measured acceleration records by integration without requiring the knowledge of the initial velocity and displacement information. The advantage of the wavelet transform over either a pure spectral or temporal decomposition of the signal is that the pertinent signals features can be characterized in the time-frequency plane. In the response conversion procedure using the wavelet decomposition signals, not only the static component can be extracted, but also the dynamic displacement component can be separated by the structural mode from the identified displacement response. The applicability of the technique is tested by an example problem using the real bridge's superstructure under several cases of moving load. If the reliability of the identified responses is ensured, it is expected that the proposed method for estimating the impact factor can be useful in the bridge's dynamic test. This method can be useful in those practical cases when the direct measurement of the displacement is difficult as in the dynamic studies of huge structure.

A Research on Development Measures of Information Services for Construction Technology (건설기술 정보서비스 구축 방안에 관한 연구)

  • Ok, Hyun;Kim, Jin-Uk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제16권8호
    • /
    • pp.5707-5715
    • /
    • 2015
  • Recently, construction industry has won an increasing number of orders for overseas construction projects, thereby achieving an external growth, but its competiveness is concentrated on the construction execution field. In particular, the plant field occupies most of the entire orders, which are concentrated regionally in the Middle East and Asia. In addition, low-cost orders are frequently caused by excessive competition. But its high value-added construction engineering(Below, CE) field's overseas market share and technological capacity are very low. Also, technological competiveness, in terms of order amount and other factors, is deepening in polarization between large CE companies and small and medium-sized CE firms. It is noted that the existing CE information systems mostly simply accumulate data such as design and specification standards and provide the information thereon to users, and thus have yet to provide the information essential for the CE and support such efforts. This study sought to prepare a system designed for sharing outstanding design documents information necessary for the CE industry, by category of construction so as to support the technological enhancement of the CE field. Toward that end, this study presented measures for constructing the system and services designed to exchange and share the outstanding design documents information and know-how by construction category necessary between ordering agencies and CE companies.

Visualized recommender system based on Freebase (Freebase 기반의 추천 시스템 시각화)

  • Hong, Myung-Duk;Ha, Inay;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • 제18권10호
    • /
    • pp.23-37
    • /
    • 2013
  • In this paper, the proposed movie recommender system constructs trust network, which is similar to social network, using user's trust information that users explicitly present. Recommendation on items is performed by using relation degree between users and information of recommended item is provided by a visualization method. We discover the hidden relationships via the constructed trust network. To provide visualized recommendation information, we employ Freebase which is large knowledge base supporting information such as movie, music, and people in structured format. We provide three visualization methods as the followings: i) visualization based on movie posters with the number of movies that user required. ii) visualization on extra information such as director, actor and genre and so on when user selected a movie from recommendation list. iii) visualization based on movie posters that is recommended by neighbors who a user selects from trust network. The proposed system considers user's social relations and provides visualization which can reflect user's requirements. Using the visualization methods, user can reach right decision making on items. Furthermore, the proposed system reflects the user's opinion through recommendation visualization methods and can provide rich information to users through LOD(Linked Open Data) Cloud such as Freebase, LinkedMDB and Wikipedia and so on.

Review for Assessment Methodology of Disaster Prevention Performance using Scientometric Analysis (계량정보 분석을 활용한 방재성능평가 방법에 대한 고찰)

  • Dong Hyun Kim;Hyung Ju Yoo;Seung Oh Lee
    • Journal of Korean Society of Disaster and Security
    • /
    • 제15권4호
    • /
    • pp.39-46
    • /
    • 2022
  • The rainfall characteristics such as heavy rains are changing differently from the past, and uncertainties are also greatly increasing due to climate change. In addition, urban development and population concentration are aggravating flood damage. Since the causes of urban inundation are generally complex, it is very important to establish an appropriate flood prevention plan. Thus, the government in Korea is establishing standards for disaster prevention performance for each local government. Since the concept of the disaster prevention performance target was first presented in 2010, the setting standards have changed several times, but the overall technology, methodology, and procedures have been maintained. Therefore, in this study, studies and technologies related to urban disaster prevention performance were reviewed using the scientometric analysis method to review them. This analysis is a method of identifying trends in the field and deriving new knowledge and information based on data such as papers and literature. In this study, papers related to the disaster prevention performance of the Web of Science for the last 30 years from 1990 to 2021 were collected. Citespace, scientometric software, was used to identify authors, research institutes, countries, and research trends, including citation analysis. As a result of the analysis, consideration factors such as the the concept of asset evaluation were identified when making decisions related to urban disaster prevention performance. In the future, it is expected that prevention performance standards and procedures can be upgraded if the keywords are specified and the review of each technology is conducted.

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제12권11호
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.

A Comparative Evaluation of Multiple Meteorological Datasets for the Rice Yield Prediction at the County Level in South Korea (우리나라 시군단위 벼 수확량 예측을 위한 다종 기상자료의 비교평가)

  • Cho, Subin;Youn, Youjeong;Kim, Seoyeon;Jeong, Yemin;Kim, Gunah;Kang, Jonggu;Kim, Kwangjin;Cho, Jaeil;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • 제37권2호
    • /
    • pp.337-357
    • /
    • 2021
  • Because the growth of paddy rice is affected by meteorological factors, the selection of appropriate meteorological variables is essential to build a rice yield prediction model. This paper examines the suitability of multiple meteorological datasets for the rice yield modeling in South Korea, 1996-2019, and a hindcast experiment for rice yield using a machine learning method by considering the nonlinear relationships between meteorological variables and the rice yield. In addition to the ASOS in-situ observations, we used CRU-JRA ver. 2.1 and ERA5 reanalysis. From the multiple meteorological datasets, we extracted the four common variables (air temperature, relative humidity, solar radiation, and precipitation) and analyzed the characteristics of each data and the associations with rice yields. CRU-JRA ver. 2.1 showed an overall agreement with the other datasets. While relative humidity had a rare relationship with rice yields, solar radiation showed a somewhat high correlation with rice yields. Using the air temperature, solar radiation, and precipitation of July, August, and September, we built a random forest model for the hindcast experiments of rice yields. The model with CRU-JRA ver. 2.1 showed the best performance with a correlation coefficient of 0.772. The solar radiation in the prediction model had the most significant importance among the variables, which is in accordance with the generic agricultural knowledge. This paper has an implication for selecting from multiple meteorological datasets for rice yield modeling.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • 제19권4호
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.