• Title/Summary/Keyword: Automatic Generation

Search Result 1,324, Processing Time 0.032 seconds

Analyzing Korean Math Word Problem Data Classification Difficulty Level Using the KoEPT Model (KoEPT 기반 한국어 수학 문장제 문제 데이터 분류 난도 분석)

  • Rhim, Sangkyu;Ki, Kyung Seo;Kim, Bugeun;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.315-324
    • /
    • 2022
  • In this paper, we propose KoEPT, a Transformer-based generative model for automatic math word problems solving. A math word problem written in human language which describes everyday situations in a mathematical form. Math word problem solving requires an artificial intelligence model to understand the implied logic within the problem. Therefore, it is being studied variously across the world to improve the language understanding ability of artificial intelligence. In the case of the Korean language, studies so far have mainly attempted to solve problems by classifying them into templates, but there is a limitation in that these techniques are difficult to apply to datasets with high classification difficulty. To solve this problem, this paper used the KoEPT model which uses 'expression' tokens and pointer networks. To measure the performance of this model, the classification difficulty scores of IL, CC, and ALG514, which are existing Korean mathematical sentence problem datasets, were measured, and then the performance of KoEPT was evaluated using 5-fold cross-validation. For the Korean datasets used for evaluation, KoEPT obtained the state-of-the-art(SOTA) performance with 99.1% in CC, which is comparable to the existing SOTA performance, and 89.3% and 80.5% in IL and ALG514, respectively. In addition, as a result of evaluation, KoEPT showed a relatively improved performance for datasets with high classification difficulty. Through an ablation study, we uncovered that the use of the 'expression' tokens and pointer networks contributed to KoEPT's state of being less affected by classification difficulty while obtaining good performance.

Development of an Algorithm for Automatic Quantity Take-off of Slab Rebar (슬래브 철근 물량 산출 자동화 알고리즘 개발)

  • Kim, Suhwan;Kim, Sunkuk;Suh, Sangwook;Kim, Sangchul
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.5
    • /
    • pp.52-62
    • /
    • 2023
  • The objective of this study is to propose an automated algorithm for precise cutting length of slab rebar complying with regulations such as anchorage length, standard hooks, and lapping length. This algorithm aims to improve the traditional manual quantity take-off process typically outsourced by external contractors. By providing accurate rebar quantity data at BBS(Bar Bending Schedule) level from the bidding phase, uncertainty in quantity take-off can be eliminated and reliance on out-sourcing reduced. In addition, the algorithm allows for early determination of precise quantities, enabling construction firms to preapre competitive and optimized bids, leading to increased profit margins during contract negotiations. The proposed algorithm not only streamlines redundant tasks across various processes, including estimating, budgeting, and BBS generation but also offers flexibility in handling post-contract structural drawing changes. In particular, the proposed algorithm, when combined with BIM, can solve the technical problems of using BIM in the early phases of construction, and the algorithm's formulas and shape codes that built as REVIT-based family files, can help saving time and manpower.

An Automatic ROI Extraction and Its Mask Generation based on Wavelet of Low DOF Image (피사계 심도가 낮은 이미지에서 웨이블릿 기반의 자동 ROI 추출 및 마스크 생성)

  • Park, Sun-Hwa;Seo, Yeong-Geon;Lee, Bu-Kweon;Kang, Ki-Jun;Kim, Ho-Yong;Kim, Hyung-Jun;Kim, Sang-Bok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.93-101
    • /
    • 2009
  • This paper suggests a new algorithm automatically searching for Region-of-Interest(ROI) with high speed, using the edge information of high frequency subband transformed with wavelet. The proposed method executes a searching algorithm of 4-direction object boundary by the unit of block using the edge information, and detects ROIs. The whole image is splitted by $64{\times}64$ or $32{\times}32$ sized blocks and the blocks can be ROI block or background block according to taking the edges or not. The 4-directions searche the image from the outside to the center and the algorithm uses a feature that the low-DOF image has some edges as one goes to center. After searching all the edges, the method regards the inner blocks of the edges as ROI, and makes the ROI masks and sends them to server. This is one of the dynamic ROI method. The existing methods have had some problems of complicated filtering and region merge, but this method improved considerably the problems. Also, it was possible to apply to an application requiring real-time processing caused by the process of the unit of block.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

Automatic Electronic Medical Record Generation System using Speech Recognition and Natural Language Processing Deep Learning (음성인식과 자연어 처리 딥러닝을 통한 전자의무기록자동 생성 시스템)

  • Hyeon-kon Son;Gi-hwan Ryu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.731-736
    • /
    • 2023
  • Recently, the medical field has been applying mandatory Electronic Medical Records (EMRs) and Electronic Health Records (EHRs) systems that computerize and manage medical records, and distributing them throughout the entire medical industry to utilize patients' past medical records for additional medical procedures. However, the conversations between medical professionals and patients that occur during general medical consultations and counseling sessions are not separately recorded or stored, so additional important patient information cannot be efficiently utilized. Therefore, we propose an electronic medical record system that uses speech recognition and natural language processing deep learning to store conversations between medical professionals and patients in text form, automatically extracts and summarizes important medical consultation information, and generates electronic medical records. The system acquires text information through the recognition process of medical professionals and patients' medical consultation content. The acquired text is then divided into multiple sentences, and the importance of multiple keywords included in the generated sentences is calculated. Based on the calculated importance, the system ranks multiple sentences and summarizes them to create the final electronic medical record data. The proposed system's performance is verified to be excellent through quantitative analysis.

Research on Training and Implementation of Deep Learning Models for Web Page Analysis (웹페이지 분석을 위한 딥러닝 모델 학습과 구현에 관한 연구)

  • Jung Hwan Kim;Jae Won Cho;Jin San Kim;Han Jin Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.517-524
    • /
    • 2024
  • This study aims to train and implement a deep learning model for the fusion of website creation and artificial intelligence, in the era known as the AI revolution following the launch of the ChatGPT service. The deep learning model was trained using 3,000 collected web page images, processed based on a system of component and layout classification. This process was divided into three stages. First, prior research on AI models was reviewed to select the most appropriate algorithm for the model we intended to implement. Second, suitable web page and paragraph images were collected, categorized, and processed. Third, the deep learning model was trained, and a serving interface was integrated to verify the actual outcomes of the model. This implemented model will be used to detect multiple paragraphs on a web page, analyzing the number of lines, elements, and features in each paragraph, and deriving meaningful data based on the classification system. This process is expected to evolve, enabling more precise analysis of web pages. Furthermore, it is anticipated that the development of precise analysis techniques will lay the groundwork for research into AI's capability to automatically generate perfect web pages.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Three-dimensional Model Generation for Active Shape Model Algorithm (능동모양모델 알고리듬을 위한 삼차원 모델생성 기법)

  • Lim, Seong-Jae;Jeong, Yong-Yeon;Ho, Yo-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.6 s.312
    • /
    • pp.28-35
    • /
    • 2006
  • Statistical models of shape variability based on active shape models (ASMs) have been successfully utilized to perform segmentation and recognition tasks in two-dimensional (2D) images. Three-dimensional (3D) model-based approaches are more promising than 2D approaches since they can bring in more realistic shape constraints for recognizing and delineating the object boundary. For 3D model-based approaches, however, building the 3D shape model from a training set of segmented instances of an object is a major challenge and currently it remains an open problem in building the 3D shape model, one essential step is to generate a point distribution model (PDM). Corresponding landmarks must be selected in all1 training shapes for generating PDM, and manual determination of landmark correspondences is very time-consuming, tedious, and error-prone. In this paper, we propose a novel automatic method for generating 3D statistical shape models. Given a set of training 3D shapes, we generate a 3D model by 1) building the mean shape fro]n the distance transform of the training shapes, 2) utilizing a tetrahedron method for automatically selecting landmarks on the mean shape, and 3) subsequently propagating these landmarks to each training shape via a distance labeling method. In this paper, we investigate the accuracy and compactness of the 3D model for the human liver built from 50 segmented individual CT data sets. The proposed method is very general without such assumptions and can be applied to other data sets.

A Dynamic Management Method for FOAF Using RSS and OLAP cube (RSS와 OLAP 큐브를 이용한 FOAF의 동적 관리 기법)

  • Sohn, Jong-Soo;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.39-60
    • /
    • 2011
  • Since the introduction of web 2.0 technology, social network service has been recognized as the foundation of an important future information technology. The advent of web 2.0 has led to the change of content creators. In the existing web, content creators are service providers, whereas they have changed into service users in the recent web. Users share experiences with other users improving contents quality, thereby it has increased the importance of social network. As a result, diverse forms of social network service have been emerged from relations and experiences of users. Social network is a network to construct and express social relations among people who share interests and activities. Today's social network service has not merely confined itself to showing user interactions, but it has also developed into a level in which content generation and evaluation are interacting with each other. As the volume of contents generated from social network service and the number of connections between users have drastically increased, the social network extraction method becomes more complicated. Consequently the following problems for the social network extraction arise. First problem lies in insufficiency of representational power of object in the social network. Second problem is incapability of expressional power in the diverse connections among users. Third problem is the difficulty of creating dynamic change in the social network due to change in user interests. And lastly, lack of method capable of integrating and processing data efficiently in the heterogeneous distributed computing environment. The first and last problems can be solved by using FOAF, a tool for describing ontology-based user profiles for construction of social network. However, solving second and third problems require a novel technology to reflect dynamic change of user interests and relations. In this paper, we propose a novel method to overcome the above problems of existing social network extraction method by applying FOAF (a tool for describing user profiles) and RSS (a literary web work publishing mechanism) to OLAP system in order to dynamically innovate and manage FOAF. We employed data interoperability which is an important characteristic of FOAF in this paper. Next we used RSS to reflect such changes as time flow and user interests. RSS, a tool for literary web work, provides standard vocabulary for distribution at web sites and contents in the form of RDF/XML. In this paper, we collect personal information and relations of users by utilizing FOAF. We also collect user contents by utilizing RSS. Finally, collected data is inserted into the database by star schema. The system we proposed in this paper generates OLAP cube using data in the database. 'Dynamic FOAF Management Algorithm' processes generated OLAP cube. Dynamic FOAF Management Algorithm consists of two functions: one is find_id_interest() and the other is find_relation (). Find_id_interest() is used to extract user interests during the input period, and find-relation() extracts users matching user interests. Finally, the proposed system reconstructs FOAF by reflecting extracted relationships and interests of users. For the justification of the suggested idea, we showed the implemented result together with its analysis. We used C# language and MS-SQL database, and input FOAF and RSS as data collected from livejournal.com. The implemented result shows that foaf : interest of users has reached an average of 19 percent increase for four weeks. In proportion to the increased foaf : interest change, the number of foaf : knows of users has grown an average of 9 percent for four weeks. As we use FOAF and RSS as basic data which have a wide support in web 2.0 and social network service, we have a definite advantage in utilizing user data distributed in the diverse web sites and services regardless of language and types of computer. By using suggested method in this paper, we can provide better services coping with the rapid change of user interests with the automatic application of FOAF.

Analysis of the growth environment and fruiting body quality of Pleurotus eryngii cultivated by Smart Farming (큰느타리(새송이)버섯 스마트팜 재배를 통한 생육환경 분석 및 자실체 품질 특성)

  • Kim, Kil-Ja;Kim, Da-Mi;An, Ho-Sub;Choi, Jin-Kyung;Kim, Seon-Gon
    • Journal of Mushroom
    • /
    • v.17 no.4
    • /
    • pp.211-217
    • /
    • 2019
  • Currently, cultivation of mushrooms using the Information and Communication Technology (ICT)-based smart farming technique is increasing rapidly. The main environmental factors for growth of mushrooms are temperature, humidity, carbon dioxide (CO2), and light. Among all the mentioned factors, currently, only temperature has been maintained under automatic control. However, humidity and ventilation are controlled using a timer, based on technical experience.Therefore, in this study, a Pleurotus eryngii first-generation smart farm model was set up that can automatically control temperature, humidity, and ventilation. After installing the environmental control system and the monitoring device, the environmental condition of the mushroom cultivation room and the growth of the fruiting bodies were studied. The data thus obtained was compared to that obtained using the conventional cultivation method.In farm A, the temperature during the primordia formation stage was about 17℃, and was maintained at approximately 16℃ during the fruiting stage. The humidity was initially maintained at 95%, and the farm was not humidified after the primordia formation stage. There was no sensor for CO2 management, and the system was ventilated as required by observing the shape of the pileus and the stipe. It was observed that, the concentration of CO2 was between 700 and 2,500 ppm during the growth period. The average weight of the mushrooms produced in farm A was 125 g, and the quality was between that of the premium and the first grade.In farm B. The CO2 sensor was in use for measurement purposes only; the system was ventilated as required by observing the shape of the pileus and the stipe. During the growth period, the CO2 concentration was observed to be between 640 and 4,500 ppm. The average weight of the mushrooms produced in farm B was 102 g.These results indicate that the quality of the king oyster mushroom is determined by the environmental conditions, especially by the concentration of CO2. Thus, the data obtained in this study can be used as an optimal smart farm model, where, by improving the environmental control method of farm A, better quality mushrooms were obtained.