• Title/Summary/Keyword: Table Data

Search Result 2,169, Processing Time 0.032 seconds

Test Dataset for validating the meaning of Table Machine Reading Language Model (표 기계독해 언어 모형의 의미 검증을 위한 테스트 데이터셋)

  • YU, Jae-Min;Cho, Sanghyun;Kwon, Hyuk-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.164-167
    • /
    • 2022
  • In table Machine comprehension, the knowledge required for language models or the structural form of tables changes depending on the domain, showing a greater performance degradation compared to text data. In this paper, we propose a pre-learning data construction method and an adversarial learning method through meaningful tabular data selection for constructing a pre-learning table language model robust to these domain changes in table machine reading. In order to detect tabular data sed for decoration of web documents without structural information from the extracted table data, a rule through heuristic was defined to identify head data and select table data was applied. An adversarial learning method between tabular data and infobax data with knowledge information about entities was applied. When the data was refined compared to when it was trained with the existing unrefined data, F1 3.45 and EM 4.14 increased in the KorQuAD table data, and F1 19.38, EM 4.22 compared to when the data was not refined in the Spec table QA data showed increased performance.

  • PDF

Korean TableQA: Structured data question answering based on span prediction style with S3-NET

  • Park, Cheoneum;Kim, Myungji;Park, Soyoon;Lim, Seungyoung;Lee, Jooyoul;Lee, Changki
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.899-911
    • /
    • 2020
  • The data in tables are accurate and rich in information, which facilitates the performance of information extraction and question answering (QA) tasks. TableQA, which is based on tables, solves problems by understanding the table structure and searching for answers to questions. In this paper, we introduce both novice and intermediate Korean TableQA tasks that involve deducing the answer to a question from structured tabular data and using it to build a question answering pair. To solve Korean TableQA tasks, we use S3-NET, which has shown a good performance in machine reading comprehension (MRC), and propose a method of converting structured tabular data into a record format suitable for MRC. Our experimental results show that the proposed method outperforms a baseline in both the novice task (exact match (EM) 96.48% and F1 97.06%) and intermediate task (EM 99.30% and F1 99.55%).

A Table Integration Technique Using Query Similarity Analysis

  • Choi, Go-Bong;Woo, Yong-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2019
  • In this paper, we propose a technique to analyze similarity between SQL queries and to assist integrating similar tables. First, the table information was extracted from the SQL queries through the query structure analyzer, and the similarity between the tables was measured using the Jacquard index technique. Then, similar table clusters are generated through hierarchical cluster analysis method and the co-occurence probability of the table used in the query is calculated. The possibility of integrating similar tables is classified by using the possibility of co-occurence of similarity table and table, and classifying them into an integrable cluster, a cluster requiring expert review, and a cluster with low integration possibility. This technique analyzes the SQL query in practice and analyse the possibility of table integration independent of the existing business, so that the existing schema can be effectively reconstructed without interruption of work or additional cost.

New Growth Power, Economic Effect Analysis of Software Industry (신성장 동력, 소프트웨어산업의 경제적 파급효과 분석)

  • Choi, Jinho;Ryu, Jae Hong
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.4_spc
    • /
    • pp.381-401
    • /
    • 2014
  • This study proposes the accurate economic effect (employment inducement coefficient, hiring inducement coefficient, index of the sensitivity of dispersion, index of the power of dispersion, and ratio of value added) of Korea software industry by analyzing the inter-industry relation using the modified inter-industry table. Some previous studies related to the inter-industry analysis were reviewed and the key problems were identified. First, in the current inter-industry table publishedby the Bank of Korea, the output of software industry includes not only the output of pure software industry (package software and IT services) but also the output of non-software industry due to the misclassification of the industry. This causes the output to become bigger than the actual output of the software industry. Second, during rewriting the inter-industry table, the output is changing. The inter-industry table is the table in the form of rows and columns, which records the transactions of goods and services among industries which are required to continue the activities of each industry. Accordingly, if only an output of a specific industry is changed, the reliability of the table would be degraded because the table is prepared based on the relations with other industries. This possibly causes the economic effect coefficient to degrade reliability, over or under estimated. This study tries to correct these problems to get the more accurate economic effect of the software industry. First, to get the output of the pure software section only, the data from the Korea Electronics Association(KEA) was used in the inter-industry table. Second, to prevent the difference in the outputs during rewriting the inter-industry table, the difference between the output in the current inter-industry table and the output from KEA data was identified and then it was defined as the non-software section output for the analysis. The following results were obtained: The pure software section's economic effect coefficient was lower than the coefficient of non-software section. It comes from differenceof data to Bank of Korea and KEA. This study hasa signification from accurate economic effect of Korea software industry.

Mapping Design between XML and Table in Relation Database (XML과 관계 데이터베이스 자료 간의 매핑 설계)

  • Kim Gil-Choon
    • Journal of Digital Contents Society
    • /
    • v.5 no.3
    • /
    • pp.180-186
    • /
    • 2004
  • XML has an essential funtion of dealing with standardized document to be used in all academic areas and industrial areas as well as e-commerce. The transformation of XML data into Relation Database table is also necessary for data search using SQL language. Mapping relation between XMLand Table in Relation Database is required for transformation of XML. This arti치e studies the mapping relation between XML and Relation Database using DTD which enables to check the unity automatically whenever document is read so that it studied mapping design for the transformation of XML data info Relation Database table.

  • PDF

Bayesian pooling for contingency tables from small areas

  • Jo, Aejung;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1621-1629
    • /
    • 2016
  • This paper studies Bayesian pooling for analysis of categorical data from small areas. Many surveys consist of categorical data collected on a contingency table in each area. Statistical inference for small areas requires considerable care because the subpopulation sample sizes are usually very small. Typically we use the hierarchical Bayesian model for pooling subpopulation data. However, the customary hierarchical Bayesian models may specify more exchangeability than warranted. We, therefore, investigate the effects of pooling in hierarchical Bayesian modeling for the contingency table from small areas. In specific, this paper focuses on the methods of direct or indirect pooling of categorical data collected on a contingency table in each area through Dirichlet priors. We compare the pooling effects of hierarchical Bayesian models by fitting the simulated data. The analysis is carried out using Markov chain Monte Carlo methods.

Review on RAM Data Management to Urban Maglev Transit (자기부상열차 RAM DATA 관리방안)

  • Lee, Chang-Deok;Kang, Chan-Yong
    • Proceedings of the KSR Conference
    • /
    • 2007.11a
    • /
    • pp.191-196
    • /
    • 2007
  • This paper is reviewed RAM(Reliability, Availability and Maintainability) data table utilized for RAM data management to Urban Maglev Transit. As railway systems become more complex, the RAM requirements are reinforced to ensure that a design meets Reliability, Availability, Maintainability criteria. Therefore, it needs the efficient management for RAM data of railway system to meet RAM target. At this study, RAM data management format is suggested to ensure reliability and maintainability based on acquired experience for overseas rolling stock. This RAM data table and FMECA(Failure Mode Effect Criticality Analysis) table are useful to the calculation of MTBF(Mean Time Between Failure), MTBSF(Mean Time Between Service Failure) and Maintainability. Also, this RAM management table will be efficient to improve the RAM evaluation to Urban Maglev Transit.

  • PDF

Data Cube Generation Method Using Hash Table in Spatial Data Warehouse (공간 데이터 웨어하우스에서 해쉬 테이블을 이용한 데이터큐브의 생성 기법)

  • Li, Yan;Kim, Hyung-Sun;You, Byeong-Seob;Lee, Jae-Dong;Bae, Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.11
    • /
    • pp.1381-1394
    • /
    • 2006
  • Generation methods of data cube have been studied for many years in data warehouse which supports decision making using stored data. There are two previous studies, one is multi-way array algorithm and the other is H-cubing algorithm which is based on the hyper-tree. The multi-way array algorithm stores all aggregation data in arrays, so if the base data is increased, the size of memory is also grow. The H-cubing algorithm which is based on the hyper-tree stores all tuples in one tree so the construction cost is increased. In this paper, we present an efficient data cube generation method based on hash table using weight mapping table and record hash table. Because the proposed method uses a hash table, the generation cost of data cube is decreased and the memory usage is also decreased. In the performance study, we shows that the proposed method provides faster search operation time and make data cube generation operate more efficiently.

  • PDF

Resource Attack Based On Flow Table Limitation in SDN (SDN 플로우 테이블 제한에 따른 리소스 어택)

  • Tri, Hiep T. Nguyen;Kim, Kyungbaek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.215-217
    • /
    • 2014
  • In Software Defined Network (SDN), data plane and control plane are decoupled. Dummy switches on the data plane simply forward packet based on the flow entries that are stored in its flow table. The flow entries are generated by a centralized controller that acts as a brain of the network. However, the size of flow table is limited and it can conduct a security issue related to Distributed Denial of Service (DDoS). Especially, it related to resource attack that consumes all flow table resource and consumes controller resources. In this paper, we will analyze the impact of flow table limitation to the controller. Then we propose an approach that is called Flow Table Management to handle flow table limitation.

Comprehension and Appropriate Use of a Flood Table on a Gamma Camera (감마 카메라의 Flood Table에 대한 이해와 적절한 이용)

  • Kim, Jae-Il;Im, Jeong-Jin;Kim, Jin-Eui;Kim, Hyun-Joo
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.1
    • /
    • pp.29-33
    • /
    • 2011
  • Background and Purpose: Uniformity is the one of the important quality control features with respect to gamma cameras. To maintain adequate uniformity, we must acquire suitable flood table (=flood map) data because the flood table effects energy, and the type or dose of input radiation. Therefore, in this study we evaluated the difference in uniformity when uniformity does not match between the type of input radiation and the flood table data or collimator type. Subjects and Methods: For input radiation, we prepared 370 MBq of $^{57}Co$, $^{99m}Tc$, and $^{201}Tl$. Using SKYLight (Philips) and Infinia gamma cameras (GE), we acquired nine uniformity data that were corrected by technetium, cobalt flood table and did not corrected image for the three sources. Additionally, we acquired two uniformity images with a collimator that were corrected by intrinsic and extrinsic flood tables. Using this data, we evaluated and compared the uniformity values. Results: In the case of the SKYLight gamma camera, the uniformities of the images that matched between the input radiation and flood table with respect to $^{99m}Tc$ and $^{57}Co$ were better than the unmatched uniformity (3.96% vs. 5.69% ; 4.9% vs. 5.91%). However, because there was no thallium flood table, the uniformities of images at Tl were significantly incorrect (7.49%, 7.03%). The uniformities of the Infinia gamma camera had the same pattern as the SKYLight gamma camera (3.7% vs. 4.5%). Moreover, the uniformity of the $^{99m}Tc$ image acquired with a collimator and corrected by an extrinsic flood table was better than the intrinsic flood table (3.96% vs. 6.28%). Conclusion: Correcting an image by a suitable flood table can help achieve better uniformity for a gamma camera. Therefore, we have to acquire images with suitable uniformity correction, and update the flood table periodically. Whenever we acquire a nuclear medicine image, we always have to check the appropriate flood table according to the acquired condition.

  • PDF