• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.03 seconds

A Study on the Consumer Perception of Metaverse Before and After COVID-19 through Big Data Analysis (빅데이터 분석을 통한 코로나 이전과 이후 메타버스에 대한 소비자의 인식에 관한 연구)

  • Park, Sung-Woo;Park, Jun-Ho;Ryu, Ki-Hwan
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.287-294
    • /
    • 2022
  • The purpose of this study is to find out consumers' perceptions of "metaverse," a newly spotlighted technology, through big data analysis as a non-face-to-face society continues after the outbreak of COVID-19. This study conducted a big data analysis using text mining to analyze consumers' perceptions of metaverse before and after COVID-19. The top 30 keywords were extracted through word purification, and visualization was performed through network analysis and concor analysis between each keyword based on this. As a result of the analysis, it was confirmed that the non-face-to-face society continued and metaverse emerged as a trend. Previously, metaverse was focused on textual data such as SNS as a part of life logging, but after that, it began to pay attention to virtual reality space, creating many platforms and expanding industries. The limitation of this study is that since data was collected through the search frequency of portal sites, anonymity was guaranteed, so demographic characteristics were not reflected when data was collected.

Big Data Application for Judgment on Consumer's Awareness of the Trademark (상표의 소비자 인식 판단을 위한 빅데이터 활용 방안)

  • You, Hyun-Woo;Lee, Hwan-soo
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.6 no.8
    • /
    • pp.399-408
    • /
    • 2016
  • As entering the Big Data age, utilization of Big Data is also increasing in the intellectual property sector. Meanwhile, the purpose of a trademark which distinguishes the source of the goods essentially is to enable the public to recognize the goods. Big Data technologies which is recently becoming a issue can be used as a tool to judge consumer's awareness of the trademark. It was difficult for judgment of trademark awareness through traditional ways. As a new way, survey methodology has bee received attention, and it was applied to the field of trademark law. However, various problems such as cost, time, objectivity, and fairness were observed. In order to overcome theses limitations, this study proposes new way utilizing big data analytics for judgment on consumer's awareness of the trademark. This new way will not only contribute to enhancing the objectivity of judging trademark awareness but also utilized to support for related legal judgments.

Development of Social Data Collection and Loading Engine-based Reliability analysis System Against Infectious Disease Pandemic (감염병 위기 대응을 위한 소셜 데이터 수집 및 적재 엔진 기반 신뢰도 분석 시스템 개발)

  • Doo Young Jung;Sang-Jun Lee;MIN KYUNG IL;Seogsong Jeong;HyunWook Han
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.103-111
    • /
    • 2022
  • There are many institutions, organizations, and sites related to responding to infectious diseases, but as the pandemic situation such as COVID-19 continues for years, there are many changes in the initial and current aspects, and accordingly, policies and response systems are evolving. As a result, regional gaps arise, and various problems are scattered due to trust, distrust, and implementation of policies. Therefore, in the process of analyzing social data including information transmission, Twitter data, one of the major social media platforms containing inaccurate information from unknown sources, was developed to prevent facts in advance. Based on social data, which is unstructured data, an algorithm that can automatically detect infectious disease threats is developed to create an objective basis for responding to the infectious disease crisis to solidify international competitiveness in related fields.

Privacy Preserving Techniques for Deep Learning in Multi-Party System (멀티 파티 시스템에서 딥러닝을 위한 프라이버시 보존 기술)

  • Hye-Kyeong Ko
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.647-654
    • /
    • 2023
  • Deep Learning is a useful method for classifying and recognizing complex data such as images and text, and the accuracy of the deep learning method is the basis for making artificial intelligence-based services on the Internet useful. However, the vast amount of user da vita used for training in deep learning has led to privacy violation problems, and it is worried that companies that have collected personal and sensitive data of users, such as photographs and voices, own the data indefinitely. Users cannot delete their data and cannot limit the purpose of use. For example, data owners such as medical institutions that want to apply deep learning technology to patients' medical records cannot share patient data because of privacy and confidentiality issues, making it difficult to benefit from deep learning technology. In this paper, we have designed a privacy preservation technique-applied deep learning technique that allows multiple workers to use a neural network model jointly, without sharing input datasets, in multi-party system. We proposed a method that can selectively share small subsets using an optimization algorithm based on modified stochastic gradient descent, confirming that it could facilitate training with increased learning accuracy while protecting private information.

A study on Wikidata linkage methods for utilization of digital archive records of the National Debt Redemption Movement (국채보상운동 디지털 아카이브 기록물의 활용을 위한 위키데이터 연계 방안에 대한 연구)

  • Seulki Do;Heejin Park
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.2
    • /
    • pp.95-115
    • /
    • 2023
  • This study designed a data model linked to Wikidata and examined its applicability to increase the utilization of the digital archive records of the National Debt Redemption Movement, registered as World Memory Heritage, and implications were derived by analyzing the existing metadata, thesaurus, and semantic network graph. Through analysis of the original text of the National Debt Redemption Movement records, key data model classes for linking with Wikidata, such as record item, agent, time, place, and event, were derived. In addition, by identifying core properties for linking between classes and applying the designed data model to actual records, the possibility of acquiring abundant related information was confirmed through movement between classes centered on properties. Thus, this study's result showed that Wikidata's strengths could be utilized to increase data usage in local archives where the scale and management of data are relatively small. Therefore, it can be considered for application in a small-scale archive similar to the National Debt Redemption Movement digital archive.

Application of Big Data and Machine-learning (ML) Technology to Mitigate Contractor's Design Risks for Engineering, Procurement, and Construction (EPC) Projects

  • Choi, Seong-Jun;Choi, So-Won;Park, Min-Ji;Lee, Eul-Bum
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.823-830
    • /
    • 2022
  • The risk of project execution increases due to the enlargement and complexity of Engineering, Procurement, and Construction (EPC) plant projects. In the fourth industrial revolution era, there is an increasing need to utilize a large amount of data generated during project execution. The design is a key element for the success of the EPC plant project. Although the design cost is about 5% of the total EPC project cost, it is a critical process that affects the entire subsequent process, such as construction, installation, and operation & maintenance (O&M). This study aims to develop a system using machine-learning (ML) techniques to predict risks and support decision-making based on big data generated in an EPC project's design and construction stages. As a result, three main modules were developed: (M1) the design cost estimation module, (M2) the design error check module, and (M3) the change order forecasting module. M1 estimated design cost based on project data such as contract amount, construction period, total design cost, and man-hour (M/H). M2 and M3 are applications for predicting the severity of schedule delay and cost over-run due to design errors and change orders through unstructured text data extracted from engineering documents. A validation test was performed through a case study to verify the model applied to each module. It is expected to improve the risk response capability of EPC contractors in the design and construction stage through this study.

  • PDF

Cost Performance Evaluation Framework through Analysis of Unstructured Construction Supervision Documents using Binomial Logistic Regression (비정형 공사감리문서 정보와 이항 로지스틱 회귀분석을 이용한 건축 현장 비용성과 평가 프레임워크 개발)

  • Kim, Chang-Won;Song, Taegeun;Lee, Kiseok;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.121-131
    • /
    • 2024
  • This research explores the potential of leveraging unstructured data from construction supervision documents, which contain detailed inspection insights from independent third-party monitors of building construction processes. With the evolution of analytical methodologies, such unstructured data has been recognized as a valuable source of information, offering diverse insights. The study introduces a framework designed to assess cost performance by applying advanced analytical methods to the unstructured data found in final construction supervision reports. Specifically, key phrases were identified using text mining and social network analysis techniques, and these phrases were then analyzed through binomial logistic regression to assess cost performance. The study found that predictions of cost performance based on unstructured data from supervision documents achieved an accuracy rate of approximately 73%. The findings of this research are anticipated to serve as a foundational resource for analyzing various forms of unstructured data generated within the construction sector in future projects.

Corpus of Eye Movements in L3 Spanish Reading: A Prediction Model

  • Hui-Chuan Lu;Li-Chi Kao;Zong-Han Li;Wen-Hsiang Lu;An-Chung Cheng
    • Asia Pacific Journal of Corpus Research
    • /
    • v.5 no.1
    • /
    • pp.23-36
    • /
    • 2024
  • This research centers on the Taiwan Eye-Movement Corpus of Spanish (TECS), a specially created corpus comprising eye-tracking data from Chinese-speaking learners of Spanish as a third language in Taiwan. Its primary purpose is to explore the broad utility of TECS in understanding language learning processes, particularly the initial stages of language learning. Constructing this corpus involves gathering data on eye-tracking, reading comprehension, and language proficiency to develop a machine-learning model that predicts learner behaviors, and subsequently undergoes a predictability test for validation. The focus is on examining attention in input processing and their relationship to language learning outcomes. The TECS eye-tracking data consists of indicators derived from eye movement recordings while reading Spanish sentences with temporal references. These indicators are obtained from eye movement experiments focusing on tense verbal inflections and temporal adverbs. Chinese expresses tense using aspect markers, lexical references, and contextual cues, differing significantly from inflectional languages like Spanish. Chinese-speaking learners of Spanish face particular challenges in learning verbal morphology and tenses. The data from eye movement experiments were structured into feature vectors, with learner behaviors serving as class labels. After categorizing the collected data, we used two types of machine learning methods for classification and regression: Random Forests and the k-nearest neighbors algorithm (KNN). By leveraging these algorithms, we predicted learner behaviors and conducted performance evaluations to enhance our understanding of the nexus between learner behaviors and language learning process. Future research may further enrich TECS by gathering data from subsequent eye-movement experiments, specifically targeting various Spanish tenses and temporal lexical references during text reading. These endeavors promise to broaden and refine the corpus, advancing our understanding of language processing.

A Digital Library Prototype for Access to Diverse Collections (다양한 장서 접근을 위한 디지털 도서관의 프로토타입 구축)

  • Choi Won-Tae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.2
    • /
    • pp.295-307
    • /
    • 1998
  • This article is an overview of the digital library project, indicating what roles Koreas diverse digital collections may play. Our digital library prototype has simple architecture, consisting of digital repositories, filters, indexing and searching, and clients. Digital repositories include various types of materials and databases. The role of filters is to recognize a format of a document collection and mark the structural components of each of its documents. We are using a database management system (ORACLE and ConText) supporting user-defined functions and access methods that allows us to easily incorporate new object analysis, structuring, and indexing technology into a repository. Clients can be considered browsers or viewers designed for different document data types, such as image, audio, video, SGML, PDF, and KORMARC. The combination of navigational tools supports a variety of approaches to identifying collections and browsing or searching for individual items. The search interface was implemented using HTML forms and the World Wide Web's CGI mechanism.

  • PDF

A Rule-based Approach to Identifying Citation Text from Korean Academic Literature (한국어 학술 문헌의 본문 인용문 인식을 위한 규칙 기반 방법)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.43-60
    • /
    • 2012
  • Identifying citing sentences from article full-text is a prerequisite for creating a variety of future academic information services such as citation-based automatic summarization, automatic generation of review articles, sentiment analysis of citing statements, information retrieval based on citation contexts, etc. However, finding citing sentences is not easy due to the existence of implicit citing sentences which do not have explicit citation markers. While several methods have been proposed to attack this problem for English, it is difficult to find such automatic methods for Korean academic literature. This article presents a rule-based approach to identifying Korean citing sentences. Experiments show that the proposed method could find 30% of implicit citing sentences in our test data in nearly 70% precision.