• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.032 seconds

Extraction Analysis for Crossmodal Association Information using Hypernetwork Models (하이퍼네트워크 모델을 이용한 비전-언어 크로스모달 연관정보 추출)

  • Heo, Min-Oh;Ha, Jung-Woo;Zhang, Byoung-Tak
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.278-284
    • /
    • 2009
  • Multimodal data to have several modalities such as videos, images, sounds and texts for one contents is increasing. Since this type of data has ill-defined format, it is not easy to represent the crossmodal information for them explicitly. So, we proposed new method to extract and analyze vision-language crossmodal association information using the documentaries video data about the nature. We collected pairs of images and captions from 3 genres of documentaries such as jungle, ocean and universe, and extracted a set of visual words and that of text words from them. We found out that two modal data have semantic association on crossmodal association information from this analysis.

  • PDF

Huffman Code Design and PSIP Structure of Hangul Data for Digital Broadcasting (디지털 방송용 한글 허프만 부호 설계 및 PSIP 구조)

  • 황재정;진경식;한학수;최준영;이진환
    • Journal of Broadcast Engineering
    • /
    • v.6 no.1
    • /
    • pp.98-107
    • /
    • 2001
  • In this paper we derive an optimal Huffman code set with escape coding that miximizes coding efficiency for the Hangul text data. The Hangul code can be represented in the standard Wansung or Unicode format, and we can generate a set of Huffamn codes for both. The current Korean DT standard has not defined a Hangul compression algorithm which may be confronted with a serious data rate for the digital data broadcasting system Generation of the optimal Huffman code set is to solve the data transmission problem. A relevant PSIP structure for the DTB standard is also proposed As a result characters which have the probability of less than 0.0043 are escape coded, showing the optimum compression efficiency of 46%.

  • PDF

The Auto Regressive Parameter Estimation and Pattern Classification of EKS Signals for Automatic Diagnosis (심전도 신호의 자동분석을 위한 자기회귀모델 변수추정과 패턴분류)

  • 이윤선;윤형로
    • Journal of Biomedical Engineering Research
    • /
    • v.9 no.1
    • /
    • pp.93-100
    • /
    • 1988
  • The Auto Regressive Parameter Estimation and Pattern Classification of EKG Signal for Automatic Diagnosis. This paper presents the results from pattern discriminant analysis of an AR (auto regressive) model parameter group, which represents the HRV (heart rate variability) that is being considered as time series data. HRV data was extracted using the correct R-point of the EKG wave that was A/D converted from the I/O port both by hardware and software functions. Data number (N) and optimal (P), which were used for analysis, were determined by using Burg's maximum entropy method and Akaike's Information Criteria test. The representative values were extracted from the distribution of the results. In turn, these values were used as the index for determining the range o( pattern discriminant analysis. By carrying out pattern discriminant analysis, the performance of clustering was checked, creating the text pattern, where the clustering was optimum. The analysis results showed first that the HRV data were considered sufficient to ensure the stationarity of the data; next, that the patern discrimimant analysis was able to discriminate even though the optimal order of each syndrome was dissimilar.

  • PDF

A Study on the Application of Natural Language Processing in Health Care Big Data: Focusing on Word Embedding Methods (보건의료 빅데이터에서의 자연어처리기법 적용방안 연구: 단어임베딩 방법을 중심으로)

  • Kim, Hansang;Chung, Yeojin
    • Health Policy and Management
    • /
    • v.30 no.1
    • /
    • pp.15-25
    • /
    • 2020
  • While healthcare data sets include extensive information about patients, many researchers have limitations in analyzing them due to their intrinsic characteristics such as heterogeneity, longitudinal irregularity, and noise. In particular, since the majority of medical history information is recorded in text codes, the use of such information has been limited due to the high dimensionality of explanatory variables. To address this problem, recent studies applied word embedding techniques, originally developed for natural language processing, and derived positive results in terms of dimensional reduction and accuracy of the prediction model. This paper reviews the deep learning-based natural language processing techniques (word embedding) and summarizes research cases that have used those techniques in the health care field. Then we finally propose a research framework for applying deep learning-based natural language process in the analysis of domestic health insurance data.

Online VQ Codebook Generation using a Triangle Inequality (삼각 부등식을 이용한 온라인 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.16 no.3
    • /
    • pp.373-379
    • /
    • 2015
  • In this paper, we propose an online VQ Codebook generation method for updating an existing VQ Codebook in real-time and adding to an existing cluster with newly created text data which are news paper, web pages, blogs, tweets and IoT data like sensor, machine. Without degrading the performance of the batch VQ Codebook to the existing data, it was able to take advantage of the newly added data by using a triangle inequality which modifying the VQ Codebook progressively show a high degree of accuracy and speed. The result of applying to test data showed that the performance is similar to the batch method.

Customized Information Analysis System Using National Defense News Data (국방 기사 데이터를 이용한 맞춤형 정보 분석 시스템)

  • Choi, Jung-Whoan;Lim, Chea-O
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.457-465
    • /
    • 2010
  • Customized information analysis system is a software system that can help to extract useful information from non-structured natural language data, process the information to customized form, and provide future forecast and reasoning information. To implement the information analysis system, we need natural language processing technology to analyze natural language, information extraction technology to detect necessary entity and its relationship from text, and data mining technology to discover new and unknown information from extracting data. This paper suggest virtual customized information analysis system processing national defense news data and introduce base technologies for information analysis.

Extracting of Interest Issues Related to Patient Medical Services for Small and Medium Hospital by SNS Big Data Text Mining and Social Networking (중소병원 환자의료서비스에 관한 관심 이슈 도출을 위한 SNS 빅 데이터 텍스트 마이닝과 사회적 연결망 적용)

  • Hwang, Sang Won
    • Korea Journal of Hospital Management
    • /
    • v.23 no.4
    • /
    • pp.26-39
    • /
    • 2018
  • Purposes: The purpose of this study is to analyze the issue of interest in patient medical service of small and medium hospitals using big data. Methods: The method of this study was implemented by data mining and social network using SNS big data. The analysis tool were extracted key keywords and analyzed correlation by using Textom, Ucinet6 and NetDraw program. Findings: In the results of frequency, the network-centered and closeness centrality analysis, It was shown that the government center is interested in the major explanations and evaluations of the technology, information, security, safety, cost and problems of small and medium hospitals, coping with infections, and actual involvement in bank settlement. And, were extracted care for disabilities such as pediatrics, dentistry, obstetrics and gynecology, dementia, nursing, the elderly, and rehabilitation. Practical Implications: Future studies will be more useful if analyzed the needs of customers for medical services in the metropolitan area and provinces may be different in the small and medium hospitals to be studied, further classification studies.

A Scraping Method of In-Frame Web Sources Using Python (파이썬을 이용한 프레임내 웹 페이지 스크래핑 기법)

  • Yun, Sujin;Seung, Li;Woo, Young Woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.271-274
    • /
    • 2019
  • In this paper, we proposed a detailed address acquisition scheme for automatically collecting data of a web page in a frame that is difficult to access by a general web access method. Using the Python language and the Beautiful Soup library, which can utilize the proposed address resolution technique and the HTML selector, we were able to automatically collect all the bulletin board text data written in several pages. By using the proposed method, we can collect large amount of data automatically by Python web scraping program for web pages of any form of address, and we expect that it can be used for big data analysis.

  • PDF

A Study on the General Public's Perceptions of Dental Fear Using Unstructured Big Data

  • Han-A Cho;Bo-Young Park
    • Journal of dental hygiene science
    • /
    • v.23 no.4
    • /
    • pp.255-263
    • /
    • 2023
  • Background: This study used text mining techniques to determine public perceptions of dental fear, extracted keywords related to dental fear, identified the connection between the keywords, and categorized and visualized perceptions related to dental fear. Methods: Keywords in texts posted on Internet portal sites (NAVER and Google) between 1 January, 2000, and 31 December, 2022, were collected. The four stages of analysis were used to explore the keywords: frequency analysis, term frequency-inverse document frequency (TF-IDF), centrality analysis and co-occurrence analysis, and convergent correlations. Results: In the top ten keywords based on frequency analysis, the most frequently used keyword was 'treatment,' followed by 'fear,' 'dental implant,' 'conscious sedation,' 'pain,' 'dental fear,' 'comfort,' 'taking medication,' 'experience,' and 'tooth.' In the TF-IDF analysis, the top three keywords were dental implant, conscious sedation, and dental fear. The co-occurrence analysis was used to explore keywords that appear together and showed that 'fear and treatment' and 'treatment and pain' appeared the most frequently. Conclusion: Texts collected via unstructured big data were analyzed to identify general perceptions related to dental fear, and this study is valuable as a source data for understanding public perceptions of dental fear by grouping associated keywords. The results of this study will be helpful to understand dental fear and used as factors affecting oral health in the future.

A cross-domain access control mechanism based on model migration and semantic reasoning

  • Ming Tan;Aodi Liu;Xiaohan Wang;Siyuan Shang;Na Wang;Xuehui Du
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1599-1618
    • /
    • 2024
  • Access control has always been one of the effective methods to protect data security. However, in new computing environments such as big data, data resources have the characteristics of distributed cross-domain sharing, massive and dynamic. Traditional access control mechanisms are difficult to meet the security needs. This paper proposes CACM-MMSR to solve distributed cross-domain access control problem for massive resources. The method uses blockchain and smart contracts as a link between different security domains. A permission decision model migration method based on access control logs is designed. It can realize the migration of historical policy to solve the problems of access control heterogeneity among different security domains and the updating of the old and new policies in the same security domain. Meanwhile, a semantic reasoning-based permission decision method for unstructured text data is designed. It can achieve a flexible permission decision by similarity thresholding. Experimental results show that the proposed method can reduce the decision time cost of distributed access control to less than 28.7% of a single node. The permission decision model migration method has a high decision accuracy of 97.4%. The semantic reasoning-based permission decision method is optimal to other reference methods in vectorization and index time cost.