• 제목/요약/키워드: biomedical informatics

검색결과 269건 처리시간 0.028초

Optimizing the maximum reported cluster size for normal-based spatial scan statistics

  • Yoo, Haerin;Jung, Inkyung
    • Communications for Statistical Applications and Methods
    • /
    • 제25권4호
    • /
    • pp.373-383
    • /
    • 2018
  • The spatial scan statistic is a widely used method to detect spatial clusters. The method imposes a large number of scanning windows with pre-defined shapes and varying sizes on the entire study region. The likelihood ratio test statistic comparing inside versus outside each window is then calculated and the window with the maximum value of test statistic becomes the most likely cluster. The results of cluster detection respond sensitively to the shape and the maximum size of scanning windows. The shape of scanning window has been extensively studied; however, there has been relatively little attention on the maximum scanning window size (MSWS) or maximum reported cluster size (MRCS). The Gini coefficient has recently been proposed by Han et al. (International Journal of Health Geographics, 15, 27, 2016) as a powerful tool to determine the optimal value of MRCS for the Poisson-based spatial scan statistic. In this paper, we apply the Gini coefficient to normal-based spatial scan statistics. Through a simulation study, we evaluate the performance of the proposed method. We illustrate the method using a real data example of female colorectal cancer incidence rates in South Korea for the year 2009.

러프집합을 이용한 규칙기반 신체활동상태 결정방법 (Decision method for rule-based physical activity status using rough sets)

  • 이영동;손창식;정완영;박희준;김윤년
    • 센서학회지
    • /
    • 제18권6호
    • /
    • pp.432-440
    • /
    • 2009
  • This paper presents an accelerometer based system for physical activity decision that are capable of recognizing three different types of physical activities, i.e., standing, walking and running, using by rough sets. To collect physical acceleration data, we developed the body sensor node which consists of two custom boards for physical activity monitoring applications, a wireless sensor node and an accelerometer sensor module. The physical activity decision is based on the acceleration data collected from body sensor node attached on the user's chest. We proposed a method to classify physical activities using rough sets which can be generated rules as attributes of the preprocessed data and by constructing a new decision table, rules reduction. Our experimental results have successfully validated that performance of the rule patterns after removing the redundant attribute values are better and exactly same compare with before.

소셜 네트워크 서비스 데이터에서 Bi-LSTM 기반 약물 부작용 게시물 탐지 모델 연구 (A Study on Bi-LSTM-Based Drug Side Effects Post Detection Model in Social Network Service Data)

  • 이충천;이승희;송미화;이수현
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2022년도 춘계학술발표대회
    • /
    • pp.397-400
    • /
    • 2022
  • 본 연구에서는 소셜 네트워크 서비스(Social Network Service, SNS) 데이터로부터 약물 부작용 게시글을 추출하기 위한 순환 신경망(Recurrent Neural Network, RNN) 기반 분류 모델을 제안한다. 먼저, 처방 빈도가 높으며 게시글을 많이 확보할 수 있는 케토프로펜 약물에 대하여 국내 최대 소셜 네트워크 플랫폼인 네이버 블로그와 카페의 게시글(2005 년~2020 년)을 확보하고 최종 3,828 건을 분석하였다. 결과적으로 케토프로펜에 대한 3 종(약물, 부작용, 불용어)의 렉시콘을 정의하였으며 이를 기반으로 Bi-LSTM 분류모델 기준 87%의 정확도를 얻었다. 본 연구에서 제안하는 모델은 SNS 데이터가 약물 부작용 정보 획득을 위한 기존 (전자의무기록, 자발적 약물 부작용 보고 시스템 등) 자료원에 대한 보완적 정보원이 되며, 개발된 Bi-LSTM 분류모델을 통해 약물 부작용 게시글 추출의 편리성을 제공할 것으로 기대된다.

인공지능 학습데이터 라벨링 정확도에 따른 인공지능 성능 (AI Performance Based On Learning-Data Labeling Accuracy)

  • 이지훈;신지은
    • 산업융합연구
    • /
    • 제22권1호
    • /
    • pp.177-183
    • /
    • 2024
  • 본 연구는 데이터의 품질이 인공지능(AI) 성능에 미치는 영향을 검토한다. 이를 위해, 데이터 특성변수(Feature)의 유사도와 클래스(Class) 구성의 불균형을 고려한 모의실험(Simulation)을 통해 라벨링 오류 수준이 인공지능의 성능에 미치는 영향을 비교 분석하였다. 그 결과, 특성변수 간 유사성이 높은 데이터에서는 특성 변수 간 유사성이 낮은 데이터에 비해 라벨링 정확도에 더 민감하게 반응하였으며, 클래스 불균형이 증가함에 따라 인공지능 정확도가 급격히 감소되는 경향을 관찰하였다. 이는 인공지능 학습데이터의 품질평가 기준 및 관련 연구를 위한 기초자료가 될 것이다.

Understanding Disease Susceptibility through Population Genomics

  • Han, Seonggyun;Lee, Junnam;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제10권4호
    • /
    • pp.234-238
    • /
    • 2012
  • Genetic epidemiology studies have established that the natural variation of gene expression profiles is heritable and has genetic bases. A number of proximal and remote DNA variations, known as expression quantitative trait loci (eQTLs), that are associated with the expression phenotypes have been identified, first in Epstein-Barr virus-transformed lymphoblastoid cell lines and later expanded to other cell and tissue types. Integration of the eQTL information and the network analysis of transcription modules may lead to a better understanding of gene expression regulation. As these network modules have relevance to biological or disease pathways, these findings may be useful in predicting disease susceptibility.

A biomedically oriented automatically annotated Twitter COVID-19 dataset

  • Hernandez, Luis Alberto Robles;Callahan, Tiffany J.;Banda, Juan M.
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.21.1-21.5
    • /
    • 2021
  • The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

  • Lee, Yuna;Park, Kiejung;Koh, Insong
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.40.1-40.9
    • /
    • 2019
  • While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.

Standard based Deposit Guideline for Distribution of Human Biological Materials in Cancer Patients

  • Seo, Hwa Jeong;Kim, Hye Hyeon;Im, Jeong Soo;Kim, Ju Han
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권14호
    • /
    • pp.5545-5550
    • /
    • 2014
  • Background: Human biological materials from cancer patients are linked directly with public health issues in medical science research as foundational resources so securing "human biological material" is truly important in bio-industry. However, because South Korea's national R and D project lacks a proper managing system for establishing a national standard for the outputs of certain processes, high-value added human biological material produced by the national R and D project could be lost or neglected. As a result, it is necessary to develop a managing process, which can be started by establishing operating guidelines to handle the output of human biological materials. Materials and Methods: The current law and regulations related to submitting research outcome resources was reviewed, and the process of data 'acquisition' and data 'distribution' from the point of view of big data and health 2.0 was examined in order to arrive at a method for switching paradigms to better utilize human biological materials. Results: For the deposit of biological research resources, the original process was modified and a standard process with relative forms was developed. With deposit forms, research information, researchers, and deposit type are submitted. The checklist's 26 items are provided for publishing. This is a checklist of items that should be addressed in deposit reports. Lastly, XML-based deposit procedure forms were designed and developed to collect data in a structured form, to help researchers distribute their data in an electronic way. Conclusions: Through guidelines included with the plan for profit sharing between depositor and user it is possible to manage the material effectively and safely, so high-quality human biological material can be supplied and utilized by researchers from universities, industry and institutes. Furthermore, this will improve national competitiveness by leading to development in the national bio-science industry.

Single-cell RNA sequencing identifies distinct transcriptomic signatures between PMA/ionomycin- and αCD3/αCD28-activated primary human T cells

  • Jung Ho Lee;Brian H Lee;Soyoung Jeong;Christine Suh-Yun Joh;Hyo Jeong Nam;Hyun Seung Choi;Henry Sserwadda;Ji Won Oh;Chung-Gyu Park;Seon-Pil Jin;Hyun Je Kim
    • Genomics & Informatics
    • /
    • 제21권2호
    • /
    • pp.18.1-18.11
    • /
    • 2023
  • Immunologists have activated T cells in vitro using various stimulation methods, including phorbol myristate acetate (PMA)/ionomycin and αCD3/αCD28 agonistic antibodies. PMA stimulates protein kinase C, activating nuclear factor-κB, and ionomycin increases intracellular calcium levels, resulting in activation of nuclear factor of activated T cell. In contrast, αCD3/αCD28 agonistic antibodies activate T cells through ZAP-70, which phosphorylates linker for activation of T cell and SH2-domain-containing leukocyte protein of 76 kD. However, despite the use of these two different in vitro T cell activation methods for decades, the differential effects of chemical-based and antibody-based activation of primary human T cells have not yet been comprehensively described. Using single-cell RNA sequencing (scRNA-seq) technologies to analyze gene expression unbiasedly at the single-cell level, we compared the transcriptomic profiles of the non-physiological and physiological activation methods on human peripheral blood mononuclear cell-derived T cells from four independent donors. Remarkable transcriptomic differences in the expression of cytokines and their respective receptors were identified. We also identified activated CD4 T cell subsets (CD55+) enriched specifically by PMA/ionomycin activation. We believe this activated human T cell transcriptome atlas derived from two different activation methods will enhance our understanding, highlight the optimal use of these two in vitro T cell activation assays, and be applied as a reference standard when analyzing activated specific disease-originated T cells through scRNA-seq.

신경망을 사용한 사상체질 진단검사 개발 연구 (Development of Sasang Type Diagnostic Test with Neural Network)

  • 채한;황상문;엄일규;김병철;김영인;김병주;권영규
    • 동의생리병리학회지
    • /
    • 제23권4호
    • /
    • pp.765-771
    • /
    • 2009
  • The medical informatics for clustering Sasang types with collected clinical data is important for the personalized medicine, but it has not been thoroughly studied yet. The purpose of this study was to examine the usefulness of neural network data mining algorithm for traditional Korean medicine. We used Kohonen neural network, the Self-Organizing Map (SOM), for the analysis of biomedical information following data pre-processing and calculated the validity index as percentage correctly predicted and type-specific sensitivity. We can extract 12 data fields from 30 after data pre-processing with correlation analysis and latent functional relationship analysis. The profile of Myers-Briggs Type Inidcator and Bio-Impedance Analysis data which are clustered with SOM was similar to that of original measurements. The percentage correctly predicted was 56%, and sensitivity for So-Yang, Tae-Eum and So-Eum type were 56%, 48%, and 61%, respectively. This study showed that the neural network algorithm for clustering Sasang types based on clinical data is useful for the sasang type diagnostic test itself. We discussed the importance of data pre-processing and clustering algorithm for the validity of medical devices in traditional Korean medicine.