• 제목/요약/키워드: domain-specificity

검색결과 122건 처리시간 0.03초

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

상호작용 중요도 행렬을 이용한 단백질-단백질 상호작용 예측 (Protein-Protein Interaction Prediction using Interaction Significance Matrix)

  • 장우혁;정석훈;정휘성;현보라;한동수
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권10호
    • /
    • pp.851-860
    • /
    • 2009
  • 최근 계산을 통한 단백질 상호작용 예측 기법 중, 단백질 쌍이 포함하고 있는 도메인들 사이의 관계에 중점을 둔 도메인 정보 기반 예측 기법들이 다양하게 제안되고 있다. 하지만, 다수의 도메인 쌍들이 상호작용에 기여하는 정도를 정밀하게 반영하는 계산 기법은 드문 실정이다. 본 논문에서는 단백질 상호작용에 있어 도메인 조합 쌍의 상호작용 영향력을 수치화하여 반영한 상호작용 중요도 행렬을 고안하고 이를 기반으로 한 단백질 상호작용 예측 시스템을 구현한다. 일반적인 도메인 조합 기법과 달리, 상호작용 중요도 행렬에서는 상호작용을 위한 도메인간의 협업 확률이 고려된 Weighted 도메인 조합과, 다수의 Weighted 도메인 조합 중 실제 상호작용 주체가 될 확률을 도메인 조합 쌍의 힘(Domain Combination Pair Power, DCPPW)으로 수치화한다. DIP과 IntAct에서 얻어온 S. cerevisiae의 단백질 상호작용 데이터와 Pfam-A 도메인 정보를 사용한 정확도 검증 결과, 평균 63%의 민감도와 94%의 특이도를 확인하였으며, 학습집단의 증가에 따른 안정적인 예측 정확도 향상을 보였다. 본 논문에서 구현한 예측 시스템과 학습 데이터는 웹(http://code.google.com/p/prespi)을 통하여 내려 받을 수 있다.

Molecular determinants of the host specificity by Xanthomonas spp.

  • Heu, Sunggi;Choi, Min-Seon;Park, Hyoung-Joon;Lee, Seung-Don;Ra, Dong-Soo
    • 한국식물병리학회:학술대회논문집
    • /
    • 한국식물병리학회 2004년도 The 2004 KSPP Annual Meeting & International Symposium
    • /
    • pp.65-67
    • /
    • 2004
  • During initial interactions of bacteria with their host plants, most plants recognize the bacterial infections and repel the pathogen by plant defense mechanism. The most active plant defense mechanism is the hypersensitive response (HR) which is the localized induced cell death in the plant at the site of infection by a pathogen. A primary locus induced in gram-negative phytopathogenic bacteria during this initial interaction is the Hrp locus. The Hrp locus is composed of a cluster of genes that encodes the bacteral Type 111 machinery that is involved in the secretion and translocation of effector proteins to the plant cell. DNA sequence analysis of hrp gene in phytopathogenic bacteria has revealed a Hrp pathogenicity is]and (PAI) with a tripartite mosaic structure. For many gram-negative pathogenic bacteria, colonization of the host's tissue depends on the type III protein secretion system (TTSS) which secrets and translocates effector proteins into the host cell. Effectors can be divided into several groups including broad host range effectors, host specific effectors, disease specific effectors, and effectors inhibit host defenses. The role of effectors carrying LRR domain in plant resistance is very elusive since most known plant resistance gene carry LRR domain. Host specific effectors such as several avr gene products are involved in the determination of the host specificity. Almost all the phytopathogenic Xanthomonas spp. carry avrBs1, avrBs2, and avrBs3 homologs. Some strains of X. oryzae pv. oryzae carry more than 10 copies of avrBs3 homologs. However, the functions of all those avr genes in host specificity are not characterized well.;

  • PDF

Development of radiotracer for polo-box domain of polo-like kinase 1

  • Ryu, Eun Kyoung
    • 대한방사성의약품학회지
    • /
    • 제5권2호
    • /
    • pp.152-157
    • /
    • 2019
  • Polo-like kinase 1 (Plk1) is crucial regulator of cell cycle progression during mitosis. It is known to highly overexpress in many different tumor types, and has been implicated as a potential antimitotic cancer target. The phosphopeptide, Pro-Leu-His-Ser-p-Thr (PLHSpT), was shown a high level of affinity and specificity for the polo-box domain (PBD) of Plk1. However, the peptide has the limitation of cell permeability. We designed the derivatives to enhance the limitation of PLHSpT using drug delivery system. In addition, we synthesized and evaluated its radiotracer for tumor diagnosis. This review discusses the derivative and radiotracer that are suitable for tumor treatment and diagnosis for PBD of Plk1.

Prediction of Paroxysmal Atrial Fibrillation using Time-domain Analysis and Random Forest

  • Lee, Seung-Hwan;Kang, Dong-Won;Lee, Kyoung-Joung
    • 대한의용생체공학회:의공학회지
    • /
    • 제39권2호
    • /
    • pp.69-79
    • /
    • 2018
  • The present study proposes an algorithm that can discriminate between normal subjects and paroxysmal atrial fibrillation (PAF) patients, which is conducted using electrocardiogram (ECG) without PAF events. For this, time-domain features and random forest classifier are used. Time-domain features are obtained from Poincare plot, Lorenz plot of ${\delta}RR$ interval, and morphology analysis. Afterward, three features are selected in total through feature selection. PAF patients and normal subjects are classified using random forest. The classification result showed that sensitivity and specificity were 81.82% and 95.24% respectively, the positive predictive value and negative predictive value were 96.43% and 76.92% respectively, and accuracy was 87.04%. The proposed algorithm had an advantage in terms of the computation requirement compared to existing algorithm, so it has suggested applicability in the more efficient prediction of PAF.

Comparison of wavelet-based decomposition and empirical mode decomposition of electrohysterogram signals for preterm birth classification

  • Janjarasjitt, Suparerk
    • ETRI Journal
    • /
    • 제44권5호
    • /
    • pp.826-836
    • /
    • 2022
  • Signal decomposition is a computational technique that dissects a signal into its constituent components, providing supplementary information. In this study, the capability of two common signal decomposition techniques, including wavelet-based and empirical mode decomposition, on preterm birth classification was investigated. Ten time-domain features were extracted from the constituent components of electrohysterogram (EHG) signals, including EHG subbands and EHG intrinsic mode functions, and employed for preterm birth classification. Preterm birth classification and anticipation are crucial tasks that can help reduce preterm birth complications. The computational results show that the preterm birth classification obtained using wavelet-based decomposition is superior. This, therefore, implies that EHG subbands decomposed through wavelet-based decomposition provide more applicable information for preterm birth classification. Furthermore, an accuracy of 0.9776 and a specificity of 0.9978, the best performance on preterm birth classification among state-of-the-art signal processing techniques, were obtained using the time-domain features of EHG subbands.

Solution Dynamics Studies for the Lck SH2 Domain Complexed with Peptide and Peptide-Free Forms

  • Yoon, Jeong-Hyeok;Chi, Myung-Whan;Yoon, Chang-No;Park, Jongsei
    • 한국응용약물학회:학술대회논문집
    • /
    • 한국응용약물학회 1995년도 춘계학술대회
    • /
    • pp.81-81
    • /
    • 1995
  • It is well known that Src Homology 2(SH2) domain in many intracellular signal transduction proteins is very important. The domain has about 100 amino acid residues and bind phosphotyrosine-containing peptide with high affinity and specificity. Lck SH2 domain is a Src-like, lymphocyte-specific tyrosine kinase. An 11-residue phosphopeptide derived from the hamster polvoma middle-T antigen, EPQp YEEIPIYL, binds with an 1 nM dissociation constant to Lck SH2 domain. And it is known that the phosphotyrosine and isoleucine residues of the peptide are tightly bound by two well-defined pockets on Lck SH2 domain's surface. To investigate the conformational changes during complexation of SH2 domain with phosphopeptide we have performed the molecular dynamics simulation for Lck SH2 domain with peptide and peptide-free form at look in aqueous solution. More than 3000 water molecules were incorporated to solvate Lck SH2 domain and peptide. Periodic boundary condition has been applied in molecular dynamics simulation. Data analysis with the results of that simulation shows that the phosphopeptide makes primary interaction with the Lck SH2 domain at six central residues, The comparison of the complexed and uncomplexed SH2 domain structures in solution has revealed only relatively small change. But the hydrophilic and hydrophobic pockets in the protein surface show the conformational changes in spite of the small structural difference between the complex and peptide-free forms.

  • PDF

구성정보와 문맥정보를 이용한 전문용어의 전문성 측정 방법 (Determining the Specificity of Terms using Compositional and Contextual Information)

  • 류법모;배선미;최기선
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제33권7호
    • /
    • pp.636-645
    • /
    • 2006
  • 어떤 용어가 전문적인 개념을 많이 내포하고 있을 때 전문성이 높다고 말한다. 본 논문에서는 용어의 내부 구성정보와 외부 문맥정보를 이용하여 정보이론에 기반한 방법으로 전문용어가 내포하는 전문성을 정량적으로 계산하는 방법을 제안한다. 용어의 전문성은 용어간 상하위어 관계 설정에서 중요한 필요조건으로 사용될 수 있다. 제안한 방법은 전문용어의 내부 구성정보를 이용하는 방법, 문맥정보를 이용하는 방법 그리고 두 정보를 모두 이용하는 방법으로 나눈다. 구성정보를 이용하는 방법에서는 전문용어를 구성하는 단어의 빈도수, 가중치, 바이그램, 내부 수식구조 둥을 이용하고, 문맥정보를 이용하는 방법에서는 전문용어를 수식하는 단어들의 분포를 이용한다. 본 논문에서 제안한 방법은 분야에 독립적으로 적용될 수 있고, 전문용어 생성 절차에 대한 특정을 잘 반영할 수 있는 장점이 있다. MeSH 트리에 포함된 질병 이름의 전문성 값을 계산한 뒤 상위어의 전문성 값과 비교한 결과 82.0%의 정확률을 보였다.

창의성 영역문제의 탐색 및 재접근 (Investigation of Domain-specificity of Creativity and the 3-year follow-up)

  • 한기순
    • 영재교육연구
    • /
    • 제15권2호
    • /
    • pp.1-34
    • /
    • 2005
  • 본 연구는 크게 두 부분으로 이루어져 있다. 연구 1에서는 창의성의 영역 특수성과 영역 보편성의 문제가 109명의 초등학교 2학년생들을 대상으로 연구되었다. 연구 1의 주목적은 첫째, 세 가지 다른 영역간의 아동들의 창의성의 상관관계를 검토하고, 둘째, 아동들의 일반적인 창의적 사고능력과 세 영역에서의 창의성의 관계를 조사하고자 하는 것이다. 연구 2에서는 연구 1에 참여하였던 학생의 일부 (71명)을 대상으로 후속연구를 실시하였다. 연구2에서는 전문가 평정에 의한 창의적 수행평가가 3년이라는 기간동안 장기적 안정성을 나타내는가의 문제와 최근 이슈로 대두되고 있는 창의성의 영역문제에 있어서의 방법론적 효과에 대하여 살펴보고자 하였다. 즉, 실제의 수행평가를 사용할 경우와 자기보고 형식을 사용할 때에 창의성의 영역 문제에 대한 결과가 다르게 나타날 수 있음을 살펴보고자 하였다. 본 연구의 결과는 아동들의 창의력이 영역 보편적이기보다는 다소 영역 특수적이라는 입장을 지지하고 있다. 본 연구에서 아동들은 세 가지 다른 영역에서 일관성 있는 창의적 능력을 보이기보다는 각 영역 간 폭 넓은 창의력의 차이를 보이므로 창의력이 상당히 영역 특수적임을 제시했다. 또한 아동들의 일반적인 창의적 사고 능력을 측정하기 위해 사용된 창의성 검사들은 세 가지 영역에서 드러난 아동들의 창의성과 매우 낮거나 무의미한 상관관계를 보였다. 이와함께 이 연구는 전문가 평정에 의한 아동의 창의적 수행평가가 적절한 신뢰도 뿐 아니라 장기적으로도 안정적이고 타당한 창의성 평가 방법일 수 있음을 시사하였으며 창의성에서의 영역문제가 어떠한 접근 방법을 취하느냐에 따라 결과가 다르게 나타날 수 있음이 제기되었고 그에 따른 논의가 전개된다.

Predicting tissue-specific expressions based on sequence characteristics

  • Paik, Hyo-Jung;Ryu, Tae-Woo;Heo, Hyoung-Sam;Seo, Seung-Won;Lee, Do-Heon;Hur, Cheol-Goo
    • BMB Reports
    • /
    • 제44권4호
    • /
    • pp.250-255
    • /
    • 2011
  • In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.