• Title/Summary/Keyword: Domain-specificity

Search Result 122, Processing Time 0.026 seconds

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

Protein-Protein Interaction Prediction using Interaction Significance Matrix (상호작용 중요도 행렬을 이용한 단백질-단백질 상호작용 예측)

  • Jang, Woo-Hyuk;Jung, Suk-Hoon;Jung, Hwie-Sung;Hyun, Bo-Ra;Han, Dong-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.851-860
    • /
    • 2009
  • Recently, among the computational methods of protein-protein interaction prediction, vast amounts of domain based methods originated from domain-domain relation consideration have been developed. However, it is true that multi domains collaboration is avowedly ignored because of computational complexity. In this paper, we implemented a protein interaction prediction system based the Interaction Significance matrix, which quantified an influence of domain combination pair on a protein interaction. Unlike conventional domain combination methods, IS matrix contains weighted domain combinations and domain combination pair power, which mean possibilities of domain collaboration and being the main body on a protein interaction. About 63% of sensitivity and 94% of specificity were measured when we use interaction data from DIP, IntAct and Pfam-A as a domain database. In addition, prediction accuracy gradually increased by growth of learning set size, The prediction software and learning data are currently available on the web site.

Molecular determinants of the host specificity by Xanthomonas spp.

  • Heu, Sunggi;Choi, Min-Seon;Park, Hyoung-Joon;Lee, Seung-Don;Ra, Dong-Soo
    • Proceedings of the Korean Society of Plant Pathology Conference
    • /
    • 2004.10a
    • /
    • pp.65-67
    • /
    • 2004
  • During initial interactions of bacteria with their host plants, most plants recognize the bacterial infections and repel the pathogen by plant defense mechanism. The most active plant defense mechanism is the hypersensitive response (HR) which is the localized induced cell death in the plant at the site of infection by a pathogen. A primary locus induced in gram-negative phytopathogenic bacteria during this initial interaction is the Hrp locus. The Hrp locus is composed of a cluster of genes that encodes the bacteral Type 111 machinery that is involved in the secretion and translocation of effector proteins to the plant cell. DNA sequence analysis of hrp gene in phytopathogenic bacteria has revealed a Hrp pathogenicity is]and (PAI) with a tripartite mosaic structure. For many gram-negative pathogenic bacteria, colonization of the host's tissue depends on the type III protein secretion system (TTSS) which secrets and translocates effector proteins into the host cell. Effectors can be divided into several groups including broad host range effectors, host specific effectors, disease specific effectors, and effectors inhibit host defenses. The role of effectors carrying LRR domain in plant resistance is very elusive since most known plant resistance gene carry LRR domain. Host specific effectors such as several avr gene products are involved in the determination of the host specificity. Almost all the phytopathogenic Xanthomonas spp. carry avrBs1, avrBs2, and avrBs3 homologs. Some strains of X. oryzae pv. oryzae carry more than 10 copies of avrBs3 homologs. However, the functions of all those avr genes in host specificity are not characterized well.;

  • PDF

Development of radiotracer for polo-box domain of polo-like kinase 1

  • Ryu, Eun Kyoung
    • Journal of Radiopharmaceuticals and Molecular Probes
    • /
    • v.5 no.2
    • /
    • pp.152-157
    • /
    • 2019
  • Polo-like kinase 1 (Plk1) is crucial regulator of cell cycle progression during mitosis. It is known to highly overexpress in many different tumor types, and has been implicated as a potential antimitotic cancer target. The phosphopeptide, Pro-Leu-His-Ser-p-Thr (PLHSpT), was shown a high level of affinity and specificity for the polo-box domain (PBD) of Plk1. However, the peptide has the limitation of cell permeability. We designed the derivatives to enhance the limitation of PLHSpT using drug delivery system. In addition, we synthesized and evaluated its radiotracer for tumor diagnosis. This review discusses the derivative and radiotracer that are suitable for tumor treatment and diagnosis for PBD of Plk1.

Prediction of Paroxysmal Atrial Fibrillation using Time-domain Analysis and Random Forest

  • Lee, Seung-Hwan;Kang, Dong-Won;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.39 no.2
    • /
    • pp.69-79
    • /
    • 2018
  • The present study proposes an algorithm that can discriminate between normal subjects and paroxysmal atrial fibrillation (PAF) patients, which is conducted using electrocardiogram (ECG) without PAF events. For this, time-domain features and random forest classifier are used. Time-domain features are obtained from Poincare plot, Lorenz plot of ${\delta}RR$ interval, and morphology analysis. Afterward, three features are selected in total through feature selection. PAF patients and normal subjects are classified using random forest. The classification result showed that sensitivity and specificity were 81.82% and 95.24% respectively, the positive predictive value and negative predictive value were 96.43% and 76.92% respectively, and accuracy was 87.04%. The proposed algorithm had an advantage in terms of the computation requirement compared to existing algorithm, so it has suggested applicability in the more efficient prediction of PAF.

Comparison of wavelet-based decomposition and empirical mode decomposition of electrohysterogram signals for preterm birth classification

  • Janjarasjitt, Suparerk
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.826-836
    • /
    • 2022
  • Signal decomposition is a computational technique that dissects a signal into its constituent components, providing supplementary information. In this study, the capability of two common signal decomposition techniques, including wavelet-based and empirical mode decomposition, on preterm birth classification was investigated. Ten time-domain features were extracted from the constituent components of electrohysterogram (EHG) signals, including EHG subbands and EHG intrinsic mode functions, and employed for preterm birth classification. Preterm birth classification and anticipation are crucial tasks that can help reduce preterm birth complications. The computational results show that the preterm birth classification obtained using wavelet-based decomposition is superior. This, therefore, implies that EHG subbands decomposed through wavelet-based decomposition provide more applicable information for preterm birth classification. Furthermore, an accuracy of 0.9776 and a specificity of 0.9978, the best performance on preterm birth classification among state-of-the-art signal processing techniques, were obtained using the time-domain features of EHG subbands.

Solution Dynamics Studies for the Lck SH2 Domain Complexed with Peptide and Peptide-Free Forms

  • Yoon, Jeong-Hyeok;Chi, Myung-Whan;Yoon, Chang-No;Park, Jongsei
    • Proceedings of the Korean Society of Applied Pharmacology
    • /
    • 1995.04a
    • /
    • pp.81-81
    • /
    • 1995
  • It is well known that Src Homology 2(SH2) domain in many intracellular signal transduction proteins is very important. The domain has about 100 amino acid residues and bind phosphotyrosine-containing peptide with high affinity and specificity. Lck SH2 domain is a Src-like, lymphocyte-specific tyrosine kinase. An 11-residue phosphopeptide derived from the hamster polvoma middle-T antigen, EPQp YEEIPIYL, binds with an 1 nM dissociation constant to Lck SH2 domain. And it is known that the phosphotyrosine and isoleucine residues of the peptide are tightly bound by two well-defined pockets on Lck SH2 domain's surface. To investigate the conformational changes during complexation of SH2 domain with phosphopeptide we have performed the molecular dynamics simulation for Lck SH2 domain with peptide and peptide-free form at look in aqueous solution. More than 3000 water molecules were incorporated to solvate Lck SH2 domain and peptide. Periodic boundary condition has been applied in molecular dynamics simulation. Data analysis with the results of that simulation shows that the phosphopeptide makes primary interaction with the Lck SH2 domain at six central residues, The comparison of the complexed and uncomplexed SH2 domain structures in solution has revealed only relatively small change. But the hydrophilic and hydrophobic pockets in the protein surface show the conformational changes in spite of the small structural difference between the complex and peptide-free forms.

  • PDF

Determining the Specificity of Terms using Compositional and Contextual Information (구성정보와 문맥정보를 이용한 전문용어의 전문성 측정 방법)

  • Ryu Pum-Mo;Bae Sun-Mee;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.7
    • /
    • pp.636-645
    • /
    • 2006
  • A tenn with more domain specific information has higher level of term specificity. We propose new specificity calculation methods of terms based on information theoretic measures using compositional and contextual information. Specificity of terms is a kind of necessary conditions in tenn hierarchy construction task. The methods use based on compositional and contextual information of terms. The compositional information includes frequency, $tf{\cdot}idf$, bigram and internal structure of the terms. The contextual information of a tenn includes the probabilistic distribution of modifiers of terms. The proposed methods can be applied to other domains without extra procedures. Experiments showed very promising result with the precision of 82.0% when applied to the terms in MeSH thesaurus.

Investigation of Domain-specificity of Creativity and the 3-year follow-up (창의성 영역문제의 탐색 및 재접근)

  • Han, Ki-Soon
    • Journal of Gifted/Talented Education
    • /
    • v.15 no.2
    • /
    • pp.1-34
    • /
    • 2005
  • This study is composed of 2 parts. Study 1 empirically examined (1) the relationships among children's creative performances measured by three product-based assessments (story-telling, collage-making, and math word problems) in three domains, and (2) the relationships between children's general creative thinking skills, measured by two divergent thinking tests, and children's creative performances. Study 2 is a three-year follow up study of the study 1. Study 2 followed up some (71) children who participated in study 1. In study 2, long-term stability of the performance based assessment involving story-telling, collage making, and math problem making were examined during the three-year time period. In addition, study 2 tried to look at the methods effect of the domain issue of creativity, comparing self-report scale and performance-based assessment. The findings of this study support the position that creative ability in young children is rather (but not absolutely) domain-specific. The long-term stability of the performance-based assessments compares favorably with stability figures for other creativity tests. Results also indicate that there are some method effect in explaining the domain issue of creativity. Implication of the study in connection with educational practices for gifted children is discussed.

Predicting tissue-specific expressions based on sequence characteristics

  • Paik, Hyo-Jung;Ryu, Tae-Woo;Heo, Hyoung-Sam;Seo, Seung-Won;Lee, Do-Heon;Hur, Cheol-Goo
    • BMB Reports
    • /
    • v.44 no.4
    • /
    • pp.250-255
    • /
    • 2011
  • In multicellular organisms, including humans, understanding expression specificity at the tissue level is essential for interpreting protein function, such as tissue differentiation. We developed a prediction approach via generated sequence features from overrepresented patterns in housekeeping (HK) and tissue-specific (TS) genes to classify TS expression in humans. Using TS domains and transcriptional factor binding sites (TFBSs), sequence characteristics were used as indices of expressed tissues in a Random Forest algorithm by scoring exclusive patterns considering the biological intuition; TFBSs regulate gene expression, and the domains reflect the functional specificity of a TS gene. Our proposed approach displayed better performance than previous attempts and was validated using computational and experimental methods.