• Title/Summary/Keyword: Search algorithms

Search Result 1,328, Processing Time 0.028 seconds

Development of Workbench for Analysis and Visualization of Whole Genome Sequence (전유전체(Whole gerlome) 서열 분석과 가시화를 위한 워크벤치 개발)

  • Choe, Jeong-Hyeon;Jin, Hui-Jeong;Kim, Cheol-Min;Jang, Cheol-Hun;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.387-398
    • /
    • 2002
  • As whole genome sequences of many organisms have been revealed by small-scale genome projects, the intensive research on individual genes and their functions has been performed. However on-memory algorithms are inefficient to analysis of whole genome sequences, since the size of individual whole genome is from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench system for analysis and visualization of whole genome sequence using string B-tree that is suitable for analysis of huge data. This system consists of two parts : analysis query part and visualization part. Query system supports various transactions such as sequence search, k-occurrence, and k-mer analysis. Visualization system helps biological scientist to easily understand whole structure and specificity by many kinds of visualization such as whole genome sequence, annotation, CGR (Chaos Game Representation), k-mer, and RWP (Random Walk Plot). One can find the relations among organisms, predict the genes in a genome, and research on the function of junk DNA using our workbench.

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

A Weighted Frequent Graph Pattern Mining Approach considering Length-Decreasing Support Constraints (길이에 따라 감소하는 빈도수 제한조건을 고려한 가중화 그래프 패턴 마이닝 기법)

  • Yun, Unil;Lee, Gangin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.125-132
    • /
    • 2014
  • Since frequent pattern mining was proposed in order to search for hidden, useful pattern information from large-scale databases, various types of mining approaches and applications have been researched. Especially, frequent graph pattern mining was suggested to effectively deal with recent data that have been complicated continually, and a variety of efficient graph mining algorithms have been studied. Graph patterns obtained from graph databases have their own importance and characteristics different from one another according to the elements composing them and their lengths. However, traditional frequent graph pattern mining approaches have the limitations that do not consider such problems. That is, the existing methods consider only one minimum support threshold regardless of the lengths of graph patterns extracted from their mining operations and do not use any of the patterns' weight factors; therefore, a large number of actually useless graph patterns may be generated. Small graph patterns with a few vertices and edges tend to be interesting when their weighted supports are relatively high, while large ones with many elements can be useful even if their weighted supports are relatively low. For this reason, we propose a weight-based frequent graph pattern mining algorithm considering length-decreasing support constraints. Comprehensive experimental results provided in this paper show that the proposed method guarantees more outstanding performance compared to a state-of-the-art graph mining algorithm in terms of pattern generation, runtime, and memory usage.

Customizable Global Job Scheduler for Computational Grid (계산 그리드를 위한 커스터마이즈 가능한 글로벌 작업 스케줄러)

  • Hwang Sun-Tae;Heo Dae-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.7
    • /
    • pp.370-379
    • /
    • 2006
  • Computational grid provides the environment which integrates v 따 ious computing resources. Grid environment is more complex and various than traditional computing environment, and consists of various resources where various software packages are installed in different platforms. For more efficient usage of computational grid, therefore, some kind of integration is required to manage grid resources more effectively. In this paper, a global scheduler is suggested, which integrates grid resources at meta level with applying various scheduling policies. The global scheduler consists of a mechanical part and three policies. The mechanical part mainly search user queues and resource queues to select appropriate job and computing resource. An algorithm for the mechanical part is defined and optimized. Three policies are user selecting policy, resource selecting policy, and executing policy. These can be defined newly and replaced with new one freely while operation of computational grid is temporarily holding. User selecting policy, for example, can be defined to select a certain user with higher priority than other users, resource selecting policy is for selecting the computing resource which is matched well with user's requirements, and executing policy is to overcome communication overheads on grid middleware. Finally, various algorithms for user selecting policy are defined only in terms of user fairness, and their performances are compared.

Developing algorithms for providing evacuation and detour route guidance under emergency conditions (재난.재해 시 대피 및 우회차량 경로 제공 알고리즘 개발)

  • Yang, Choong-Heon;Son, Young-Tae;Yang, In-Chul;Kim, Hyun-Myoung
    • International Journal of Highway Engineering
    • /
    • v.11 no.3
    • /
    • pp.129-139
    • /
    • 2009
  • The transportation network is a critical infrastructure in the event of natural and human caused disasters such as rainfall, snowfall, and terror and so on. Particularly, the transportation network in an urban area where a large number of population live is subject to be negatively affected from such events. Therefore, efficient traffic operation plans are required to assist rapid evacuation and effective detour of vehicles on the network as soon as possible. Recently, ubiquitous communication and sensor network technology is very useful to improve data collection and connection related emergency information. In this study, we develop a specific algorithm to provide evacuation route and detour information only for vehicles under emergency situations. Our algorithm is based on shortest path search technique and dynamic traffic assignment. We perform the case study to evaluate model performance applying hypothetical scenarios involved terror. Results show that the model successfully describe effective path for each vehicle under emergency situation.

  • PDF

Computational Optimization of Bioanalytical Parameters for the Evaluation of the Toxicity of the Phytomarker 1,4 Napthoquinone and its Metabolite 1,2,4-trihydroxynapththalene

  • Gopal, Velmani;AL Rashid, Mohammad Harun;Majumder, Sayani;Maiti, Partha Pratim;Mandal, Subhash C
    • Journal of Pharmacopuncture
    • /
    • v.18 no.2
    • /
    • pp.7-18
    • /
    • 2015
  • Objectives: Lawsone (1,4 naphthoquinone) is a non redox cycling compound that can be catalyzed by DT diaphorase (DTD) into 1,2,4-trihydroxynaphthalene (THN), which can generate reactive oxygen species by auto oxidation. The purpose of this study was to evaluate the toxicity of the phytomarker 1,4 naphthoquinone and its metabolite THN by using the molecular docking program AutoDock 4. Methods: The 3D structure of ligands such as hydrogen peroxide ($H_2O_2$), nitric oxide synthase (NOS), catalase (CAT), glutathione (GSH), glutathione reductase (GR), glucose 6-phosphate dehydrogenase (G6PDH) and nicotinamide adenine dinucleotide phosphate hydrogen (NADPH) were drawn using hyperchem drawing tools and minimizing the energy of all pdb files with the help of hyperchem by $MM^+$ followed by a semi-empirical (PM3) method. The docking process was studied with ligand molecules to identify suitable dockings at protein binding sites through annealing and genetic simulation algorithms. The program auto dock tools (ADT) was released as an extension suite to the python molecular viewer used to prepare proteins and ligands. Grids centered on active sites were obtained with spacings of $54{\times}55{\times}56$, and a grid spacing of 0.503 was calculated. Comparisons of Global and Local Search Methods in Drug Docking were adopted to determine parameters; a maximum number of 250,000 energy evaluations, a maximum number of generations of 27,000, and mutation and crossover rates of 0.02 and 0.8 were used. The number of docking runs was set to 10. Results: Lawsone and THN can be considered to efficiently bind with NOS, CAT, GSH, GR, G6PDH and NADPH, which has been confirmed through hydrogen bond affinity with the respective amino acids. Conclusion: Naphthoquinone derivatives of lawsone, which can be metabolized into THN by a catalyst DTD, were examined. Lawsone and THN were found to be identically potent molecules for their affinities for selected proteins.

A Protein Structure Comparison System based on PSAML (PSAML을 이용한 단백질 구조 비고 시스템)

  • Kim Jin-Hong;Ahn Geon-Tae;Byun Sang-Hee;Lee Su-Hyun;Lee Myung-Joon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.2
    • /
    • pp.133-148
    • /
    • 2005
  • Since understanding of similarities and differences among protein structures is very important for the study of the relationship between structure and function, many protein structure comparison systems have been developed. Hut, unfortunately, these systems introduce their own protein data derived from the PDB(Protein Data Bank), which are needed in their algorithms for comparing protein structures. In addition, according to the rapid increase in the size of PDB, these systems require much more computation to search for common substructures in their databases. In this paper, we introduce a protein structure comparison system named WS4E(A Web-Based Searching Substructures of Secondary Structure Elements) based on a PSAML database which stores PSAML documents using the eXist open XML DBMS. PSAML(Protein Structure Abstraction Markup Language) is an XML representation of protein data, describing a protein structure as the secondary structures of the protein and their relationships. Using the PSAML database, the WS4E provides web services searching for common substructures among proteins represented in PSAML. In addition, to reduce the number of candidate protein structures to be compared in the PSAML database, we used topology strings which contain the spatial information of secondary structures in a protein.

Design and Implementation of Autonomic De-fragmentation for File System Aging (파일 시스템 노화를 해소하기 위한 자동적인 단편화 해결 시스템의 설계와 구현)

  • Lee, Jun-Seok;Park, Hyun-Chan;Yoo, Chuck
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.101-112
    • /
    • 2009
  • Existing techniques for defragmentation of the file system need intensive disk operation for some periods at specific time such as disk defragmentation program. In this paper, for solving this problem, we design and implement the automatic and continuous defragmentation free system by distributing the disk operation. We propose the Automatic Layout Scoring(ALS) mechanism for measuring defragmentation degree and suggest the Lazy Copy mechanism that copies the defragmented data at idle time for scattering the disk operation. We search the defragmented file by Automatic Layout Scoring mechanism and then find for empty spaces for that searched file. After lazy copy of searched fils to empty space for preventing that file from being lost, the algorithm solves the defragmentation problem by updating the I-node of that file. We implement these algorithms in Linux and evaluate them for small and defragmented file to get the layout scoring. We outperform the Linux EXT2 file system by $2.4%{\sim}10.4%$ in layout scoring evaluation. And the performance of read and write for various file size is better than the EXT2 by $1%{\sim}8.5%$ for write performance and by $1.2%{\sim}7.5%$ for read performance. We suggest this system for solving the problem of defragmentation automatically without disturbing the I/O task and manual management.

Finding a Minimum Fare Route in the Distance-Based System (거리비례제 요금부과에 따른 최소요금경로탐색)

  • Lee, Mee-Young;Baik, Nam-Cheol;Nam, Doo-Hee;Shin, Seon-Gil
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.6
    • /
    • pp.101-108
    • /
    • 2004
  • The new transit fare in the Seoul Metropolitan is basically determined based on the distance-based fare system (DBFS). The total fare in DBFS consists of three parts- (1) basic fare, (2) transfer fare, and (3) extra fare. The fixed amount of basic fare for each mode is charged when a passenger gets on a mode, and it proceeds until traveling within basic travel distance. The transfer fare may be added when a passenger switches from the present mode to another. The extra fare is imposed if the total travel distance exceeds the basic travel distance, and after that, the longer distance the more extra fare based on the extra-fare-charging rule. This study proposes an algorithm for finding minimum fare route in DBFS. This study first exploits the link-label-based searching method to enable shortest path algorithms to implement without network expansion at junction nodes in inter-modal transit networks. Moreover, the link-expansion technique is adopted in order for each mode's travel to be treated like duplicated links, which have the same start and end nodes, but different link features. In this study, therefore, some notations associated with modes can be saved, thus the existing link-based shortest path algorithm is applicable without any loss of generality. For fare calculation as next steps, a mathematical formula is proposed to embrace fare-charging process using search process of two adjacent links illustrated from the origin. A shortest path algorithm for finding a minimum fare route is derived by converting the formula as a recursive form. The implementation process of the algorithm is evaluated through a simple network test.