• Title/Summary/Keyword: fastText

Search Result 172, Processing Time 0.03 seconds

Time and Space Efficient Search with Suffix Arrays (접미사 배열을 이용한 시간과 공간 효율적인 검색)

  • Choi, Yong-Wook;Sim, Jeong-Seop;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.5
    • /
    • pp.260-267
    • /
    • 2005
  • To search efficiently a text T of length n for a pattern P over an alphabet 5, suffix trees and suffix arrays are widely used. In case of a large text, suffix arrays are preferred to suffix trees because suffix ways take less space than suffix trees. Recently, O(${\mid}P{\mid}{\codt}{\mid}{\Sigma}{\mid}$-time and O(${\mid}P{\mid}P{\cdot}log{\mid}{\Sigma}{\mid}$)-time search algorithms in suffix ways were developed. In this paper we present time and space efficient search algorithms in suffix arrays. One algorithm runs in O(${\mid}P{\mid}$) time using O($n{\cdot}{\mid}{\Sigma}{\mid}$)-bits space, and the other runs in O($n{\cdot}{\mid}{\Sigma}{\mid}$ time using O($nlog{\mid}{\Sigma}{\mid}+{\mid}{\Sigma}{\mid}{\cdot}$nlog log n/logn)-bits space, which is more space efficient and still fast. Experiments show that our algorithms are efficient in both time and space when compared to previous algorithms.

Synthesis of Curcumin Glycosides with Enhanced Anticancer Properties Using One-Pot Multienzyme Glycosylation Technique

  • Gurung, Rit Bahadur;Gong, So Youn;Dhakal, Dipesh;Le, Tuoi Thi;Jung, Na Rae;Jung, Hye Jin;Oh, Tae Jin;Sohng, Jae Kyung
    • Journal of Microbiology and Biotechnology
    • /
    • v.27 no.9
    • /
    • pp.1639-1648
    • /
    • 2017
  • Curcumin is a natural polyphenolic compound, widely acclaimed for its antioxidant, anti-inflammatory, antibacterial, and anticancerous properties. However, its use has been limited due to its low-aqueous solubility and poor bioavailability, rapid clearance, and low cellular uptake. In order to assess the effect of glycosylation on the pharmacological properties of curcumin, one-pot multienzyme (OPME) chemoenzymatic glycosylation reactions with UDP-${\alpha}-{\text\tiny{D}}$-glucose or UDP-${\alpha}-{\text\tiny{D}}$-2-deoxyglucose as donor substrate were employed. The result indicated significant conversion of curcumin to its glycosylated derivatives: curcumin 4'-O-${\beta}$-glucoside, curcumin 4',4"-di-O-${\beta}$-glucoside, curcumin 4'-O-${\beta}$-2-deoxyglucoside, and curcumin 4',4"-di-O-${\beta}$-2-deoxyglucoside. The products were characterized by ultra-fast performance liquid chromatography, high-resolution quadruple-time-of-flight electrospray ionization-mass spectrometry, and NMR analyses. All the products showed improved water solubility and comparable antibacterial activities. Additionally, the curcumin 4'-O-${\beta}$-glucoside and curcumin 4'-O-${\beta}$-2-deoxyglucoside showed enhanced anticancer activities compared with the parent aglycone and diglycoside derivatives. This result indicates that glycosylation can be an effective approach for enhancing the pharmaceutical properties of different natural products, such as curcumin.

A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment (조세심판 문서 검색 효율 향상 모델에 관한 연구)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.41-47
    • /
    • 2019
  • It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.

Could European Media Freedom Act solve the problems of traditional media's content in the online sphere? (온라인 영역에서 유럽 미디어 자유법의 전통 미디어 콘텐츠 문제 해결 가능성에 관한 연구)

  • Gosztonyi, Gergely;Lendvai, Ferenc Gergely
    • Informatization Policy
    • /
    • v.31 no.1
    • /
    • pp.72-82
    • /
    • 2024
  • The presence of traditional media content on online platforms is one of the critical issues nowadays, and Article 17 of the European Media Freedom Act (EMFA) seeks to regulate this. However, it can be seen that the current version of the text is not yet free of flaws: both its harmonisation with the Digital Services Regulation, its use of definitions and the media fast track mechanism it contains would require careful legislative scrutiny before the final text is adopted. The article examines if the self-declaration procedure envisaged by the EMFA would create a loophole for rogue media actors and bring confusion at both the European and horizontal levels or if it would fit the original goal of the EMFA, which is to improve the functioning of the internal European media market and to reinforce the independent media.

A Study on Analyzing the Impact Factors of Cell Broadcast Service Considering Socially Vulnerable Groups - Focus on Comparative Analysis between the Elderly and the General Population (사회적 취약계층을 고려한 재난방송문자 서비스 영향 요인 분석 - 고령자와 일반인 그룹의 비교분석을 중심으로)

  • Keunoh Park
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.2
    • /
    • pp.383-394
    • /
    • 2023
  • Purpose : This study focuses on improvements in cell broadcast service(CBS) with consideration of socially vulnerable groups, and identifies service factors necessary for elderly people in particular. Method : Multiple regression analysis was applied to test overall satisfaction with cell broadcast service as the dependent variable and each of the service factors as the independent variable. Result : The results showed that fast delivery had the greatest effect on overall satisfaction with cell broadcast service in both groups, followed by the delivery of sufficient content for the elderly group and system quality for the non-elderly group. The results demonstrated that the elderly group cared more about information content, while the non-elderly group considered functions such as system quality and text message transmission criteria more important. Conclusion : Elderly people consider the delivery of sufficient information important as well as fast delivery, which suggests that it is necessary when sending cell broadcast service to give consideration to the characteristics of elderly people, as they tend to have weaker understanding and thinking abilities than the non-elderly.

Research on text mining based malware analysis technology using string information (문자열 정보를 활용한 텍스트 마이닝 기반 악성코드 분석 기술 연구)

  • Ha, Ji-hee;Lee, Tae-jin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.45-55
    • /
    • 2020
  • Due to the development of information and communication technology, the number of new / variant malicious codes is increasing rapidly every year, and various types of malicious codes are spreading due to the development of Internet of things and cloud computing technology. In this paper, we propose a malware analysis method based on string information that can be used regardless of operating system environment and represents library call information related to malicious behavior. Attackers can easily create malware using existing code or by using automated authoring tools, and the generated malware operates in a similar way to existing malware. Since most of the strings that can be extracted from malicious code are composed of information closely related to malicious behavior, it is processed by weighting data features using text mining based method to extract them as effective features for malware analysis. Based on the processed data, a model is constructed using various machine learning algorithms to perform experiments on detection of malicious status and classification of malicious groups. Data has been compared and verified against all files used on Windows and Linux operating systems. The accuracy of malicious detection is about 93.5%, the accuracy of group classification is about 90%. The proposed technique has a wide range of applications because it is relatively simple, fast, and operating system independent as a single model because it is not necessary to build a model for each group when classifying malicious groups. In addition, since the string information is extracted through static analysis, it can be processed faster than the analysis method that directly executes the code.

A Method for Detecting Event-Location based on Similar Keyword Extraction in Tweet Text (트윗 텍스트의 유사 키워드 추출을 통한 이벤트 지역 탐지 기법)

  • Yim, Junyeob;Ha, Hyunsoo;Hwang, Byung-Yeon
    • Spatial Information Research
    • /
    • v.23 no.5
    • /
    • pp.1-7
    • /
    • 2015
  • Twitter has the fast propagation and diffusion of information compare to other SNS. Therefore, many researches about detecting real-time event using twitter are progressing. Twitter real-time event detecting system assumes every twitter user as a sensor and analyzes their written tweet in order to detect the event. Researches that are related to this twitter have already obtained good results but confronted the limits because of some problems. Especially, many existing researches are using the method that can trace an event location by using GPS coordinate. However, it can be suggested a definite limitation through the present user's skeptical responses about making personal location information public. Therefore, this paper suggests the method that traces the location information in tweet contents text without using the provided location information from twitter. Associated words were grouped by using the keyword that extracted in tweet contents text. The place that the events have occurred and whether the events have surely occurred are detected by this experiment using this algorithm. Furthermore, this experiment demonstrated the necessity of the suggested methods by showing faster detection compare to the other existing media.

Text Classification Using Heterogeneous Knowledge Distillation

  • Yu, Yerin;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.29-41
    • /
    • 2022
  • Recently, with the development of deep learning technology, a variety of huge models with excellent performance have been devised by pre-training massive amounts of text data. However, in order for such a model to be applied to real-life services, the inference speed must be fast and the amount of computation must be low, so the technology for model compression is attracting attention. Knowledge distillation, a representative model compression, is attracting attention as it can be used in a variety of ways as a method of transferring the knowledge already learned by the teacher model to a relatively small-sized student model. However, knowledge distillation has a limitation in that it is difficult to solve problems with low similarity to previously learned data because only knowledge necessary for solving a given problem is learned in a teacher model and knowledge distillation to a student model is performed from the same point of view. Therefore, we propose a heterogeneous knowledge distillation method in which the teacher model learns a higher-level concept rather than the knowledge required for the task that the student model needs to solve, and the teacher model distills this knowledge to the student model. In addition, through classification experiments on about 18,000 documents, we confirmed that the heterogeneous knowledge distillation method showed superior performance in all aspects of learning efficiency and accuracy compared to the traditional knowledge distillation.

Development of Expert Systems using Automatic Knowledge Acquisition and Composite Knowledge Expression Mechanism

  • Kim, Jin-Sung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.447-450
    • /
    • 2003
  • In this research, we propose an automatic knowledge acquisition and composite knowledge expression mechanism based on machine learning and relational database. Most of traditional approaches to develop a knowledge base and inference engine of expert systems were based on IF-THEN rules, AND-OR graph, Semantic networks, and Frame separately. However, there are some limitations such as automatic knowledge acquisition, complicate knowledge expression, expansibility of knowledge base, speed of inference, and hierarchies among rules. To overcome these limitations, many of researchers tried to develop an automatic knowledge acquisition, composite knowledge expression, and fast inference method. As a result, the adaptability of the expert systems was improved rapidly. Nonetheless, they didn't suggest a hybrid and generalized solution to support the entire process of development of expert systems. Our proposed mechanism has five advantages empirically. First, it could extract the specific domain knowledge from incomplete database based on machine learning algorithm. Second, this mechanism could reduce the number of rules efficiently according to the rule extraction mechanism used in machine learning. Third, our proposed mechanism could expand the knowledge base unlimitedly by using relational database. Fourth, the backward inference engine developed in this study, could manipulate the knowledge base stored in relational database rapidly. Therefore, the speed of inference is faster than traditional text -oriented inference mechanism. Fifth, our composite knowledge expression mechanism could reflect the traditional knowledge expression method such as IF-THEN rules, AND-OR graph, and Relationship matrix simultaneously. To validate the inference ability of our system, a real data set was adopted from a clinical diagnosis classifying the dermatology disease.

  • PDF

Reconstitution of Compact Binary trie for the Efficient Retrieval of Hangul UniCODE Text (한글 유니코드 텍스트의 효율적인 탐색을 위한 컴팩트 바이너리 트라이의 재구성)

  • Jung, Kyu Cheol;Lee, Jong Chan;Park, Sang Joon;Kim, Byung Gi
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.5 no.2
    • /
    • pp.21-28
    • /
    • 2009
  • This paper proposes RCBT(Reduced Compact Binary trie) to correct faults of CBT (Compact Binary trie). First, in the case of CBT, a compact structure was tried for the first time, but as the amount of data was increasing, that of inputted data gained and much difficulty was experienced in insertion due to the dummy nodes used in balancing trees. On the other hand, if the HCBT realized hierarchically, given certain depth to prevent the map from increasing onthe right, reached the depth, the method for making new trees and connecting to them was used. Eventually, fast progress could be made in the inputting and searching speed, but this had a disadvantage of the storage space becoming bigger because of the use of dummy nods like CBT and of many tree links. In the case of RCBT in this thesis, a capacity is increased by about 60% by completely cutting down dummy nods.