• Title/Summary/Keyword: associated words

Search Result 299, Processing Time 0.028 seconds

Automatic Construction of Alternative Word Candidates to Improve Patent Information Search Quality (특허 정보 검색 품질 향상을 위한 대체어 후보 자동 생성 방법)

  • Baik, Jong-Bum;Kim, Seong-Min;Lee, Soo-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.861-873
    • /
    • 2009
  • There are many reasons that fail to get appropriate information in information retrieval. Allomorph is one of the reasons for search failure due to keyword mismatch. This research proposes a method to construct alternative word candidates automatically in order to minimize search failure due to keyword mismatch. Assuming that two words have similar meaning if they have similar co-occurrence words, the proposed method uses the concept of concentration, association word set, cosine similarity between association word sets and a filtering technique using confidence. Performance of the proposed method is evaluated using a manually extracted alternative list. Evaluation results show that the proposed method outperforms the context window overlapping in precision and recall.

An Exploratory Study of Happiness and Unhappiness Among Koreans based on Text Mining Techniques (텍스트마이닝 기법을 활용한 한국인의 행복과 불행 탐색연구)

  • Park, Sanghyeon;Do, Kanghyuk;Kim, Hakyeong;Park, Gaeun;Yun, Jinhyeok;Kim, Kyungil
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.10-27
    • /
    • 2018
  • The purpose of this study is to explore the meaning of happiness and unhappiness in Korean society through text mining analysis. Similar words with keywords(happiness/unhappiness) from online news portal are extracted using Word2Vec and TF-IDF method. We also use the K-LIWC dictionary to perform the sentiment analysis of words associated with happiness and unhappiness. In TF-IDF analysis, happiness and unhappiness are highly related to social factors and social issues of the year. In Word2Vec analysis, 'Hope' has been similar with happiness for six years. In K-LIWC analysis, 'money/financial issues', 'school', 'communication' is highly related with happiness and unhappiness. In addition, 'physical condition and symptom' is highly related to unhappiness. Implications, limitations, and suggestions for future research are also discussed.

Some Characteristics of Hanmal and Hangul from the viewpoint of Processing Hangul Information on Computers

  • Kim, Kyong-Sok
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.456-463
    • /
    • 1996
  • In this paper, we discussed three cases to see the effects of the characteristics of Hangul writing system. In applications such as computer Hangul shorthands for ordinary people and pushbuttons with Hangul characters engraved, we found that there is much advantage in using Hangul. In case of Hangul Transliteration, we discussed some problems which are related with the characteristics of Hangul writing system. Shorthands use 3-set keyboards in England, America, and Korea. We saw how ordinary people can do computer Hangul shorthands, whereas only experts can do computer shorthands in other countries. Specifically, the facts that 1) Hangul characters are grouped into syllables (syllabic blocks) and that 2) there is already a 3-set Hangul keyboard for ordinary people allow ordinary people to do computer Hangul shorthands without taking special training as with English shorthands. This study was done by the author under the codename of 'Sejong 89'. In contrast like QWERTY or DVORAK, a 2-set Hangul keyboard cannot be used for shorthands. In case of English pushbuttons, one digit is associated with only one character. However, by engraving only syllable-initial characters on the phone pushbuttons, we can associate one Hangul "syllable" with one digit. Therefore, for a given number of digits, we can associate longer words or more meaningful words in Hangul than in English. We discussed the problems of the Hangul Transliteration system proposed by South Korea and suggested their solutions, if available. 1) We are incorrectly using the framework of transcription for transliteration. To solve the problem, the author suggests that a) we include all complex characters in the transliteration table, and that b) we specify syllable-initial and -final characters separately in the table. 2) The proposed system cannot represent independent characters and incomplete syllables. 3) The proposed system cannot distinguish between syllable-initial and -final characters.

  • PDF

Study on a Creative Fashion Design Development Process through Idea Classification (아이디어 발상 유형화를 통한 창의적 패션 디자인 전개 프로세스 연구)

  • Kim, Yoon-Kyoung;Park, Hye-Won
    • Journal of the Korean Society of Costume
    • /
    • v.60 no.9
    • /
    • pp.95-105
    • /
    • 2010
  • The purpose of this study is in allowing thinking about the design development process which is more towards the visual and perceptional aspects related to the form structure by more diverse methods by typology of idea generation. To accomplish such goal, researches in the psychology, pedagogy, engineering, and consilient studies as well as related precedent researches and reference data in architecture, promotion, industrial design, and other art fields and fashion designs are collected and analyzed to see the study trend. In addition, in the content analysis method based on such, the idea generation was classified into types in consideration of relevancy, usefulness, and suitability with fashion. First, a concentrated thinking of a limited space is a method of leading an optimal design by focusing on solving the cause of a problem within a space which generates the problem. Second, plan thinking per section of structure decomposition is a method of dismantling the design problems per organization, thinking type, factor, and characteristic into sub-modules to re-interpret and re-organize the problems in various aspects. Third, an associated thinking through interpreting relationships among vocabularies is a method of selecting the marginal languages that allow a person to come up with concrete forms and the key words related to fashion to import the characteristics and attributes of the marginal languages and thematic relationship between the two terms to search the relevancy. Lastly, the free integrated thinking of language extension is a method of groping integration between other fields and fashion by free integration among the extended terms by extending the vocabularies through inferring metaphorical expressions founded upon individual's memories or knowledge concepts regarding theme words that do not allow concrete forms to come up.

Applying Randomization Tests to Collocation Analyses in Large Corpora (언어의 공기관계 분석을 위한 임의화검증의 응용)

  • Yang Kyung-Sook;Kim HeeYoung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.583-595
    • /
    • 2005
  • Contingency tables are used to compare counts of n-grams to determine if the n-gram is a true collocation, meaning that the words that make up the n-gram are highly associated in the text. Some statistical methods for identifying collocation are used. They are Kulczinsky coefficient, Ochiai coefficient, Frager and McGowan coefficient, Yule coefficient, mutual information, and chi-square, and so on. But the main problem is that these measures are based ell the assumption of a nor-mal or approximately normal distribution of the variables being sampled. While this assumption is valid in most instances, it is not valid when comparing the rates of occurrence of rare events, and texts are composed mostly of rare events. In this paper we have simply reviewed some statistics about testing association of two words. Some randomization tests to evaluate the significance level in analyzing collocation in large corpora are proposed. A related graph can be used to compare different lest statistics that ran be used to analyze the same contingency table.

An Exploratory Analysis of Online Discussion of Library and Information Science Professionals in India using Text Mining

  • Garg, Mohit;Kanjilal, Uma
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.3
    • /
    • pp.40-56
    • /
    • 2022
  • This paper aims to implement a topic modeling technique for extracting the topics of online discussions among library professionals in India. Topic modeling is the established text mining technique popularly used for modeling text data from Twitter, Facebook, Yelp, and other social media platforms. The present study modeled the online discussions of Library and Information Science (LIS) professionals posted on Lis Links. The text data of these posts was extracted using a program written in R using the package "rvest." The data was pre-processed to remove blank posts, posts having text in non-English fonts, punctuation, URLs, emails, etc. Topic modeling with the Latent Dirichlet Allocation algorithm was applied to the pre-processed corpus to identify each topic associated with the posts. The frequency analysis of the occurrence of words in the text corpus was calculated. The results found that the most frequent words included: library, information, university, librarian, book, professional, science, research, paper, question, answer, and management. This shows that the LIS professionals actively discussed exams, research, and library operations on the forum of Lis Links. The study categorized the online discussions on Lis Links into ten topics, i.e. "LIS Recruitment," "LIS Issues," "Other Discussion," "LIS Education," "LIS Research," "LIS Exams," "General Information related to Library," "LIS Admission," "Library and Professional Activities," and "Information Communication Technology (ICT)." It was found that the majority of the posts belonged to "LIS Exam," followed by "Other Discussions" and "General Information related to the Library."

Difference in Lung Functions according to Genetic Polymorphism of Tobacco Substance Metabolizing Enzymes of Korean Smokers (한국인 흡연자들의 담배 물질 대사 효소의 유전자 다형성에 따른 폐기능 차이)

  • Kang, Yun-Jung
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.5
    • /
    • pp.134-142
    • /
    • 2020
  • This study aimed to determine whether there was a difference in lung functions of smokers according to the presence of carcinogenic genetic-metabolizing enzymes by comparing the results of lung functions and the presence of genetic metabolizing enzymes that metabolize tobacco substances. To achieve this, 31 smokers without no illness and no psychiatric history were selected (28 males and 3 females); they were aged 20 to 27 years and were physically and mentally healthy students attending K University. Their lung functions were measured, and gene polymorphisms of cytochrome P-450 1A1 (CYP1A1) related to metabolic activation of tobacco components and gene polymorphism of tumor protein 53 (TP53) related to lung cancer were analyzed. As a result, the mean values of lung function of TT and Arg / Arg without genetic mutations were the highest, and ANOVA analysis of CYP1A1 and lung functions showed that the P-value of FVC was 0.049, which was different between groups. In other words, there is no high mutation in Cytochrome P-450 1A1 (CYP1A1) gene, which is associated with the metabolic activation of tobacco components. In other words, In the absence of the mutant Cytochrome P-450 1A1 (CYP1A1) gene, which is associated with the metabolic activation of tobacco components, the value of FVC was high.

A Meta-analysis of Related Factors Depression of Korea University Student (한국 대학생의 우울 관련 요인에 대한 메타분석)

  • Jeon, Byoung-Jin;Song, Bo-Kyong;Ko, Koung-Min;Kim, Ji-Yoon;Park, Sang-Eun;Yu, Yi-Seul;Lee, Du-Ri;Choi, Young-Ju
    • The Journal of Korean society of community based occupational therapy
    • /
    • v.5 no.2
    • /
    • pp.43-55
    • /
    • 2015
  • Objective : This study was a meta-analysis of previous studies to examine the integration of related factors depression University students of Korea, and to determine the relative importance among the relevant factors based on it. Methods : 2000-2014 papers posted on the National Science and Technology Information Center (NDSL), Nurimedia (DBpia), Academic Research Information Service (RISS), Korea Research Information(KISS), provide the text of the Library of Congress were collected using the service. The Key words a 'University Student', 'Depression', 'Depression Factors' was used. Used the Down & Black level, evidence-based checklist was developed by the research (1998) (checklist) had analyzed the selected document metadata to assess the quality. Results : 47-studies selected research groups are divided into five factors(self-esteem, suicidal ideation, positive thinking, stresses, Internet and smartphone addiction). Using meta-analysis, we analyzed the effect sizes, statistical heterogeneity and publication amenities. As a result, the self-esteem of the five factors were not found heterogeneity. Effect size is a self-esteem and suicidal ideation "large effect size", positive thinking and stress "medium effect size", internet and smart phone addiction"small effect size". Conclusion : Self-esteem and suicidal ideation are among the factors associated with depression in University students of Korea was found that the most relevant. It identified the factors associated with depression in college students, and could utilized as basis for the prevention of depression.

The Influence of an Aesthetically Appealing Product on the Using Time, Flow, and Recall Memory (제품의 심미성이 제품의 사용시간, 몰입도, 정보 기억도에 미치는 영향)

  • Lee, Jae-Hwa;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.11 no.2
    • /
    • pp.257-270
    • /
    • 2008
  • Three experiments were carried out in order to determine whether users have longer using time, better recall of product information, and flow in an aesthetically appealing product (media player) in products offering good usability. For the experiment, fourteen emotional words were employed which were made up of 8 aesthetic and 6 usability words. In a preliminary experiment, the subjects freely used three media players and selected emotional words by a 7-point likert scale to distinguish a group of similar usability value and another group contrary to the other in aesthetic and usability value. (N=18) In the main experiment, it was hypothesized that users use more and have more flow and recalled information in the case of the aesthetically appealing product. Therefore, in the main experiment, we measured how much time subjects spent using the product and asked them to make an assumption regarding the time spent by the group that has the same usability value. We then examined the time they spent and the gap between the actual and estimated time. We also calculated the amount of menu information recalled via a questionnaire. In the last experiment, we selected the group of products contrary to each other in aesthetic and usability value and assessed the differences in using time, recall of product information, and flow. (N=18) The empirical results provide evidence that aesthetically appealing products are associated with greater flow and recall of product information than other products, thus supporting the hypothesis. In addition, it was found that there is a positive correlation between the aesthetically appealing product and flow index as well as with recalled information.

  • PDF

A Study on Creation and Development of Folksonomy Tags on LibraryThing (폭소노미 태그의 생성과 성장에 관한 연구 - LibraryThing을 중심으로 -)

  • Kim, Dong-Suk;Chung, Yeon-Kyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.203-230
    • /
    • 2010
  • This study analyzed the development and growth of folksonomy by examining tags associated with 40 bestsellers on LibraryThing.com in 6-month intervals. It was found that tag values do not decrease but grow in terms of quantity and quality. Accordingly, we examined the major significances of the tags and their potential utilization as an expression of subjects. Our findings were as follows. First, the motivations for tagging can be categorized into personal information for search purposes, self-fulfillment such as sense of achievement, display of emotion and sharing of one's experience with others, or an altruistic objective that emphasizes sociality with a desire that one's actions might provide social benefits. According to our analysis, 74.12% of tags had a social motivation. Second, the total number of tags and the frequency of usage increased with time. Third, the categories that showed a high increase in tag usage were dates of publication and reading, key words, main characters, and book reviews. Tags related to subjects had the highest ratio. Fourth, among Library of Congress Subject Headings (LCSH), multiple genres, key words and main characters were assigned to books, and specific key words and other properties were added as time progressed. There was also a slight increase in the number of tags consistent with LCSH. Fifth, we found that key tags could serve as a compilation of terms that reflects the knowledge base of the corresponding era. Thus, folksonomy should be continuously monitored for its quantitative and qualitative development of the tags to make improvements on its formative disadvantages, and identify internal semantic significance, be actively utilized in conjunction with taxonomy as a flexible compilation of terms that incorporate the history of a specific era.