• Title/Summary/Keyword: StopWords

Search Result 107, Processing Time 0.022 seconds

Investigating Major Topics Through the Analysis of Depression-related Facebook Group Posts (페이스북 그룹 게시물 분석을 통한 우울증 관련 주제에 대한 고찰)

  • Zhu, Yongjun;Kim, Donghun;Lee, Changho;Lee, Yongjeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.4
    • /
    • pp.171-187
    • /
    • 2019
  • The study aims to analyze the posts of depression-related Facebook groups to understand major topics discussed by group users. Specifically, the purpose of the study is to identify the topics and keywords of the posts to understand what users discuss about depression. Depression is a mental disorder that is somewhat sensitive in the online community, which is characterized by accessibility, openness and anonymity. The researchers have implemented a natural language-based data analysis framework that includes components ranging from Facebook data collection to the automated extraction of topics. Using the framework, we collected and analyzed 885 posts created in the past one year from the largest Facebook depression group. To derive more complete and accurate topics, we combined both automated and manual (e.g., stop words removal, topic size determination) methods. Results indicate that users discuss a variety of topics including depression in general, human relations, mood and feeling, depression symptoms, suicide, medical references, family and etc.

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

A Study on Data Cleansing Techniques for Word Cloud Analysis of Text Data (텍스트 데이터 워드클라우드 분석을 위한 데이터 정제기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.745-750
    • /
    • 2021
  • In Big data visualization analysis of unstructured text data, raw data is mostly large-capacity, and analysis techniques cannot be applied without cleansing it unstructured. Therefore, from the collected raw data, unnecessary data is removed through the first heuristic cleansing process and Stopwords are removed through the second machine cleansing process. Then, the frequency of the vocabulary is calculated, visualized using the word cloud technique, and key issues are extracted and informationalized, and the results are analyzed. In this study, we propose a new Stopword cleansing technique using an external Stopword set (DB) in Python word cloud, and derive the problems and effectiveness of this technique through practical case analysis. And, through this verification result, the utility of the practical application of word cloud analysis applying the proposed cleansing technique is presented.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

Image Quality Assessment Model of Natural Scene Based on Normal Distribution Analysis (일반 장면의 정규분포 분석을 기반으로 한 화질 측정 모형)

  • Park, Hyung-Ju;Har, Dong-Hwan
    • Science of Emotion and Sensibility
    • /
    • v.16 no.3
    • /
    • pp.373-386
    • /
    • 2013
  • In this research, we specify the image consumers' preferred image quality ranges based on objective image quality evaluation factors and follow a method which measures preference of the natural image scenes. In other words, according to No-Reference, we select dynamic range, color, and contrast as factors of image quality measurements. For collecting sample images, we choose the preferred 200 landscapes which have over 30 recommendations by image consumers on the internet photo gallery. According to the scores of three objective factors of image quality measurements, the final expected score which means the image quality preference is measured and its total score is 100 points. In the main test, the actual image sample shows dynamic range 10 stop, LAB mean value L:54.7, A:2.96, B:-15.84, and RSC contrast 376.9. Total 200 image samples' normal distribution z value represents in dynamic range 0.21, LAB mean value L:0.15, A:0.38, B:0.13, and RSC contrast 0.08. In the standard normal distribution table, we can convert the z value as a percentage; dynamic range is 8.32%, LAB mean value is L:5.96%, A:14.8%, B:5.17%, and RSC contrast is 3.19%. And then, we convert the percentage values into the scores of 100; dynamic range is 91.68, LAB mean value is 91.36, and RSC contrast is 96.81. Therefore, we can conclude that the sample image's total mean score is 94.99 based on three objective image quality factors. Throughout our proposed image quality assessment model, we can measure the preference value of natural scenes. Also, we can specify the preferred image quality representation ranges and measure the expected image quality preference.

  • PDF

A Study on development of Innovational Cluster for Knowledge Management in Busan (부산지역 지식경영을 위한 혁신클러스터 모델 구축에 관한 연구)

  • Jeong, Hyung-Il;Bang, Kwuen-Soo;Kim, Jong-Duk
    • Management & Information Systems Review
    • /
    • v.29 no.4
    • /
    • pp.169-186
    • /
    • 2010
  • This study aims to reveal the ways to sharpen the edges of Korean companies through the relativity analysis between knowledge management and innovational cluster in environmental changes in resent Busan. That is, according to the knowledge management approach, the methods and directions of strengthening industrial competition were established, while the strategy of innovational clusters was suggested as a way of expanding and encouraging knowledge management. The key words of innovational cluster are in this research are the framework of Cluster theory, the importance of innovational cluster, and the change of managerial strategy paradigm. This study provide the several implication for the practice of knowledge management and the researchers. Based on these theories of knowledge management and industrial clusters, their close relationships were analyzed. As a result, industrial clusters were found to be effectively utilized to enlarge and deepen knowledge management. In addition, this suggests the efficient operation guideline of knowledge management. this study indicates both knowledge and innovational cluster should be operated and handled together in the managerial strategy. but this research has limitations in generaling the study result because it collects data from local firms only in Busan.

  • PDF

The Principles of Fractal Geometry and Its Applications for Pulp & Paper Industry (펄프·제지 산업에서의 프랙탈 기하 원리 및 그 응용)

  • Ko, Young Chan;Park, Jong-Moon;Shin, Soo-Jung
    • Journal of Korea Technical Association of The Pulp and Paper Industry
    • /
    • v.47 no.4
    • /
    • pp.177-186
    • /
    • 2015
  • Until Mandelbrot introduced the concept of fractal geometry and fractal dimension in early 1970s, it has been generally considered that the geometry of nature should be too complex and irregular to describe analytically or mathematically. Here fractal dimension indicates a non-integer number such as 0.5, 1.5, or 2.5 instead of only integers used in the traditional Euclidean geometry, i.e., 0 for point, 1 for line, 2 for area, and 3 for volume. Since his pioneering work on fractal geometry, the geometry of nature has been found fractal. Mandelbrot introduced the concept of fractal geometry. For example, fractal geometry has been found in mountains, coastlines, clouds, lightning, earthquakes, turbulence, trees and plants. Even human organs are found to be fractal. This suggests that the fractal geometry should be the law for Nature rather than the exception. Fractal geometry has a hierarchical structure consisting of the elements having the same shape, but the different sizes from the largest to the smallest. Thus, fractal geometry can be characterized by the similarity and hierarchical structure. A process requires driving energy to proceed. Otherwise, the process would stop. A hierarchical structure is considered ideal to generate such driving force. This explains why natural process or phenomena such as lightning, thunderstorm, earth quakes, and turbulence has fractal geometry. It would not be surprising to find that even the human organs such as the brain, the lung, and the circulatory system have fractal geometry. Until now, a normal frequency distribution (or Gaussian frequency distribution) has been commonly used to describe frequencies of an object. However, a log-normal frequency distribution has been most frequently found in natural phenomena and chemical processes such as corrosion and coagulation. It can be mathematically shown that if an object has a log-normal frequency distribution, it has fractal geometry. In other words, these two go hand in hand. Lastly, applying fractal principles is discussed, focusing on pulp and paper industry. The principles should be applicable to characterizing surface roughness, particle size distributions, and formation. They should be also applicable to wet-end chemistry for ideal mixing, felt and fabric design for papermaking process, dewatering, drying, creping, and post-converting such as laminating, embossing, and printing.

Customized Configuration with Template and Options (맞춤구성을 위한 템플릿과 Option 기반의 추론)

  • 이현정;이재규
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.1
    • /
    • pp.119-139
    • /
    • 2002
  • In electronic catalogs, each item is represented as an independent unit while the parts of the item can be composed of a higher level of functionality. Thus, the search for this kind of product database is limited to the retrieval of most similar standard commodities. However, many industrial products need to configure optional parts to fulfill the required specifications. Since there are many paths in finding the required specifications, we need to develop a search system via the configuration process. In this system, we adopt a two-phased approach. The first phase finds the most similar template, and the second phase adjusts the template specifications toward the required set of specifications by the Constraint and Rule Satisfaction Problem approach. There is no guarantee that the most similar template can find the most desirable configuration. The search system needs backtracking capability, so the search can stop at a satisfied local optimal satisfaction. This framework is applied to the configuration of computers and peripherals. Template-based reasoning is basically the same as case-based reasoning. The required set of specifications is represented by a list of criteria, and matched with the product specifications to find the closest ones. To measure the distance, we develop a thesaurus of values, which can identify the meaning of numbers, symbols, and words. With this configuration, the performance of the search by configuration algorithm is evaluated in terms of feasibility and admissibility.

  • PDF

Port's Successful Global Supply Chain Strategies - Focusing on the case of Dubai port - (항만의 성공적인 글로벌 공급사슬 전략 - 두바이항의 사례를 중심으로 -)

  • Han, Chul-Hwan
    • Journal of Korea Port Economic Association
    • /
    • v.24 no.2
    • /
    • pp.175-192
    • /
    • 2008
  • Today's individual firms no longer compete as solely autonomous entities, but rather as supply chain. As such the competitive position of a port is not only determined by its internal strengths but also it is also affected by its links in a global supply chin. In other words, port competitiveness is becoming increasingly dependent on external coordination and control of the whole supply chain. The main purpose of this paper is to examine how a port embeds itself into supply chain in order to strengthen its competitive position by focusing on Dubai port case. This paper found that Dubai port used three phases-insertion, integration and dominance-as a strategies for how it can embedded into global supply chain successfully. Dubai's global supply chain strategies give some implications for the further development of the Port of Gwangyang. First, the Port of Gwangyang should fully utilize symbiotic relationship with Gwangyang free Economic Zone. Second, the integration between Korea Container Terminal Authority and GYFEZ can be recommended for fast decision-making and providing a one-stop-service. Finally, Gwangyang should pursue an aggressive supply chain strategy, aims at dominance in the regional port network through port alliance with small and medium ports in neighboring area.

  • PDF

Flow Analysis and Experimental Study of Globe Valve for Precision Control (정밀 제어 글로브 밸브의 유동해석 및 실험적 연구)

  • Choi, Ji-Won;Park, Sun-Hyung;Lee, Kwon-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.734-739
    • /
    • 2016
  • The globe valve is a linear motion valve that is designed primarily to stop, start, and regulate flow. The disk of a globe valve can be removed totally from the flow path or it can completely close the flow path. In this study, numerical analysis using ANSYS-CFX was initially performed to predict the flow coefficient and build a prototype model of a globe valve. The flow coefficient is the volume of water at $15.6^{\circ}C$ that will flow per minute through a valve with a pressure drop of 1 psi across the valve. In other words, it is an important factor for determining the size of the valve. From the analysis results, the fluid flux of water and flow coefficient of the valve were extracted. From the numerical results, a prototype of ultra-fine precision control valve, which can regulate the fluid flow of range 0 ~ 0.1 gal per min, was developed. The experimental results were compared with the numerical results using the flow coefficient ($C_v$) graph. From the comparative results, the flow coefficient ($C_v$) error percentage between the numerical and experimental results was very low, which is acceptable, proving that the proposed prototype model is convincing. In addition, it is possible to predict the flow coefficient using only numerical analysis.