• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

Performance analysis of volleyball games using the social network and text mining techniques (사회네트워크분석과 텍스트마이닝을 이용한 배구 경기력 분석)

  • Kang, Byounguk;Huh, Mankyu;Choi, Seungbae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.619-630
    • /
    • 2015
  • The purpose of this study is to provide basic information to develop a game strategy plan of a team in a future by identifying the patterns of attack and pass of national men's professional volleyball teams and extracting core key words related with volleyball game performance to evaluate game performance using 'social network analysis' and 'text mining'. As for the analysis result of 'social network analysis' with the whole data, group '0' (6 players) and group '1' (11 players) were partitioned. A point of view the degree centrality and betweenness centrality in 'social network analysis' results, we can know that the group '1' more active game performance than the group '0'. The significant result for two group (win and loss) obtained by 'text mining' according to two groups ('0' and '1') obtained by 'social network analysis' showed significant difference (p-value: 0.001). As for clustering of each network, group '0' had the tendency to score points through set player D and E. In group '1', the player K had the tendency to fail if he attack through 'dig'; players C and D have a good performance through 'set' play.

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Text-dependent Speaker Verification System Over Telephone Lines (전화망을 위한 어구 종속 화자 확인 시스템)

  • 김유진;정재호
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.663-667
    • /
    • 1999
  • In this paper, we review the conventional speaker verification algorithm and present the text-dependent speaker verification system for application over telephone lines and its result of experiments. We apply blind-segmentation algorithm which segments speech into sub-word unit without linguistic information to the speaker verification system for training speaker model effectively with limited enrollment data. And the World-mode] that is created from PBW DB for score normalization is used. The experiments are presented in implemented system using database, which were constructed to simulate field test, and are shown 3.3% EER.

  • PDF

Secondary Metabolites Produced by Penicillium sp. JVF17 Isolated from Vitex rotundifolia (순비기나무(Vitex rotundifolia)로부터 유래한 Penicillium sp. JVF17가 생산하는 이차대사산물)

  • Bang, Sunghee;Shim, Sang Hee
    • Korean Journal of Pharmacognosy
    • /
    • v.50 no.2
    • /
    • pp.81-85
    • /
    • 2019
  • An endophytic fungus, Penicillium sp. JVF17, was isolated from a leaf of Vitex rotundifolia in coastal area of Jeju island. Chemical investigation of this fungal strain resulted in the isolation of four compounds, piceol (1), cyclo (${\text\tiny{L}}-Pro-{\text\tiny{L}}-Val$) (2), isochromophilone VI (3), and dicitrinin A (4). Their chemical structures were elucidated by comparison of their spectral data such as NMR and ESIMS with reported literature values.

Analysis of Social Network Service Data to Estimate Tourist Interests in Green Tour Activities

  • Rah, HyungChul;Park, Sungho;Kim, Miok;Cho, Youngbeen;Yoo, Kwan-Hee
    • International Journal of Contents
    • /
    • v.14 no.3
    • /
    • pp.27-31
    • /
    • 2018
  • Social network service (SNS) data related to green tourism were used to estimate preferred tour sites and users' interests. Keywords related with green tour activities were employed to search the SNS data. SNS data were collected from Korean blogs such as Naver and Daum from June $1^{st}$ to August $31^{st}$ between 2015 and 2017 using text-mining solution. During the study period, seven hundred and five posts were analyzed. Associated words that frequently co-occurred with keywords were classified into different categories depending on the nature of associated words. Associated words included swimming pools and camping sites (location); experience and swimming pools (attribute); and water play and culture (culture/leisure). Our data suggest that SNS users with experience of green tourism in Korea exhibited interest in green tourism with swimming pools, camping sites, experience, water play and/or culture rather than particular popular sites. Based on the findings, it is recommended that preferred facilities such as swimming pools should be provided at green tourism sites to meet the users' needs and to facilitate green tourism.

변칙 사례의 특성이 인지 갈등과 개념 변화에 미치는 영향

  • Gang, Seok Jin;Kim, Sun Ju;No, Tae Hui
    • Journal of the Korean Chemical Society
    • /
    • v.45 no.6
    • /
    • pp.589-594
    • /
    • 2001
  • In this study, the effects of the number and the presentational type of anomalous data on students'cognitive conflict and conceptual change in studying 'conservation of mass before and after combustion'were investigated. The subjects were 128 eighth graders in a co-ed middle school. A preconception test, a test of response to anomalous data, and a conception test were administered. Four types of anomalous data varying the number (one/two) and the presentational type (text/text+figure) were presented. The results indicated that students with two anomalous data showed more cognitive conflicts than those with one. However, no significant differences in the degree of cognitive conflict were found by the presentational types of anomalous data. The ANOVA results indicated that there were no significant differences by the characteristics of anomalous data in the conception test scores.

  • PDF

A Comparative Study on Data Input Design of E-business Websites (E-business 웹사이트에서의 데이터 입력디자인에 관한 비교 연구)

  • 정홍인
    • Archives of design research
    • /
    • v.17 no.1
    • /
    • pp.127-134
    • /
    • 2004
  • The purpose of this study was to compare data input interfaces used in e-business applications on the web and find optimal input design characteristics. Basic data entry tools such as a pull down menu, list, text input box, and radio button were examined by inputting data into a simulated hotel room reservation web site. Experimental results indicated that the text input box was most efficient for experts or experienced operators when there are more than four menu-items and pull down menu was considered most satisfactory, simplest, and easier to use for novices or unexperienced users. A simple list was determined to be the best for the input of binary data considering user's satisfaction, simplicity, and flexibility but radio button was evaluated best for the ease to use. Design guide lines of this study can be applied to build a usable interactive web sites and increase economic efficiency.

  • PDF

A Study on FIFA Partner Adidas of 2022 Qatar World Cup Using Big Data Analysis

  • Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.164-170
    • /
    • 2023
  • The purpose of this study is to analyze the big data of Adidas brand participating in the Qatar World Cup in 2022 as a FIFA partner to understand useful information, semantic connection and context from unstructured data. Therefore, this study collected big data generated during the World Cup from Adidas participating in sponsorship as a FIFA partner for the 2022 Qatar World Cup and collected data from major portal sites to understand its meaning. According to text mining analysis, 'Adidas' was used the most 3,340 times based on the frequency of keyword appearance, followed by 'World Cup', 'Qatar World Cup', 'Soccer', 'Lionel Messi', 'Qatar', 'FIFA', 'Korea', and 'Uniform'. In addition, the TF-IDF rankings were 'Qatar World Cup', 'Soccer', 'Lionel Messi', 'World Cup', 'Uniform', 'Qatar', 'FIFA', 'Ronaldo', 'Korea', and 'Nike'. As a result of semantic network analysis and CONCOR analysis, four groups were formed. First, Cluster A named it 'Qatar World Cup Sponsor' as words such as 'Adidas', 'Nike', 'Qatar World Cup', 'Sponsor', 'Sponsor Company', 'Marketing', 'Nation', 'Launch', 'Official', 'Commemoration' and 'National Team' were formed into groups. Second, B Cluster named it 'Group stage' as words such as 'Qatar', 'Uruguay', 'FIFA' and 'group stage' were formed into groups. Third, C Cluster named it 'Winning' as words such as 'World Cup Winning', 'Champion', 'France', 'Argentina', 'Lionel Messi', 'Advertising' and 'Photograph' formed a group. Fourth, D Cluster named it 'Official Ball' as words such as 'Official Ball', 'World Cup Official Ball', 'Soccer Ball', 'All Times', 'Al Rihla', 'Public', 'Technology' was formed into groups.

A Big Data Analysis of Yumentingzheng: Weiwenqiju as an Example (어문청정 빅데이터 분석: 위문기거 일례)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.624-626
    • /
    • 2021
  • Yumentingzheng, which records the contents of the Qing dynasty's discussions with his subjects, is an important document like the Annals of Joseon in Korea. This paper describes the method and steps for big data analysis of Yumentingzheng written in Manchu alphabet. In big data analysis of documents written in Manchu characters, there are many problems that need to be solved in advance, and research on these should be preceded. In this paper, a method of big data analysis using the R language was proposed in the stage where the text written in Manchurian characters was transliterated into Latin characters through a preliminary study to be conducted in the future. In the proposed method, Apkai method was adopted for the transliteration of Wumentingzheng, and the results of big data analysis were presented using the text of Weiwenqiju.

  • PDF