• Title/Summary/Keyword: Data Collecting

Search Result 2,254, Processing Time 0.031 seconds

Research Trends of Health Recommender Systems (HRS): Applying Citation Network Analysis and GraphSAGE (건강추천시스템(HRS) 연구 동향: 인용네트워크 분석과 GraphSAGE를 활용하여)

  • Haryeom Jang;Jeesoo You;Sung-Byung Yang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.57-84
    • /
    • 2023
  • With the development of information and communications technology (ICT) and big data technology, anyone can easily obtain and utilize vast amounts of data through the Internet. Therefore, the capability of selecting high-quality data from a large amount of information is becoming more important than the capability of just collecting them. This trend continues in academia; literature reviews, such as systematic and non-systematic reviews, have been conducted in various research fields to construct a healthy knowledge structure by selecting high-quality research from accumulated research materials. Meanwhile, after the COVID-19 pandemic, remote healthcare services, which have not been agreed upon, are allowed to a limited extent, and new healthcare services such as health recommender systems (HRS) equipped with artificial intelligence (AI) and big data technologies are in the spotlight. Although, in practice, HRS are considered one of the most important technologies to lead the future healthcare industry, literature review on HRS is relatively rare compared to other fields. In addition, although HRS are fields of convergence with a strong interdisciplinary nature, prior literature review studies have mainly applied either systematic or non-systematic review methods; hence, there are limitations in analyzing interactions or dynamic relationships with other research fields. Therefore, in this study, the overall network structure of HRS and surrounding research fields were identified using citation network analysis (CNA). Additionally, in this process, in order to address the problem that the latest papers are underestimated in their citation relationships, the GraphSAGE algorithm was applied. As a result, this study identified 'recommender system', 'wireless & IoT', 'computer vision', and 'text mining' as increasingly important research fields related to HRS research, and confirmed that 'personalization' and 'privacy' are emerging issues in HRS research. The study findings would provide both academic and practical insights into identifying the structure of the HRS research community, examining related research trends, and designing future HRS research directions.

Anura Call Monitoring Data Collection and Quality Management through Citizen Participation (시민참여형 무미목 양서류 음성신호 수집 및 품질관리 방안)

  • Kyeong-Tae Kim;Hyun-Jung Lee;Won-Kyong Song
    • Korean Journal of Environment and Ecology
    • /
    • v.38 no.3
    • /
    • pp.230-245
    • /
    • 2024
  • Amphibians, sensitive to external environmental changes, serve as bioindicator species for assessing alterations or disturbances in local ecosystems. It is known that one-third of amphibian species within the order Anura are at risk of extinction due to anthropogenic threats such as habitat destruction and fragmentation caused by urbanization. To develop effective protection and conservation strategies for anuran amphibians, species surveys that account for population characteristics are essential. This study aimed to investigate the potential for citizen participation in ecological monitoring using the mating calls of anura species. We also proposed suitable quality control measures to mitigate errors and biases, ensuring the extraction of reliable species occurrence data. The Citizen Science project was carried out nationwide from April 1 to August 31, 2022, targeting 12 species of anura amphibians in Korea. Citizens voluntarily participated in voice signal monitoring, where they listened to anura species' mating calls and recorded them using a mobile application. Additionally, we established a quality control process to extract reliable species occurrence data, categorizing errors and biases from citizen-collected data into three levels: omission, commission, and incorrect identification. A total of 6,808 observations were collected during the citizen participation in anura species vocalization monitoring. Through the quality control process, errors and biases were identified in 1,944 (28.55%) of the 6,808 data. The most common type of error was omission, accounting for 922 cases (47.43%), followed by incorrect identification with 540 cases (27.78%), and commission with 482 cases (24.79%). During the Citizen Science project, we successfully recorded the mating calls of 10 out of the 12 anuran amphibian species in Korea, excluding the Asian toads (Bufo gargarizans Cantor), Korean brown frog (Rana coreana). Difficulties in collecting mating calls were primarily attributed to challenges in observing due to population decline or discrepancies between the breeding season of non-emergent individuals and the timing of the citizen science project. This study represents the first investigation of distribution status and species emergence data collection through mating calls of anura species in Korea based on citizen participation. It can serve as a foundation for designing future bioacoustic monitoring that incorporates citizen science and quality control measures for citizen science data.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

The Design of Smart-phone Application Design for Intelligent Personalized Service in Exhibition Space (전시 공간에서 지능형 개인화 서비스를 위한 스마트 폰 어플리케이션 설계)

  • Cho, Young-Hee;Choi, Ae-Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.109-117
    • /
    • 2011
  • The exhibition industry, as technology-intensive, eco-friendly industry, contributes to regional and national development and enhancement of its image as well, if it joins cultural and tourist industry. Therefore, We need to revitalize the exhibition industry, as actively holding an exhibition event. However, to attract a number of exhibition audience, the work of enhancing audience satisfaction and awareness of value for participation should be prioritized after improving quality of service within exhibition hall. As one way to enhance the quality of service, it is thought that the way providing personalized service geared toward each audience is needed. that is, if audience avoids the complexity in exhibition space and it affords them service to enable effective time and space management, it will improve the satisfaction. All such personalized service affordable lets the audience's preference on the basis of each audience profile registered in advance online grasp. and Based on this information, it is provided with exhibition-related information suited their purpose that is the booth for the interesting audience, the shortest path to go to the booth and event via audience's smart phone. and it collects audience's reaction information, such as visiting the booth, participating the event through offered the information in this way and location information for the flow of movement, the present position so that it makes revision of existing each audience profile. After correcting the information, it extracts the individual's preference. hereunder, it provides recommend booth and event information. in other words, it provides optimal information for individual by amendment based on reaction information about recommending information built on basic profile. It provides personalized service dynamic and interactive with audience. This paper will be able to provide the most suitable information for each audience through circular and interactive structure and designed smart-phone application supportable for updating dynamic and interactive personalized service that is able to afford surrounding information in real time, as locating movement position through sensing. The proposed application collects user‘s context information and carrys information gathering function collecting the reaction about searched or provided information via sensing. and it also carrys information gathering function providing needed data for user in exhibition hall. In other words, it offers information about recommend booth of position foundation for user, location-based services of recommend booth and involves service providing detailed information for inside exhibition by using service of augmented reality, the map of whole exhibition as well. and it is also provided with SNS service that is able to keep information exchange besides intimacy. To provide this service, application is consisted of several module. first of all, it includes UNS identity module for sensing, and contain sensor information gathering module handling and collecting the perceived information through this module. Sensor information gathered like this transmits the information gathering server. and there is exhibition information interfacing with user and this module transmits to interesting information collection module through user's reaction besides interface. Interesting information collection module transmits collected information and If valid information out of the information gathering server that brings together sensing information and interesting information is sent to recommend server, the recommend server makes recommend information through inference with gathered valid information. If this server transmit by exhibition information process, exhibition information process module is provided with user by interface. Through this system it raises the dynamic, intelligent personalized service for user.

Studies on the Organo-mercury Residus in Rice Grain -3. Studies on the histopathological changes of the chief organ in rabbit influenced by PMA administration and the fate of mercury- (수도(水稻)에 처리(處理)된 유기수은제(有機水銀劑)의 잔류성(殘留性)에 관(關)한 연구(硏究) -제3보(第3報) : 가토(家兎)에 있어서 PMA투여(投與)에 의(依)한 주요장기(主要臟器)의 병리조직학적(病理組織學的) 변화(變化) 및 체내(體內)에서의 동태(動態)에 관(關)한 연구(硏究)-)

  • Lee, Dong-Suk
    • Applied Biological Chemistry
    • /
    • v.8
    • /
    • pp.101-111
    • /
    • 1967
  • Daily doses of phenylmercuric acetate arranged in $30{\gamma}\;(group\;I)$, 3{\gamma}\;(group\;II)$ and $0.3{\gamma}\;(group\;III)$ were administered respectively to rabbits for 90 days. The chief histopathological changes in the organs and the analytical data on mercury residues in the excretion and liver were as follows. 1. Kidney: In group I, severe degrees of vacuolization and cloudy swelling were occurred in the epithelial cells of proximal convoluted tubules and severe cloudy swelling and coagulative necrosis were observed in the proximal straight tubules. There were many hyaline casts in the collecting tubules. In group II, moderate degrees of vacuolization and cloudy swelling were observed in the epithelial cells of proximal convoluted tubules and moderate cloudy swelling and coagulative necrosis were encountered in the proximal straight tubules. A little numbers of hyaline casts were located in the lumen of collecting tubules. In group III, slight degree of cloudy swelling were observed in the epithelial cells of proximal convoluted and straight tubules. 2. Liver: In group I, cloudy swelling, fatty changes and coagulative necrosis were observed in the central zone of hepatic lobules. Dissociation of hepatic cell cords was encountered. Hyperplsia of hepatic cells were remarkable in group II. No Pathological changes were observed in group III. 3. Spleen: Deposition of hemosiderin pigment was prominant in group I and small amount of the pigment was observed in group II. There were no pathological changes in group III. 4. Adrenal, colon and heart: No pathological changes were detected in all 3 groups. 5. In an average about 76.5% of mercury was excreted from group I, 85.4% from group II and 79.8% from group III. 6. Mercury content in the liver was 0.0348 g in group I, 0.00378 g and 0.00066 g in group II and group III respectively. 7. In general, as to increased mercury doses the concentration of mercury accumulation in the liver became higher, how·ever, the accumulation quantity against a total amount of mercury doses showed an adverse trend. In other word, the quantity of mercury accumulation was not increased proportionately by higher dose of mercury.

  • PDF

Analysis of Hepatobiliary Disorders from a Nationwide Survey of Discharge Data in Korean Children and Adolescents (전국 퇴원자료조사를 통한 소아청소년 간담도 질환의 분석)

  • Park, Hyun-Ju;Shin, Chang-Gyun;Moon, Jin-Soo;Lee, Chong-Guk
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.12 no.1
    • /
    • pp.16-22
    • /
    • 2009
  • Purpose: To update the epidemiologic information of hepatobiliary diseases in pediatric inpatients using cross-sectional survey data throughout the Republic of Korea. Methods: Nationwide cross-sectional survey was obtained from the 85 residency training hospitals in Korea to gather the final diagnosis on discharge. The surveyed periods were from 2004 to 2006. All the reports regarding the diagnosis were based on ICD-10 system. In this study, we focused on hepatobiliary diseases. Results: A total of 826,896 cases with discharge data were collected, of which 4,151 (5.0%) hepatobiliary cases were identified; 2,385 cases (57.4%) of hepatobiliary disease were hepatitis, which was the most common hepatobiliary disease. Other diseases included congenital hepatobiliary diseases (524 cases [12.6%]) and biliary diseases (315 cases [7.6%]). The prevalence of hepatobiliary disease according to age differed. Biliary atresia was the most common hepatobiliary disease in the neonatal period, whereas the prevalence of hepatitis increased in adolescents. The total number of hepatobiliary operations was 416 cases. With the comparison of annual data, there was no definite difference in the total number of hepatobiliary cases. The average duration of hospital stay appeared to decrease gradually. Conclusion: In this study, we have summarized the recent epidemiology of hepatobiliary disorders in Korean children based on discharge data. Hepatobiliary disorders in pediatric inpatient units consisted of diverse disorders with a low prevalence, so multi-center approaches should be considered to enhance the clinical and public health outcomes. To improve this nationwide survey, a new data collecting system should be developed.

  • PDF

Intelligent VOC Analyzing System Using Opinion Mining (오피니언 마이닝을 이용한 지능형 VOC 분석시스템)

  • Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.113-125
    • /
    • 2013
  • Every company wants to know customer's requirement and makes an effort to meet them. Cause that, communication between customer and company became core competition of business and that important is increasing continuously. There are several strategies to find customer's needs, but VOC (Voice of customer) is one of most powerful communication tools and VOC gathering by several channels as telephone, post, e-mail, website and so on is so meaningful. So, almost company is gathering VOC and operating VOC system. VOC is important not only to business organization but also public organization such as government, education institute, and medical center that should drive up public service quality and customer satisfaction. Accordingly, they make a VOC gathering and analyzing System and then use for making a new product and service, and upgrade. In recent years, innovations in internet and ICT have made diverse channels such as SNS, mobile, website and call-center to collect VOC data. Although a lot of VOC data is collected through diverse channel, the proper utilization is still difficult. It is because the VOC data is made of very emotional contents by voice or text of informal style and the volume of the VOC data are so big. These unstructured big data make a difficult to store and analyze for use by human. So that, the organization need to automatic collecting, storing, classifying and analyzing system for unstructured big VOC data. This study propose an intelligent VOC analyzing system based on opinion mining to classify the unstructured VOC data automatically and determine the polarity as well as the type of VOC. And then, the basis of the VOC opinion analyzing system, called domain-oriented sentiment dictionary is created and corresponding stages are presented in detail. The experiment is conducted with 4,300 VOC data collected from a medical website to measure the effectiveness of the proposed system and utilized them to develop the sensitive data dictionary by determining the special sentiment vocabulary and their polarity value in a medical domain. Through the experiment, it comes out that positive terms such as "칭찬, 친절함, 감사, 무사히, 잘해, 감동, 미소" have high positive opinion value, and negative terms such as "퉁명, 뭡니까, 말하더군요, 무시하는" have strong negative opinion. These terms are in general use and the experiment result seems to be a high probability of opinion polarity. Furthermore, the accuracy of proposed VOC classification model has been compared and the highest classification accuracy of 77.8% is conformed at threshold with -0.50 of opinion classification of VOC. Through the proposed intelligent VOC analyzing system, the real time opinion classification and response priority of VOC can be predicted. Ultimately the positive effectiveness is expected to catch the customer complains at early stage and deal with it quickly with the lower number of staff to operate the VOC system. It can be made available human resource and time of customer service part. Above all, this study is new try to automatic analyzing the unstructured VOC data using opinion mining, and shows that the system could be used as variable to classify the positive or negative polarity of VOC opinion. It is expected to suggest practical framework of the VOC analysis to diverse use and the model can be used as real VOC analyzing system if it is implemented as system. Despite experiment results and expectation, this study has several limits. First of all, the sample data is only collected from a hospital web-site. It means that the sentimental dictionary made by sample data can be lean too much towards on that hospital and web-site. Therefore, next research has to take several channels such as call-center and SNS, and other domain like government, financial company, and education institute.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.