• Title/Summary/Keyword: 비정형분석

Search Result 484, Processing Time 0.029 seconds

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

Online Document Mining Approach to Predicting Crowdfunding Success (온라인 문서 마이닝 접근법을 활용한 크라우드펀딩의 성공여부 예측 방법)

  • Nam, Suhyeon;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.45-66
    • /
    • 2018
  • Crowdfunding has become more popular than angel funding for fundraising by venture companies. Identification of success factors may be useful for fundraisers and investors to make decisions related to crowdfunding projects and predict a priori whether they will be successful or not. Recent studies have suggested several numeric factors, such as project goals and the number of associated SNS, studying how these affect the success of crowdfunding campaigns. However, prediction of the success of crowdfunding campaigns via non-numeric and unstructured data is not yet possible, especially through analysis of structural characteristics of documents introducing projects in need of funding. Analysis of these documents is promising because they are open and inexpensive to obtain. We propose a novel method to predict the success of a crowdfunding project based on the introductory text. To test the performance of the proposed method, in our study, texts related to 1,980 actual crowdfunding projects were collected and empirically analyzed. From the text data set, the following details about the projects were collected: category, number of replies, funding goal, fundraising method, reward, number of SNS followers, number of images and videos, and miscellaneous numeric data. These factors were identified as significant input features to be used in classification algorithms. The results suggest that the proposed method outperforms other recently proposed, non-text-based methods in terms of accuracy, F-score, and elapsed time.

State-of-the-art Node of Freeform Structure (프리폼 구조의 노드 기술 현황 분석)

  • Lee, Kyoung Ju;Oh, Jin Tak;Kim, Sang Dae;Ju, Young Kyu
    • 한국방재학회:학술대회논문집
    • /
    • 2011.02a
    • /
    • pp.153-153
    • /
    • 2011
  • 현대 건축은 기능적이고 합리적이었지만 획일적이었던 박스형 건축에서 탈피해 형태와 공간에 있어서 다양한 변화를 시도하고 있다. 특별한 건축물의 실현을 위해 각 나라의 기술력은 급속한 발전을 이루었고, 보다 더 독특한 건축물에 대한 관심은 비정형 건축물에 대한 관심의 증대로 이어지고 있다. 이러한 비정형 건축물에 적합한 구조로써 프리폼(Free-Form) 구조가 있다. 프리폼 구조로 입체골조(Double Layered Structure)를 많이 사용하였으나, 최근 유리로 되어 투명하고 기하학적 모양의 건축물을 추구함에 따라 평면골조(Single Layered Structure)가 증가하고 있는 추세이다. 평면골조는 축력 지배형과 모멘트 지배형으로 분류할 수 있고 프리폼 구조의 구성 요소 중 가장 취약하고 중요한 부분은 노드이다. 본 연구에서는 프리폼 구조 중 가장 큰 관심이 고조되고 있는 평면골조 모멘트 지배형의 노드에 대한 국내외 기술 분석을 통해 향후 연구 방향성을 제시하고자 한다. 입체골조는 하나의 노드에 여러개의 부재가 3차원으로 결합되어야 하기 때문에 다른 골조 시스템에 비해 노드부가 복잡하지만, 건축물의 외관을 유리로 하여 투명하게 하고 비틀리고 구부러진 구조물에 대한 건축적 요구가 많아짐에 따라 평면골조의 인기가 높아지고 있다. 이러한 시대의 흐름에 발맞추어 건물의 구조적, 기하학적 요구를 충족시키기 위해 다양한 노드 시스템이 개발 중이며, 가해지는 하중의 특성에 따라 축력 지배형과 모멘트 지배형으로 구분하여 노드의 양상을 분류할 수 있다. 축력 지배형의 대표적인 시스템은 다이아그리드(Diagrid)이다. 축력 지배형 프리폼 구조의 노드는 전체 구조물의 하중을 축력으로 받아 모두 전달해야 하기 때문에 크기가 크고 가새가 2~4개층에 걸쳐서 설치되기 때문에 중량이다. 모멘트 지배형 노드를 갖는 프리폼 구조의 형태는 대부분 지붕 구조로써 지붕 자체의 하중만을 견디도록 설계된다. 따라서 노드부와 노드에 붙는 부재들이 가볍기 때문에 사람이 들 수 있고 노드의 크기가 작아 시공성이 좋으며 대량 생산이 가능하다는 장점이 있다. 노드의 형태는 힘의 흐름과 쓰임에 따라 다양하다. 평면골조 모멘트 지배형의 노드는 접합방식에 따라 Splice node connection과 End-Face node connection 두 가지로 분류할 수 있다. Splice node connection은 각 부재의 종축으로 노드와 구조부재 사이에 이음재를 두어 연결하고, 연결 형태에 따라 전단력을 전달할 수 있는 1~2개의 접촉면이 생긴다. 전단응력을 받는 볼트로 이음재를 이어 조립하거나 용접으로 접합할 수 있다. 대표 노드로, SBP-1, SBP-2와 POLO-1 등이 있다. End-Face node connection은 각 연결된 부재의 단부와 노드 사이의 연결면은 종축방향의 수직이고, 인장응력을 받는 볼트를 사용하거나 용접에 의해 접합할 수 있다. 대표 노드로 SBP-4, WABI-1, MERO-1(Cylinder), MERO-2(Block), MERO-4(Double Dish) 등이 있다. 본 기술 현황 분석을 통해 현재 개발된 노드를 분류하고 가장 관심이 높은 Single Layer 모멘트 지배형 노드를 비교, 분석하였다. 최근 건물의 경향을 반영한 프리폼 구조를 실현하기 위해서 필수적인 노드의 개발은 국외에서 활발히 연구되고 있지만 그 기술이 개방되어 있지 않다. 국내에서는 동대문 디자인 플라자에 새로운 노드를 적용하고 고려대학교에서 모멘트 지배형 노드를 개발하는 등 발전 가능성을 보이고 있지만 국외 사례들에 비하면 아직 초기 단계라 할 수 있다. 따라서 현장 용접을 지양하고 공장 제작하여 현장에서 조립하며, 프로젝트 별로 상이한 노드를 사용하는 것이 아닌 다양한 요구를 효과적으로 수용하는 구조 효율성을 향상시킨 노드 상세의 개발이 이루어져야 할 것이다.

  • PDF

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

A Study on Technology Trend of Power Semiconductor Packaging using Topic model (토픽모델을 이용한 전력반도체 패키징 기술 동향 연구)

  • Park, Keunseo;Choi, Gyunghyun
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.27 no.2
    • /
    • pp.53-58
    • /
    • 2020
  • Analysis of electric semiconductor packaging technology for electric vehicles was performed. Topic modeling using LDA technique was performed by collecting valid patents by deriving valid patents. It was classified into 20 topics, and the definition of technology was defined through extracted words for each topic. In order to analyze the trend of each topic, the trend of power semiconductor packaging technology was analyzed by deriving hot and cold topics by topic through regression analysis on frequency by year. The package structure technology according to the withstand voltage, the input/output-related control technology and the heat dissipation technology were derived as the hot topic technology, and the inductance reduction technology was derived as the cold topic technology.

For airline preferences of consumers Big Data Convergence Based Marketing Strategy (소비자의 항공사 선호도에 대한 빅데이터 융합 기반 마케팅 전략)

  • Chun, Yong-Ho;Lee, Seung-Joon;Park, Su-Hyeon
    • Journal of Industrial Convergence
    • /
    • v.17 no.3
    • /
    • pp.17-22
    • /
    • 2019
  • As the value of big data is recognized as important, it is possible to advance decision making by effectively introducing and improving the development and utilization of JAVA and R programs that can analyze vast amounts of existing and unstructured data to governments, public institutions and private businesses. In this study, news data was collated and analyzed through text mining techniques in order to establish marketing strategies based on consumers' airline preferences. This research is meaningful in establishing marketing strategies based on analysis results by analyzing consumers' airline preferences using high-level big data utilization program techniques for data that were difficult to obtain in the past.

A Study on the Analysis of Accident Types in Public and Private Construction Using Web Scraping and Text Mining (웹 스크래핑과 텍스트마이닝을 이용한 공공 및 민간공사의 사고유형 분석)

  • Yoon, Younggeun;Oh, Taekeun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.5
    • /
    • pp.729-734
    • /
    • 2022
  • Various studies using accident cases are being conducted to identify the causes of accidents in the construction industry, but studies on the differences between public and private construction are insignificant. In this study, web scraping and text mining technologies were applied to analyze the causes of accidents by order type. Through statistical analysis and word cloud analysis of more than 10,000 structured and unstructured data collected, it was confirmed that there was a difference in the types and causes of accidents in public and private construction. In addition, it can contribute to the establishment of safety management measures in the future by identifying the correlation between major accident causes.

Analytical Research to Identify Issues Using Online Media Related to Festivals (축제 관련 온라인 매체를 활용한 이슈 도출 분석연구)

  • Lee, Jeongwon;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.493-495
    • /
    • 2021
  • Local festivals, an intangible tourism resource, contribute to the development of the local tourism industry by developing specialized products and tourism products to develop the region. With a very high interest in festivals in each of these regions, much attention is paid to data analysis on what issues and improvements will be made after the festival. In this study, for festivals in the Danyang-gun area, where many people visit every year among festivals in various regions, the issue of negative or positive relations is visually identified by collecting and analyzing unstructured data, which is an online medium, free from the difficulty of collecting commercial data This study was conducted to derive.

  • PDF

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.