• Title/Summary/Keyword: web crawler

Search Result 102, Processing Time 0.025 seconds

Changes in public recognition of parabens on twitter and the research status of parabens related to toothpaste (트위터(twitter)에서의 파라벤(parabens) 관련 대중의 인식 변화와 치약내 파라벤에 대한 연구 현황)

  • Oh, Hyo-Jung;Jeon, Jae-Gyu
    • Journal of Korean Academy of Oral Health
    • /
    • v.41 no.2
    • /
    • pp.154-161
    • /
    • 2017
  • Objectives: The purpose of this study was to investigate changes in public recognition of parabens on Twitter and the research status of parabens related to toothpaste. Methods: Tweet information between 2010 and October 2016 was collected by an automatic web crawler and examined according to tweet frequency, key words (2012-October 2016), and issue tweet detection analyses to reveal changes in public recognition of parabens on Twitter. To investigate the research status of parabens related to toothpaste, queries such as "paraben," "paraben and toxicity," "paraben and (toothpastes or dentifrices)," and "paraben and (toothpastes or dentifrices) and toxicity" were used. Results: The number of tweets concerning parabens sharply increased when parabens in toothpaste emerged as a social issue (October 2014), and decreased from 2015 onward. However, toothpaste and its related terms were continuously included in the core key words extracted from tweets from 2015. They were not included in key words before 2014, indicating that the emergence of parabens in toothpaste as a social issue plays an important role in public recognition of parabens in toothpaste. The issue tweet analysis also confirmed the change in public recognition of parabens in toothpaste. Despite the expansion of public recognition of parabens in toothpaste, there are only seven research articles on the topic in PubMed. Conclusions: The general public clearly recognized parabens in toothpaste after emergence of parabens in toothpaste as a social issue. Nevertheless, the scientific information on parabens in toothpaste is very limited, suggesting that the efforts of dental scientists are required to expand scientific knowledge related to parabens in oral hygiene measures.

A Study for Used Transaction Analysis System using Big Data (빅데이터를 이용한 중고 거래 분석 시스템 연구)

  • Ahn, Byeongtae
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.259-264
    • /
    • 2021
  • Recently, as the number of used trading sites supporting used trading increases, users want to search for a variety of information in real time. This new change has enabled a new type of C2C (Commerce to Commerce) transaction in the e-commerce base. However, since each used trading site has its own characteristics, it is difficult to standardize the whole. Therefore, in this paper, we studied a system that provides the transaction data used by the user in real time and provides the desired information quickly. In this paper, we researched the crawler system necessary for the development of the integrated trading system for used goods through Internet e-commerce, and made it possible to provide information in the web environment desired by the user through the defined morpheme analyzer. Therefore, in this study, we designed a system that provides information desired by users without accessing various used goods sites.

Mask Wearing Detection System using Deep Learning (딥러닝을 이용한 마스크 착용 여부 검사 시스템)

  • Nam, Chung-hyeon;Nam, Eun-jeong;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.44-49
    • /
    • 2021
  • Recently, due to COVID-19, studies have been popularly worked to apply neural network to mask wearing automatic detection system. For applying neural networks, the 1-stage detection or 2-stage detection methods are used, and if data are not sufficiently collected, the pretrained neural network models are studied by applying fine-tuning techniques. In this paper, the system is consisted of 2-stage detection method that contain MTCNN model for face recognition and ResNet model for mask detection. The mask detector was experimented by applying five ResNet models to improve accuracy and fps in various environments. Training data used 17,217 images that collected using web crawler, and for inference, we used 1,913 images and two one-minute videos respectively. The experiment showed a high accuracy of 96.39% for images and 92.98% for video, and the speed of inference for video was 10.78fps.

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.

Do Not Just Talk, Show Me in Action: Investigating the Effect of OSSD Activities on Job Change of IT Professional (오픈소스 소프트웨어 개발 플랫폼 활동이 IT 전문직 취업에 미치는 영향)

  • Jang, Moonkyoung;Lee, Saerom;Baek, Hyunmi;Jung, Yoonhyuk
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.1
    • /
    • pp.43-65
    • /
    • 2021
  • With the advancement of information and communications technology, a means to recruit IT professional has fundamentally changed. Nowadays recruiters search for candidate information from the Web as well as traditional information sources such as résumés or interviews. Particularly, open-source software development (OSSD) platforms have become an opportunity for developers to demonstrate their IT capabilities, making it a way for recruiters to find the right candidates, whom they need. Therefore, this study aims to investigate the impact developers' profiles in an OSSD platform on their finding a job. This study examined four antecedents of developer information that can accelerate their job search: job-seeking status, personal-information posting, learning activities and knowledge contribution activities. For the empirical analysis, we developed a Web crawler and gathered a dataset on 4,005 developers from GitHub, which is a well-known OSSD platform. Proportional hazards regression was used for data analysis because shorter job-seeking period implies more successful result of job change. Our results indicate that developers, who explicitly posted their job-seeking status, had shorter job-seeking periods than those who did not. The other antecedents (i.e., personal-information posting, learning, and knowledge contribution activities) also contributed in reducing the job-seeking period. These findings imply values of OSSD platforms for recruiters to find proper candidates and for developers to successfully find a job.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

An Interactive Cooking Video Query Service System with Linked Data (링크드 데이터를 이용한 인터랙티브 요리 비디오 질의 서비스 시스템)

  • Park, Woo-Ri;Oh, Kyeong-Jin;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.59-76
    • /
    • 2014
  • The revolution of smart media such as smart phone, smart TV and tablets has brought easiness for people to get contents and related information anywhere and anytime. The characteristics of the smart media have changed user behavior for watching the contents from passive attitude into active one. Video is a kind of multimedia resources and widely used to provide information effectively. People not only watch video contents, but also search for related information to specific objects appeared in the contents. However, people have to use extra views or devices to find the information because the existing video contents provide no information through the contents. Therefore, the interaction between user and media is becoming a major concern. The demand for direct interaction and instant information is much increasing. Digital media environment is no longer expected to serve as a one-way information service, which requires user to search manually on the internet finding information they need. To solve the current inconvenience, an interactive service is needed to provide the information exchange function between people and video contents, or between people themselves. Recently, many researchers have recognized the importance of the requirements for interactive services, but only few services provide interactive video within restricted functionality. Only cooking domain is chosen for an interactive cooking video query service in this research. Cooking is receiving lots of people attention continuously. By using smart media devices, user can easily watch a cooking video. One-way information nature of cooking video does not allow to interactively getting more information about the certain contents, although due to the characteristics of videos, cooking videos provide various information such as cooking scenes and explanation for each recipe step. Cooking video indeed attracts academic researches to study and solve several problems related to cooking. However, just few studies focused on interactive services in cooking video and they still not sufficient to provide the interaction with users. In this paper, an interactive cooking video query service system with linked data to provide the interaction functionalities to users. A linked recipe schema is used to handle the linked data. The linked data approach is applied to construct queries in systematic manner when user interacts with cooking videos. We add some classes, data properties, and relations to the linked recipe schema because the current version of the schema is not enough to serve user interaction. A web crawler extracts recipe information from allrecipes.com. All extracted recipe information is transformed into ontology instances by using developed instance generator. To provide a query function, hundreds of questions in cooking video web sites such as BBC food, Foodista, Fine cooking are investigated and analyzed. After the analysis of the investigated questions, we summary the questions into four categories by question generalization. For the question generalization, the questions are clustered in eleven questions. The proposed system provides an environment associating UI (User Interface) and UX (User Experience) that allow user to watch cooking videos while obtaining the necessary additional information using extra information layer. User can use the proposed interactive cooking video system at both PC and mobile environments because responsive web design is applied for the proposed system. In addition, the proposed system enables the interaction between user and video in various smart media devices by employing linked data to provide information matching with the current context. Two methods are used to evaluate the proposed system. First, through a questionnaire-based method, computer system usability is measured by comparing the proposed system with the existing web site. Second, the answer accuracy for user interaction is measured to inspect to-be-offered information. The experimental results show that the proposed system receives a favorable evaluation and provides accurate answers for user interaction.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

A Study on the necessity of Open Source Software Intermediaries in the Software Distribution Channel (소프트웨어 유통에 있어 공개소프트웨어 중개자의필요성에 대한 연구)

  • Lee, Seung-Chang;Suh, Eung-Kyo;Ahn, Sung-Hyuck;Park, Hoon-Sung
    • Journal of Distribution Science
    • /
    • v.11 no.2
    • /
    • pp.45-55
    • /
    • 2013
  • Purpose - The development and implementation of OSS (Open Source Software) led to a dramatic change in corporate IT infrastructure, from system server to smart phone, because the performance, reliability, and security functions of OSS are comparable to those of commercial software. Today, OSS has become an indispensable tool to cope with the competitive business environment and the constantly-evolving IT environment. However, the use of OSS is insufficient in small and medium-sized companies and software houses. This study examines the need for OSS Intermediaries in the Software Distribution Channel. It is expected that the role of the OSS Intermediary will be reduced with the improvement of the distribution process. The purpose of this research is to prove that OSS Intermediaries increase the efficiency of the software distribution market. Research design, Data, and Methodology - This study presents the analysis of data gathered online to determine the extent of the impact of the intermediaries on the OSS market. Data was collected using an online survey, conducted by building a personal search robot (web crawler). The survey period lasted 9 days during which a total of 233,021 data points were gathered from sourceforge.net and Apple's App store, the two most popular software intermediaries in the world. The data collected was analyzed using Google's Motion Chart. Results - The study found that, beginning 2006, the production of OSS in the Sourceforge.net increased rapidly across the board, but in the second half of 2009, it dropped sharply. There are many events that can explain this causality; however, we found an appropriate event to explain the effect. It was seen that during the same period of time, the monthly production of OSS in the App store was increasing quickly. The App store showed a contrasting trend to software production. Our follow-up analysis suggests that appropriate intermediaries like App store can enlarge the OSS market. The increase was caused by the appearance of B2C software intermediaries like App store. The results imply that OSS intermediaries can accelerate OSS software distribution, while development of a better online market is critical for corporate users. Conclusion - In this study, we analyzed 233,021 data points on the online software marketplace at Sourceforge.net. It indicates that OSS Intermediaries are needed in the software distribution market for its vitality. It is also critical that OSS intermediaries should satisfy certain qualifications to play a key role as market makers. This study has several interesting implications. One implication of this research is that the OSS intermediary should make an effort to create a complementary relationship between OSS and Proprietary Software. The second implication is that the OSS intermediary must possess a business model that shares the benefits with all the participants (developer, intermediary, and users).The third implication is that the intermediary provides an OSS of high quality like proprietary software with a high level of complexity. Thus, it is worthwhile to examine this study, which proves that the open source software intermediaries are essential in the software distribution channel.

  • PDF

상품에 대한 공급자 검색 문제 해결하기 위한 지능형 상품 에이전트 개발

  • Chae, Sang-Yong;Kim, Gyeong-Pil;Kim, U-Ju;Kim, Chang-Uk
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.11a
    • /
    • pp.475-480
    • /
    • 2005
  • 인터넷상에 존재하는 수 많은 웹 페이지들에는 정형화되지 않은 각종 정보들이 이종의 형태로 산재되어 있다. 현재의 검색 기술을 통하여 필요한 정보를 찾아내는 것은 시간과 비용이 많이 소요되는 비효율적인 방법으로 이뤄지고 있다. 이러한 상황에서 사용자가 원하는 정보를 검색 및 추출해내어 정형화시키는 것은 매우 중요하다. 전자상거래의 폭발적 성장에도 불구하고 전자상거래 표준 활용 및 적용이 미비하여 e- Procurement, e-Marketplace, on-Line Shopping Mall 등에서 소비자가 원하는 상품 정보를 손쉽게 획득하지 못하고 있다. 이는 공급자에게는 보다 많은 매출의 기회를 구매자에게는 보다 좋은 자재 및 상품을 저렴한 가격에 소싱 할 수 있는 기회를 제공하지 못하는 문제점이 발생한다. 본 연구에서 제안하고자 하는 지능형 상품 에이전트는 소비자가 구매하고자 하는 특정 상품에 대한 공급자 검색 문제를 해결하기 위하여, 시스템 내부 정보의 확장 및 지식화 뿐만 아니라 웹 상의 다양한 상품 정보를 자동적으로 수집 및 가공하여 저장하는 역할을 수행한다. 이러한 연구를 위해서 사용한 기술은 우선 database 의 schema 를 읽어 들일 수 있는 DB schema reader, 인터넷 웹 페이지(웹문서)를 방문해서 다양한 정보들의 URL을 수집하는 일을 하는 Meta Search Engine 과 Focused Crawler, 그리고 다른 형태의 데이터 구조를 특정 목적에 따라 표준화된 형태로 바꾸는 Wrapper가 있다. 이러한 기술들을 연동하여 필요한 정보들을 추출 공급자 검색 문제를 해결하고자 하는 것이 연구의 목적이다. 정보추출은 사용자의 관심사에 적합한 문서들로부터 어떤 구체적인 사실이나 관계를 정확히 추출하는 작업을 가리킨다.앞으로 e-메일, 매신저, 전자결재, 지식관리시스템, 인터넷 방송 시스템의 기반 구조 역할을 할 수 있다. 현재 오픈웨어에 적용하기 위한 P2P 기반의 지능형 BPM(Business Process Management)에 관한 연구와 X인터넷 기술을 이용한 RIA (Rich Internet Application) 기반 웹인터페이스 연구를 진행하고 있다.태도와 유아의 창의성간에는 상관이 없는 것으로 나타났고, 일반 유아의 아버지 양육태도와 유아의 창의성간의 상관에서는 아버지 양육태도의 성취-비성취 요인에서와 창의성제목의 추상성요인에서 상관이 있는 것으로 나타났다. 따라서 창의성이 높은 아동의 아버지의 양육태도는 일반 유아의 아버지와 보다 더 애정적이며 자율성이 높지만 창의성이 높은 아동의 집단내에서 창의성에 특별한 영향을 더 미치는 아버지의 양육방식은 발견되지 않았다. 반면 일반 유아의 경우 아버지의 성취지향성이 낮을 때 자녀의 창의성을 향상시킬 수 있는 것으로 나타났다. 이상에서 자녀의 창의성을 향상시키는 중요한 양육차원은 애정성이나 비성취지향성으로 나타나고 있어 정서적인 측면의 지원인 것으로 밝혀졌다.징에서 나타나는 AD-SR맥락의 반성적 탐구가 자주 나타났다. 반성적 탐구 척도 두 그룹을 비교 했을 때 CON 상호작용의 특징이 낮게 나타나는 N그룹이 양적으로 그리고 내용적으로 더 의미 있는 반성적 탐구를 했다용을 지원하는 홈페이지를 만들어 자료 제공 사이트에 대한 메타 자료를 데이터베이스화했으며 이를 통해 학생들이 원하는 실시간 자료를 검색하여 찾을 수 있고 홈페이지를 방분했을 때 이해하기 어려운 그래프나 각 홈페이지가 제공하는 자료들에 대한 처리 방법을 도움말로 제공받을 수 있게 했다. 실시간 자료들을 이용한 학습은 학생들의 학습 의욕과 탐구 능력을 향상시켰으

  • PDF