Search | Korea Science

Wrapper-based Economy Data Collection System Design And Implementation (래퍼 기반 경제 데이터 수집 시스템 설계 및 구현)

Piao, Zhegao;Gu, Yeong Hyeon;Yoo, Seong Joon
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.05a
- /
- pp.227-230
- /
- 2015
For analyzing and prediction of economic trends, it is necessary to collect particular economic news and stock data. Typical Web crawler to analyze the page content, collects document and extracts URL automatically. On the other hand there are forms of crawler that can collect only document of a particular topic. In order to collect economic news on a particular Web site, we need to design a crawler which could directly analyze its structure and gather data from it. The wrapper-based web crawler design is required. In this paper, we design a crawler wrapper for Economic news analysis system based on big data and implemented to collect data. we collect the data which stock data, sales data from USA auto market since 2000 with wrapper-based crawler. USA and South Korea's economic news data are also collected by wrapper-based crawler. To determining the data update frequency on the site. And periodically updated. We remove duplicate data and build a structured data set for next analysis. Primary to remove the noise data, such as advertising and public relations, etc.
PDF

A Study on the Construction of the Automatic Summaries - on the basis of Straight News in the Web - (자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -)

Lee, Tae-Young
- Journal of the Korean Society for information Management
- /
- v.23 no.4 s.62
- /
- pp.41-67
- /
- 2006
The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences , and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, was carried out by using the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary.
https://doi.org/10.3743/KOSIM.2006.23.4.041 인용 PDF

Identifying the Interests of Web Category Visitors Using Topic Analysis (토픽 분석을 활용한 웹 카테고리별 방문자 관심 이슈 식별 방안)

Choi, Seongi;Kim, Namgyu
- Journal of Information Technology Applications and Management
- /
- v.21 no.4_spc
- /
- pp.415-429
- /
- 2014
With the advent of smart devices, users are able to connect to each other through the Internet without the constraints of time and space. Because the Internet has become increasingly important to users in their everyday lives, reliance on it has grown. As a result, the number of web sites constantly increases and the competition between these sites becomes more intense. Even those sites that operate successfully struggle to establish new strategies for customer retention and customer development in order to survive. Many companies use various customer information in order to establish marketing strategies based on customer group segmentation A method commonly used to determine the customer groups of individual sites is to infer customer characteristics based on the customers' demographic information. However, such information cannot sufficiently represent the real characteristics of customers. For example, users who have similar demographic characteristics could nonetheless have different interests and, therefore, different buying needs. Hence, in this study, customers' interests are first identified through an analysis of their Internet news inquiry records. This information is then integrated in order to identify each web category. The study then analyzes the possibilities for the practical use of the proposed methodology through its application to actual Internet news inquiry records and web site browsing histories.
https://doi.org/10.21219/jitam.2014.21.4_spc.415 인용 PDF

MyNews : Personalized XML Document Transcoding Technique for Mobile Device Users (MyNews : 모바일 환경에서 사용자 관심사를 고려한 XML 문서 트랜스코딩)

Song Teuk-Seob;Lee Jin-Sang;Lee Kyong-Ho;Sohn Won-Sung;Ko Seung-Kyu;Choy Yoon-Chul;Lim Soon-Bum
- The KIPS Transactions:PartB
- /
- v.12B no.2 s.98
- /
- pp.181-190
- /
- 2005
Developing wireless internet service and mobile devices, mechanisms for web service across are various. However, the existing web infrastructure and content were designed for desktop computers and arc not well-suited for other types of accesses, e.g. PDA or mobile Phone that have less processing power and memory, small screens, limited input facilities, or network bandwidth etc. Thus, there is a growing need for transcoding techniques that provide that ability to browse the web through mobile devices. However, previous researches on existing web contents transcoding are service provider centric, which does not accurately reflect the user's continuously changing interest. In this paper, we presents a transcoding technique involved in making existing news contents based on XML available via customized wireless service, mobile phone.
https://doi.org/10.3745/KIPSTB.2005.12B.2.181 인용 PDF KSCI

Fake News Detector using Machine Learning Algorithms

Diaa Salama;yomna Ibrahim;Radwa Mostafa;Abdelrahman Tolba;Mariam Khaled;John Gerges;Diaa Salama
- International Journal of Computer Science & Network Security
- /
- v.24 no.7
- /
- pp.195-201
- /
- 2024
With the Covid-19(Corona Virus) spread all around the world, people are using this propaganda and the desperate need of the citizens to know the news about this mysterious virus by spreading fake news. Some Countries arrested people who spread fake news about this, and others made them pay a fine. And since Social Media has become a significant source of news, .there is a profound need to detect these fake news. The main aim of this research is to develop a web-based model using a combination of machine learning algorithms to detect fake news. The proposed model includes an advanced framework to identify tweets with fake news using Context Analysis; We assumed that Natural Language Processing(NLP) wouldn't be enough alone to make context analysis as Tweets are usually short and do not follow even the most straightforward syntactic rules, so we used Tweets Features as several retweets, several likes and tweet-length we also added statistical credibility analysis for Twitter users. The proposed algorithms are tested on four different benchmark datasets. And Finally, to get the best accuracy, we combined two of the best algorithms used SVM ( which is widely accepted as baseline classifier, especially with binary classification problems ) and Naive Base.
https://doi.org/10.22937/IJCSNS.2024.24.7.23 인용 PDF

An Analysis of Card News and Deconstructing News Values in Curated News Contents in the Digital Era

Hong, Seong Choul;Pae, Jung Kun
- Journal of Internet Computing and Services
- /
- v.18 no.2
- /
- pp.105-111
- /
- 2017
This paper explores the characteristics of curated news content. With content analysis of 1020 news clips, the study found that news values immersed in card news differed from those of traditional news. Specifically, timeliness was not regarded as a key factor in newsworthiness. Rather, information and social impacts were highly emphasized. Considering news consumers depend on traditional news for timely news, curated news content was not a replacement for traditional news but a supplement. By refurbishing photos from previous news reports and googling the web for related information, curated news reiterates social meaning and provides relevant information. Furthermore, salience of human interest can be explained by entertaining characteristics of curated news. In story forms, the list technique has several important points to stress, and was more frequently used than inverted pyramids. Another key finding of this study is man-on-the street as the most quoted main sources in the curatorial context.
https://doi.org/10.7472/jksii.2017.18.2.105 인용 PDF KSCI

Issue Analysis on Gas Safety Based on a Distributed Web Crawler Using Amazon Web Services (AWS를 활용한 분산 웹 크롤러 기반 가스 안전 이슈 분석)

Kim, Yong-Young;Kim, Yong-Ki;Kim, Dae-Sik;Kim, Mi-Hye
- Journal of Digital Convergence
- /
- v.16 no.12
- /
- pp.317-325
- /
- 2018
With the aim of creating new economic values and strengthening national competitiveness, governments and major private companies around the world are continuing their interest in big data and making bold investments. In order to collect objective data, such as news, securing data integrity and quality should be a prerequisite. For researchers or practitioners who wish to make decisions or trend analyses based on objective and massive data, such as portal news, the problem of using the existing Crawler method is that data collection itself is blocked. In this study, we implemented a method of collecting web data by addressing existing crawler-style problems using the cloud service platform provided by Amazon Web Services (AWS). In addition, we collected 'gas safety' articles and analyzed issues related to gas safety. In order to ensure gas safety, the research confirmed that strategies for gas safety should be established and systematically operated based on five categories: accident/occurrence, prevention, maintenance/management, government/policy and target.
https://doi.org/10.14400/JDC.2018.16.12.317 인용 PDF KSCI HTML

A Personalized Mobile Service Method of RSS News Channel Contents for Ubiquitous Environment (유비쿼터스 환경을 위한 RSS 뉴스 채널 컨텐츠의 개인화 모바일 서비스 기법)

Han, Seung-Hyun;Ryu, Dong-Yeop;Lim, Young-Hwan
- The KIPS Transactions:PartD
- /
- v.14D no.4 s.114
- /
- pp.427-434
- /
- 2007
Although wireless devices are the most suitable device for ubiquitous environment, they have restrictive capacities when using internet services than desktop environments. Therefore this research proposes a wireless internet service method that uses contents-based personalization. The existing websites can easily and promptly access desired news articles and other data through RSS-linked web contents and by the personalization method. The proposed method will make using wireless internet easier while lowering contents production costs. Moreover, personalized mobile web news contents that satisfy the preferences of users can be serviced.
https://doi.org/10.3745/KIPSTD.2007.14-D.3.427 인용 PDF KSCI

News Abusing Inference Model Using Web Crawling (웹크롤링을 활용한 뉴스 어뷰징 추론 모델)

Chung, Kyoung-Rock;Park, Koo-Rack;Chung, Young-Suk;Nam, Ki-Bok
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2018.07a
- /
- pp.175-176
- /
- 2018
기존 신문이나 티브이가 아닌 온라인과 모바일로 뉴스를 보는 사람이 더 많아지면서, 포털 사이트 뉴스난에 다른 언론사의 기사보다 더 많이 노출되기 위한 경쟁의 심화로 뉴스 어뷰징은 심각한 사회 문제로까지 대두되었다. 본 논문은 온라인상에서 생성, 유통되는 많은 뉴스 중에서 이용자의 시간을 낭비하고 양질의 정보를 찾기 힘들게 하는 뉴스 어뷰징을 판단하는 모델을 제안한다. 제안된 모델은 크롤링 기술을 사용하여 뉴스의 제목과 내용을 가져온 후 인공지능 기술을 이용한 유사도 검사로 기사의 어뷰징 여부를 판단하여 양질의 뉴스 정보를 사용자에게 제공될 수 있다.
PDF

Method of Related Document Recommendation with Similarity and Weight of Keyword (키워드의 유사도와 가중치를 적용한 연관 문서 추천 방법)

Lim, Myung Jin;Kim, Jae Hyun;Shin, Ju Hyun
- Journal of Korea Multimedia Society
- /
- v.22 no.11
- /
- pp.1313-1323
- /
- 2019
With the development of the Internet and the increase of smart phones, various services considering user convenience are increasing, so that users can check news in real time anytime and anywhere. However, online news is categorized by media and category, and it provides only a few related search terms, making it difficult to find related news related to keywords. In order to solve this problem, we propose a method to recommend related documents more accurately by applying Doc2Vec similarity to the specific keywords of news articles and weighting the title and contents of news articles. We collect news articles from Naver politics category by web crawling in Java environment, preprocess them, extract topics using LDA modeling, and find similarities using Doc2Vec. To supplement Doc2Vec, we apply TF-IDF to obtain TC(Title Contents) weights for the title and contents of news articles. Then we combine Doc2Vec similarity and TC weight to generate TC weight-similarity and evaluate the similarity between words using PMI technique to confirm the keyword association.
https://doi.org/10.9717/kmms.2019.22.11.1313 인용 PDF KSCI

Search Result 247, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)