• Title/Summary/Keyword: Location정보

Search Result 5,606, Processing Time 0.04 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

The Application of Operations Research to Librarianship : Some Research Directions (운영연구(OR)의 도서관응용 -그 몇가지 잠재적응용분야에 대하여-)

  • Choi Sung Jin
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.4
    • /
    • pp.43-71
    • /
    • 1975
  • Operations research has developed rapidly since its origins in World War II. Practitioners of O. R. have contributed to almost every aspect of government and business. More recently, a number of operations researchers have turned their attention to library and information systems, and the author believes that significant research has resulted. It is the purpose of this essay to introduce the library audience to some of these accomplishments, to present some of the author's hypotheses on the subject of library management to which he belives O. R. has great potential, and to suggest some future research directions. Some problem areas in librianship where O. R. may play a part have been discussed and are summarized below. (1) Library location. It is usually necessary to make balance between accessibility and cost In location problems. Many mathematical methods are available for identifying the optimal locations once the balance between these two criteria has been decided. The major difficulties lie in relating cost to size and in taking future change into account when discriminating possible solutions. (2) Planning new facilities. Standard approaches to using mathematical models for simple investment decisions are well established. If the problem is one of choosing the most economical way of achieving a certain objective, one may compare th althenatives by using one of the discounted cash flow techniques. In other situations it may be necessary to use of cost-benefit approach. (3) Allocating library resources. In order to allocate the resources to best advantage the librarian needs to know how the effectiveness of the services he offers depends on the way he puts his resources. The O. R. approach to the problems is to construct a model representing effectiveness as a mathematical function of levels of different inputs(e.g., numbers of people in different jobs, acquisitions of different types, physical resources). (4) Long term planning. Resource allocation problems are generally concerned with up to one and a half years ahead. The longer term certainly offers both greater freedom of action and greater uncertainty. Thus it is difficult to generalize about long term planning problems. In other fields, however, O. R. has made a significant contribution to long range planning and it is likely to have one to make in librarianship as well. (5) Public relations. It is generally accepted that actual and potential users are too ignorant both of the range of library services provided and of how to make use of them. How should services be brought to the attention of potential users? The answer seems to lie in obtaining empirical evidence by controlled experiments in which a group of libraries participated. (6) Acquisition policy. In comparing alternative policies for acquisition of materials one needs to know the implications of each service which depends on the stock. Second is the relative importance to be ascribed to each service for each class of user. By reducing the level of the first, formal models will allow the librarian to concentrate his attention upon the value judgements which will be necessary for the second. (7) Loan policy. The approach to choosing between loan policies is much the same as the previous approach. (8) Manpower planning. For large library systems one should consider constructing models which will permit the skills necessary in the future with predictions of the skills that will be available, so as to allow informed decisions. (9) Management information system for libraries. A great deal of data can be available in libraries as a by-product of all recording activities. It is particularly tempting when procedures are computerized to make summary statistics available as a management information system. The values of information to particular decisions that may have to be taken future is best assessed in terms of a model of the relevant problem. (10) Management gaming. One of the most common uses of a management game is as a means of developing staff's to take decisions. The value of such exercises depends upon the validity of the computerized model. If the model were sufficiently simple to take the form of a mathematical equation, decision-makers would probably able to learn adequately from a graph. More complex situations require simulation models. (11) Diagnostics tools. Libraries are sufficiently complex systems that it would be useful to have available simple means of telling whether performance could be regarded as satisfactory which, if it could not, would also provide pointers to what was wrong. (12) Data banks. It would appear to be worth considering establishing a bank for certain types of data. It certain items on questionnaires were to take a standard form, a greater pool of data would de available for various analysis. (13) Effectiveness measures. The meaning of a library performance measure is not readily interpreted. Each measure must itself be assessed in relation to the corresponding measures for earlier periods of time and a standard measure that may be a corresponding measure in another library, the 'norm', the 'best practice', or user expectations.

  • PDF

Study on the Consequence Effect Analysis & Process Hazard Review at Gas Release from Hydrogen Fluoride Storage Tank (최근 불산 저장탱크에서의 가스 누출시 공정위험 및 결과영향 분석)

  • Ko, JaeSun
    • Journal of the Society of Disaster Information
    • /
    • v.9 no.4
    • /
    • pp.449-461
    • /
    • 2013
  • As the hydrofluoric acid leak in Gumi-si, Gyeongsangbuk-do or hydrochloric acid leak in Ulsan, Gyeongsangnam-do demonstrated, chemical related accidents are mostly caused by large amounts of volatile toxic substances leaking due to the damages of storage tank or pipe lines of transporter. Safety assessment is the most important concern because such toxic material accidents cause human and material damages to the environment and atmosphere of the surrounding area. Therefore, in this study, a hydrofluoric acid leaked from a storage tank was selected as the study example to simulate the leaked substance diffusing into the atmosphere and result analysis was performed through the numerical Analysis and diffusion simulation of ALOHA(Areal Location of Hazardous Atmospheres). the results of a qualitative evaluation of HAZOP (Hazard Operability)was looked at to find that the flange leak, operation delay due to leakage of the valve and the hose, and toxic gas leak were danger factors. Possibility of fire from temperature, pressure and corrosion, nitrogen supply overpressure and toxic leak from internal corrosion of tank or pipe joints were also found to be high. ALOHA resulting effects were a little different depending on the input data of Dense Gas Model, however, the wind direction and speed, rather than atmospheric stability, played bigger role. Higher wind speed affected the diffusion of contaminant. In term of the diffusion concentration, both liquid and gas leaks resulted in almost the same $LC_{50}$ and ALOHA AEGL-3(Acute Exposure Guidline Level) values. Each scenarios showed almost identical results in ALOHA model. Therefore, a buffer distance of toxic gas can be determined by comparing the numerical analysis and the diffusion concentration to the IDLH(Immediately Dangerous to Life and Health). Such study will help perform the risk assessment of toxic leak more efficiently and be utilized in establishing community emergency response system properly.

The Usefulness Assessment of Attenuation Correction and Location Information in SPECT/CT (SPECT/CT에서 감쇠 보정 및 위치 정보의 유용성 평가)

  • Choi, Jong-Sook;Jung, Woo-Young;Shin, Sang-Ki;Cho, Shee-Man
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.12 no.3
    • /
    • pp.214-221
    • /
    • 2008
  • Purpose: We make a qualitative analysis of whether Fusion SPECT/CT can find lesion's anatomical sites better than existing SPECT or not, and we want to show the usefulness of SPECT/CT through finding out effects of CT attenuation correction on SPECT images. Materials and Method: 1. The evaluation of fusion images: This study comprised patients who was tested $^{131}I$-MIBG, Bone, $^{111}In$-Octreotide, Meckel's diverticulum, Parathyroid MIBI with Precedence 16 or Symbia T2 from 2008 Jan to Aug. We compared SPECT/CT image with non fusion image and make a qualitative analysis. 2. The evaluation of attenuation correction: We classified 38 patients who was tested 201Tl myocardial exam with Symbia T2 into 5 sections by using Cedars Sinai' QPS program - Ant, Inf, Lat, Septum, Apex. And we showed each section's perfusion states by percentage. We compared the each section's perfusion-states differences between CT AC and Non AC by average${\pm}$standard deviation. Results: 1. The evaluation of fusion images : In high energy $^{131}I$ cases, it was hard to grasp exact anatomical lesions due to difference between regions and surrounding lesions' uptake level. After combining with CT, we could grabs anatomical lesion more exactly. And in meckel's diverticulum case or to find lesions around bowels or organs with $^{111}In$ cases, it demonstrates its superiority. Bone SPECT/CT images help to distinguish between disk spaces certainly and give correct results. 2. The evaluation of attenuation correction: There is no significant difference statistically in Ant and Lat (p>0.05), but there is a meaningful difference in Inferior, Apex and Septum (p<0.05). AC perfusion at inferior wall in the 5 sections of myocardium: The perfusion difference between Non AC perfusion image ($68.58{\pm}7.55$) and CT corrected perfusion image ($76.84{\pm}6.52$) was the largest by $8.26{\pm}4.95$ (p<0.01, t=10.29). Conclusion: Nuclear medicine physicians can identify not only molecular image which shows functional activity of lesions but also anatomical location information of lesions with more accuracy using the combination of SPECT and CT systems. Of course this combination helps nuclear medicine physician find out the abnormal parts. Moreover combined data sets help separate between normal group and abnormal group in complicated body part. So clinicians can carry out diagnosis and treatment planning at the same time with a single test image. In addition, when we examine a myocardium in thorax where attenuation can occur easily, we can trust perfusion more in a certain region in SPECT test because CT provides the capability for accurate attenuation correction. In these reasons, we think we can prove the justice after treatment fusion image.

  • PDF

Bioinformatic Analysis of the Canine Genes Related to Phenotypes for the Working Dogs (특수 목적견으로서의 품성 및 능력 관련 유전자들에 관한 생물정보학적 분석)

  • Kwon, Yun-Jeong;Eo, Jungwoo;Choi, Bong-Hwan;Choi, Yuri;Gim, Jeong-An;Kim, Dahee;Kim, Tae-Hun;Seong, Hwan-Hoo;Kim, Heui-Soo
    • Journal of Life Science
    • /
    • v.23 no.11
    • /
    • pp.1325-1335
    • /
    • 2013
  • Working dogs, such as rescue dogs, military watch dogs, guide dogs, and search dogs, are selected by in-training examination of desired traits, including concentration, possessiveness, and boldness. In recent years, genetic information has been considered to be an important factor for the outstanding abilities of working dogs. To characterize the molecular features of the canine genes related to phenotypes for working dogs, we investigated the 24 previously reported genes (AR, BDNF, DAT, DBH, DGCR2, DRD4, MAOA, MAOB, SLC6A4, TH, TPH2, IFT88, KCNA3, TBR2, TRKB, ACE, GNB1, MSTN, PLCL1, SLC25A22, WFIKKN2, APOE, GRIN2B, and PIK3CG) that were categorized to personality, olfactory sense, and athletic/learning ability. We analyzed the chromosomal location, gene-gene interactions, Gene Ontology, and expression patterns of these genes using bioinformatic tools. In addition, variable numbers of tandem repeat (VNTR) or microsatellite (MS) polymorphism in the AR, MAOA, MAOB, TH, DAT, DBH, and DRD4 genes were reviewed. Taken together, we suggest that the genetic background of the canine genes associated with various working dog behaviors and skill performance attributes could be used for proper selection of superior working dogs.

A Study on the Possibility of Producing a Floor Plan of 「Donggwoldo(東闕圖)」 through the Use of Rubber Sheeting Transformation - With a Focus on the Surroundings near the Geumcheongyo Bridge in Changdeokgung Palace - (러버쉬팅변환을 통한 「동궐도(東闕圖)」의 평면도 제작 가능성 연구 - 창덕궁 금천교 주변을 중심으로 -)

  • Lee, Jae-Yong;Kim, Young-Mo
    • Korean Journal of Heritage: History & Science
    • /
    • v.50 no.4
    • /
    • pp.104-121
    • /
    • 2017
  • The present study attempted to produce the floor plan of the surroundings near Geumcheongyo Bridge in Changdeokgung Palace of the Late Joseon Period through the use of rubber sheeting transformation based on the drawing principles of "Donggwoldo(東闕圖)". First, the study compared the actual sizes of the major buildings that have existed since the production of "Donggwoldo(東闕圖)" with the sizes depicted in the picture to reveal that the front elevation of the buildings was produced by reducing it by approximately 1/200. However, the study could not confirm the same production proportions for the side elevation. Only the lengths of the side elevation were depicted at around half of the actual proportions, and as the diagonal line angles were found to be at an average of $39^{\circ}$, the study confirmed they were drawn in a manner similar to cabinet projection. Second, the study created an obliquely projected floor plan by inversely shadowing the drawing principles of "Donggwoldo(東闕圖)" and produced a floor plan of the surroundings near Geumcheongyo Bridge in Changdeokgung Palace through the use of rubber sheeting transformation. Projective transformation was confirmed as most suitable during the transformation, and with standard error of 2.1208m, the relatively high accuracy of the transformation shows that the production of a floor plan for "Donggwoldo(東闕圖)" is significant. Furthermore, it implies the possibility of producing floor plans for various documentary paintings produced using the paralleled oblique drawing method in addition to "Donggwoldo(東闕圖)". Third, the study evaluated the accuracy of the spatial information provided by the produced floor plan by comparing the three items of Geumcheongyo Bridge location, Geumcheongyo Bridge and Jinseonmun Gate arrangement, and Geumcheon stone embankment location. The results confirmed the possibility of utilizing the floor plan as a useful tool which helps understand the appearance of the surroundings at the time of "Donggwoldo(東闕圖)" production because it is parallel to the excavation results of the Geumcheongyo Bridge and its context. Therefore, the present study is significant in that it seeks the possibility of producing spatial information recorded in "Donggwoldo(東闕圖)" by applying rubber sheeting transformation and consequently in that it presents a new methodology for understanding the appearance of the East Palace of the Late Joseon Period.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Usefulness of "Volumetrix Suite" with SPECT/CT (SPECT/CT 영상에서 Volumetrix Suite의 유용성)

  • Cho, Seung-Wook;Shin, Byeong-Ho;Kim, Jong-Pil;Yoon, Seok-Hwan;Kim, Tae-Yeub;Seung, Yong-Joon;Moon, Il-Sang;Woo, Jae-Ryong;Lee, Ho-Young
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.2
    • /
    • pp.166-171
    • /
    • 2010
  • Purpose: The SPECT/CT is able to acquire diagnostic information resolved the difficult problems that discriminate regions of focals by intergrating functional images and anatomical images. We introduce the usefulness of "Volumetrix Suite" which can describe 3D images by the convergence of the SPECT/CT images and reference CT images. Materials and Methods: We applied Volumetrix Suite program (Volumetrix IR, Volumetrix 3D) to patients, Bone, Venography, Parathyroid, WBC, taken diagnostic CT examination which have same regions of focal in Seoul Metropolitan Government Seoul National University Boramae Medical Center. After acquiring SPECT/CT images and reference CT images, we fused a couple of scans applying for this programs. The CT scan of Infinia Hawkeye 4 shows limitation of anatomical information. For this reason, we tried to transfer CT images that have lots of diagnostic informations as the form of Dicom file in PACS, and changed from 2D images to 3D images after image registering in Xeleris Workstaion of Hawkeye 4. Results & Conclusion: By using Volumetrix Suite program, we're able to acquire more accurate anatomical informations with 3D rendering which can distinguish both location and range of focals in Infinia Hawkeye 4. Thus, the result of utilizing this program indicate that nuclear medicine anatomical images can be improved by providing more diagnostic imformations produced by its program.

  • PDF

Retrieving Volcanic Ash Information Using COMS Satellite (MI) and Landsat-8 (OLI, TIRS) Satellite Imagery: A Case Study of Sakurajima Volcano (천리안 위성영상(MI)과 Landsat-8 위성영상(OLI, TIRS)을 이용한 화산재 정보 산출: 사쿠라지마 화산의 사례연구)

  • Choi, Yoon-Ho;Lee, Won-Jin;Park, Sun-Cheon;Sun, Jongsun;Lee, Duk Kee
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.5_1
    • /
    • pp.587-598
    • /
    • 2017
  • Volcanic ash is a fine particle smaller than 2 mm in diameters. It falls after the volcanic eruption and causes various damages to transportation, manufacturing industry and respiration of living things. Therefore diffusion information of volcanic ash is highly significant for preventing the damages from it. It is advantageous to utilize satellites for observing the widely diffusing volcanic ash. In this study volcanic ash diffusion information about two eruptions of Mt. Sakurajima were calculated using the geostationary satellite, Communication, Ocean and Meteorological Satellite (COMS) Meteorological Imager (MI) and polar-orbiting satellite, Landsat-8 Operational Land Imager (OLI) and the Thermal InfraRed Sensor (TIRS). The direction and velocity of volcanic ash diffusion were analyzed by extracting the volcanic ash pixels from COMS-MI images and the height was retrieved by adjusting the shadow method to Landsat-8 images. In comparison between the results of this study and those of Volcanic Ash Advisories center (VAAC), the volcanic ash tend to diffuse the same direction in both case. However, the diffusion velocity was about four times slower than VAAC information. Moreover, VAAC only provide an ash height while our study produced a variety of height information with respect to ash diffusion. The reason for different results is measured location. In case of VAAC, they produced approximate ash information around volcano crater to rapid response, while we conducted an analysis of the ash diffusion whole area using ash observed images. It is important to measure ash diffusion when large-scale eruption occurs around the Korean peninsula. In this study, it can be used to produce various ash information about the ash diffusion area using different characteristics satellite images.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.