• Title/Summary/Keyword: sparse data matrix

Search Result 68, Processing Time 0.022 seconds

3D Modeling and Inversion of Magnetic Anomalies (자력이상 3차원 모델링 및 역산)

  • Cho, In-Ky;Kang, Hye-Jin;Lee, Keun-Soo;Ko, Kwang-Beom;Kim, Jong-Nam;You, Young-June;Han, Kyeong-Soo;Shin, Hong-Jun
    • Geophysics and Geophysical Exploration
    • /
    • v.16 no.3
    • /
    • pp.119-130
    • /
    • 2013
  • We developed a method for inverting magnetic data to recover the 3D susceptibility models. The major difficulty in the inversion of the potential data is the non-uniqueness and the vast computing time. The insufficient number of data compared with that of inversion blocks intensifies the non-uniqueness problem. Furthermore, there is poor depth resolution inherent in magnetic data. To overcome this non-uniqueness problem, we propose a resolution model constraint that imposes large penalty on the model parameter with good resolution; on the other hand, small penalty on the model parameter with poor resolution. Using this model constraint, the model parameter with a poor resolution can be effectively resolved. Moreover, the wavelet transform and parallel solving were introduced to save the computing time. Through the wavelet transform, a large system matrix was transformed to a sparse matrix and solved by a parallel linear equation solver. This procedure is able to enormously save the computing time for the 3D inversion of magnetic data. The developed inversion algorithm is applied to the inversion of the synthetic data for typical models of magnetic anomalies and real airborne data obtained at the Geumsan area of Korea.

Assessment of Changed Input Modules with SMOKE Model (SMOKE 모델의 입력 모듈 변경에 따른 영향 분석)

  • Kim, Ji-Young;Kim, Jeong-Soo;Hong, Ji-Hyung;Jung, Dong-Il;Ban, Soo-Jin;Lee, Yong-Mi
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.24 no.3
    • /
    • pp.284-299
    • /
    • 2008
  • Emission input modules was developed to produce emission input data and change some profiles for Sparse Matrix Operator Kernel Emissions (SMOKE) using Clean Air Policy Support System (CAPSS)'s activities and previous studies. Specially, this study was focused to improve chemical speciation and temporal allocation profiles of SMOKE. At first, SCC cord mapping was done. 579 SCC cords of CAPSS were matched with EPA's one. Temporal allocation profiles were changed using CAPSS monthly activities. And Chemical speciation profiles were substituted using Kang et al. (2000) and Lee et al. (2005) studies and Kim et al. (2005) study. Simulation in Seoul Metropolitan Area (Seoul, Incheon, Gyeonggi) using MM5, SMOKE and CMAQ modeling system was done for effect analysis of changed input modules of SMOKE. Emission model results adjusted with new input modules were slightly changed as compared to using EPA's default modules. SMOKE outputs shows that aldehyde emissions were decreased 4.78% after changing chemical profiles, increased 0.85% after implementing new temporal profiles. Toluene emissions were decreased 18.56% by changing chemical speciation profiles, increased 0.67% by replacing temporal profiles as well. Simulated results of air quality were also slightly elevated by using new input modules. Continuous accumulation of domestic data and studies to develop input system for air quality modeling would produce more improved results of air quality prediction.

Korea Emissions Inventory Processing Using the US EPA's SMOKE System

  • Kim, Soon-Tae;Moon, Nan-Kyoung;Byun, Dae-Won W.
    • Asian Journal of Atmospheric Environment
    • /
    • v.2 no.1
    • /
    • pp.34-46
    • /
    • 2008
  • Emissions inputs for use in air quality modeling of Korea were generated with the emissions inventory data from the National Institute of Environmental Research (NIER), maintained under the Clean Air Policy Support System (CAPSS) database. Source Classification Codes (SCC) in the Korea emissions inventory were adapted to use with the U.S. EPA's Sparse Matrix Operator Kernel Emissions (SMOKE) by finding the best-matching SMOKE default SCCs for the chemical speciation and temporal allocation. A set of 19 surrogate spatial allocation factors for South Korea were developed utilizing the Multi-scale Integrated Modeling System (MIMS) Spatial Allocator and Korean GIS databases. The mobile and area source emissions data, after temporal allocation, show typical sinusoidal diurnal variations with high peaks during daytime, while point source emissions show weak diurnal variations. The model-ready emissions are speciated for the carbon bond version 4 (CB-4) chemical mechanism. Volatile organic carbon (VOC) emissions from painting related industries in area source category significantly contribute to TOL (Toluene) and XYL (Xylene) emissions. ETH (Ethylene) emissions are largely contributed from point industrial incineration facilities and various mobile sources. On the other hand, a large portion of OLE (Olefin) emissions are speciated from mobile sources in addition to those contributed by the polypropylene industry in point source. It was found that FORM (Formaldehyde) is mostly emitted from petroleum industry and heavy duty diesel vehicles. Chemical speciation of PM2.5 emissions shows that PEC (primary fine elemental carbon) and POA (primary fine organic aerosol) are the most abundant species from diesel and gasoline vehicles. To reduce uncertainties in processing the Korea emission inventory due to the mapping of Korean SCCs to those of U.S., it would be practical to develop and use domestic source profiles for the top 10 SCCs for area and point sources and top 5 SCCs for on-road mobile sources when VOC emissions from the sources are more than 90% of the total.

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

Collaborative Filtering using Co-Occurrence and Similarity information (상품 동시 발생 정보와 유사도 정보를 이용한 협업적 필터링)

  • Na, Kwang Tek;Lee, Ju Hong
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.19-28
    • /
    • 2017
  • Collaborative filtering (CF) is a system that interprets the relationship between a user and a product and recommends the product to a specific user. The CF model is advantageous in that it can recommend products to users with only rating data without any additional information such as contents. However, there are many cases where a user does not give a rating even after consuming the product as well as consuming only a small portion of the total product. This means that the number of ratings observed is very small and the user rating matrix is very sparse. The sparsity of this rating data poses a problem in raising CF performance. In this paper, we concentrate on raising the performance of latent factor model (especially SVD). We propose a new model that includes product similarity information and co occurrence information in SVD. The similarity and concurrence information obtained from the rating data increased the expressiveness of the latent space in terms of latent factors. Thus, Recall increased by 16% and Precision and NDCG increased by 8% and 7%, respectively. The proposed method of the paper will show better performance than the existing method when combined with other recommender systems in the future.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Estimation of Chemical Speciation and Temporal Allocation Factor of VOC and PM2.5 for the Weather-Air Quality Modeling in the Seoul Metropolitan Area (수도권 지역에서 기상-대기질 모델링을 위한 VOC와 PM2.5의 화학종 분류 및 시간분배계수 산정)

  • Moon, Yun Seob
    • Journal of the Korean earth science society
    • /
    • v.36 no.1
    • /
    • pp.36-50
    • /
    • 2015
  • The purpose of this study is to assign emission source profiles of volatile organic compounds (VOCs) and particulate matters (PMs) for chemical speciation, and to correct the temporal allocation factor and the chemical speciation of source profiles according to the source classification code within the sparse matrix operator kernel emission system (SMOKE) in the Seoul metropolitan area. The chemical speciation from the source profiles of VOCs such as gasoline, diesel vapor, coating, dry cleaning and LPG include 12 and 34 species for the carbon bond IV (CBIV) chemical mechanism and the statewide air pollution research center 99 (SAPRC99) chemical mechanism, respectively. Also, the chemical speciation of PM2.5 such as soil, road dust, gasoline and diesel vehicles, industrial source, municipal incinerator, coal fired, power plant, biomass burning and marine was allocated to 5 species of fine PM, organic carbon, elementary carbon, $NO_3{^-}$, and $SO_4{^2-}$. In addition, temporal profiles for point and line sources were obtained by using the stack telemetry system (TMS) and hourly traffic flows in the Seoul metropolitan area for 2007. In particular, the temporal allocation factor for the ozone modeling at point sources was estimated based on $NO_X$ emission inventories of the stack TMS data.