1. INTRODUCTION
The spatial resolution of a large aperture ground-based solar telescope is degraded by the atmospheric turbulence. Raw observational images from the telescope have characteristics of motion, blurring, and geometrical distortions. Many post-processing reconstruction methods have been developed to help enhance spatial resolution of observational images, such as shift-and-add (SAA) (Li et al. 2014; Sudo et al. 2006), phase diversity (Gonsalves 1982; Paxman et al. 1992), and speckle masking (Lohmann et al. 1983; von der Lühe 1993). Due to reliable performance of phase reconstruction, the speckle masking is most commonly used, even though it is computationally expensive. The speckle masking method was applied to solar image reconstruction for the New Vacuum Solar Telescope (NVST) (Li et al. 2014). Near real-time reconstruction performance was achieved for the NVST on a high performance cluster using the Message Passing Interface (MPI). The parallel program we developed reconstructs one image with 2368×1920 pixels from one burst with 100 short exposure images in about 48 seconds (Li et al. 2015). However, much time (one-third of the computation time) was spent in reconstructing all subimages in such a speckle reconstruction. Real-time image reconstruction can not only enhance the efficiency of observational data analysis, but also relieve the pressure of data storage and processing. Therefore, in order to further improve the acceleration performance, it is absolutely necessary to parallelize and speedup subimage reconstruction.
Some aspects of using speckle masking and MPI to reconstruct solar image in near time have been already researched by Denker et al. (2001), Wöger et al. (2008), and Cao et al. (2010). However, there is little literature on the parallelization of the speckle masking algorithm. Only Wöger & Ferayorni (2012) researched the acceleration of speckle masking algorithm in one solar subimage reconstruction on Graphics Processing Units (GPU). Upon analyzing speckle masking algorithm in subimage reconstruction, we found that the average bispectrum computation is most time-consuming. The algorithm described in this paper takes a different approach to accelerate the average bispectrum computation using the Open Multiple Processing (OpenMP). OpenMP is the most popular solution and has been widely used in realtime astronomical data processing. For example, Giant Metrewave Radio Telescope (GMRT) used the OpenMP to accelerate the flagging and calibration of radio interferometric data (Jayanti & Jayaram 2012). The parallel computation of the nonlinear force-free field extrapolation of the coronal magnetic field using the data obtained by the Hinode satellite was realized through OpenMP on a multi-CPU machine (He et al. 2011).
In this paper, we report on a parallel implementation of the speckle masking algorithm for solar subimage reconstruction using the OpenMP. Our method mainly differs from Wöger & Ferayorni (2012) in that the method uses the OpenMP to accelerate the average bispectrum computation algorithm, and promotes the computation speed of solar subimage reconstruction significantly. In Section 2, we describe the novel parallel method for speckle masking algorithm. In Section 3, we present results and analysis. Finally, we present our conclusions and future work in Section 4.
2. THE NOVEL PARALLEL METHOD FOR SPECKLE MASKING ALGORITHM
In general, a high resolution image is statistically reconstructed from one burst with at least 100 short exposure images. Because speckle masking is only valid for a small region, the isoplanatic patch, one burst with at least 100 preprocessed images are divided into a number of partially overlapping subimage bursts with 100 × 2562 pixels (Li et al. 2015). These subimage bursts are reconstructed into subimages using speckle masking separately, then all reconstructed subimages are combined to form an entire high resolution image (Liu et al. 2014). The speckle masking algorithm, based on Fourier phase reconstruction by use of the bispectrum, is defined by
where u and v is the two-dimensional spatial frequency. represents the average speckle masking bispectrum. denotes the average speckle masking transfer function. After obtaining average bispectrum, the object’s phase can be recovered from average bispectrum. A detailed description of the technical parts of the phase reconstruction method is given by Pehlemann & von der Lühe (1989).
OpenMP is an open standard for portable high performance computing. Using the OpenMP standard, it is easy to solve the traditional programming problem of load balancing because OpenMP does it automatically, we need not consider which part of calculation should be done on which CPU core (Chapman et al. 2007). In our work, we propose a novel parallel method for speckle masking algorithm based on OpenMP. In the OpenMP model, multi-threads accelerate the computational intensity and time-consuming codes, while single thread continues to execute serial codes. Upon analyzing speckle masking algorithm, we find that computation of the average bispectrum is the most time-consuming process. Therefore, in the algorithm, the main functions such as batched two-dimensional (2D) Fourier spectrum shift, batched 2D spectrum subimage extract, and five-dimensional (5D) bispectrum computation are implemented in C language through OpenMP multi-threads. The flow chart of the novel parallel method for speckle masking algorithm based on OpenMP is shown in Figure 1. Several functions are implemented layer by layer, and inside the parallel region, each function is performed by multi-threads simultaneously and independently.
Figure 1.The flow chart of the OpenMP-based speckle masking implementation on a shared memory machine.
One subimage burst was composed of 100 batch of 2D 256×256 pixels subimages. The implementation of parallel method for one subimage reconstruction covers the following steps, and the pseudo code for subimage reconstruction using speckle masking based on OpenMP is shown in Figure 2. Firstly, the single thread continues to preprocess the subimage burst, executes the batched 2D FFT operation, and then acquires 100 batch of 2D spectrum. Secondly, the outer FOR loop controls the statistical sample size within one subimage reconstruction. For the functions of batched 2D spectrum shift and batched 2D spectrum subimage extract, data dependencies do not exist, so this loop could be unrolled, and then these two functions are implemented by multi-threads respectively and simultaneously. Thirdly, differently from the direct use of average four-dimensional (4D) bispectrum computation in serial codes, 5D bispectrum is added, and multi-threads are employed to perform 5D bispectrum computation and average 5D bispectrum computation respectively and simultaneously. Fourthly, since the loop index is reduced to 10 from 100, the average 4D bispectrum could be quickly acquired from average 5D bispectrum, and the object’s phase is reconstructed from the average 4D bispectrum through single thread. Finally, we get the object’s amplitude using the method of Labeyrie (1970), and reconstruct a final subimage after inverse Fourier transformation of the phase and the amplitude through single thread.
Figure 2.The pseudo code for subimage reconstruction using speckle masking based on OpenMP.
3. RESULTS AND ANALYSIS
In order to verify the correctness of our codes, we used real solar observational data acquired from the NVST. The data was composed of 100 batch of 256×256 pixels subimages in the photosphere TiO channel. Figure 3a shows one raw subimage of one subimage burst before reconstruction. Figure 3b shows the parallel reconstruction subimage using speckle masking through OpenMP, and its field of view is slightly smaller than that of raw image. The spatial resolution of reconstructed subimage is enhanced significantly compared to that of raw subimage. The result on reconstruction accuracy using multi-threads parallel implementation is the same as in the single thread serial implementation. All experiments were conducted under Linux system on a shared memory machine equipped with an Intel Xeon E5-2620 CPU at 2.00 GHz (total 16 CPU cores), 32 GB random access memory (RAM). For the tests, we used a burst of 100 subimages, and this data set was reconstructed with 166464 average 4D bispectrum values.
Figure 3.(a) One subimage of one subimage burst of 100 × 2562 pixels before reconstruction. (b) The parallel reconstruction subimage using speckle masking based on OpenMP.
Table 1 shows the execution time comparison of one subimage reconstruction between various modules with single thread serial implementation and 10 threads parallel implementation. As shown in the table, the runtime of reconstructing one entire subimage in parallel reduces to around 2.7 seconds, outperforming the single thread serial implementation by a factor of about 2.5. The most time-consuming module – average bispectrum computation, including 5D bispectrum computation, average 5D bispectrum computation, and average 4D bispectrum computation, shows a great speed increase by a factor of about 4.6. The processing speed of the other two modules is about 8-10 times faster than that of single thread serial implementation. From the comparison results above, we can see that the processing speed of solar subimage reconstruction together with several reconstruction steps is promoted significantly.
Table 1Time performance measurements for one subimage burst of 100 × 2562 pixels between various modules with serial implementation and parallel implementation.
Figure 4 shows the execution time and relative speedup ratio of one solar subimage reconstruction for the number of threads ranging from 1 to 16. A clear advantage is seen for increasing number of threads. As shown in Figure 4, as the number of threads increases, the execution time reduces and the relative speedup ratio increases. However, there is a saturation at a minimum of around 2.7 seconds in Figure 4a (at a maximum of around 5 in Figure 4b) when the number of threads exceeds 10. The probable reason is that there are some serial programs in the parallel implementation, and some overhead associated with the startup, revocation, and synchronization of multi-threads. Therefore, for accelerating one subimage reconstruction in the OpenMP model, 10 threads on each processor may prove to be an optimal choice.
Figure 4.The execution time (Figure 4a) and relative speedup ratio (Figure 4b) of one solar subimage reconstruction versus the number of employed threads on each processor.
4. CONCLUSIONS
In this study, we design and implement a novel parallel method for speckle masking reconstruction of solar subimage on a shared memory machine using the OpenMP. Real tests are performed to verify the correctness of our codes. The details of several parallel reconstruction steps are presented. The multi-threads parallel implementation between various modules shows significant speed increase as compared to single thread serial implementation, and a speedup of about 2.5 is achieved in one subimage reconstruction. In addition, the timing result for reconstructing one subimage with 256×256 pixels in parallel has shown a clear advantage with greater number of threads. The parallel computing results are reliable, and can provide the foundation for reconstructing an entire solar image in real-time.
The current parallel method is used to accelerate one subimage reconstruction using the OpenMP. In the future, the parallel method would be ported to a high performance cluster with more computing machines deploying to accelerate all subimages in a hybrid model of MPI and OpenMP. In summary, the novel parallel method would be of great value, and contribute to the real-time reconstruction of an entire solar image.
참고문헌
- Cao, W. D., Nicolas, G., Roy, C., et al. 2010, Nasmyth Focus Instrumentation of the New Solar Telescope at Big Bear Solar Observatory, SPIE, 7735, 77355V-1
- Chapman, B., Jost, G., & van der Paas, R. 2007, Using OpenMP. Portable Shared Memory Parallel Programming (Cambridge: MIT Press)
- Denker, C., Yang, G., & Wang, H. 2001, Near Real-Time Image Reconstruction, Solar Physics, 202, 63 https://doi.org/10.1023/A:1011886923189
- Gonsalves, R. A. 1982, Phase Retrieval and Diversity in Adaptive Optics, Optical Engineering, 21, 829 https://doi.org/10.1117/12.7972989
- He, H., Wang, H. N., & Yan, Y. H. 2011, Nonlinear Force-Free Field Extrapolation of the Coronal Magnetic Field Using the Data Obtained by the Hinode Satellite, J. Geophys. Res., 116, A01101-1
- Jayanti, P., & Jayaram, C. 2012, FLAGCAL: a Flagging and Calibration Package for Radio Interferometric Data, Experimental Astronomy, 33, 157 https://doi.org/10.1007/s10686-011-9279-5
- Labeyrie, A. 1970, Attainment of Diffraction Limited Resolution in Large Telescopes by Fourier Analysing Speckle Patterns in Star Images, A&A, 6, 85
- Li, X. B., Wang, F., Xiang, Y. Y., et al. 2014, Parallel Image Reconstruction for New Vacuum Solar Telescope, JKAS, 47, 43
- Li, X. B., Liu, Z., Wang, F., et al. 2015, High-Performance Parallel Image Reconstruction for the New Vacuum Solar Telescope, PASJ, 67, 47 https://doi.org/10.1093/pasj/psv018
- Liu, Z., Xu, J., Gu, B. Z., et al. 2014, New Vacuum Solar Telescope and Observations with High Resolution, RAA, 14, 705
- Lohmann, A.W., Weigelt, G., &Wirnitzer, B. 1983, Speckle Masking in Astronomy - Triple Correlation Theory and Applications, Applied Optics, 22, 4028 https://doi.org/10.1364/AO.22.004028
- Paxman, R. G., Schulz, T. J., & Fienup, J. R. 1992, Joint Estimation of Object and Aberrations by Using Phase Diversity, JOSAA, 9, 1072 https://doi.org/10.1364/JOSAA.9.001072
- Pehlemann, E., & von der Lühe, O. 1989, Technical Aspects of the Speckle Masking Phase Reconstruction Algorithm, A&A, 216, 337
- Sudo, Y., Baba, N., Miura, N., et al. 2006, Application of Self-Deconvolution Method to Shift-and-Add Solar Imaging, Applied Optics, 45, 2707 https://doi.org/10.1364/AO.45.002707
- von der Lüehe, O. 1993, Speckle Imaging of Solar Small Scale Structure. I - Methods, A&A, 268, 374
- Wöger, F., & Ferayorni, A. 2012, Accelerated Speckle Imaging with the ATST Visible Broadband Imager, SPIE, 8451, 84511C-1
- Wöger, F., von der Lühe, O., & Reardon, K. 2008, Speckle Interferometry with Adaptive Optics Corrected Solar Data, A&A, 488, 375 https://doi.org/10.1051/0004-6361:200809894