Land Cover Classification Map of Northeast Asia Using GOCI Data

Son, Sanghun;Kim, Jinsoo;

doi:10.7780/kjrs.2019.35.1.6

Korean Journal of Remote Sensing (대한원격탐사학회지)

Volume 35 Issue 1
/
Pages.83-92
/
2019
/
1225-6161(pISSN)
/
2287-9307(eISSN)

Korean Society of Remote Sensing (대한원격탐사학회)

DOI QR Code

Land Cover Classification Map of Northeast Asia Using GOCI Data

Son, Sanghun (Division of Earth Environmental System Science, Pukyong National University) ;
Kim, Jinsoo (Department of Spatial Information Engineering, Pukyong National University)

Received : 2019.01.31
Accepted : 2019.02.12
Published : 2019.02.28

https://doi.org/10.7780/kjrs.2019.35.1.6 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

Land cover (LC) is an important factor in socioeconomic and environmental studies. According to various studies, a number of LC maps, including global land cover (GLC) datasets, are made using polar orbit satellite data. Due to the insufficiencies of reference datasets in Northeast Asia, several LC maps display discrepancies in that region. In this paper, we performed a feasibility assessment of LC mapping using Geostationary Ocean Color Imager (GOCI) data over Northeast Asia. To produce the LC map, the GOCI normalized difference vegetation index (NDVI) was used as an input dataset and a level-2 LC map of South Korea was used as a reference dataset to evaluate the LC map. In this paper, 7 LC types(urban, croplands, forest, grasslands, wetlands, barren, and water) were defined to reflect Northeast Asian LC. The LC map was produced via principal component analysis (PCA) with K-means clustering, and a sensitivity analysis was performed. The overall accuracy was calculated to be 77.94%. Furthermore, to assess the accuracy of the LC map not only in South Korea but also in Northeast Asia, 6 GLC datasets (IGBP, UMD, GLC2000, GlobCover2009, MCD12Q1, GlobeLand30) were used as comparison datasets. The accuracy scores for the 6 GLC datasets were calculated to be 59.41%, 56.82%, 60.97%, 51.71%, 70.24%, and 72.80%, respectively. Therefore, the first attempt to produce the LC map using geostationary satellite data is considered to be acceptable.

Keywords

1. Introduction

Information regarding land cover (LC) changes over time is essential for studying the functional and morpho-functional changes occurring in the global ecological, meteorological, and hydrological environments(Chen et al., 2015; Feddema et al., 2005; Son and Kim, 2018).

Remote sensing has long been recognized as an effective tool for broad-scale LC mapping and as an effective tool for generating LC maps needed to understand human activity and the biogeographical diversity of the land surface (Chen et al., 2015; Zhang and Roy, 2017). As a result, a number of LC products, such as global-scale maps based on remote sensing data, have been developed with broad-scale resolution through the efforts of many scientific communities (Arino et al., 2008; Bartholomé and Belward, 2005; Bontemps et al., 2011; Friedl et al., 2002; Friedl et al., 2010; Hansen et al., 2000; Loveland and Belward, 1997; Loveland et al., 2000).

To date, several global land cover (GLC) datasets have been produced and widely applied in various fields. However, these datasets have different inputs, purposes of classification, classification methods, and classification systems (McCallum et al., 2006; Herold et al., 2008). Reported accuracies of GLC datasets range from 66% to more than 80% (Son and Kim, 2018). However, these GLC datasets have some drawbacks in Northeast Asia, especially in South Korea, due to insufficient validation data and misclassification. According to Park and Suh (2014), the Moderate Resolution Imaging Spectroradiometer (MODIS) LC dataset (MOD12Q1 and MCD12Q1), the most widely used GLC dataset in the world, have many misclassifications in Northeast Asia. Relevant input datasets, classification methods and classification systems are required to produce the LC map in NortheastAsia that appropriately reflectsthe Northeast Asia LC types.

All GLC datasets are produced using polar orbit satellite data. However, the disadvantage of polar orbit satellite imagery is that it is difficult to obtain data on the same region every day. On the other hand, geostationary satellites can obtain datasets over the same region every day. The purpose of this study is to assess the feasibility of LC mapping using Geostationary Ocean Color Imager (GOCI) data over NortheastAsia.The primary steps and contributions of thisstudy are summarized asfollows: (1) the principal component analysis (PCA) was based on using the GOCI normalized difference vegetation index (NDVI) to select principal components(PCs);(2)to produce an unlabeled map, the K-means clustering was conducted using PCs as input data; (3) through the sensitivity analysis, unlabeled classes were aggregated by LCtype and the LC map was produced; and (4) to analyze the feasibility of LC mapping, accuracy assessments were conducted using the reference dataset.

2. Data and Methodology

1) Study area and data

Fig. 1 shows the research area of this study that is composed of the Korean peninsula, Japanese Islands, and eastern part of China in Northeast Asia (latitude: 24.75–47.25°N, longitude: 113.4–146.6°E). This area is the same as the target area of the GOCI.

OGCSBN_2019_v35n1_83_f0001.png 이미지

Fig. 1. The research area of this study.

Datasets for the feasibility assessment of LC mapping using the GOCI data were divided in two: the first dataset consisted of the GOCI data as an input dataset to produce the LC map; the second dataset was the reference dataset to evaluate the LC map. The first dataset was composed of the GOCI NDVIs, which was calculated based on pre-processed bidirectional reflectance distribution function (BRDF) modeling for 16 days worth of data (Fig. 2).In addition, to minimize null values caused by clouds and snow, the second composite was based on the maximum value composite (MVC) over 7 days. To further minimize the effects of snow, the study period was selected to run from May 15, 2013 to October 15, 2013. The western and southern parts ofthe study area, which have null values throughout the year, were masked for accurate LC mapping.The second dataset consisted of a level-2 LC map to assess the accuracy in South Korea (Fig. 3). In addition, to assess the accuracy of the LC map for LC types in Northeast Asia, we selected 6 GLC datasets (IGBP,UMD,GLC2000,GlobCover2009,MCD12Q1, and GlobeLand30)for NortheastAsia (Fig. 4).In order to produce accurateLCmap,LCtypesthat appropriately reflect the LC of NortheastAsia must be defined. GLC datasetsthat used widely in the world are IGBP, UMD, GLC2000, GlobCover2009, MCD12Q1.These Global land cover datasets have different classification system and 14 to 22 land cover types. In addition, the major classification system in the United States Geological Survey (USGS) defined 9 LC types(urban, croplands, grasslands, forest, water, wetlands, barren, tundra and permafrost).To name NortheastAsian land covertypes, the number of land cover types in 5 GLC’s land cover types is not appropriate. Furthermore, among USGS`s land covertypes, tundra and permafrost are notsuitable land cover types in Northeast Asia. In this study, LC types were defined for 7 classes (urban, croplands, forest, grasslands, wetlands, barren, and water).

OGCSBN_2019_v35n1_83_f0002.png 이미지

Fig. 2. GOCI NDVI used as an input dataset in this study, (a) Jan 16, (b) Feb 16, (c) Mar 16, (d) Apr 16, (e) May 16, (f) Jun 16, (g) Jul 16, (h) Aug 16, (i) Sep 16, (j) Oct 16, (k) Nov 16, (l) Dec 16.

OGCSBN_2019_v35n1_83_f0003.png 이미지

Fig. 3. Level-2 LC map in South Korea with 7 classes used as reference datasets.

OGCSBN_2019_v35n1_83_f0004.png 이미지

Fig. 4. 6GLC dataset aggregated7 classes usedas comparisondatasets.(a)IGBP,(b) UMD,(c)GLC2000,(d)GlobCover2009, (e) MCD12Q1, and (f) GlobeLand30.

2) Methodology

Fig. 5 shows a flow chart to assess the feasibility of LC mapping using the GOCI data over Northeast Asia. Input datasets for this study were composed of the GOCI NDVIs calculated via BRDF modeling. BRDF modeling was calculated by proceeding to Eq. (1) (Roujean et al., 1992).

\(\begin{aligned} R\left(\theta_{s}, \theta_{v}, \varnothing\right)=& K_{0}+K_{1} \cdot f_{1}\left(\theta_{s}, \theta_{v}, \varnothing\right)+\\ & K_{2} \cdot f_{2}\left(\theta_{s}, \theta_{v}, \varnothing\right) \end{aligned}\) (1)

OGCSBN_2019_v35n1_83_f0005.png 이미지

Fig. 5. The flow chart in this study.

f₁ and f₂ denote the geometric kernel (Eq. (2)) and the volumetric kernel (Eq. (3)), respectively, and represent geometric scattering and volumetric scattering on the surface.

\(\begin{array}{c} f_{1}\left(\theta_{s}, \theta_{v}, \sigma\right)=\frac{1}{2 \pi}[(\pi-\omega) \cos \phi+\sin \theta] \tan \theta_{s} \tan \theta_{v}-\frac{1}{\pi} \\ \left(\tan \theta_{v}+\tan \theta_{s}+\sqrt{\tan ^{2} \theta_{v}}+\tan ^{2} \theta_{s}-2 \tan \theta_{s} \tan \theta_{v} \cos \theta\right) \end{array}\) (2)

\(\begin{array}{c} f_{2}\left(\theta_{s}, \theta_{v}, \theta\right)=\frac{4}{3 \pi} \cdot \frac{1}{\cos \theta_{s}+\cos \theta_{v}} \\ {\left[\left(\frac{\pi}{2}-\zeta\right) \cos \zeta+\sin \zeta\right]-\frac{1}{3}} \end{array}\) (3)

\(\zeta=\arccos \left[\cos \theta_{v} \cos \theta_{s}+\sin \theta_{v} \sin \theta_{s} \cos \theta\right]\) (4)

According to Knight et al. (2006), differences in water levels over time lead to incorrect reflectance of infrared and red wavelengths at water–land boundaries and, depending on the turbidity and depth of the water, the reflectance of infrared can have different values. In addition, it is difficult to spectrally differentiate between urban areas, suburban areas, grass cover, barren soil, and fallow areas (Lee and Lathrop, 2006). Therefore, to produce accurate the LC map, urban areas and water were masked in this study using GlobeLand30 and level-2 LC maps of South Korea. To select valuable data, input datasets were produced using PCA. In this study, we defined the PCs with an accumulated percentage of 99% or less. These PCs were used as input data for the K-means clustering algorithm. The result of the K-means clustering algorithm is a map of unlabeled classes, which will become the LC map through the sensitivity analysis. The last step is an accuracy assessment ofthe LC map using the reference dataset.

PCA is a multivariate technique used to reduce large datasets.The goal of PCAisto extract the valuable data, called PCs, from the dataset. PCA is commonly used as a data reduction technique in order to determine a new dataset of orthogonal variables having minimum dimensions ordered by variance (Han et al., 2004;Adbi and Williams, 2010). The results of the PCA consist of eigenvalues, eigen-percentages, and PCs, and the parameter of the PCA is the accumulated eigenpercentage. In this study, to select the most valuable data, the value of the accumulated eigen-percentage was less than 99% of an input dataset.

The K-means clustering is one of the most popular clustering techniques of unsupervised classification (Kanungo et al., 2000; Han et al., 2004).The K-means clustering is a data relocation technique that minimizes the distance between the centroid and dataset centered on initial centroids and determines n-datasets as k-clusters (Kim, 2002). The parameters, including number of classes, iteration, and threshold, must be specified in order to process the K-means clustering algorithm.The parameters ofthisstudywere determined empirically; 40 number of classes, 100 iterations, 0.95 threshold, 100 batch size, and 0 seeds.

3. Results and discussion

The result of the K-means clustering algorithm is a map of unlabeled classes. In this study, the initial number of classes was 40 and Fig. 6 shows a map of unlabeled classes in this study. The high NDVI values were clustered into the blue area of the unlabeled map, and the low NDVI values were clustered into the red area of the unlabeled map. In addition, the black part of the unlabeled map represents the masking area due to null values or urban and water masks.

OGCSBN_2019_v35n1_83_f0006.png 이미지

Fig. 6. The map of unlabeled classes in this study.

Through the sensitivity analysis, these classes aggregate into LC types. In order to aggregate unlabeled classes, the NDVI time series of reference datasets were used as a comparison dataset(Fig. 7).The results of the NDVI time series of reference datasets showed that the NDVI trend of forest areas was the highest, those of croplands and grasslands were similar, and those of urban, barren and water areas were lower than those of croplands and grasslands. The NDVI trend of wetland areas waslowest among all LC types.

OGCSBN_2019_v35n1_83_f0007.png 이미지

Fig. 7. The NDVI trends of the reference dataset used as a comparison data for the sensitivity analysis.

Following the sensitivity analysis of the NDVI time series of the unlabeled map using the NDVI trends of reference dataset, the unlabeled classes aggregated to produce the LC map (Fig. 8). Table 1 presents a confusion matrix of the LC map and reference dataset. In the case of cropland and forest areas, the LC types were well classified. However, grassland areas were overestimated. In addition, wetlands and barren areas were underestimated. Since urban and water areas were masked, the overall accuracy was calculated based on 5 LCtypes(croplands,forest, grasslands, wetlands, and barren areas). Since more than 90% of South Korean LC types are classified as croplands and forest, the overall accuracy was calculated as 77.94%.

OGCSBN_2019_v35n1_83_f0008.png 이미지

Fig. 8. The LC map with 7 classes in Northeast Asia.

Table 1. Confusion matrix of the LC map and reference dataset

OGCSBN_2019_v35n1_83_t0001.png 이미지

To assess the feasibility of the LC map for LC types in Northeast Asia, an accuracy assessment was performedusing6GLCdatasets:IGBP,UMD,GLC2000, GlobCover2009, MCD12Q1, and GlobeLand30.Asin the previous accuracy assessment, confusion matrices were used to evaluate the LC map in Northeast Asia. Table 2 showsthe confusionmatrices of 6 GLCdatasets compared to the LC map. The overall accuracies compared to IGBP, UMD, GLC2000, GlobCover2009, MCD12Q1, and GlobeLand30 were calculated to be 59.41%, 56.82%, 60.97%, 51.71%, 70.24%, and 72.80%, respectively. The overall accuracies of MCD12Q1, the most widely used dataset in the world, and GlobeLand30, the best spatial resolution and the most recent land cover map, were calculated more than 75%.

Table 2. The overall accuracies of the 6 GLC maps

OGCSBN_2019_v35n1_83_t0002.png 이미지

4. Conclusions

LC is one of the major factors used to study global biogeochemical, meteorological, and hydrological characteristics. The goal of this paper was to produce an LC map using the GOCI data and to assess the feasibility of LC mapping using geostationary satellite data. First, to produce the LC map, the GOCI NDVIs was made through BRDF modeling and a level-2 LC map in South Korea was used as a reference dataset to assess the LC map. The LC map was produced as follows:(1) PCAwas based on using the GOCI NDVIs to select PCs; (2) to produce an unlabeled map, the Kmeans clustering was conducted using PCs as input data; (3) through the sensitivity analysis, unlabeled classes were aggregated by LC type and the LC map was produced; and (4) to analyze the feasibility of LC mapping, accuracy assessments were conducted using the reference dataset. The overall accuracy compared with the reference dataset was calculated to be 77.94%. In addition, the overall accuracies compared to IGBP, UMD, GLC2000, GlobCover2009, MCD12Q1, and GlobeLand30 were calculated to be 36.01%, 73.59%, 67.38%, 57.99%, 75.51%, and 77.59%, respectively. In conclusion, LC mapping using the geostationary satellite data over Northeast Asia is considered to be a feasible mapping method.

Acknowledgements

This research was a part of the project titled “Development of LC products for GOCI-II(C-D2018-0217)” funded by the Ministry of Oceans and Fisheries, Korea and this work was supported by the BK21 plus Project of the Graduate School of Earth Environmental Hazard System.

References

Abdi, H. and L.J. Williams, 2010. Principal component analysis, WIREs Computational Statistics, 2(4): 433-459. https://doi.org/10.1002/wics.101
Arino, O., P. Bicheron, F. Achard, F. Latham, R. Witt, and J.L. Weber, 2008. GLOBCOVER-the most detailed portrait of Earth, European Space Agency Bulletin, 136: 25-31.
Bartholome, E. and A. S. Belward, 2005. GLC2000: a new approach to global land cover mapping from Earth observation data, International Journal of Remote Sensing, 26(9): 1959-1977. https://doi.org/10.1080/01431160412331291297
Bontemps, S., P. Defourny, E.V. Bogaert, O. Arino, V. Kalolgirou, and J.R. Perez, 2011. GLOBCOVER 2009 Products Description and Validation Report, European Space Agency, Paris, France.
Chen, J., J. Chen, A. Liao, X. Cao, L. Chen, X. Chen, C. He, G. Han, S. Peng, M. Lu, W. Zhang, X. Tong, and J. Mills, 2015. Global land cover mapping at 30 m resolution: A POK-based operational approach, ISPRS Journal of Photogrammetry and Remote Sensing, 103: 7-27. https://doi.org/10.1016/j.isprsjprs.2014.09.002
Feddema, J.J., K.W. Oleson, G.B. Bonan, L.O. Mearns, L.E. Buja, G.A. Meehl, and W.M. Washington, 2005. The importance of land-cover change in simulating future climates, Science, 310 (575409): 1674-1678. https://doi.org/10.1126/science.1118160
Friedl, M.A., D. Sulla-Menashe, B. Tan, A. Schneider, N. Ramankutty, A. Sibley, and X. Huang, 2010. MODIS collection 5 global land cover: algorithm refinements and characterization of new datasets, Remote Sensing of Environment, 114(1): 168-182. https://doi.org/10.1016/j.rse.2009.08.016
Friedl, M.A., D.K. McIver, J.C. Hodges, X. Zhang, D. Muchoney, A.H. Strahler, C.E. Woodcock, S. Gopal, A. Schneider, and A. Cooper, 2002. Global land cover mapping from MODIS: algorithms and early results, Remote Sensing of Environment, 83(1&2): 287-302. https://doi.org/10.1016/S0034-4257(02)00078-0
Han, K.S., J.L. Champeaux, and J.L. Roujean, 2004. A land cover classification product over France at 1 km resolution using SPOT4/VEGETATION data, Remote Sensing of Environment, 92(1): 52-66. https://doi.org/10.1016/j.rse.2004.05.005
Hansen, M.C., R.S. Defries, J.R.G. Townshend, and R. Sohlberg, 2000. Global land cover classification at 1 km spatial resolution using a classification tree approach, International Journal of Remote Sensing, 21(6&7): 1331-1364. https://doi.org/10.1080/014311600210209
Herold, M., P. Mayaux, C.E. Woodcock, A. Baccini, and C. Schmullius, 2008. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets, Remote Sensing of Environment, 112(5): 2538-2556. https://doi.org/10.1016/j.rse.2007.11.013
Kanungo, T., D.M. Mount, N.S. Netanyahu, C. Piatko, R. Silverman, and A.Y. Wu, 2000. The analysis of a simple K-means clustering algorithm, Proc. of the sixteenth annual symposium on Computational geometry, Kowloon, Hong Kong, Jun. 12-14, pp. 100-109.
Kim, N.Y., H.J. Oh, D.U. An, and S.C. Park, 2002. Document clustering analysis based on similarity calculation between cluster centroids, The Institute of Electronics and Information Engineers, 25(2): 119-122 (in Korean with English abstract).
Knight, J.F., R.S. Lunetta, J. Ediriwickrema, and S. Khorram, 2006. Regional scale land cover characterization using MODIS-NDVI 250 m multi-remporal imagery: A phenology-based approach, GIScience and Remote Sensing, 43(1): 1-23. https://doi.org/10.2747/1548-1603.43.1.1
Lee, S. and R.G. Lathrop, 2006. Subpixel analysis of Landsat ETM + using Self-Organizing Map (SOM) neural networks for urban land cover characterization, IEEE Transactions on Geoscience and Remote Sensing, 44(6): 1642-1654. https://doi.org/10.1109/TGRS.2006.869984
Loveland, T.R. and A.S. Belward, 1997. The IGBP-DIS global 1 km land cover data set, DISCover: first results, International Journal of Remote Sensing, 18(15): 3289-3295. https://doi.org/10.1080/014311697217099
Loveland, T.R., B.C. Reed, J.F. Brown, D.O. Ohlen, L. Yang, and W. Merchant, 2000. Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data, International Journal of Remote Sensing, 21(6&7): 1303-1330. https://doi.org/10.1080/014311600210191
McCallum, I., M. Obersteiner, S. Nilsson, and A. Shvidenko, 2006. A spatial comparison of four satellite derived 1 km global land cover dataets, International Journal of Applied Earth Observation and Geoinformation, 8(4): 246-255. https://doi.org/10.1016/j.jag.2005.12.002
Park, J.Y. and M.Y. Suh, 2014. Characteristics of MODIS land-cover data sets over Northeast Asia for the recent 12 years (2001-2012), Korean Journal of Remote Sensing, 30(4): 511-524 (in Korean with English abstract). https://doi.org/10.7780/kjrs.2014.30.4.9
Roujean, J.L., M. Leroy, and P.Y. Deschamps, 1992. A Bidirectional Reflectance Model of the Earth's Surface for the Correction of Remote Sensing Data, Journal of Geophysical Research, 97(D18): 20455-20468. https://doi.org/10.1029/92JD01411
Son, S.H. and J.S. Kim, 2018. Accuracy assessment of global land cover datasets in South Korea, Korean Journal of Remote Sensing, 34(4): 601-610. https://doi.org/10.7780/kjrs.2018.34.4.3
Zhang, H.K. and D.P. Roy, 2017. Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover classification, Remote Sensing of Environment, 197: 15-34. https://doi.org/10.1016/j.rse.2017.05.024