Image classification (ArcGIS pro)
- GIS Shortcut
- Feb 15, 2021
- 5 min read
Updated: Feb 18, 2021

In this post, we cover Image classification and error assessment.
Hang on before we get even more remotely lost, lets quickly go through the previous post- for those who forgot, we covered false images. At the end of this blog, I will flippantly include a stunning image of Tongariro National Park. Today, I realised, why should we hide that image away? We should do something fun, let's extract this information into categories from our raster map.
You may be wondering, what is this phenomenon? Well, it is known as 'image classification'. It’s assigning these aerial features (e.g. grass, buildings etc..) to each of these landcover categories- known as a schema. These schemas are a series of a small, coloured square (pixel values) it is useful for classification. classification is a way of adding names or values to the aerial images which add more detail to a map.

Over the past, this was a very slow and challenging process. Fortunately, ArcGIS pro can mitigate this by the magical force of ‘Image Classification Wizard ‘setting; located under the imagery tab.

Under configure select >
- classification type: pixel-based as we want to select each pixel within our area of interest.
- classification schema: National Land Cover Data (NLCD 2011) which is up to date and can support raster data.
Now, we enter the Training samples manager page, it basically edits, adds, and removes the landcover schema (classes) from our previous National Land Cover Data (NLCD 2011).
Excellent, we can now create a training sample by freehand, polygons or circle near the save icon, to capture the pixels. Creating an image classification algorithm to locate the class of interest on our map. Well, what about adding or removing schema, we simply select the add button at the top of the page and right click on the schema class to delete. These are stored in a feature class (in a form of a polygon shapefile). It basically generates a nice picture of our image.
Are you lost? Well, my unsupervised image classification settings is: 1) configuration page> classification method: unsupervised >classification type : pixel based >classification schema : the training samples, 2) train page > classifier :ISO cluster > maximum number of classes : 9 >maximum number of literation: 20 >maximum number of cluster merges per year: 5, maximum merge distance is 0.5 > maximum samples per cluster: 20 > skip factor :10.
Tip: The help icon allows us to find out what these setting mean
Well, you now may be wondering, how are we going to select our schema? Does my previous blog ring a bell? Well, by looking at the: (a) spectral or composite bands and (b) updated landcover database (LCDB4). To produce the latest batch of updated landcover imagery. Helping me to select a schema – urban, ice, water, deciduous and evergreen forest.



Are you still having trouble creating a schema? Is there another way of classifying the image non-manually or without training samples? Well, yes! The unsupervised method can be used via computer algorithms. However, there is a downfall, well, the computer software can pick up spectral classes via numbers so your favourite urban area could be repressed as 1. So it is important that you identify and edit the clusters.
Tip: Again, make sure you use the help files to have a bit of a feel on what setting preferences you would like to use.

Yay, we have completed it! Now, some may realise that applying a classification algorithm to a remotely sensed map can create errors.
Well, don’t panic, we will first create random points with a minimum of 30 through the ‘accuracy assessment point’ tool. We will set the sampling strategy as random under the input supervised image classification setting. From our record findings our two fields as well as Classified and GmdTruth, are all added to our accuracy assessment point. The classified field is a non-decimal value allocated to each land cover class (e.g. Urban-5, ice-7, water- 10 etc..). The GmdTruth is the accurate land cover class recognized from our reference picture. Once completed, you will notice that both Classified and GmdTruth field values are the same, indicating an accurate classification.
You now may be wondering why manually type the whole thing? Is there another way? Well, no, I typed the new value of 180 points within 30 minutes – don’t be discouraged. Let’s see if you can break my record.

Yay, I after minutes of labour we can now start to trust the system again. Take a coffee or some break, cause here comes the easy part. Use the domain and field function, to convert GmdTruth and Classified values to the supervise classified names– urban, forest etc…
Note: Domains and field function found by right-clicking on the accuracy assessment points.

Great, now we can use the ‘confusion matrix’ geoprocessing tool. It is a process of finding the accuracy of our classified image. It is a table which displays connections between the reference image and classification result. True to its name, the confusion matrix can be a bit confusing, however, the matrix itself is easy to comprehend (different to matrix film).


What next? We Should analyse it – easier than those hard-nitty gritty statistics formulas? From the table the overall accuracy indicates the correct proportion mapped out of the reference sites. It can be calculated by:

Errors of omission (type I error) of our reference’s areas indicates that the classified image from our accurate class is unavailable. They are measured by reviewing the reference areas which have inaccurate classifications.

Accuracy Metrics include:
The Producer's accuracy which allows us to see the accuracy of the image point by the image operator – which is complementary to the omission error (100%-omission error= producer's and users accuracy). These typically show us the probability at which the specific land cover area is classified. This is calculated by dividing the accurate reference area values by the sum reference area per class.

Similarly, the user's accuracy is the ability to view the image accuracy point without an image operator. Telling us when an image class is truly available on the ground. This is calculated by dividing the sum of our row by the sum of the correct classifications number per class

Once generated, let’s begin to analyse this table. Cause trust me it gets even simpler.
An interesting thing I found was that the producer and user accuracy for all the class was different – except for the magnificent 100% ice class. From the examples above, the user's accuracy above 50% was ice (the highest), bare earth and evergreen forest (the lowest), while less than 50% were deciduous forest and lowest 8% urban area – bare water (2), deciduous (19) and evergreen forest (1) wrongfully classified as urban. Likewise, the producer's accuracy 50% or above was ice and water (the highest with 100%), evergreen forest and urban (the lowest with 50%), while less than 50% were bare earth and the lowest 16% is the deciduous forest – evergreen forest (2), urban (19) wrongfully classified as deciduous forest. By interpreting the many error metrics and accuracy, we can effectively analyse the classification of our results. See its straightforward! Now, try to assess it!
For those that are extra keen on the analysis’s part, the Kappa coefficient evaluates the classification accuracy of a statistical test ranging from -1 to 1. By evaluating the classification performance in comparison to values randomly assigned. Still, confused? Let’s use my image as an example, all my classes kappa values were 0 with an overall of 0.2. A value of 0 means classification is no better than a random classification, negative shows classification being significantly worse than random and close to 1 indicating a classification being significantly better than random.
Finally, we are finished! Now, I shall wander off to another remotely sensed problem that needs fixing – see you in the meantime, with the latest remote remotely sensed gossip.
Comments