Tag Archives: wasserstein metric opencv python

Earth Mover’s Distance (EMD)

In the previous blogs, we discussed various histogram comparison methods for image retrieval. Most of the methods we discussed were highly sensitive to blurring, local deformation or color shifts. etc. In this blog, we will discuss a more robust method for comparing the distributions known as Wasserstein metric or Earthmover’s distance. So, let’s get started.

In a formal sense, the notion of distance is more suitable to single elements as compared to distributions. For instance, if I say what is the distance from your house to the neighbor’s house? Most of you will come up with a number say x meters but where is this distance came from. Is it the distance between the centers of both the houses or the nearest distance between the two houses or any other form. Thus, the definition of distance becomes less apparent when we are dealing with distributions or sets of elements rather than single elements. So, in this blog, we will discuss the Earthmover’s distance also known as Wasserstein metric which is more suitable for finding distance or similarity between the distributions. This concept was first introduced by Gaspard Monge in 1781, in the context of transportation theory (Wikipedia). Let’s discuss the main concept behind this.

Let’s say we have two distributions A and B whose distance we want to calculate. This assumes one distribution to be a mass of earth or a pile of dirt and the other to be a collection of holes in that same space. The least amount of work done to fill the holes completely gives us the EMD. Filling the holes result in converting one distribution to another. The lesser the distance, the more similar will be the distributions and vice-versa.

Mathematically, we construct a matrix, say M, whose each element denotes the amount of weight transferred or matched between the distributions. For instance, Mij denotes the weight transferred from the ith position in the first distribution to the jth position in the second distribution. The work done will be the weight transferred multiplied by the distance i.e. Mij*dij. Thus the EMD is given by

There is an important terminology used in EMD known as a signature which is nothing but a way of representing any distribution. For instance, in this, we divide any distribution into clusters which are represented by the mean (or any other statistics) and the fraction of the distribution in that cluster. This representation by a set of clusters is called the signature.

Now, let’s see how to implement this. OpenCV provides a builtin function for calculating EMD as shown below.

Here, the signature is a matrix of size equal to the (total number of pixels x number of dimensions + 1). For greyscale, we have 2 dimensions, width, and height while for color image 3. Thus for greyscale, each row of signature represents (pixel value and the coordinates). To calculate the distance, you can use any metric such as L1 and L2 as cv2.DIST_L1 etc. You can pass the cost matrix using the cost argument. lower-bound is the distance between the center of mass of two signatures. This outputs the flow matrix (M discussed above), lower bound and the work.

Now, let’s take an example to understand this. First, you need to change the images to their corresponding signatures as shown below.

and then calculate the EMD as shown below.

Below is the output we got

Hope you enjoy reading.

If you have any doubt/suggestion please feel free to ask and I will do my best to help or improve myself. Good-bye until next time.