maskcompare module

This module contains functions to compare different saliency maps using various metrics.

Metrics implemented:
  • ShapGap Cosine

  • ShapGap L2

  • Earth Mover’s Distance (EMD)

  • Mean Absolute Error (MAE)

  • Sign Agreement Ratio (SAR)

  • Sign Distance

  • Intersection over Union (IoU)

  • Correlation Distance

  • Mean Squared Error (MSE)

  • Peak Signal-to-Noise Ratio (PSNR)

  • Czekanowski Distance

  • Jaccard Index

  • Jaccard Distance

  • Structural Similarity Index Measure (SSIM)

  • KL Divergence (symmetric / Jeffrey divergence)

  • AUC-Judd (symmetric pairwise variant)

saliencytools.maskcompare.auc_judd(prediction, reference)[source]

Compute the AUC-Judd distance between a predicted saliency map and a reference map.

AUC-Judd (Judd et al., 2009) evaluates how well a predicted saliency map recovers the salient regions of a reference map. The reference is binarised at its mean to produce a fixation mask; the prediction is then treated as a continuous classifier of those fixated pixels, and the Area Under the ROC Curve (AUC) is reported.

Convention: auc_judd(prediction, reference) — the second argument always provides the fixation mask. This matches the intended use case: auc_judd(lime_map, shap_map) measures how well the LIME explanation recovers the regions SHAP considers important, not the reverse.

Asymmetry: auc_judd(a, b) != auc_judd(b, a) in general. The metric therefore does not satisfy the symmetry axiom of a metric space. It is listed alongside SSIM and PSNR as a documented exception in the formal validation suite.

In the proxy benchmark, metric_fn(test_image, prototype) places the test image in the prediction role and the prototype in the reference role, which is the natural direction: the prototype is the trusted reference, the test image is the explanation being evaluated.

Reference:

Judd et al. (2009). “Learning to predict where humans look.” ICCV. Bylinskii et al. (2017). “What do different evaluation metrics tell us about saliency models?” IEEE TPAMI, arXiv:1604.03605.

Parameters:
  • prediction (numpy.ndarray) – Predicted saliency map (the map being evaluated, e.g. from LIME or Integrated Gradients).

  • reference (numpy.ndarray) – Reference saliency map whose above-mean pixels define the fixation mask (e.g. a SHAP explanation or a prototype).

Returns:

AUC-Judd distance in [0, 1]. 0 means perfect recovery of the reference’s salient regions; 0.5 is chance level.

Return type:

float

saliencytools.maskcompare.clip_mask(mask)[source]

Clip the mask to the range [-1, 1].

This function ensures that the values in the input saliency map do not exceed the range [-1, 1]. This is useful for preventing outliers or extreme values from affecting downstream computations.

Parameters:

mask (numpy.ndarray) – Input saliency map. This is a 2D or 3D array representing the saliency values of an image.

Returns:

Clipped saliency map with values constrained to [-1, 1].

Return type:

numpy.ndarray

saliencytools.maskcompare.correlation_distance(a, b)[source]

Compute the Correlation Distance between two images.

The Correlation Distance measures the linear relationship between corresponding pixel values in two images. It captures how well the variations in one image are correlated with the other.

Reference:

Commonly used in statistics and signal processing.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Correlation Distance, representing the inverse of correlation.

Return type:

float

saliencytools.maskcompare.cosine_distance(a, b)[source]

Compute the cosine distance between two vectors.

The cosine distance measures the angular difference between two vectors in a high-dimensional space. It is useful for comparing the orientation of two saliency maps rather than their magnitude.

Reference:

Commonly used in vector similarity and machine learning literature.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Cosine distance, representing the angular difference.

Return type:

float

saliencytools.maskcompare.czenakowski_distance(a, b)[source]

Compute the Czekanowski Distance between two images.

The Czekanowski Distance measures the dissimilarity between two images based on the ratio of their minimum and total pixel values. It is useful for comparing distributions with overlapping regions.

Reference:
  1. SORENSEN (1948) “A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons.” Biologiske Skrifter.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Czekanowski Distance, representing the dissimilarity.

Return type:

float

saliencytools.maskcompare.emd(a, b, bins=256)[source]

Compute the Earth Mover’s Distance (EMD) between two images.

The EMD measures the minimum cost of transforming one distribution into another. It is particularly useful for comparing saliency maps with spatial distributions of importance.

Reference:
  • Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The Earth Mover’s Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 40(2), 99-121. https://doi.org/10.1023/A:1026543900054

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Earth Mover’s Distance, representing the cost of transformation.

Return type:

float

saliencytools.maskcompare.euclidean_distance(a, b)[source]

Compute the Euclidean distance between two images.

The Euclidean distance measures the straight-line distance between corresponding pixels in two images. It captures the overall magnitude of differences between the two images.

Reference:

Commonly used in image processing and computer vision literature, also known as Frobenius norm.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Euclidean distance, representing the magnitude of differences.

Return type:

float

saliencytools.maskcompare.information_gain(a, b, baseline=None)[source]

Compute the Information Gain (IG) between a saliency map and ground truth.

Reference:

Kümmerer, M., Wallis, T. S., & Bethge, M. (2015). Information-theoretic framework to overcome the ambiguity of saliency metrics. arXiv preprint arXiv:1509.01556.

Parameters:
  • a (numpy.ndarray) – Prediction saliency map.

  • b (numpy.ndarray) – Ground truth saliency map.

  • baseline (numpy.ndarray, optional) – Baseline saliency map. Defaults to uniform.

Returns:

Information Gain value.

Return type:

float

saliencytools.maskcompare.jaccard_distance(a, b)[source]

Compute the Jaccard Distance between two images.

The Jaccard Distance is the complement of the Jaccard Index and measures the dissimilarity between two images. It is useful for evaluating the differences between binary or thresholded saliency maps.

Reference:

Commonly used in set theory and image segmentation literature.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Jaccard Distance, representing the dissimilarity ratio.

Return type:

float

saliencytools.maskcompare.jaccard_index(a, b)[source]

Compute the Jaccard Index between two images.

The Jaccard Index measures the similarity between two images by comparing the intersection and union of their pixel values. It is commonly used for evaluating binary or thresholded saliency maps.

Reference:

Commonly used in set theory and image segmentation literature.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Jaccard Index, representing the similarity ratio.

Return type:

float

saliencytools.maskcompare.kl_divergence(a, b)[source]

Compute the symmetric KL divergence (Jeffrey divergence) between two images.

The standard KL(P || Q^D) from Bylinskii et al. (2017) is asymmetric: it measures how well a saliency prediction P approximates a ground-truth fixation map Q^D. Since both maps here are peers (neither is ground truth), we use the symmetric variant KL(a||b) + KL(b||a), which penalises false positives and false negatives equally.

Both maps are shifted to be non-negative and normalised to sum to 1 (probability distributions) before computation. A small epsilon (1e-10) is added for numerical stability, following the regularisation strategy of the MIT Saliency Benchmark.

Reference:

Bylinskii et al. (2017). “What do different evaluation metrics tell us about saliency models?” IEEE TPAMI, arXiv:1604.03605.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Symmetric KL divergence (>= 0; 0 iff a == b after normalisation).

Return type:

float

saliencytools.maskcompare.linear_correlation_coefficient(a, b)[source]

Compute the Linear Correlation Coefficient (CC).

Parameters:
  • a (numpy.ndarray) – First saliency map.

  • b (numpy.ndarray) – Second saliency map.

Returns:

CC value.

Return type:

float

saliencytools.maskcompare.make_histogram(mask: ndarray, bins: int = 256) ndarray[source]

Convert continuous values to discrete distribution.

This function takes a saliency map and converts it into a histogram representation. The histogram is normalized to ensure that the sum of all bins equals 1, making it suitable for comparing distributions.

Parameters:
  • mask (numpy.ndarray) – Input saliency map. This is a 2D or 3D array representing the saliency values of an image.

  • bins (int) – Number of bins for the histogram. Default is 256.

Returns:

Normalized histogram of the saliency map. The sum of all

bins equals 1, representing the distribution of saliency values.

Return type:

numpy.ndarray

saliencytools.maskcompare.mean_absolute_error(a, b)[source]

Compute the Mean Absolute Error (MAE) between two images.

The MAE measures the average absolute difference between corresponding pixels in two images. It captures the overall deviation in pixel values.

Reference:

Commonly used in regression analysis and image processing.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Mean Absolute Error, representing the average deviation.

Return type:

float

saliencytools.maskcompare.mean_squared_error(a, b)[source]

Compute the Mean Squared Error (MSE) between two images.

The MSE measures the average squared difference between corresponding pixels in two images. It emphasizes larger deviations more than the Mean Absolute Error.

Reference:

Commonly used in regression analysis and image processing.

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Mean Squared Error, representing the average squared deviation.

Return type:

float

saliencytools.maskcompare.normalize_mask(mask)[source]

Normalize the mask to the range [-1, 1].

This function rescales the input saliency map to the range [-1, 1], ensuring that the values are standardized for further processing. This normalization is particularly useful when working with metrics or models that expect inputs in this range.

Parameters:

mask (numpy.ndarray) – Input saliency map. This is a 2D or 3D array representing the saliency values of an image.

Returns:

Normalized saliency map with values in the range [-1, 1].

Return type:

numpy.ndarray

saliencytools.maskcompare.normalize_mask_0_1(mask)[source]

Normalize the input saliency map to the range [0, 1].

This function ensures that the values in the input saliency map are scaled to lie within the range [0, 1]. This is useful for standardizing the input data for further processing or comparison, especially when working with metrics that require normalized inputs.

Parameters:

mask (numpy.ndarray) – Input saliency map. This is a 2D or 3D array representing the saliency values of an image, where higher values indicate greater importance.

Returns:

Normalized saliency map with values in the range [0, 1].

The output has the same shape as the input.

Return type:

numpy.ndarray

saliencytools.maskcompare.nss(a, b)[source]

Compute the Normalized Scanpath Saliency (NSS).

Reference:

Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural scenes. Vision Research, 45(18), 2397-2416.

Parameters:
  • a (numpy.ndarray) – Prediction saliency map.

  • b (numpy.ndarray) – Ground truth saliency map (fixations).

Returns:

NSS value.

Return type:

float

saliencytools.maskcompare.nss_distance(a, b)[source]

NSS converted to a distance for use in KNN benchmarks.

NSS is a similarity measure (higher = more similar), so this wrapper returns -nss(a, b) so that argmin picks the most similar prototype. The function is asymmetric by design: a is treated as the prediction (z-scored), b as the reference whose high-valued regions are queried.

Parameters:
  • a (numpy.ndarray) – Prediction saliency map.

  • b (numpy.ndarray) – Reference saliency map.

Returns:

-nss(a, b). More negative means more similar.

Return type:

float

saliencytools.maskcompare.psnr(a, b)[source]

Compute the Peak Signal-to-Noise Ratio (PSNR) between two images.

The PSNR measures the ratio between the maximum possible pixel value and the mean squared error. It is commonly used to evaluate the quality of reconstructed images.

Reference:

Huynh-Thu, Q., & Ghanbari, M. (2008). “Scope of validity of PSNR in image/video quality assessment.”

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

PSNR value, representing the signal-to-noise ratio.

Return type:

float

saliencytools.maskcompare.sign_agreement_ratio(a, b)[source]

Compute the Sign Agreement Ratio (SAR) between two images.

The SAR measures the proportion of pixels where the signs of the values in two images agree. It captures the consistency in the direction of importance between two saliency maps.

Reference:
    1. Nevill, G. Atkinson (1997) “Assessing agreement between measurements recorded on a ratio scale” in sports medicine and sports science

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

Sign Agreement Ratio, representing the proportion of agreement.

Return type:

float

saliencytools.maskcompare.ssim(a, b)[source]

Compute the Structural Similarity Index Measure (SSIM) between two images.

The SSIM evaluates the perceptual similarity between two images by considering luminance, contrast, and structure. It is widely used for assessing image quality and similarity.

Reference:
  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600-612. https://doi.org/10.1109/TIP.2003.819861

Parameters:
  • a (numpy.ndarray) – First image.

  • b (numpy.ndarray) – Second image.

Returns:

SSIM value, representing the perceptual similarity.

Return type:

float