Open Access

Object segmentation by saliency-seeded and spatial-weighted region merging

Applied Informatics20163:9

DOI: 10.1186/s40535-016-0024-z

Received: 13 September 2016

Accepted: 2 November 2016

Published: 22 November 2016

Abstract

In this paper, we present a region merging-based method for object segmentation in natural images. The method consists of three separate steps: (1) initial over-segmentation such that pixels in each region are as homogeneous as possible and therefore likely to be from the same object; (2) saliency-seeded interaction to provide proper prior input to guide the segmentation; (3) region merging by an introduced maximal spatially weighted similarity (MSWS) criterion. Saliency-seeded interaction can well reflect the human intention but does not require any manual user editing, which makes our method applicable to increasingly large-scale image databases. The MSWS criterion takes into account both the color similarity and spatial distance of the candidate regions for merging, which allows the region merging-based method to achieve better performance. Extensive experiments show that our method can reliably and automatically segment the objects from a great variety of natural images.

Keywords

Object segmentation Saliency detection Spatial neighbor Region merging

Background

Object segmentation is an important task in the field of image processing (Gollmer et al. 2014, Tavakoli and Amini 2013, Seo et al. 2006). In many applications such as object recognition (Russell et al. 2006) and content-aware image resizing (Avidan and Shamir 2007), one of the core issues is to segment the object(s) of interest out from an image. If the object(s) can be correctly segmented, better application performance can be achieved such as higher recognition rate or lower resizing deformation.

However, this segmentation task in itself is a difficult and still open problem. Over the last three decades, a plethora of methods have been proposed: mean shift (Comaniciu and Meer 2002), fuzzy c-means (Cai et al. 2007; Chen and Zhang 2004), normalized cuts (Shi and Malik 2000), the coherence-connected tree algorithm (Ding et al. 2006), etc. But, as reported, they are all restricted to work well if the assumption of homogeneity in one or more region attributes hold. In other words, these segmentation methods yield good results when the objects are piece-wise smooth or nearly constant in at least one attribute. However, in commonly encountered complex natural images, they often perform poorly. Quite often, the objects tend to be segmented into pieces. In recent years, interactive techniques such as graph cuts (Boykov and Jolly 2001), GrabCut (Rother et al. 2004), and those in Bai and Sapiro (2007), Peng et al. (2011), Xiang et al. (2009) and Li et al. (2004) have received considerable attention. The underlying idea is to utilize some prior user inputs to guide the segmentation.
Fig. 1

Segmentation results from the method proposed by Ning et al. (2010). 1st row input image (left) and initial mean-shift over-segmentation (right); 2nd row four different interactive inputs of the object (green) and the background (blue); 3rd row the corresponding segmentation results

Fig. 2

Segmentation results based on different merging rules. a Interactive inputs; b color similarity-based results; c MSWS-based results. In a the green lines are the object markers and the blue lines are the background markers

Fig. 3

An overview of the general schematic flowchart of our proposed method. The image shown in the red box is our object segmentation result

Fig. 4

Results of ten popular saliency detection algorithms. From left to right original image, IT, MZ, GB, SR, AC, CA, FT, LC, HC, and RC

Experiments have shown that if proper prior input is provided, most of the existing interactive methods can yield satisfactory results for natural images. But, providing the proper input is not straightforward (Yang et al. 2010). Quite often, the user, especially a non-expert, has to struggle with a carefully patient editing among all possibly ‘desired’ locations in the image (Rother et al. 2004; Li et al. 2004). If the user fails to provide effective priors, more interactions are required to correct the segmentation. This is a tedious task, and especially difficult when the object and its background have low contrast (Ning et al. 2010), or the object is camouflaged, or there is clutter in the image (Rother et al. 2004). In such cases, despite the interactive input, the segmentation may not always yield the desired output (see Fig. 1). This can be remedied partly by a second tier editing of the initial segmentation results (Li et al. 2004). Another option is to employ multiple types of prior user input, including object and background strokes, soft boundary brushes or boxes, hard edge scribbles, and any combination of these (Rother et al. 2004).

Although this effort results in improved segmentation results, the whole process is tedious and is not at all practical especially in view of image databases of increasingly larger sizes (Liu et al. 2011). Manual annotation of these databases is out of the question. This is the main motivation behind our method. Our proposed scheme aims as follows: (i) provide a segmentation method effective in a great variety of natural images, where regions are primed by a few background and object seed inputs; and (ii) any seed input must be acquired automatically, that is free from any user manual effort.

Photographs of natural scenes reflect real-world variations and are characterized by large ranges of color, texture, shape, or similar attributes. Image objects are not necessarily homogeneous in their attributes, and consequently even the state-of-the-art methods can fail to segment an object in its entirety, and more often the segmentation yields fragmented objects. This gives us the following idea: we can first over-segment an image into regions that are as homogeneous as possible, and then try to merge the object regions that are adjacent and similar to each other. The rationale is that these regions in all likelihood belong to the same object. To this end, we present a merging-based segmentation method in this paper.
Fig. 5

An example of our saliency-seeded interaction. a Input image; b saliency map; c the histogram of the saliency map; d and e the obtained object and background seeds. Green lines denote object seeds and blue lines denote background seeds. The saliency values 39 and 238 in c correspond to P \(_{B}=0.5\) and P \(_{O}=0.05\), respectively

Fig. 6

The explanation of our MSWS criterion. a Regions AD represent four different homogeneous regions (a and b are object regions, c and d are background regions); b a saliency map helps determine where the object of interest is (object region A denoted ‘O,’ background region D denoted ‘B,’ and unlabeled regions denoted ‘N’ for simplicity); d centers of four regions; e MCS similarity; f spatial distance between two regions; g MSWS similarity; hj are the corresponding labeled results

Fig. 7

Segmentation results with two different similarity measures. a Saliency-seeded interactions; b MCS-based results; c MSWS-based results

Fig. 8

Accuracy-P\(_{O}\) and Accuracy-P\(_{B}\) curves on the MSRA1000 dataset of different saliency methods

We introduce a novel rule termed ‘maximal spatially weighted similarity’ (MSWS) to aggregate regions. Specifically, our proposed rule is to merge the regions that not only have the highest similarity in color, but that also are the nearest to each other. That is, MSWS criterion takes into account both the “color similarity” and “spatial distance” of the candidate regions for merging. Merging methods in the current literature focus on finding neighboring regions with color similarity above a threshold (Yang et al. 2010) or the highest (Ning et al. 2010) among all, without a distance weighting criterion. Disregarding the distance weighting criterion increases the risk that background regions with similar colors will be erroneously merged with object regions (see Fig. 2b).

Furthermore, we adopt an interactive merging strategy as recently proposed in Ning et al. (2010). That is, we first generate image clues to direct the merging, and these clues, in the form of simple strokes, roughly indicate the locations of the object and of the background. However, while in Ning et al. (2010), the object and background seeds are all drawn by the user, in our scheme they are automatically extracted. To generate segmentation priors, we have to take into account the following observations:
  • From the prior interaction point of view, the locations of pixels which have different attributes but belong to the same object are often good candidates for priors (see the toucan image in Fig. 1). As a case in point, the “toucan” object consists in majority of black pixels, and a minority of orange and white pixels. To be segmented into the same region as the black pixel, the minority pixels have to be marked as prior object seeds.

  • From the human attention point of view, the locations of pixels which have different attributes but belong to the same object are generally the salient places where human attention is attracted (see Fig. 3 for the toucan again). The orange and white pixels which are highly contrasted to the black pixels have the highest salience, shown as bright regions. At the same time, we can also observe that the pixels with the lowest salience are usually part of background.

From these two observations, we can conclude that the salient parts of an image, which attract more human attention are also likely to be the locations of prior interactions. Inspired by this conclusion, we build a saliency-seeded interactive scheme that can automatically find the good object (i.e., by highest salience) and background (i.e., by lowest salience) seed inputs. A typical result of our automatic interaction for the toucan image is shown in Fig. 3. Clearly, the object marks fall onto a small portion of locations where the pixels are largely orange and white, while the background marks are all located in the background.

A brief overview of our ‘saliency-seeded and spatial-weighted’ (SSaSW) region merging-based method is illustrated in Fig. 3. It consists of three main stages: (A) initial over-segmentation; (B) saliency-seeded interaction; and (C) MSWS-based region merging. First, we run an image segmentation algorithm to divide the input image into many small homogenous regions. Next, with the aid of a saliency detection method, the prior interactions are determined automatically. Finally, the object is extracted from the background when our MSWS-based merging process ends. Extensive experiments are conducted and results show that our method can reliably segment the objects from a wide variety of natural images.

In summary, the contributions of this paper mainly include the following:
  1. 1.

    We build a saliency-seeded interaction scheme that can well reflect the human intention but is free of any manual user editing effort. In addition, it is easy and flexible for our interactions embedded into many interactive methods.

     
  2. 2.

    We propose a novel rule MSWS to aggregate regions. It takes into account both the color similarity and spatial distance of the candidate regions for merging, which allows the region merging-based method to achieve better performance.

     

Our merging-based segmentation method

In this section, we will detail three stages of our method.

Initial over-segmentation

There are many low-level homogeneity-based methods which can be used for an initial over-segmentation, such as normalized cuts (Ncuts) (Shi and Malik 2000), k-means (Mignotte 2008), mean shift (Comaniciu and Meer 2002), Otsu’s thresholding (Otsu 1979), and watershed (Vincent and Soille 1991). Our required initial segmentation should be that pixels in each region are as homogeneous as possible such that (i) they are from the same object and (ii) the object boundary is well preserved. The results produced by the mean-shift algorithm satisfy these two requirements. However, methods like k-means, Ncuts, and otsu’s require a preset threshold on the number of regions, and their computational complexity always rapidly increases with this threshold. The results produced by these three methods usually do not keep the boundary well. Although the results produced by watershed also satisfy the mentioned two requirements, they always tend to yield over-segmentation regions that increase the complexity of computation. For these reasons, we choose mean-shift to produce our required initial over-segmentation. In particular, the EDISON system EDISON Software (http://www.caip.rutgers.edu/riul/research/code.html) of mean shift software is used here.

Saliency-seeded automatic interaction

Most of existing interactive segmentation methods can yield satisfactory results, if proper user interaction is provided. However, the image database nowadays becomes increasingly larger, and manual annotation of them is impractical at all. Thus, finding an automatic way to figure out the prior interaction is very important.

Our motivation of automatic interaction

Saliency detection is one recently developed technique for object extraction (Cheng et al. 2011; Achanta et al. 2009). It seeks to identify the highly informative parts of a scene that attract more human attention. In an image, the regions that are strongly contrasted to their surroundings often tend to pop out being salient. To date, there are many popular salience detection methods proposed to identify these regions, such as IT (Itti et al. 1998), MZ (Ma and Zhang 2003), GB (Harel et al. 2007), SR (Hou and Zhang 2007), AC (Achanta et al. 2008), CA (Goferman et al. 2010), FT (Achanta et al. 2009), LC (Zhai and Shah 2006), HC (Cheng et al. 2011), and RC (Cheng et al. 2011). In all of them, the salience values of pixels are represented in gray and normalized to the range [0, 1]. The brighter a pixel is, the higher its salience value is. From their typical results shown in Fig. 4, we can observe that pixels with the higher salience (shown as brighter pixels) are near high-contrast positions (e.g., object boundaries), or within some high-contrast regions (e.g., a textured region). On the other hand, they are all related to the object of interest in one image. On the contrary, pixels in the background tend to have the lower salience, shown in black. Interactive methods such as GrabCut (Rother et al. 2004), graph cuts (Boykov and Jolly 2001), or MSRM (maximal similarity-based region merging) (Ning et al. 2010) yield good results when the locations of pixels with higher salience are marked as prior inputs. That is, the high-contrast positions or regions are always good candidate places for prior user interaction. Inspired by these, we will build a saliency-seeded automatic interaction scheme in the following:

Our way of automatic interaction

In particular, we intend to mark pixels with the highest salience being ‘object’ (denoted ‘O’), and to mark pixels with the lowest salience being ‘background’ (denoted ‘B’). That is, we are to pick the pixels with salience above a threshold \(T_O\) as the prior object seeds, and to pick the pixels with salience below a threshold \(T_B\) as the prior background seeds (\(T_O> T_B\)):
$$\begin{aligned} O=\{(x,y)\mid s(x,y)\ge {T}_{O}\} \end{aligned}$$
(1)
$$\begin{aligned} B=\{(x,y)\mid s(x,y)\le {T}_{B}\}, \end{aligned}$$
(2)
where s(xy) is the salience value of the pixel (xy). However, it is difficult to find a general-purpose value for such two thresholds. The objects and backgrounds in different images tend to have different salience values.
Thus we turn to specify other two alternative thresholds \(P_{O}\), \(P_{B}\) that represent the amount of prior object and background seeds in an image I:
$$\begin{aligned} {\rm Pr}(O)={\rm Pr}(s(x,y)\ge \text{T}_{O})={P}_{O} \end{aligned}$$
(3)
$$\begin{aligned} {\rm Pr}(B)={\rm Pr}(s(x,y)\le {T}_{B})= {P}_{B}, \end{aligned}$$
(4)
where \({\rm Pr}(\cdot )\) is a probability function and defined as
$$\begin{aligned} {\rm Pr}(O)=\frac{|O|}{|I|};\quad {\rm Pr}(B)=\frac{|B|}{|I|}. \end{aligned}$$
(5)
\(|\cdot |\) denotes the number of elements in a set. We observe that in each salience map, the probability of pixels with the highest salience is about 2–5%, and the probability of pixels with the lowest salience in black is near \(50\%\) (see Fig. 5c). Then, we here select a value for \(P_{O}\) in the range [0.02, 0.05] and set \(P_{B}\) to be 0.5. As shown in Fig. 5d, the object and background seed inputs are well determined.
However, with this approach, there are still too many marked inputs, especially in the background. For a shrink, we take the morphological ‘thin’ operation on marked object and background seeds, respectively. We use the function ‘bwmorph’ in the MATLAB R2010b function library in forms of bwmorph (BW, operation, n) which means applying a specific morphological operation to the binary image ‘BWn times. Specifically, we apply the operation ‘thin’ repeatedly until the image no longer changes, i.e., operation =‘ thin,’ and n = inf. As a results, the ‘thin’ operation removes pixels so that the object or background seeds regions without hole shrink to a minimally connected stroke, and the regions with holes shrink to a ring halfway between the hold and outer boundary (see Fig. 5e).
Fig. 9

Object segmentation of SSaSW based on different saliency maps. From left to right initial image, IT-seeded, MZ-seeded, GB-seeded, SR-seeded, AC-seeded, CA-seeded, FT-seeded, LC-seeded, HC-seeded, RC-seeded, and Ground truth

Fig. 10

Similarity measure comparison between MCS and MSWS. a Input images; b initial mean shift segmentation and the input markers; c and d segmentation results based on MCS and MSWS, respectively

Fig. 11

Segmentation results by RCC and SSaSW. a and d are original images. b and e show the segmentation results from RCC, and c and f are the segmentation results from SSaSW

Fig. 12

Segmentation results with different parameters. a Initial image and saliency map; b saliency-seeded interactions and segmentation results with \(P_{O}=0.05\), \(P_{B}=0.5\); c \(P_{O}=0.02\), \(P_{B}=0.5\)

As thus, only a small portion of image pixels are marked as prior interaction inputs, and they have reflected human attention well. More importantly, they are all obtained free from any user manual effort and adaptive to the image content.

MSWS-based region merging

After the above interaction input, there are some over-segmented regions that will contain both object seeds and background seeds. Before the merging step, we should first label the regions with more prior object (or background) seeds as the object (or background) marker region, and label the regions with no prior seed input as non-marker regions. The merging aim of MSWS is to assign to each non-marker region the correct label ‘O’ or ‘B.’ The whole merging process contains two stages, which are repeatedly executed until no new merging occurs. (i) Merging non-marker regions with background marker regions. For each background marker region, if a non-marker region satisfies the MSWS criterion with it, the two regions are merged and the new region is labeled ‘B.’ (ii) Merging non-marker regions remained from the first stage adaptively. For each non-marker region, if a non-marker region satisfies the MSWS criterion with it, the two non-marker regions are merged and form a new non-marker region. In what follows, we will give a brief review of a principle of maximal color similarity (MCS) in MSRM. Based on it, we will provide our insight into why the spatial distance between regions is also important for the merging.

Overview of MCS

Color is a simple and effective low-level attribute that is commonly used for image segmentation. The idea is that regions from the same object are more similar in color than regions from different objects. Specifically, MCS is a very useful merging principle described in MSRM. It merges two neighboring regions that have the maximal similarity in color. That is, for one region R, let Q denote an adjacent region of R (i.e., a region with at least one pixel in common with R), if
$$\begin{aligned} \rho _{c} (R,Q^{*})=\max \limits _{Q\in N(R)}\rho _{c}(R,Q) \end{aligned}$$
(6)
\(Q^{*}\) is called the most similar region to R and is merged with R, where \(\rho _{c}(R,Q)\) denotes the color similarity between R and Q, and N(R) is the set of R’s all adjacent regions. By this “max” operator, the merging process avoids a preset similarity threshold. However, the “max” operator may be somewhat sensitive to noise. To avoid this issue, MSRM uses an RGB histogram to represent each region. In the RGB space, each channel is uniformly quantized into 16 levels, and then a color space of \(16\times 16\times 16=4096\) bins is used to calculate the histogram of each region. MSRM computes the color similarity of regions as the Bhattacharyya coefficient between two histograms:
$$\begin{aligned} \rho _{c} (R,Q)=\sum _{u=1}^{4096}\sqrt{{\rm Hist}_R^u\cdot {\rm Hist}_Q^u}, \end{aligned}$$
(7)
where \({\rm Hist}_R\) and \({\rm Hist}_Q\) denote the normalized color histograms of R and Q respectively, and the superscript u represents the uth bin.

Our MSWS criterion

It is worthwhile to note that in MCS all neighboring regions are treated equally in the merging, and only color information is used to judge the similarity between regions. This has some limitations. This approach may fail when low-contrast edges and shadow occur. It may also fail when part of the object region is slightly more similar in color to the adjacent background region than adjacent object regions, or vice versa.
Fig. 13

F-measure evaluations

Fig. 14

Result comparisons. From left to right graph cuts, GrabCut, and SSaSW. In the first row, the green and blue strokes are the corresponding object and background seeds in graph cuts. The red rectangle around the desired object is the interaction in GrabCut

Fig. 15

Segmentation results on some shadow images and medical images vascular images

Fig. 16

Failure cases of SSaSW. 1st row initial images; 2nd row initial mean shift segmentations; 3rd row saliency maps; 4th row saliency-seeded interactions; 5th row segmentation results by our SSaSW; 6th row corresponding human segmentations

We take the yellow flower shown in Fig. 2 as an example. The flower consists of two parts: petals and stamen. Although both parts are yellow, the stamen is slightly darker than the surrounding petals. In Fig. 2b (first row), only parts of the petals are marked as belonging to the object, and a small portion of the background is present in the segmented object. In Fig. 2b (second row), it can be seen that the prior interactions are well designed, but the segmentation problem remains. The object cannot be reliably extracted from the background by either of these two interaction inputs. This example illustrates that even if the prior interactions are well designed, a satisfactory result cannot be obtained for this image. This is mainly because the object of interest is not piece-wise smooth or nearly constant in color and the contrast between the object and background is low. These problems are relatively common in natural images. Therefore, using only color information cannot ensure good segmentation performance for these natural images.

To solve this problem, we propose a novel rule termed maximal spatially weighted similarity (MSWS) to merge regions. It takes into account both the color similarity and the spatial distance of the candidate regions for merging. The implied idea is that regions of the same object are spatially adjacent and their colors are similar enough to each other. That is, one aims to merge the regions that not only have the highest similarity in color, but that also are the nearest to each other. Specifically, for two regions R and Q, we first define the spatial distance as
$$\begin{aligned} \rho _{s} (R,Q)={\Vert center_{R}-center_{Q}\Vert }_{2} \end{aligned}$$
(8)
where \(center_{R}\) and \(center_{Q}\) are the center pixel coordinates of the regions R and Q, respectively, and \({\Vert \cdot \Vert }_2\) denotes the Euclidean distance. The lower \(\rho _{s} (R,Q)\) is for a pair of regions, the higher the spatial similarity between them. Directly integrating spatial distance into the color similarity computation, the MSWS is defined as
$$\begin{aligned} \rho (R,Q)={\text{ exp }(-{\rho _{s} (R,Q)}/{\sigma ^{2}})}\cdot {\rho _{c} (R,Q)}, \end{aligned}$$
(9)
where \(\sigma\) controls the effect of spatial distance in the maximal spatially weighted similarity measure. In our experiments, we use \(\sigma ^{2}=1\) empirically. Note that, although we choose the RGB color space and Bhattacharyya coefficient to compute the color similarity as in Ning et al. (2010), other color spaces (e.g., HSI) and distance metrics (e.g., Euclidean distance) can also be used here.

We use Fig. 6 as a toy example to explain the rationale behind our MSWS criterion. Fig. 6a is the initial over-segmentation result. It contains four different homogeneous regions, denoted by A, B, C, and D (Fig. 6c). We assume that A and C are object regions, and B and D are background regions. In the MCS-based labeled result (Fig. 6h), regions B and region C are labeled ‘O.’ The labeling result which uses only spatial information is shown in Fig. 6j; in this case, region B and region C are labeled ‘B.’ However, as shown in Fig. 6i, the corresponding MSWS-based result is consistent with the benchmark (Fig. 6a). This shows that our criterion can improve the performance of region merging-based methods by considering the color similarity and spatial distance of the candidate regions jointly.

Figure 7 shows the segmentation results based on MCS and MSWS criterion. In the MCS-based results, the objects of interest cannot be segmented accurately. In the person image, parts of the object are merged into the background; in the flower image, a small portion of the background regions are erroneously integrated into the object (see Fig. 7b). Figure 7c shows the segmentation results by our proposed method. Clearly, it can effectively and accurately extract the objects from their backgrounds.

Experiments and comparisons

Experiment setting

Datasets

In this section, we evaluate the performance of our proposed algorithm from multiple perspectives. These extensive experiments are conducted on two public image databases. The first one is the Berkeley Segmentation Database denoted as BSDS300 (Martin et al. 2001). It is an information-rich dataset which contains 300 images along with the ground-truth segmentations. These images are of complex, natural scenes, and have five to ten human hand-labeled segmentations on each one of them. The second database MSRA1000 is provided by Achanta et al. (2009). It consists of 1000 images with obvious salient objects and clean backgrounds with a manually generated segmentation result for each image.

Parameters setting

P\(_{O}\) and P\(_{B}\) are two important parameters for our method to obtain the object and background seeds. In order to determine the P\(_{O}\) and P\(_{B}\) values, we conducted an elaborated analysis on MSRA1000 dataset.

We analyzed this problem mainly from two aspects:
  1. (i)

    From the aspect of the accuracy that the saliency detection algorithm brings for our prior interactions, we conducted extensive statistical experiments over ten saliency detection methods with different thresholds P\(_{O}\) and P\(_{B}\). For each saliency method, we compute average accuracy-P \(_{O}\) curve and accuracy-P \(_{B}\) curve on MSRA1000 dataset, and present all the curves in Fig. 8, respectively. From Fig. 8, we can see that when P \(_{O}\le 5 \%\), the accuracy is above \(50\%\) for all these ten methods (SR has the worst performance when P \(_{O} = 5 \%\)), and when P \(_{O}> 5 \%\), the accuracy is decrease gradually. To our minds, it will not be accepted when the accuracy is less than \(50\%\). So, we here choose the maximum value of P \(_{O}\) is \(5\%\). From Fig. 8, it can be seen that when P \(_{B}\) is near \(50\%\) the accuracy is higher than \(95\%\) for most of saliency models.

     
  2. (ii)

    From the aspect of the foreground object size of the image, we computed the proportion of the foreground object in the whole image for all the MSRA1000 dataset. In 1000 images, there are only 21 images which have a very small proportion—less than \(5\%\). Among the 21 images there are only two images, whose proportion is less than \(2\%\). So, here we choose the minimum value of P\(_{O}\) is \(2\%\). Besides, there are only several images whose proportion of the background is less than \(50\%\).

     
Taking these two aspects into consideration and for fairly comparing with other methods, in this paper, we select a value for P\(_{O}\) in the range [0.02, 0.05], and set P\(_{B}\) to be 0.5.

Qualitative result comparisons

The two main stages of our proposed method are the saliency-seeded automatic interaction and MSWS-based region merging. In order to verity their effectiveness, we conduct extensive experiments on the two test datasets.

Results based on different saliency detection methods

Figure 9 illustrates the corresponding segmentation results of SSaSW based on different saliency detection methods IT (Itti et al. 1998), MZ (Ma and Zhang 2003), GB (Harel et al. 2007), SR (Hou and Zhang 2007), AC (Achanta et al. 2008), CA (Goferman et al. 2010), FT (Achanta et al. 2009), LC (Zhai and Shah 2006), HC (Cheng et al. 2011), and RC (Cheng et al. 2011). These images are from the MSRA1000 database. We can clearly see that SSaSW yields satisfactory segmentation results from most of these methods, except for AC and SR. Therefore, most saliency detection methods except for AC and SR can provide the proper automatic interactions for SSaSW. In the following experiments, the RC saliency map is used to automatically determine prior interactions.

Effectiveness analysis of our MSWS

We compare the performance of our MSWS criterion with that of the MCS criterion. Note that MCS can be seamlessly embedded into our framework. All experiments are conducted on the BSDS300database. Figure 10 shows the segmentation results of the MCS- and MSWS-based region merging methods. In these images, some objects contain low-contrast edges, or parts of the background are very similar in color to the adjacent object regions. It is difficult to achieve satisfying results in these cases with MCS. However, given the same marking, MSWS achieves much better results than MCS.

Quantitative result evaluations

Evaluations on the \(\mathbf BSDS300\) database

Until now, the effectiveness of MSWS is evaluated visually. However, visual observation is subjective. In order to demonstrate the performance objectively, it is necessary to provide some performance measures for quantitative evaluations. We make use of the following performance measures: a probabilistic measure PRI (Unnikrishnan et al. 2007), and two metrics VoI (Meila 2005) and GCE (Martin et al. 2001), to demonstrate the effectiveness of our proposed MSWS. The three performance measures adopted here are described in the following sections:
  1. 1.
    Probabilistic Rand Index (PRI) (higher probability is better): The Rand index proposed in Unnikrishnan et al. (2005) calculates the fraction of pairs of pixels whose labels are consistent between the test segmentation S and the ground-truth segmentation G. PRI proposed in Unnikrishnan et al. (2007) is a simple extension of the Rand index. It allows the comparison of a segmentation algorithm to a set of ground-truth segmentations by averaging the results. Given a set of ground-truth segmentations \({\{G_k\}}\), the PRI is defined as
    $$\begin{aligned} \text{ PRI }(S,\{G_k\})=\frac{1}{K}\sum \limits _{i<j}[c_{ij}p_{ij}+(1-c_{ij})(1-p_{ij})], \end{aligned}$$
    (10)
    where \(c_{ij}\) means that pixel i and j have the same label and \(p_{ij}\) denotes its probability. Let K be the number of ground-truth segmentations for an image. Thus, PRI is based on pair-wise relationships and highly correlated with human hand-labeled segmentation results.
     
  2. 2.
    Variation of Information (VoI) (lower distance is better): In contrast to PRI, VoI (Meila 2005) is based on the relationship between a pixel and its own cluster. It views a clustering as an element of a lattice. As a metric, VoI uses conditional entropies to approximate the distance between two clusters, and is defined as
    $$\begin{aligned} \text{ VoI }(R_1,R_2)=H(R_1)+H(R_2)-2I(R_1,R_2), \end{aligned}$$
    (11)
    where H and I represent, respectively, the entropies and mutual information between two regions of \(R_1\) and \(R_2\). It is a form of ‘external evaluation,’ and measures the amount of information that is lost or gained in changing from one clustering to another.
     
  3. 3.
    Global Consistency Error (GCE) (lower distance is better): A supervised evaluation method, GCE, was introduced by Martin et al. (2001) to quantify the consistency between segmentations. Let R(Sp) be the set of pixels which are in the same region R as the pixel p in segmentation S, where \(|\cdot |\) denotes the cardinality of a set and \(\cdot \setminus \cdot\) set difference. The local refinement error is
    $$\begin{aligned} E (S_1, S_2, p) = \frac {| R\,(S_1, p) \setminus R\, (S_2, p)|}{|R\, (S_1, p)|}. \end{aligned}$$
    (12)
    Then the GCE is defined as
    $$\begin{aligned} \text{ GCE }(S_1, S_2)=\frac{1}{n}\min \left\{\sum \limits _{i}E(S_1, S_2, p_i), E(S_2, S_1, p_i) \right\}. \end{aligned}$$
    (13)
    Let n be the size of the image. Note that GCE forces all local refinements to be in the same direction, and it does not penalize over-segmentation.
     
Table 1 compares model performance on the images presented in Fig. 10 using the PRI, VoI, and GCE metrics, where ‘NO.’ denotes the ID number of the images. The values of PRI, VoI, and GCE are given comparatively in the two columns. Obviously, MSWS outperforms MCS on all the indices. The average PRI value of MSWS over the 300 images of the BSDS300 dataset is 0.5551, which is higher than MCS of 0.5476. The average GCE and VoI values of MSWS on this database are 0.0561 and 2.0146, which are lower than the MCS averages of 0.0646 and 2.0519.
Table 1

Qualitative comparison of the results of our method based on MSWS and MCS on the ten images presented in Fig. 10

No.

PRI

GCE

VoI

MCS

MSWS

MCS

MSWS

MCS

MSWS

38,092

0.6229

0.6979

0.0810

0.0604

2.4606

2.4236

41,033

0.7001

0.7010

0.0293

0.0272

1.7014

1.6997

62,096

0.5739

0.5962

0.2916

0.0138

1.9264

1.227

101,087

0.3914

0.4014

0.0326

0.0277

2.9608

2.9295

108,082

0.7099

0.7292

0.0701

0.0568

1.2445

1.1788

123,074

0.3353

0.3389

0.0337

0.0236

2.2989

2.2529

160,068

0.6084

0.6487

0.1227

0.0584

2.0237

1.9315

175,043

0.8036

0.8116

0.0277

0.0107

1.111

1.1087

296,059

0.6776

0.6784

0.0293

0.0256

2.0147

1.9971

376,043

0.6378

0.6475

0.0271

0.0162

1.704

1.6399

The average values over 300 images of the BSDS300 dataset are also included

Italic indicates best performance

Evaluations on the MSRA1000 database

In order to demonstrate the effective of our method, we conduct our method based on six recently proposed saliency detection methods LR (Shen and Wu 2012), SF (Perazzi et al. 2012), HS (Yan et al. 2013), MR (Yang et al. 2013), DS (Li et al. 2013), and AMC (Jiang et al. 2013) on the MSRA1000 database, and then compare our object segmentation results with their adaptive-thresholding segmentation results. The term of adaptive threshold is proposed by Achanta et al. (2009) which is image saliency dependent. Note that in the adaptive-thresholding segmentation, each saliency map is first over segmented by mean-shift. An average saliency is then calculated for each segment, and an overall mean saliency value over the entire image is obtained as well. If the saliency in this segment is larger than twice of the overall mean saliency value, the segment is marked as foreground, otherwise to be background. In this way, the binary segmentation map is yielded.

F-measure is used to assess the consistency of each segmentation result with the ground truth, and is defined as
$$\begin{aligned} {\text F}{\text {-measure}} = \frac{(1+\beta ^{2})\times {\rm Precision}\times {\rm Recall}}{\beta ^{2}\times {\rm Precision} + {\rm Recall}}. \end{aligned}$$
(14)
We use \(\beta ^{2}=0.3\) in our method to weigh Precision more than Recall. Table 2 shows the F-measure scores of our SSaSW and the adaptive-thresholding segmentation. From the results, we can see that our method consistently performs better than the adaptive-thresholding segmentation. This comparison results also nicely demonstrate the effectiveness of the strategies of our proposed saliency-seeded interaction and maximal spatially weighted similarity criterion.
Table 2

F-measure evaluations with different saliency methods on the MSRA1000 database

Saliency methods

Adaptive-thresholding

SSaSW

LR

0.7837

0.8782

SF

0.8157

0.8969

HS

0.8526

0.9034

MR

0.8943

0.9126

DS

0.8568

0.9037

AMC

0.8944

0.9169

The best results are highlighted in italics

Furthermore, we compare the segmentation results of SSaSW and RCC (Cheng et al. 2011) with the human segmentation result for each image. RCC is an RC-based cut algorithm. It employs the RC saliency map to initialize the process of GrabCut instead of using human input. Figure 11 compares the segmentation results of SSaSW and RCC. From Fig. 11c and f, we can see that each object of interest is effectively extracted from the background by SSaSW, while RCC has difficulty handling images with cluttered and highly textured objects or backgrounds (see Fig. 11b, e). Table 3 presents the F-measure scores on the test images and shows that our results are very consistent with the ground truth. The averaged F-measure score of our SSaSW is 0.8749 on MSRA1000 database. These experiments are conducted using the parameters \(P_{O}=0.05\), \(P_{B}=0.5\), and \(\sigma ^{2}=1\) throughout.
Table 3

Precision (P), Recall (R), and F-measure values for test images

Image

P

R

F-measure

Image

P

R

F-measure

0_0_77

0.9922

0.9224

0.9751

1_43_43183

0.9888

0.9903

0.9892

0_0_280

0.9948

0.9799

0.9913

1_44_44379

0.9932

0.9877

0.9919

0_5_5108

0.9352

0.9402

0.9963

1_53_53905

0.9747

0.9523

0.9694

0_7_7923

0.9583

0.9507

0.9565

1_67_67202

0.9555

0.9117

0.9450

0_11_11179

0.9437

0.9882

0.9536

2_81_81784

0.9947

0.9555

0.9854

0_12_12435

0.9955

0.9683

0.9891

2_82_82074

0.9626

0.5449

0.8179

0_14_14991

0.9243

0.9862

0.9379

2_89_89895

0.9773

0.9631

0.9740

0_19_19025

0.9423

0.9716

0.9498

2_90_90658

0.9836

0.9314

0.9711

0_24_24209

0.9030

0.9482

0.9131

3_104_104837

0.9263

0.9847

0.9391

0_24_24861

0.9909

0.9971

0.9923

3_117_117435

0.9861

0.9343

0.9736

0_25_25057

0.9829

0.9799

0.9822

3_120_120771

1.000

0.9824

0.9961

1_39_39670

0.9905

0.8460

0.9529

4_143_143776

0.9778

0.9942

0.9815

\(P_{O}\) and \(P_{B}\) are two important parameters for our method to obtain the object and background seeds. In our experiments, in general, we can find a good result in the range [0.02, 0.05] for \(P_{O}\), and 0.5 for \(P_{B}\). In some cases, SSaSW can obtain better results by adjusting the parameters \(P_{O}\) and \(P_{B}\). Such a case is shown in Fig. 12: with default parameters (\(P_{O}=0.05\)), the background regions circled in red are merged into the object (see Fig. 12b), since there are several pixels in the background with higher salience (see Fig. 12a) and the corresponding regions are erroneously assigned to the object marker regions. In Fig. 12c, SSaSW produces a relatively accurate result with \(P_{O}=0.02\).

For fairly comparing with other methods, we further introduce an effective scheme. For each image, with different \(P_{O}\) values \(P_{O_i}\) (\(i=1, 2, \ldots , k\)), we can easily yield the corresponding segmentation results \(Z_{P_{O_i}}\). Then the average map \(\bar{Z}\) is calculated for each pixel p as
$$\begin{aligned} \bar{Z}(p)=\frac{1}{k}\sum _{i=1}^{k}Z_{P_{O_i}}(p). \end{aligned}$$
(15)
Finally, the object segmentation result M can be obtained as (\(\bar{Z}\) is normalized to [0, 1])
$$\begin{aligned} M(p)=\left\{ \begin{array}{ll} 1,\quad &{}\hbox { if } \bar{Z}(p)\ge 0.5; \\ 0,\quad &{}\hbox {else.} \end{array} \right. \end{aligned}$$
(16)
In this result, \(M(p)=1\) indicates pixel p belonging to foreground object, and \(M(p)=0\) indicates pixel p belonging to background.

In the experiments, specifically, we vary \(P_{O}\) from 0.02 to 0.05 with 0.01 one step, and obtain four values \(P_{O_1}=0.02\), \(P_{O_2}=0.03\), \(P_{O_3}=0.04\), \(P_{O_4}=0.05\). In this way, all the results can be obtained using a unified parameter setting. The F-measure obtained by the proposed strategy is 0.91 which is higher than 0.90 obtained by RCC. Figure 13 shows the F-measure evaluations of SSaSW, RCC, and SSRMf (Li et al. 2011). SSRMf is also a saliency-based object segmentation method. Clearly, our SSaSW has the highest F-measure score. This confirms the effectiveness of our SSaSW.

In order to further illustrate the significance of the above comparisons, here, we give the results of statistical T tests. The corresponding p values are reported in Table 4. As we have expected, the p values are all below 0.05. This indicates that our proposed method has indeed outperformed RCC and SSRMf.
Table 4

p values of the statistical t tests for evaluations

Measure

SSaSW and MSRM

Measure

SSaSW and RCC

SSaSW and SSRMf

PRI

3.6638e−004

P

0.0018

0.0053

GCE

4.8741e−003

R

0.0286

0.0092

VoI

2.3569e−005

F-measure

0.0012

0.0017

Comparisons with graph cuts and grabcut

In this section, we compare our method with two interactive segmentation algorithms: graph cuts (Boykov and Jolly 2001) and GrabCut (Rother et al. 2004). Object segmentation is regarded as a minimal graph cuts problem in these two methods. For a fair comparison with our region-based algorithm, we extend the classical pixel-based graph cuts and GrabCut segmentation methods to the region-based scheme. Here, we take the regions segmented by mean shift as the nodes in the graph instead of the pixels. Both graph cuts (Boykov and Jolly 2001) and GrabCut (Rother et al. 2004) require some regions labeled as a prior, i.e., seeds. In graph cuts, the user is required to mark a few strokes as object and background interactions. And in GrabCut, the interaction is a rectangle around the desired object. In Fig. 14, it seems that the prior interactions for graph cuts and GrabCut are well designed. Despite this, our method can achieve a comparable segmentation performance with the interactive object segmentation methods.

Results on domain specific images

In order to demonstrate the effectiveness of our proposed method more widely, in this subsection, we conduct some experiments on domain specific images, e.g., shadow images, medical images (here we use two vascular images). Figure 15 shows the segmentation results. From these results, we can see that our method works well on these specific images.

On the extension to more features

Our method can benefit from the integration of more feature information. Specifically, in this subsection, we add the texture information into our model [three textural features coarseness, contrast, and directionality (Tamura et al. 1978) are used to extract texture information, as done in Dogra et al. (2012)]. That is, we use color similarity, spatial proximity, and texture similarity together to define our similarity measure. Table 5 shows the comparison results. It can be seen that our method can yield better results by integrating of texture information.
Table 5

Average F-measure values on the MSRA1000 dataset based on our MSWS and MSWS with texture

MSWS

MSWS and texture

0.8749

0.8805

Computational complexity of SSaSW

For a clear qualitative analysis of the proposed method, we will discuss its computational complexity and compare it to that of RCC and SSRMf. The running time of our method mainly depends on two parts, the region merging process and the similarity measure. For the region merging process, the time complexity is \(O(N^2)\), where N is the number of regions after initial segmentation. The time complexity of the similarity measure is \(O(M\_{k})\), where \(M\_k\) is the number of pixels in the k-th region. So, the worst-case running time complexity for our SSaSW is \(O(N^2+MN)\), where \(M=\max _{k=1,\ldots ,N}{\{M\_k\}}\). The running time complexity for SSRMf is approximately equal to that of SSaSW. The RCC method iteratively applies GrabCut (Rother et al. 2004) to refine the segmentation result. The most time-consuming step is this GrabCut iteration. Thus, the time complexity for RCC is \(O(mn^2|C|)\), where n is the number of nodes, m is the number of edges, and |C| is the cost of the minimum cut in the graph. Therefore, \(n \gg N\) is clearly since n is the total number of pixels in an image and N is the number of regions after over-segmentation. Table 6 shows the average time taken by RCC, SSRMf, and SSaSW on the MSRA1000 database. SSaSW and SSRMf are implemented in Matlab. For RCC, we use the authors’ implementation in C++. Although SSaSW takes longer to run, it has a lower time complexity than RCC (approximately equals to SSRMf). The difference in computation time is mainly due to the different execution environments.
Table 6

Average time required for object segmentation for images in the MSRA1000 database

Method

RCC

SSRMf

SSaSW

Time complexity

\(O(mn^2|C|)\)

\(O(N^2+MN)\)

\(O(N^2+MN)\)

Time (s)

0.621

12.583

12.696

Code

C++

Matlab

Matlab

Algorithms were tested using a Dual Core 2.6 GHZ machine with 2GB RAM

Failure of SSaSW

Up until now, we have evaluated the effectiveness of SSaSW on a variety of images. However, it may fail when one of the following conditions occurs (such cases are summarized and shown in Fig. 16). The reason for the failure of Fig. 16 arises from the wrongly connected over-segmentation between pencil region. If there was no connection between the hole (from blue sky) and pencil regions, our rule of region merging will not merged them as one region, even though they are with the similar blue color to the nearby pencils. As for Fig. 16, the result should be better if the saliency-seeded interactions (i.e., high-level semantics) are all accurate, e.g., if the bottle neck is not indicated as the background. For Fig. 16, it is just due to the human ambiguity (i.e., subjective labeling). The pixels with the highest saliency values are all from the ‘hand,’ thus they are indicated as the foreground interactions. For this image in the dataset, however, the iron handle is the benchmarked foreground object.

Conclusions

This paper proposes a fully automatic framework of saliency-seeded and spatial-weighted region merging for natural object segmentation. With the aid of a saliency detection method, the proper prior inputs for the object of interest and the background region can be automatically obtained. This labeling reflects human intention and without requiring any manual user editing effort. In addition, we present an effective maximal spatially weighted similarity criterion for region merging. It merges the regions that have the highest similarity in color, and are also the nearest to each other. By incorporating both the color similarity and the spatial distance of the candidate regions for merging, the region merging-based method can achieve better performance. For a wide range of natural images, the salient objects can be reliably segmented from their complex backgrounds. SSaSW involves no user inputs and is a fully automatic framework for segmentation. Experimental results prove that our proposed scheme is comparable to current state-of-the-art automatic segmentation techniques and outperforms the conventional interactive methods. Our future work will focus on how to overcome the failure of SSaSW in some difficult situations and how to improve its speed.

Abbreviations

MSWS: 

maximal spatially weighted similarity

SSaSW: 

saliency-seeded and spatial-weighted

MCS: 

maximal color similarity

PRI: 

probabilistic rand index

VoI: 

variation of information

GCE: 

global consistency error

Declarations

Authors' contributions

JL, JD, and JY conceived and designed the study. JL and LD performed the experiments. JD, JY, and LD reviewed and edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported in part by the National Science Fund of China under Grants 91420201, 61472187, 61502235, 61233011, and 61373063, in part by the Key Project of Chinese Ministry of Education under Grant 313030, the 973 Program under Grant 2014CB349303, and in part by the Program for Changjiang Scholars and Innovative Research Team in University Grant IRT13072.

Competing interests

The authors declared that they have no competing interests.

Funding

All the funding includes National Science Fund of China under Grant 91420201, Grant 61472187, Grant 61502235, Grant 61233011, and Grant 61373063, the Key Project of Chinese Ministry of Education under Grant 313030, the 973 Program under Grant 2014CB349303, and the Program for Changjiang Scholars and Innovative Research Team in University Grant IRT13072. All the above funding gives the financial support for the designing of the study and conducting experiments.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
School of Computer Science and Engineering, Nanjing University of Science and Technology

References

  1. Achanta R, Estrada F, Wils P, Susstrunk S (2008) Salient region detection and segmentation. In: IEEE international conference on computer vision systems. IEEE, New JerseyGoogle Scholar
  2. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE International conference on computer vision and pattern recognition. IEEE, New JerseyGoogle Scholar
  3. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graphics 26:236–246View ArticleGoogle Scholar
  4. Bai X, Sapiro G (2007) A geodesic framework for fast interactive image and video segmentation and matting. In: IEEE international conference on computer vision, pp 1–8Google Scholar
  5. Boykov VV, Jolly MP (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. IEEE Trans Pattern Anal Mach Intell 1:105–112Google Scholar
  6. Cai W, Chen S, Zhang D (2007) Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit 40:825–838View ArticleMATHGoogle Scholar
  7. Chen S, Zhang D (2004) Robust image segmentation using fcm with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern 34:1907–1916View ArticleGoogle Scholar
  8. Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2011) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37:409–416Google Scholar
  9. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619View ArticleGoogle Scholar
  10. Ding J, Chen S, Ma R, Wang B (2006) A fast directed tree based neighborhood clustering for image segmentation. In: International conference on neural information processing. Springer, Berlin, pp 369–378Google Scholar
  11. Dogra DP, Majumdar AK, Sural S, Mukherjee J, Mukherjee S, Singh A (2012) Analysis of adductors angle measurement in hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. Comput Biol Med 42:925–934View ArticleGoogle Scholar
  12. EDISON Software. http://www.caip.rutgers.edu/riul/research/code.html. Accessed 17 Juns 2013
  13. Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. IEEE Trans Conf Comp Vis Pattern Recogn 34:2376–2383Google Scholar
  14. Gollmer ST, Kirschner M, Buzug TM, Wesarg S (2014) Using image segmentation for evaluating 3D statistical shape models built with groupwise correspondence optimization. Comp Vis Image Underst 125:283–303View ArticleGoogle Scholar
  15. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge, pp 545–552Google Scholar
  16. Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: IEEE international conference on computer vision and pattern recognition. IEEE, New Jersey, pp 1–8Google Scholar
  17. Itti L, Kouch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254–1259View ArticleGoogle Scholar
  18. Jiang B, Zhang L, Lu H, Yang M (2013) Saliency detection via absorbing markov chain. In: IEEE international conference on computer vision. IEEE, New JerseyGoogle Scholar
  19. Li X, Lu H, Zhang L, Ruan X, Yang M (2013) Saliency detection via dense and sparse reconstruction. In: IEEE international conference on computer vision. IEEE, New JerseyGoogle Scholar
  20. Li J, Ma R, Ding J (2011) Saliency-seeded region merging: automatic object segmentation. In: Asian conference on pattern recognition. IEEE, New Jersey, p 691Google Scholar
  21. Li Y, Sun JC, Tang SH (2004) Interactive natural image segmentation via spline regression. SIGGRAPH, Los Angeles, pp 303–308Google Scholar
  22. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H (2011) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33:353–367View ArticleGoogle Scholar
  23. Ma Y, Zhang H (2003) Contrast-based image attention analysis by using fuzzy growing. ACM, New York, pp 374–381Google Scholar
  24. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE international conference on computer vision. IEEE, New Jersey, pp 416–423Google Scholar
  25. Meila M (2005) Comparing clusterings-an axiomatic view. In: IEEE international conference on machine learning. ACM, Los AngelesGoogle Scholar
  26. Mignotte M (2008) Segmentation by fusion of histogram-based k-means clusters in different color spaces. IEEE Trans Image Process 17:780–787MathSciNetView ArticleGoogle Scholar
  27. Ning J, Zhang L, Zhang D, Wub C (2010) Interactive image segmentation by maximal similarity based region merging. Pattern Recogn 43:445–456View ArticleMATHGoogle Scholar
  28. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9:62–66View ArticleGoogle Scholar
  29. Peng B, Zhang L, Zhang D, Yang J (2011) Image segmentation by iterated region merging with localized graph cuts. Pattern Recogn 44:2527–2538View ArticleGoogle Scholar
  30. Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: Contrast based filtering for salient object detection. In: IEEE international conference on computer vision and pattern recognition. IEEE, New Jersey, pp 733–740Google Scholar
  31. Rother C, Kolmogorov V, Blake A (2004) grabcut: interactive foreground extraction using iterated graph cuts. SIGGRAPH, Los Angeles, pp 309–314Google Scholar
  32. Russell BC, Freeman WT, Efros AA, Sivic J, Zisserman A (2006) Using multiple segmentations to discover objects and their extent in image collections. IEEE Comp Soc Conf Comp Vis Pattern Recognit 2:1605–1614Google Scholar
  33. Seo K, Shin J, Kim W, Lee J (2006) Real-time object tracking and segmentation using adaptive color snake model. Int J Cont Autom Sys 4:236–246Google Scholar
  34. Shen X, Wu Y (2012) A unified approach to salient object detection via low rank matrix recovery. In: IEEE international conference on computer vision and pattern recognition. IEEE, New Jersey, pp 853–860Google Scholar
  35. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905View ArticleGoogle Scholar
  36. Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8:460–472View ArticleGoogle Scholar
  37. Tavakoli V, Amini AA (2013) A survey of shaped-based registration and segmentation techniques for cardiac images. Comp Vis Image Underst 117:966–989View ArticleGoogle Scholar
  38. Unnikrishnan R, Pantofaru C, Hebert M (2005) A measure for objective evaluation of image segmentation algorithms. In: IEEE international conference on computer vision and pattern recognition workshop on empirical evaluation methods in computer vision. IEEE, New JerseyGoogle Scholar
  39. Unnikrishnan R, Pantofaru C, Hebert M (2007) Toward objective evaluation of image segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 29:929–944View ArticleGoogle Scholar
  40. Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithms based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13:583–598View ArticleGoogle Scholar
  41. Xiang S, Nie F, Zhang C, Zhang C (2009) Interactive natural image segmentation via spline regression. IEEE Trans Image Process 18:1623–1632MathSciNetView ArticleGoogle Scholar
  42. Yan Q, Xu L, Shi J, Jia J (2013) Hierarchical saliency detection. In: IEEE international conference on computer vision and pattern recognition. IEEE, New Jersey, pp 1155–1162Google Scholar
  43. Yang W, Cai J, Zheng J, Luo J (2010) User-friendly interactive image segmentation through unified combinatorial user inputs. IEEE Trans Image Process 19:2470–2479MathSciNetView ArticleGoogle Scholar
  44. Yang C, Lu L, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: IEEE international conference on computer vision and pattern recognition. IEEE, New Jersey, pp 3166–3173Google Scholar
  45. Zhai Y, Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. ACM Multimedia, New YorkView ArticleGoogle Scholar

Copyright

© The Author(s) 2016