Feature extraction of garments based on Gaussian mixture for autonomous robotic manipulation

Recent advancements in the domain of robotics have offered support to humans in their everyday activities with the aim of offloading workers from performing repetitive tasks. The present work highlights one such task in garment industry where the robot may find potential to manipulate different garments including developing a number of robotic skills like laundry pile sorting, garment stacking and garment folding/ unfolding. This paper is aimed to study the integration of hardware and software developed for ClopeMa Project on a human-friendly robotic platform, i.e. a Baxter robot that can safely operate side by side with humans. In particular, the paper discusses integration of RGB-D sensor with the ROS environment and studies utility of garment manipulation. The goal is to present a working platform which can autonomously recognize the configuration of a piece of garment spread out on a flat surface. The algorithm for recognizing the garment consists of first applying Gaussian mixture model (MoG) for background subtraction and then using polygonal approximation to acquire feature points for the foreground of the garment. The proposed algorithm is tested online through series of experiments on towel, pants and t-shirts of various colors and materials. Results under varying lightening conditions witness robustness of the proposed scheme.


I. INTRODUCTION
Present research in the domain of robotics is broadly focused in developing robotic skills, which are similar or superior to humans with the ultimate aim of offloading humans from performing repetitive tasks [1].Among the possible applications, this work highlights requirements for garment autonomous manipulation.This involves analysis of the given garment to detect feasible grasping points which can be used to manipulate different garments and develop different robotic skills such as laundry pile sorting according to color and material, garment stacking and garment folding/ unfolding [2].There are number of methods for the analysis of a given piece of garment but this paper is based on the method developed by clothes perception and manipulation (ClopeMa) project [3] which involves geometric approach for the given piece of garment.This approach includes a complete pipeline for clothes configuration recognition by estimating positions of the most important grasping points (e.g.all four corners of a towel).This geometric strategy involves various fast and dynamic programming based methods to obtain efficient results during analysis of the garment.
The setup used for experiments consist of a Baxter robot [4] having an RGB-D sensor mounted on the top of the robot.In front of the robot and within the range of the sensor, a white table has been used as a support for analyzing different garments laying on it.The whole pipeline for the adopted method is given below.

A. Input Acquisition
The input is a single color image of a piece of garment lying on the table.The original model of the cloth (e.g.towel, pant and shirt) is assumed to be known in advance.The image is taken from a RGB-D sensor attached to the top of the Baxter robot.As the position of table and camera is fixed, the analysis is done only on the cropped image of the table while discarding the area outside the table given by the image.

B. Background Subtraction
This is relatively an important step which requires robustness in the algorithm.The goal of this step is to segment the foreground (garment) from the background (table) by assuming significantly different color of the garment in contrast to table.The color properties of the table can be learned from the data (different images of the background) which is called as background learning [5].The background learning makes this step more robust that can solve many noise issues originated during image acquisition the camera.As an outcome of this step, a gray scale image containing only the garment (as maximum pixel intensity) with a black background is obtained.

C. Contour Detection
The gray scale image is then processed through Canny edge detection algorithm to get the contours of the garment.An alternate option is to apply Moore's algorithm [6] for tracing connected boundary of the region.The difference between both methods in that Canny gives edges which can be broken from several places if the result from the background subtraction is not efficient.In this case, these edges cannot be considered as a contour.So, performance of Canny algorithm is adequate provided background subtraction results are reasonably efficient [7].Moore's algorithm gives a single connected contour having the largest connected area.

D. Contour Simplification and Approximation:
The contour obtained in the last step is very dense and contains many unnecessary points which are of no use to us.This contour is simplified based on polygonal approximation and can get the simplified contour points which can be considered as the vertices of the given garment.The approximation calculates the slope of the boundary (contour) and marks a point as a vertex when it detects a significant change in the slope.

E. Matching:
We assumed that the model of the garment [8] is initially known.The approximated model obtained is matched with the known model defined for the corresponding type of the garment.The matching procedure involves finding the correspondence between the approximated vertices and landmark points defining the specific garment model.The matching considers mainly the local features of the approximated model.If the number of vertices are more than the landmark points of the model, unmatched vertices are discarded.

II. BACKGROUND SUBTRACTION
A robust background subtraction algorithm should handle lightening changes, repetitive motions and long-term scene changes.These problems are usually addressed by making the background model adaptive so that its parameters can track illumination changes.Also, increasing the complexity of the model results in accurate representation of multimodal backgrounds.The algorithm reported in [9] is an adaptive method which uses a mixture of Gaussian distributions to model a multimodal background image sequence.For each pixel, each normal distribution in its background mixture corresponds to the probability of observing a particular intensity or color in the pixel.The background is generated by multiple surfaces appearing in the pixel view.Each surface is represented by a normal distribution having a mean equal to the surface intensity or color and a variance due to surface texture, illumination fluctuations or camera noise.
The first step in the background subtraction is to learn the background color which is a probabilistic distribution of RGB values of the background pixels.The distribution is represented as a mixture of  Gaussians i.e.

𝑝(𝑋) = ∑
, ( ;   ,   )  =1 (1) where  is a normal distribution having mean vector  , and a covariance matrix   . , are weights of the Gaussians for k th component.Number of components () for Gaussian Mixture Model (GMM) is determined empirically based on the number of visible clusters in RGB.e.g. in one of the cases considered in the present work, background is completely white (table) while the foreground is completely black (pant).So, only two GMM components are required.GMM forms Gaussians taking into account all the colors in the image.From the Gaussians, the background color and foreground color can be easily judged in the present instance.The background is modeled as a mixture of adaptive Gaussians.Mixture and adaptive natures are due to multiple background colors and lightening conditions respectively.
The Gaussian's parameters are evaluated using a sequence of images of the background.At each iteration, Gaussians are evaluated using a simple heuristic rule to determine which ones are most likely to correspond to the background.At the end of this procedure, foreground is constructed by the image pixels which are not matched with the background Gaussians.
The background model is obtained as follows: In the first step, the Gaussian distribution for all the pixels { 1 ,  2 … … …   } is calculated on the first image of the background.These initial Gaussian parameters are then updated using other images in a learning process based on K-means approximations detailed in the following procedure.
If a new pixel  +1 has been matching with the already existing Gaussian's (within the limit of 2.5) then Gaussian µ ,+1 and  2  .+1 are updated as follows µ ,+1 = (1-)µ , +  +1 (3) where  = ( +1 ;  , ,  2 . ) and  is a learning rate.The weights of the Gaussians are then updated by where  ,+1 = 1 for matching Gaussian and 0 for all the others.If the new pixel  +1 is not matched with the already existing Gaussian, then the Gaussian for that specific pixel is not updated.After this step, we are proceeding heuristically to choose the Gaussian which has most supporting evidence and less variance.So the Gaussians are ordered by the value of / .Then the first B distributions are chosen as a background model i.e.
where T is the minimum portion of the image that has to be a background (threshold).Fig. 1 is the initial background which is the empty table while Fig. 2a is the captured image having a pant on the table at time t (These images belong to ClopeMa dataset.In this case we only took one background because these are simple images having no problem with the noise).By applying the background subtraction the results obtained are shown in Fig. 2b.It is a gray scale image which contains only foreground (pant).

III. CONTOUR DETECTION
The background subtraction resulted in a gray scale image that contains the foreground only.This gray scale image is then processed to find the connected boundary of the foreground, which can be achieved by two different methods:

A. Edge Detection
One way to get the connected boundary of the foreground image is to process the image based on Canny edge detection algorithm.The result can be seen in the given Fig. 3a.

B. Moore's Approach
The idea behind Moore-Neighbor tracing [10] is simple.The Moore neighborhood of a pixel  is the set of 8 pixels which share a vertex or edge with that pixel.These pixels are named as {P1, P2, …, P8} in Fig. 3b.The Moore's neighborhood is also known as the 8-neighbors or indirect neighbors.Given a digital pattern i.e. a group of white pixels on a background of black pixels (a grid), a white pixel is located and is declared as 'start' pixel.This can be done by starting from the bottom left corner of the grid, scanning each column of pixels from bottom to upwards and from leftmost column to the right until a black pixel is encountered.Being on the start pixel, the contour is extracted by going around the pattern in a clockwise direction without loss of generality.The traversal direction can be arbitrary provided it is consistent throughout the algorithm.
The general idea is that every time a white pixel  is hit, we need to backtrack i.e. go back to the previous black pixel and then go around  in a clockwise direction, visiting each pixel in its Moore neighborhood until a white pixel is encountered.The algorithm terminates when the start pixel is re-visited.The visited white pixels form the contour of the pattern.The results of Moore's algorithm on Fig. 2b is shown in Fig. 4. The results from Moore's algorithm are much better than the Canny because Moore deals with the connected boundary while Canny deals only with the edges which may or may not be connected.

IV. CONTOUR SIMPLIFICATION AND APPROXIMATION
After tracing the contour in Section III, next step is to simplify this contour and to get only the significant points on the contour.The number of distinct points  (where  contains ( 1 ,  2 , …   ) points) on the contour depends upon the image resolution as well as on size of the garment piece.Typically,  has an order of hundreds or thousands.We need to simplify the contour by approximating it with a polygon having  vertices with N<<L.The objective is to select a subsequence of N points ( 1 ,  2 , …   ) that is a subset of ( 1 ,  2 , …   ).Additionally we want to minimize the sum of Euclidean distances of the original points ( 1 ,  2 , …   ) to edges of the approximated polygon ( 1 ,  2 , …   ) as seen in Fig. 5. Simplification procedure for the garment contour is based on the dynamic programming algorithm for the optimal approximation of a close curve [11].This algorithm iteratively computes the optimal approximation of points ( 1 ,  2 , …   ) by n vertices from previously found approximation of ( 1 ,  2 , …   ) where  ∈ { − 1 …  − 1} by n-1 points as demonstrated in Fig. 6.In Fig. 5, the original contour ( 1 ,  2 , …   ) plotted in red is simplified with a polygon in blue.The points ( 1 ,  2, …   ) are the simplified points approximated by a polygon (in blue) while minimizing the distances of the original points   to polygon edges.In Fig. 6, the dynamic programming algorithm for polygonal approximation utilizes previously constructed approximation of points ( 1 ,  2 , …  6 ) by n-1 vertices to obtain approximation of next points ( 1 ,  2 , …  6 ,  7 ).
Here, the N-vertex polygonal contour  obtained from Moore's algorithm is approximated into another polygonal contour  with the minimum number of line segments  such that the approximation error () is less than error bound (can be around 2%) [12].() is defined as a Euclidean distance from the vertices of the curve  to the approximated line segment of curve .() =  1≤≤ (  ,  +1 ) (7) where (  ,   ) =  ≤≤ ; (  ,   ) and   =   ,  +1 = The approximation [11] for the closed curve  with the fixed starting point can be found by first converting into a feasibility graph which is constructed on vertices of the curve  for the given error tolerance.Nodes  = { 1 ,  2 , …   } of the graph G1 are vertices { 1 ,  2 , …   } of the curve .A pair of nodes   and   is connected by an edge if the approximation error for the curve segment {  , +1 ,   } by the line segment (  ,   ) is less than a given error tolerance i.e. (  ,   ) ≤ .
The solution to this problem lies in determining the shortest path in the feasibility graph G1.To find this path in a directed acyclic graph, we introduce 1D discrete state space Ω 1 =   :  = 1, . ..Every point in this state space represents the sub-problem of the node in the graph G1.The cost function () is given as the minimum number of edges in the shortest path and is calculated by dynamic programming for all n=1,2,....N.

A. Bellman and Ford Algorithm
The approximation algorithm computes final contour points which are the cost of the cheapest paths from a starting node to all other nodes in the graph.Thus, the paths afterwards can also be constructed.The algorithm considers each  points as nodes and tries to find the shortest path by satisfying the condition that the approximation error () should be less than error bound.The first estimate is:  The starting node has cost 0, as its distance to itself is obviously 0.
 All other nodes have cost as infinity, which is the worst estimate possible.
From Fig. 7, the subset of total contour points (in the box) has been chosen.Based on which, the algorithm finds the shortest path which leads to polygonal approximation.Afterwards, the algorithm checks every edge for the condition if cost of the source of the edge plus the cost for using the edge (considered as unity) is smaller than the cost of the edge's target i.e.

B. Working of Dynamic Programming
The algorithm calculates shortest paths in a bottom-up fashion.It first calculates the shortest distances for the shortest paths which have at-most one edge in the path.Then, it calculates shortest paths with at-most 2 edges, and so on.After the  ℎ iteration of the outer loop, the shortest paths with at most  edges are calculated.A simple path can contain maximum of |V| -1 edges, so the outer loop runs |v| -1 times.Assuming that there is no negative weight cycle, if we have calculated shortest paths with at most  edges, then an iteration over all edges guarantees to give shortest path with at-most  + 1 edges.The determined shortest path is the required contour points.The working of the algorithm can be illustrated by Figs. 8 (a-c).where ∅  is termed as Normalized Cross Correlation (NCC) which is frequently used in computer vision.It is used to get rid of the big differences between pixel positions that affect the overall efficiency of the algorithm.NCC is calculated for each pixel.For pixel , NCC ( ̂) is calculated as where,  ̅ is the mean of overall image and √∑( −  ̅ ) 2 is standard deviation.The matrix  also has values in the range [0, 1].Each row in  is examined to find the maximum element.Then it is evaluated if the same element is also maximum in the corresponding column of .Holding this condition true implies that we the match is found.

VI. RESULTS
The algorithm has been applied on images from ClopeMa dataset using RGB-D sensor and on-line results have been acquired.Initially, several background models have been formulated with an empty table for 'learning background'.Then the experiments are done on-line on the towels, pants and tshirts of various colors and materials.In these results, green circles show the vertices of the given garment which are actually the feature points of the specific garment.
Setup for the experiments is same as mentioned in Section I.The images acquired from RGB-D senor are at the rate of 30 frames/second while the machine used for processing of the images is Intel core i7 CPU 930 @ 2.80 GHz (64-bit) with 8 GB RAM.For each garment category but with different materials, 2-3 experiments have been done.For the sake of brevity, results corresponding to one material for each garment is presented.Since we have not performed model matching, so the algorithm is detecting the vertices even if the garment is not completely flat on the table.Fig. 9 shows results on a t-shirt orientated on the table.8-10 points have been detected depending on the flatness of the shirt on the table.Thanks to GMM, the presented algorithm is also capable of recognizing correct vertices even by changing the lightening conditions.Varying these conditions, Fig. 13 (1-c) presents the results of contour points corresponding to low, medium and high lightening.One can see that consistent results are obtained even though the conditions are different.This implies that the proposed algorithm is robust enough to take care of noise, shadows and lightening issues.We have presented an algorithm based on the ClopeMa project which can provide the feature points for nearly all type of garments.The robustness of the algorithm is evident by its performance in variable lightening condition.The primary limitation offered by this approach is constituted by the fact that we need a background significantly different from the garment.In case the portion of background i.e. color of table is similar to that of color of the garment, the performance of the presented approach highly depends on the size of the matching portion.In worst cases, the algorithm may give feature points within the cloth by considering the boundary of the matching portion as the boundary of the whole cloth.This may lead to failure of the method due to wrong detection of vertices.Thus, the key assumption of proper working of the approach is that the background color should be different that garment color.
In future, this algorithm can be used for calculating the transformation from the camera to the pixel location to get the 3D point of the features which can be passed to Baxter for the development of numerous robotic skills.

Fig. 3 .
Contour detection: (a) Result of edge detection (b) Moore neighborhood of pixel P.
Fig. 8. Algorithm: (a) Initialization (b) Update list.(c) Contour points determined by dynamic programming.V. FEATURE MATCHING Since the model of the garment on the table is assumed to be known, each model is determined by the vertices of the specific garment.Following the marking of vertices in the known model, we need to match these features (vertices) with the obtained simplified vertices for the specific garment on the table.So, we have two sets of feature points; one from the known model  and other from the contour approximation .Mathematically,  1 ={(1  , 1  , 1  )} =1   2 ={(2  , 2  , 2  )} =1  where  = [, ] are positions of features in the image while 1, 2 are the image patches around feature points which are centered at (3*3).The neighborhood 3*3 image patch can be seen in Fig. 3b.The adjacency matrix is then calculated by, (, ) =  −||1  −2 || 2  2 ⁄ where  is N*M matrix measuring the degree of closeness between the feature  1 in image  1 and feature  in image  2 . is the standard deviation.The matrix  has values in the range [0, 1] and is used to calculate affinity matrix by (, ) = (, ) * 1 2 (∅  (1  , 2  ) + 1)

Fig. 10 Fig. 11 .
Fig.10presents results of application of the algorithm on black and yellow towel (Knitwear material) orientated at various angles with four feature points.Fig.11shows results on the red towel (Cotton toweling material) orientated at different angles while a blue jean spread in multiple orientations is illustrated in Fig.12.As can be seen that the pant contains seven feature points.
. Feature extraction under various lightening conditions: (a) Low (b) Medium (c) High VII.CONCLUSION