CART application to image primitives tracking
Luis Alvarez, Pedro Henriquez, Javier Sánchez
CTIM. Centro de I+D de Tecnologías de la Imagen
Universidad de Las Palmas de G.C.

Index

 

Introduction

In this paper we present a new method for image primitives tracking based on a CART (Classification and Regression Tree). Primitives tracking procedure uses lines and circles as primitives. We have applied the proposed method to sport event scenarios, specifically, soccer matches. We estimate CART parameters using a learning procedure based on RGB image channels. In order to illustrate its performance, it has been applied to real HD (High Definition) video sequences and some numerical experiments are shown. The quality of the primitive classification with the decision tree is validated by the percentage error rates obtained and the comparison with other techniques as morphological primitive detection.

Decision tree building and learning to classify primitives

A CART, as those described in [1], is used to detect the white primitives. One set of features is considered to discriminate different pixel classes. In the learning procedure of the decision tree, it is important to determine the way to select the channels and thresholds in order to build a simple tree. To build the decision tree is necessary a learning stage based on a training set with information about different classes, see for instance [2]. Then for each video sequence, we have a classification data set, which contains information about two classes, primitives and background. Usually, in our soccer field scenarios, primitives are white and the background is green. In the data set, there are RGB values obtained from a manual segmentation of the first frame of the sequence. These RGB triplets are taken to build a three-channel decision tree. It determines, within each node, which channel provides the best discrimination. With this purpose, we use a measure to estimate the impurity of the sets based on Gini’s index, see for instance [3]:

 

where PK is the probability of a point belonging to a class. In the learning procedure of the decision tree, in order to decide the channel and the threshold which are selected within each node, we have to minimize the resulting impurity measure by dividing the set of points. The aim is to find the values Xi and Ci which minimize the compound energy in Equation 2, which is the addition of the energies of both son nodes:

 

Primitive tracking by means of a decision tree

For each frame, first, we initialize the primitive location using the information of 2 previous frames. The resulting primitive points can be in two situations: either on a white primitive or on the background. If the projected point was classified into the primitive class by the decision tree, we must search for the edges of the primitive in both directions of the perpendicular orientation (see Figure 1 left). Once the edges have been found, we calculate the midpoint between both, which will be considered as the center of the primitive. If the projected point was classified into the background class by the decision tree, we also search for a primitive pixel in both directions of the perpendicular orientation. However, as soon as a primitive pixel is found, we only continue moving in that direction, searching for the other primitive edge. Once it has been found, we calculate the midpoint (see Figure 1 right).

In both situations, in order to avoid considering large white zones, such as advertisements or players wearing white clothes, we have to control the thickness of the primitives with a threshold. This threshold is dynamically obtained, because farther primitives are thinner than closer ones. We calculate this threshold as the distance between the projections of a reference point and another point obtained with the addition of a certain thickness. If we have examined more pixels than the width limit, the primitive detection is rejected at that point.

Figure 1. Search for the edges in the primitive tracking procedure. The red point is the point initialization based on the previous frames. The blue dot-line is the orthogonal line that we examine to find the edges. The green square is the center of the primitive.

Numerical experiments and results

The training data set is manually obtained by segmenting the first frame of the sequence, with only two different classes: primitives (white lines and circles) and grass. Example of manual segmentation is shown in Figure 2. In order to achieve the best results for classification, we have tested different CART configurations varying the number of channels. These tests consist in build a decision tree with a set of training data. After the learning procedure, the data is classified with the decision tree and the pixels classified with errors are counted. Finally, we compare decision tree classifications with manual classifications on four random frames extracted from each video. The results are shown in Table 1 and Table 2. We can observe that the best configuration for the decision tree channels is RGB (Figure 3). The performance of the decision tree is also illustrated in Video 1 and Video 2, where some videos with the classification results are obtained.

Figure 2: Two different classes are used in the manual segmentation: white primitives and grass. Grass is segmented using polygons, whereas segments are used for primitives. Real image.

 
 

Figure 3: Decision tree configurations percentage error rates (left: scale model images, right: real images).

Scale model sequence

 

Real sequence (HD)

We have compared the classification results obtained with decision tree using RGB and the results of classifying with the morphological method [9]. The morphological method described in [9], takes an average of 2725 ms to process a HD frame in a four-core processor. The method we propose in this paper using a decision tree takes 7 ms per HD frame with the same processor. In terms of computational complexity the main difference between both methods is that a decision tree computation is very fast and the method is local (i.e. we only need to process a neighborhood of primitive location). Morphological operation takes much more time and the procedure process the whole image. In Figure 6, we show the primitive tracking stage in a frame with all the points which have been analyzed to find the white primitives.

Conclusions

In this paper we study how to improve the primitive tracking stage of the difficult problem of camera calibration of video sequences. These videos are scenarios where, in each frame, there are usually a small number of visible primitives which can be considered to perform the calibration. To track these primitives through the whole video sequence we propose a new method for primitive tracking based on a CART (Classification and Regression Tree). Decision tree is estimated using a learning procedure based RGB image channels and a training set. We present some experiments using HD videos of sport events (soccer matches) in both, scale soccer court models and real scenarios. The procedure we propose is very fast and accurate. Using a combination of RGB channel information the maximum classification error obtained is just about 0’16% (for images which are not included in the training set). As an application of the proposed method, we calibrate the video sequence using the obtained primitive tracking and we illustrate the obtained results showing some images where we have inserted graphics objects in some images using the obtained calibration.

Acknowledgement

We acknowledge Mediapro for providing us with the test images used in this paper.

References

  1. L. Breiman, JH. Friedman, RA. Olshen, CJ. Stone: Classification and Regression Trees. Belmont, CA: Wadsworth, year 1984.
  2. D. Pea: Anlisis de datos multivariantes. Madrid, year 2002.
  3. T. Hastie, R. Tibshirani, Jerome Friedman: The elements of Statistical Learning. Canada, year 2001.
  4. JB. Hayet and J. Piater: On-Line Rectification of Sport Sequences with Moving Cameras. In: MICAI 2007: Advances in Artificial Intelligence, volume 4827, pages 736-746, year 2007.
  5. H. Kim and KS. Hong: Robust Image Mosaicing of Soccer Videos using Self- Calibration and Line Tracking. In: Pattern analysis and applications, volume 4, pages 9-19, year 2001.
  6. D. Farin and S. Krabbe and PHN. de With and W. Effelsberg: Robust camera calibration for sport videos using court models. In: Storage and Retrieval Methods and Applications for Multimedia, volume 5307, pages 80-91, year 2004.
  7. D. Farin and J.G. Han and PHN. de With: Fast camera calibration for the analysis of sport sequences. In: IEEE International Conference on Multimedia and Expo (ICME), volume 1-2, pages 482-485, year 2005.
  8. Y. Watanabe and M. Haseyama and H. Kitajima: A soccer field tracking method with wire frame model from TV images. In: 2004 International Conference on Image Processing, volume 1-5, pages 1633-1636, year 2004.
  9. M. Aleman-Flores and L. Alvarez and P. Henriquez and L. Mazorra: Morphological Thick Line Center Detection. In: 7th International Conference on Image Analysis and Recognition, volume 6111, pages 71-80, year 2010.
  10. M. Emre Celebi and H. Iyatomi and W. V. Stoecker and R. H. Mossd and H. S. Rabinovitz and G. Argenziano and H. P. Soyer: Automatic detection of blue white veil and related structures in dermoscopy images. In: Computerized Medical Imaging and Graphics, volume 32, pages 670-677, year 2008.
  11. G. Macchiavello, G. Moser, G. Boni, S. B. Serpico: Automatic unsupervised classification of snow-covered areas by decision-tree classification and minimun error thresholding. In: IEEE International Geoscience and Remote Sensing Symposium, volumes 1-5, pages 1251-1254, year 2009.
  12. Z.Zili, Q.Qiming, G. Junping, D. Yuzhi, Y. Yunjun, W. Zhaoqiang, D. Fanwei: CART-Based Rare Habitat Information Extraction For Landsat ETM+. In: IEEE International Geoscience and Remote Sensing Symposium, volume 3, pages 1071- 1074, year 2008.