In this paper we address the problem of inserting virtual content in a video sequence. The method we propose uses just image information. We perform primitive tracking, camera calibration, real and virtual camera synchronisation and finally rendering to insert the virtual content in the real video sequence. To simplify the calibration step we assume that cameras are mounted on a tripod (which is a common situation in practice). The primitive tracking procedure, which uses lines and circles as primitives, is perfomed by means of a CART (Classification and Regression Tree). Finally, the virtual and real camera synchronisation and rendering is performed using functions of OpenGL (Open Graphic Library). We have applied the method proposed to sport event scenarios, specifically, soccer matches. In order to illustrate its performance, it has been applied to real HD (High Definition) video sequences. The quality of the proposed method is validated by inserting virtual elements in such HD video sequence.
In our implementation,there are several stages as is shown in Figure 1.
Figure 1: Stages in our implementation for inserting virtual content in a video sequence.
Where primitive detection is performed by means of a morphological method described in [7], primitive tracking stage is explained in [14], the geometry of the tripod and calibration are described in [16], finally we can synchronise the real camera with a virtual camera and project the virtual objects onto the real image using OpenGL [2].The synchronisation is done by calculating virtual camera parameters from real camera parameters. To perform the camera synchronisation, we have to configure the virtual camera with the real camera parameters.
Figure 2: Top: Real camera. Where Rn is rotation, ¯tn is translation and C is image centre, the centre components are xc and yc from intrinsic parameters. Bottom: Virtual camera. Where VUP is the vector that indicates the camera vertical axis. width and height are the real image width and height. Cv is the principal point. The points top,left,right and bottom are the known points to define the clipping planes.
That conversion is done with the next expressions:Figure 3: Virtual camera synchronised. The real image is in the near clip frame, where the virtual objects are rendered.
Before render, a segmentation of the real image must be done. This segmentation allows differentiate the grass from the players and the lines. That is useful for the virtual ads which are painted on the grass. We only replace the pixels belonging the grass avoiding the occlusion of the players or lines. The segmentation is done converting image to HSV space. We calculate the H histogram and get the maximum value. It will be the green of the grass because in our scenario, it is the dominant colour. The mask obtained is stored in the alpha channel of the original image, this channel is used by OpenGL to know which pixel is transparent. Then we paint the original image as background of our virtual world, and OpenGL replaces transparent pixels with pixels from the virtual world.
We have tested our method on different video sequences using both, scale models and real scenes from soccer matches. The sequences acquired using the scale model consist of 1440 x 809 frames. Real soccer sequences are 1920 x 1080 high definition video sequences.
In this paper we study the augmented reality in sport scenarios using cameras mounted on a tripod, in these scenarios there are usually a small number of visible primitives which can be considered to perform the calibration. To solve thisproblem, we firstly assume that the camera is mounted on a tripod (which is a common situation in practise) and we study the geometry of the tripod from a mathematical point of view. This assumption strongly simplifies the calibration problem and allows recovering the frame calibration in situations where general calibration techniques fail. Secondly, we use a simple method for primitive tracking based on a CART (Classification and Regression Tree). This method is used in the calibration procedure and takes into account colour information. Besides, for camera synchronisation we made a correspondence between real camera and virtual camera, calculating virtual camera parameters from real camera parameters. Finally, we render using OpenGL because it offers easy management of virtual camera and optimised graphic processing at graphic card. We present some experiments using HD videos of sport events (soccer matches) in both, scale models and real scenarios. In order to validate our approach, we insert some graphics into the video sequences. The numerical results we present are precise and very promising.
This research has partially been supported by the MICINN project reference MTM2010-17615 (Ministerio de Ciencia e Innovación. Spain). We acknowledge MEDIAPRODUCCION S.L. for providing us with the real HD video we use in the numerical experiments.