A dataset with detailed annotation and rich semantics

Celebrity Gig
Credit: Pixabay/CC0 Public Domain

Group activity recognition (GAR), which aims to identify activities performed collectively in videos, has gained significant attention recently. Existing GAR datasets typically annotate only a single Group Activity (GA) instance per sample, carefully selected from original videos.

This approach, while precise, diverges significantly from real-world contexts, which often involve multiple GA instances. Moreover, single word-level annotations are insufficient to encapsulate the complex semantic information in GA, thereby constraining the expansion and research of other GA-related tasks.

To mitigate these limitations, a research team led by Wang Yun-Hong (Beihang University, China) published their research on 15 December 2024 in Frontiers of Computer Science.

The team proposed FIFAWC, a novel dataset for GAR characterized by three notable distinctions:

  1. Comprehensive annotation: They thoroughly annotate all included GAs in each sample and retain the original frame count, diverging from previous datasets that focus on a single GA annotation and uniform frame normalization, which enhances the dataset’s complexity and practical application potential for advanced research.
  2. Semantic description: Each clip in FIFAWC is accompanied by an elaborate caption from sports commentators, ensuring content accuracy and professionalism. This positions FIFAWC as a data foundation for a variety of tasks, such as video captioning and retrieval.
  3. New scenario: FIFAWC marks a novel divergence from previous ones by featuring soccer match footage. The expansive spatial areas and rapid movements characteristic of soccer introduce new challenges, such as dynamic camera movements and smaller targets in frames, significantly elevating the complexity and difficulty of GAR.
READ ALSO:  FG plans support for farmers, projects 31million MT grains

In the research, they benchmark FIFAWC on two tasks: traditional GAR and innovative GA video captioning. For GAR, they evaluate the classical detector-based approach ARG, and the state-of-the-art detector-free DFWSGAR.

The results reveal high accuracy at category level, but low accuracy at sample level because of multiple GAs per sample, reflecting the complexity and challenge of FIFAWC. Compared to the exemplary performance (25.87 in terms of CIDEr) of PDVC on the ActivityNet dataset, the poor performance on FIFAWC indicates that further research is necessary for GA video captioning.

READ ALSO:  Disney CFO Christine McCarthy to step down

More information:
Duoxuan Pei et al, FIFAWC: a dataset with detailed annotation and rich semantics for group activity recognition, Frontiers of Computer Science (2024). DOI: 10.1007/s11704-024-40027-3

Provided by
Higher Education Press

Categories

Share This Article
Leave a comment