What if a security camera could not only capture video but understand what’s happening—distinguishing between routine activities and potentially dangerous behavior in real time? That’s the future being shaped by researchers at the University of Virginia’s School of Engineering and Applied Science with their latest breakthrough: an AI-driven intelligent video analyzer capable of detecting human actions in video footage with unprecedented precision and intelligence.
The research paper is published in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence.
The system, called the Semantic and Motion-Aware Spatiotemporal Transformer Network (SMAST), promises a wide range of societal benefits, from enhancing surveillance systems and improving public safety to enabling more advanced motion tracking in health care and refining how autonomous vehicles navigate through complex environments.
“This AI technology opens doors for real-time action detection in some of the most demanding environments,” said professor and chair of the Department of Electrical and Computer Engineering, Scott T. Acton, and the lead researcher on the project. “It’s the kind of advancement that can help prevent accidents, improve diagnostics and even save lives.”
AI-driven innovation for complex video analysis
So, how does it work? At its core, SMAST is powered by artificial intelligence. The system relies on two key components to detect and understand complex human behaviors. The first is a multi-feature selective attention model, which helps the AI focus on the most important parts of a scene—like a person or object—while ignoring unnecessary details. This makes the system more accurate at identifying what’s happening, such as recognizing someone throwing a ball instead of just moving their arm.
The second key feature is a motion-aware 2D positional encoding algorithm, which helps the AI track how things move over time. Imagine watching a video where people are constantly shifting positions—this tool helps the AI remember those movements and understand how they relate to each other. By integrating these features, SMAST can accurately recognize complex actions in real time, making it more effective in high-stakes scenarios like surveillance, health care diagnostics, or autonomous driving.
SMAST redefines how machines detect and interpret human actions. Current systems struggle with chaotic, unedited contiguous video footage, often missing the context of events. But SMAST’s innovative design allows it to capture the dynamic relationships between people and objects with remarkable accuracy, powered by the very AI components that allow it to learn and adapt from data.
Setting new standards in action detection technology
This technological leap means the AI system can identify actions like a runner crossing a street, a doctor performing a precise procedure or even a security threat in a crowded space. SMAST has already outperformed top-tier solutions across key academic benchmarks including AVA, UCF101-24 and EPIC-Kitchens, setting new standards for accuracy and efficiency.
“The societal impact could be huge,” said Matthew Korban, a postdoctoral research associate in Acton’s lab working on the project. “We’re excited to see how this AI technology might transform industries, making video-based systems more intelligent and capable of real-time understanding.”
More information:
Matthew Korban et al, A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024). DOI: 10.1109/TPAMI.2024.3377192
Citation:
AI-driven video analyzer sets new standards in human action detection (2024, October 16)
retrieved 16 October 2024
from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.