Login. Search. Navigation. Content

HERMES Project


Partnerarea


Forgot Password?

Logo EU 7th Framework Programme

HERMES Technology


A technology survey among partners has been issued considering a list of categories that could be helpful to define a robust and coherent platform architecture, trying not to strictly follow the task description of each single work package, but keeping it generic enough to have, in the end, a complete overview of the technology state of the art. This allows to have both a generic and detailed idea of the technology each partner will provide to HERMES. Generic in terms of the logical components making up each subsystems, the hardware requirements and the languages/framework used to develop the various software components. Detailed for what is the more technical aspects. In this respect detailed interfaces and data structures/format provided to the upper level have been described. The collected information deals with 7 logical blocks:

  1. Sensing infrastructure
  2. Visual Processing
  3. Audio Processing
  4. Low-level Information Fusion
  5. Context Modelling
  6. Indexing, Annotation and Knowledge Conceptualization
  7. Semantic data summarization and meta data processing

The sensing infrastructure consists of several components, which can be grouped into 3 main categories:

  • Perceptual Components based on video. These components are mainly capable of tracking and recognising single or multiple targets in indoor or outdoor conditions.
  • Perceptual Components based on Audio. These components can identify speakers or locate the direction of arrival of sound based on the information of microphone arrays.
  • Middleware for perceptual components intercommunication.

Concerning audio three components are available:

  • Speech Analysis Component - Capable of online and offline speech processing; the first one is focused on real-time scenarios while the second one is responsible of extraction, indexing and storage of speech information; this component includes three processing blocks for speaker identification, automatic speech recognition and emotion detection.
  • Speech Information Retrieval Component - This module is capable of performing searches querying the system through speaker identities, emotional states and speech contents, returning a list of speech information items sorted in the order of their estimated relevance to the query.
  • Text-To-Speech Synthesis (TTS) Component - This module deals with the conversion of textual messages to spoken messages; it can be embedded for use on HERMES mobile devices or common desktop home computers.

Moreover, advanced processing potential could be found in the following technical skills:

  • Content access to compressed images and videos directly in compressed domain
  • Shot cut detection to divide videos into sections
  • Low-level content extraction to high-level semantics through pattern recognition
  • Intelligent video content analysis and interpretation via machine learning and artificial intelligence approaches

The abovementioned capabilities will be further developed in HERMES towards construction of a computer-aided memory management system for elderly people, which aims to include:

  • process metadata via individual attributes and elements;
  • correlate and establish correspondence between metadata attributes across different events and dates;
  • decision support, modelling, and making to generate reminders;
  • convenient access to events, diary, and activities via multimedia playback of memories.