ARIA - Quality-Adaptive Media-Flow Architectures for Sensor Data Management
  NSF Award : IIS-0308268
Home People Research Publications Directions Screenshots

Table of Contents

1.Overview of the ARIA Framework
2.Formal Specification of the Architecture

3.Precision, Delay, and Resource Requirement Models
4.DANS: ARIA toward the Distributed Networks

Overview of the ARIA Framework

The main components of the ARIA architecture (Figure 1) consist of

Figure 1: ARIA architecture

Intelligent Stage
The Intelligent Stage at DMA is equipped with various video cameras and pressure sensors. The new sensors that are being installed into the Intelligent Stage include: floor mat sensors to generate data for object localization and motion classification, microphone arrays for sound localization and beamforming, Vicon 8i Realtime system with 8 cameras for tracking the position and motion of human body that is appropriately marked with reflective markers, ultrasound sensors for object tracking, and video sensors for determining the presence or absence of marked persons, determining their relative spatial positions, and detecting certain simple events. The Intelligent Stage will act as a test-bed for validating the results of the proposed research.

Media-Flow Architecture
Interactive performances will require an innovative information architecture that will process, filter, and fuse sensory inputs and actuate audio-visual responses in real-time, while providing appropriate QoS guarantees. Therefore, we propose to develop an adaptive and programmable ARchitecture for Interactive Arts (ARIA). ARIA will be capable of real-time sensing and streaming various types of audio, video, and motion data, accessing external data sources, extracting various features from streamed data, and fusing and mapping streams onto output devices as constrained by the QoS requirements. ARIA media-flow architecture will provide a run-time kernel which is a quality-adaptive and programmable media-flow network consisting of fusion operators and media-stream integration paths.

Visual Design Language Interface
ARIA will provide the choreographer with a visual design tool to describe the various processing components, such as sensors and display elements, in the interactive performance. The specifications will include streaming characteristics of the sensors; precisions and computational overheads of the feature extractors; schemas, interfaces, and data access costs of the external data sources; functionality and QoS of the fusion operators and integration pathways in the media-flow network; and display and audio features of the actuators. The language will also provide mechanisms for describing local (per-processing-component) and end-to-end (sensor-to-actuator) delay, QoS, and frequency constraints. As discussed in the related work section, there are various data- and media-flow languages, most of which rely on visual tools. We will develop a visual language to specify and design media-flow networks, building on existing languages (such as Max) used in the domain of media flow specification. This language will be more expressive than the existing languages, in that it will allow for description of QoS constraints along with the media-flow connectivity.

Back to Top

Formal Specification of the Architecture

The ARIA media-flow architecture is modeled as a flow network, G(V,E), where

  • the set, V, of vertices (or nodes) represents the set of sensors, filters, fusion operators, external sources, and actuators;
  • the set, E, of edges represents media streams between components.

Figure2: An example media flow network

Figure 2 provides an example flow network. Given two vertices vs and ve, a flow path, path(vs, ve), between these two vertices is a sequence, <vs, ..., ve> of vertices, such that each consecutive vertex on the sequence is connected by an edge. If there is no such a path between vs and ve, then path(vs, ve) = < >. A flow cycle, then, is a flow path, where vs = ve. A flow graph without any cycles is called an acyclic flow graph and one with cycles is called a cyclic flow graph. In the rest of the section, we formally describe the components that constitute the quality-adaptive media-flow network.

Back to Top


In ARIA, the basic information unit is an object. An object is produced by a sensor or an external data source. An object does not exist in isolation; it is a part of an object stream (or stream) as discussed below. An ARIA media flow network can consist of multiple streams of objects. Depending on the task, an object can be as simple as a numeric value (such as an integer denoting the pressure applied on a surface sensor) or as complex as an image component segmented out from frames in a video sequence. Each object streamed through ARIA contains:

  • an object payload, such as a string, a numeric value, or an image region, and
  • a meta-data header, that describes the object properties:

    -size of the object data that defines the memory or buffer requirement for the object,

    -precision of the object data (for different object types, precision may mean different things; for an image, its precision may mean its resolution, whereas for a coordinate value, it may mean the level of confidence provided by the object tracking sensors),

    -set of buffer usage stamps of the object as it is transformed through filtering and fusion operations, and

    -set of timestamps acquired by the object each time it went through sensing, filtering, and fusion operations. The total time delay incurred by the object can be calculated from the set of time stamps.

We denote the set of all object payload types using O. In the rest of the proposal, we refer to object payload types as object types.

Back to Top


In ARIA, each stream denotes a transmission channel of objects of the same type. We classify streams into two types: regular and irregular streams.

Regular Streams A regular stream is a data stream, where all objects have a constant frequency, resource requirement, and precision.

  • For example, a sequence of 2-byte surface pressure values, measured within 99.9% precision and generated every 10 milliseconds by a floor sensor, forms a regular object stream.

Consequently, given an object type O A O, we can represent a regular stream type as a quadruple <O, f, b, p>, where f is the frequency with which data is available in the stream, b is the resource requirement of each object, and p A [0, 1] is the precision of the data in the stream. The set of all regular stream types (of objects of type O) is denoted as RS(O).

Irregular Streams An irregular stream, on the other hand, is a data stream where object frequency, resource requirement, and precision are varying or even unpredictable.

  • For example, let us consider a face extraction module that recognizes and segments faces in a video feed and makes the resulting face-objects available for further processing. Since the availability of face-objects in the resulting stream depends on the video feed and since sizes of the objects depend on the closeness of the faces to the camera, the resulting stream is irregular.
Given an object type O A O, we represent an irregular stream type as <O, f, b, p>, where f is the expected frequency with which data is available, b is the expected resource requirement of each object, and p is the expected precision of the objects in the stream. Note that in addition to the expected frequency value, other stochastic properties of the streams can also be described. In this proposal, for the sake of clarity, we are limiting the discussion to expected frequencies of the streams. If the stream does not have an expected frequency, then f = v2. If the objects in the stream do not have an expected precision, then p = v and if the objects in the stream do not have an expected size, then b = v. The set of all irregular stream types (of objects of type O) is denoted as IS(O).

Since any stream is either regular or irregular, the set of all stream types (of objects of type O)is AS(O) = RS(O) U IS(O)

Back to Top


Sensors record and transmit various features of the real world environment. In the ARIA architecture, sensors act as stream sources. Each sensor (of object type O), n(O), has a corresponding set,
S(O) c AS(O), of stream types that it can generate. However, at any given point in time a sensor can generate only one object stream.

  • For example, consider a motion sensor that can either provide high-precision motion information every 100 milliseconds or low-precision motion information every 10 milliseconds. This sensor has two stream types.

A sensor, n(O) = <S(O)>, can make objects of type O available at different frequencies, sizes, and precisions. Consequently, a scalable sensor may need to consider the trade-off between object availability, resource requirements, and quality. Sensors that deliver only regular streams are called regular sensors, whereas those that deliver irregular streams are called irregular sensors.

Back to Top


While sensors generate object streams, actuators consume object streams and map them to appropriate visual, aural, or haptic outputs.

  • For example, consider a monitor that can display the face-objects delivered to it and can change the picture on the screen every 2secs. This monitor is an actuator with a frequency upper-limit.

Each actuator (of object type O), α(O), is of the form α(O) = <S(O)>, where S(O) is the set of object streams that the actuator can accept. Similar to the sensor, an actuator can only accept one object stream at a given time.

Back to Top


A filter takes an object stream as input, processes and transforms the objects in the stream, and outputs a new stream consisting of the transformed objects.

  • For example, consider a module that takes a stream of face-objects as its input and returns the emotion of the face as its output. This module is a transforming filter. Note that the precision of the result may depend on the number of consecutive faces considered or may depend on what type of a heuristic/ neural-net is used. Consequently, the filter may provide multiple precisions, each with its own end-to-end delay and resource requirement.

More formally, a transforming filter, tf(O, O'), takes a stream of type S = <O, f, b, p> A AS(O) and returns a stream of type S' = <O', f', b', p'> A AS(O'). Each transforming filter, tf, has

  • a set, P, of precision factors it can operate at. Each precision factor, ρi A P, describes the degree of precision change due to the filtering operation, i.e., ρ = p'/p.
  • a set, D, of end-to-end delays. For each precision factor ρi A P, there exists an end-to-end delay δi A D.
  • a set, R, of resource requirements. For each precision factor ρi A P, there is a resource requirement ri A R.
  • an input frequency range, F.

Consequently, each filter, tf(O, O'), is a tuple of the form <O, O', P, D, R, F>. There are some special filters that do not necessarily operate on the "content" of the objects in the stream, but on their higher-level properties or attributes, such as frequencies, precisions, or buffer requirements. Obviously, while altering higher-level properties of the streams they may also alter the contents of the objects. Since these operators may specifically be used in ensuring that the streams satisfy local operator and global end-to-end QoS constraints, we will treat them separately. The special filters consist of

Frequency-Scale Filters A frequency-scale filter, fsf∏, is a special filter which, given a stream s = <O, f, b, p>, returns a stream s' = <O, f B ∏, b', p'>. In other words, it modifies the frequency of the data in the stream as a function of the original data frequency.

Precision-Scale Filters A precision scale filter, pspρ, is a special filter which, given a stream s = <O, f, b, p>, outputs a stream s' = <O, f', b', p B ρ>. In other words, it modifies (improves or degrades) the precision of the objects that pass through the filter. Note that if p=v, then the precision-scale filter changes the precision of each individual object by a factor of ρ, however, the resulting stream does not have an expected precision.

Resource-Scale Filters A resource-scale filter bsbк, is a special filter which, given a stream s = <O, f, b, p>, outputs a stream s' = <O, f', b B к, p'>. In other words, it modifies the resource requirements of the objects.


Back to Top

Fusion Operators

A fusion operator,
X(Oin, Oout), is similar to a transforming filter, except that it takes as its input multiple streams (whose types are described by the vector Oin) and returns as its output multiple streams (whose types are described by the vector Oout).

  • Consider a module which receives object-tracking information from multiple redundant sensors and outputs fused highly-precise object-tracking information. The total fusion delay, as well as other resources, would depend on the degree of the final precision.

Therefore, a fusion operator, X(Oin, Oout) has

  • a set, P, of precision function vectors of the form, ρ=[ρ1, ..., ρl], where l = length(Oout) and where ρj is a function that describes the precision of the jth output stream in terms of the precisions of the input streams.
  • a set, D, of end-to-end delays. For each precision function vector ρi Є P, there exists an end-to-end delay δi Є D.
  • a set, R, of resource requirements. For each precision function vector ρi Є P, there exists a resource requirement ri Є R.
  • an input frequency range vector, F.

That is, X(Oin, Oout) can be represented as <Oin, O'out, P, D, R, F>.

There are some special fusion operators that do not necessarily operate on the "content" of the objects in the input stream, but on their higher-level properties. The special fusion operators consist of:

  • Duplicators A duplicator, Fn, is a special fusion operator which makes multiple(n) copies of its input.
  • Multiplexers A multiplexer, cn, is a special fusion operator which takes multiple(n) inputs of the same type as its input, and outputs only one of them.
  • Synchronizers A synchronizer Dn, is a special fusion operator which takes n input streams, and outputs output n streams that are synchronized with respect to the same timestamp.


Back to Top

External Data Sources

Like fusion operators, external data sources are also represented as
<Oin, O'out, P, D, R, F>. The main differences between external data sources and fusion operators are that

  • the external data sources are usually designed to be accessed through queries generated on demand. Therefore, the input streams to external data sources are mostly irregular.
  • the amount of time required for posing a query to an external resource and getting a result is usually much larger compared to the end-to-end delay of fusion operators. Furthermore, unlike the mostly fixed end-to-end delays of fusion operators, the delay in external data sources can vary significantly from query to query.

Therefore, external data sources, in general, have to be treated differently from other fusion operators.


Back to Top

Precision, Delay, and Resource Requirement Models

The delay, resource requirements, and precision characteristics of these ARIA components depend on their parameter assignments. Consequently, both individual component performance and end-to-end ARIA QoS performance are inter-dependent and controllable through assignment of the parameter values. Given a flow graph , let us denote the parameter value assignments (set by the user or by the ARIA optimizer) as , where is the parameter value assignments for the operator corresponding to vertex . Using the assignment values, one can then develop various precision, delay and, resource requirement models. Below, we describe one such model:

  • Precision: Precision can be multiplicative. In this case, given an operator, the output precision is calculated by multiplying the precisions of the input streams and the precision assignment of the node. For example, if an image filter can identify common objects in two input images with 80% precision and if the images that are streamed into this filter are both of 90% quality, the precision of the resulting objects can be modeled as 80% B 90% B 90% = 65%. Some operators could actually increase the quality of its input, but the resulting precision cannot be greater than 100%.
  • Delay: Delay is additive in nature. Given an object, the total delay observed by this object can be calculated by adding the delay introduced by the operator that output the object plus the maximum delay observed by the input objects used by the operator. For example, if an image filter can identify common objects in two input images in 100ms and the input images both take 50ms to be delivered to the filter, then the resulting objects observe a total delay of 100 + max{50, 50} = 150ms.
  • Resource Requirements: Resource requirement is also additive in nature. However, in contrast to the delay, the total amount of resources required for an object can be calculated by adding the resources needed at the operator that outputs this object plus the resources needed to compute each input objects used by the operator. For example, when we consider buffer requirements, if an image filter needs 10MB memory for identifying common objects in two input images and the input images each needed 2MB to be captured and delivered to the filter, then the resulting objects need a total buffer of 10 + 2 + 2 = 14MB to be computed.

As discussed in the previous subsection, each object being streamed in ARIA is annotated with a precision value and delay/resource usage stamps.

Back to Top

DANS: ARIA toward the Distributed Networks

Networks are becoming pervasive and ubiquitous, and networking capabilities are being integrated into an increasing diversity of equipments, ranging from home consumer electronics to smart dust type sensing networks. The recently coined terminology - Ambient Intelligence (AmI) - refers to this trend in near future. ARIA is an ambient intelligence system per se providing the AmI working environments in general with (a) the description and semantic integration of service delivery workflows consisting of adaptable sensing, processing, communicating, and actuating components, (b) modular integration of various media processing and actuating components into service delivery workflows and adaptation of these workflows for providing end-to-end service delivery guarantees, and (c) run-time workflow execution and adaptation to changing resources and network characteristics.

However, the migration of ARIA to such an ambient system is challenging as run-time situations are partially known or unknown in the design phase and multiple, potentially conflicting, criteria have to be taken into account during the runtime. In particular, the inherent context-based adaptation and distributed nature of the resources require that the workflow processes are continuously mapped and remapped to different clusters of resources without interrupting the ongoing services. Thus, this work focuses on enabling adaptation of the media processing workflow delivery in a media-rich sensory/reactive environments.

We construct DANS, Decentralized, Autonomous, and Network-wide Service Delivery and Multimedia Workflow Processing, to embody the distributed extension of ARIA. Through this ARIA evolutions we can make following contributions to the colleagues: (a) a novel decentralized streaming data workflow processing architecture to organize resource pools and executes workflows in a purely decentralized manner; (b) an effective, yet efficient operator instance search mechanism which assures qualified operator instances to be found to process objects; (c) a sender-initiated and object-based workflow instantiation, providing high adaptivity to the underlying network and alleviating the initial resources and maintenance cost.

Back to Top


1st Intl. Workshop on Ambient Intelligence, Media, and Sensing
Istanbul, Turkey