Behavior Markup Language (BML)

Welcome to the BML page. This page now contains all updates from the Vienna workshop in November 2006. Therefore the information on this page supersedes the information found on the earlier ISI wiki. Suggestions from the Paris workshop in June 2007 will not be integrated into this document until they have been discussed on the forums.

A new draft version of BML 1.0 is being developed on a separate page.

2008 MITRE Meeting Subgroups: (Please look inside the above draft version of BML 1.0)

Navigation

BML happens on three websites:

  1. This wiki: Contains the most up-to-date BML specification
  2. BML Project: Mindmakers project page containing important downloads (publications, workshop notes etc.)
  3. BML Forum: Discussion forum where pending changes, future events etc. are discussed

Overview

The Behavior Markup Language or BML is being proposed as a standard XML interface between the level of behavioral planning and behavior realization in the SAIBA framework for multimodal behavior generation in virtual humans. This wiki is associated with the SAIBA Multimodal Behavior Generation Project and the Behavior Markup Language Project on Mindmakers.

For discussing this development, please make sure to join the SAIBA discussion forum on Mindmakers.

Tools and Projects

Here is a summary of some BML tools and projects that are available or are in development.

ProjectInstitutionDescriptionInBML dialect/extensions
Expressive Gesture Repository Paris 8 Shared repository of variety of expressive gestures that get instantiated in BML for a given discourse context start start
ACE (ARticulated Communicator Engine) U.Bielefeld BML compliant behavior realizer with smart scheduling and blending/co-articulation of gesture and speech C++ start
Embodied Conversational Agent Toolkit (ECAT) HMI U.Twente / ArticuLab Facilitates the rapid integration of a broad array of character rendering engines by providing a BML compiler and a translator component start start
BCBM Rule Builder ISI / MA&D Point and Click creation of FML to BML mapping rules with a live preview of BML behavior blocks through a network connection to various behavior realizers (SmartBody and UnrealPuppets) C# start
SmartBody ISI / ICT BML compliant behavior realizer with smart blending/co-articulation of gesture C++ sbm
Social Puppets CADIA / ISI / Alelo Behavior planner mostly for interactional behavior but also some propositional behavior Python start
Ambulation Agents CADIA / CCP Behavior planner for an online social game environment Python start
Non-Verbal Behavior Generator (NVB) ISI BEAT-like system that analyzes a virtual human's communicative intent, emotional state and dialog text, and generates appropriate behavior in BML C++/XSTL start
NOVA DFKI NOnVerbal Action generator: Data-driven approach to gesture generation. start start
Elckerlyc HMI U.Twente BML compliant behavior realizer designed for continuous (as opposed to turn-based) interaction Java bmlt

FIXME Add yourselves here (or send Hannes Högni an email)

Top-Level <bml> Block

<bml>

All BML behaviors need to belong to a behavior block. A behavior block is formed by placing one or more BML behavior elements inside a top-level <bml> element. Unless synchronization is specified (see section on Synchronization), it is assumed that all behaviors in a behavior block start at the same time upon arrival in the behavior realizer.

<bml>
   <gaze target="PERSON1"/>
   <speech> Welcome to my humble abode </speech>
</bml>

The order of elements inside the <bml> does not have any semantic meaning.

<required>

It is generally assumed that the behavior realizer will attempt to realize all behaviors in a block, and even if some of the behaviors don't successfully complete for some reason, other behaviors still get carried out. If there is an all-or-nothing requirement for all or some of the behaviors, they can be enclosed in a <required> block inside the <bml> block. In the following example, the entire performance in the BML block will be canceled if either the gaze or the speech behavior is unsuccessful (and an <exception> message sent back from the behavior realizer), but if only the head nod is unsuccessful, the rest will be carried out regardless (and a <warning> message sent back from the behavior realizer).

<bml>
  <required>
    <gaze target="PERSON1"/>
    <speech> Welcome to my humble abode </speech>
  </required>
  <head type="NOD"/>
</bml>

Unit of Action and Multiple Blocks

The specification does not dictate how much is placed in a single behavior block, and therefore what the granularity of action specification really is. This allows for the possibility that certain systems will be dealing with shorter spurts of behavior, while others prefer constructing elaborate performances and sending them to the behavior realizer in larger batches.

What happens when a behavior realizer receives a new behavior block while still processing the previous one? One possibility is to add a simple scheduling instruction to the <bml> tag as an attribute, telling the realizer to replace, interrupt or append with the new block (ISI proposed this). However, it has been argued that this starts to impose a higher level of scheduling on the BML specification that should be handled by the behavior scheduler itself. So for now, special scheduling attributes or tags can be implemented as extensions where needed.

Unique Reference IDs

All BML behavior and synchronization elements, including the top-level <bml>, must contain a unique reference id via the id=”…” attribute. The value of this attribute can be used to refer to particular instances of BML elements, for example when synchronizing one behavior element with another. The id 'bml' is reserved.

<bml id="bml1">
   <gaze id="gaze1" target="AUDIENCE"/>
   <speech id="speech1" start="gaze1:ready"> Welcome ladies and gentlemen! </speech>
<bml>

The overhead of requiring these attributes is considered negligible to the benefit of precise feedback logs that include these identifiers. It is simple of automated BML generators to also generate these attributes, if not informed from other behaviors. For human authored BML, we expect the message sending tools can dynamically introducing missing identifiers.

It is proposed that this ID attribute be of the standard XML type 'ID' and that any references to it be of the XML type 'IDREF'. These are described in the standard XML specification

Behaviors

Simple Behavior Element

A behavior element describes one kind of a behavior to the behavior realizer. In its simplest form, a behavior element is a single XML tag with a few key attributes:

<bml id="bml1">
   <gaze id="gaze1" target="PERSON1"/>
</bml>

This most compact form is called a level 0 of description and is mandatory for all behaviors sent to a behavior realizer. The tag names and attributes are part of the core BML specification.

Levels of Description

BML allows for additional levels of description that go beyond the core BML behavior attributes at level 0 in describing the form of a behavior. Additional levels are embedded within a behavior element as children elements of the type description with arbitrary content. The type attribute of the description element should identify the type of content, indicating how it should be interpreted. Even if additional levels are included in a behavior, the core attributes of the behavior element itself cannot be omitted since level 0 of description is a default fallback.

<bml id="bml1">
   <gaze id="gaze1" target="PERSON1">
      <description level="1" type="RU.ACT">
         <target>PERSON1</target>
         <intensity>0.6</intensity>
         <lean>0.4</lean>
      </description>
      <description level="2" type="ISI.SBM">
         ...
      </description>
   </gaze>
</bml>

The Mandatory Level 0

All BML compliant behavior realizers have to guarantee that they can interpret a level 0 behavior description and display the corresponding behaviors. In those cases where a realizer is only providing a special subset of BML, for example a talking head, that should be made very clear and the behaviors not realized should produce an appropriate feedback message (see section on feedback). Those realizers that can interpret any of the higher levels of description, should make use of those instead. If a realizer is expecting a description of a certain level higher than 0 but does not receive a description at that level, it should default to the level 0 description.

The level 0 description will always stay well above the level of specific implementations. That is, an ideal level 0 description of a behavior should not reference specific animation files, audio files, or joint names. Behavior tags and attributes should preferably reference actions and body parts by their common verbs and nouns. This calls for a unified set of level 0 behavior description tokens.

Levels Beyond Level 0

Levels beyond level 0 can include existing representation languages such as SSML, Tobi, etc. or new languages can be created that make use of advanced realization capabilities. Each level should be a self-contained description of a behavior because a behavior realizer may not know how to combine a description from multiple levels.

It is generally assumed that as the levels go higher, the level of description will become more complex and detailed. For example, levels 1 or 2 may simply repeat the attributes of level 0 but add a few more attributes for extra expressivity, whereas a level 3 or 4 might introduce a whole new set of parameters that drive a special kind of a detailed dynamic simulation.

If multiple levels of description are given and the realizer is capable of interpreting more than one, it is assumed that the realizer will use the highest possible level of description for realization.

The Core Behavior Element: Level 0 of Description

This mandatory level includes behavior elements with a certain token name along with a minimal set of descriptors as attributes, such as target objects, positions or orientations.

FIXME Note that the attributes for each of the behavior elements below are still subject to review from the sub-groups created at the Vienna meeting.

Ontology of Basic Symbols

To specify values for various types of behavior attributes, we propose a set of certain common symbols with well defined semantics.

Type of AttributeSymbolsComments
DirectionLEFT, RIGHT, UP, DOWN, UPRIGHT, UPLEFT, DOWNLEFT, DOWNRIGHT Indicating a direction from a center

FIXME Propose more types for this table.

<head>

Movement of the head independent of eyes. Types include nodding, shaking, tossing and orienting to a given angle.

AttributeTypeUseDefaultDescription
typeNamerequired startThe category of head movement [NOD, SHAKE, TOSS, ORIENT]
NOD, SHAKE, TOSS
amountfloatoptional0.0(NOD, SHAKE, TOSS) The extent of the movement here 1.0 is fully extended and 0.0 is the least extended
repeatsintoptional1(NOD, SHAKE, TOSS) Number of times the basic head motion is repeated
ORIENT
targetWorldIDoptional start (ORIENT) The world ID of the reference target
angleAngleoptional0.0(ORIENT) Orients the head angle degrees in the specified direction from the current head orientation. If a target is also given, the orientation is relative to the orientation towards that target
directionDirectionoptionalRIGHT(ORIENT) Direction of orientation angle [RIGHT, LEFT, UP, DOWN, ROLLRIGHT, ROLLLEFT]

<torso>

Movement of the orientation and shape of the spine and shoulder.

AttributeTypeUseDefaultDescription
postureNamerequired start The name of the posture to assume
transitionNameoptional start The name of the animated transition that gets played before final posture is assumed

<face>

Movement of facial muscles to form certain expressions. Types include eyebrow, eyelid and larger expressive mouth movements.

AttributeTypeUseDefaultDescription
typeNamerequired start The part of the face being controlled [FACS, EYEBROWS, EYELIDS, MOUTH]
amountfloatoptional0.5The amount of movement where 0.0 is the lowest (or closed) position and 1.0 is the highest (or open) position
sideNameoptionalBOTHWhich side of the face is being controlled [BOTH, LEFT, RIGHT]
FACS
auintoptional0(FACS only) The Action Unit (AU) reference number for a Facial Action Coding System (FACS) expression
EYEBROWS
shapeNameoptionalFLATThe shape given to the eyebrows [FLAT, POINTDOWN, POINTUP]
separationfloatoptional0.5(EYEBROWS only) The horizontal distance of the eyebrows from the center of the forehead where 0.0 Is the shortest distance and 1.0 the furthest distance
EYELIDS
lidNameoptionalBOTH(EYELIDS only) Whether both upper and lower eyelids are affected [BOTH, UPPER, LOWER]
MOUTH
shapeNameoptionalFLATThe shape given to the mouth [FLAT, SMILE, LAUGH, PUCKER, FROWN]

<gaze>

Coordinated movement of the eyes, neck and head direction, indicating where the character is looking.

AttributeTypeUseDefaultDescription
targetIDoptional start The world ID of the reference target
angleAngleoptional0.0Orients the gaze angle degrees in the specified direction from the current gaze orientation. If a target is also given, the orientation is relative to the orientation towards that target
directionDirectionoptionalRIGHTDirection of orientation angle [RIGHT, LEFT, UP, DOWN, UPRIGHT, UPLEFT]

<body>

Full body movement, generally independent of the other behaviors. Types include overall orientation, position and posture.

AttributeTypeUseDefaultDescription
approachIDoptional start The world ID of a target place or thing to approach prior to assuming an indicated posture
proximityfloatoptional 1.0 How close to approach the target, where 1.0 is “typical” distance, for that target (defined elsewhere), and 0.0 is up against the target
faceWorldIDoptional start The world ID of the a reference target for final facing
angleAngleoptional 0.0 The offset angle of final facing, where 0.0 fully faces the reference target
postureNameoptional start The name of the posture to assume
transitionNameoptional start The name of the animated transition that gets played before final posture is assumed

<legs>

Movements of the body elements downward from the hip: pelvis, hip, legs including knee, toes and ankle.

AttributeTypeUseDefaultDescription
postureNamerequired start The name of the posture to assume
transitionNameoptional start The name of the animated transition that gets played before final posture is assumed

<gesture>

Coordinated movement with arms and hands, including pointing, reaching, emphasizing (beating), depicting and signaling.

AttributeTypeUseDefault
typeNamerequiredThe category of gesture movement [POINT, REACH, BEAT, DEPICT, SIGNAL]
nameNameoptionalThe name of a gesture needed for a DEPICT or a SIGNAL gesture
targetIDoptionalThe world ID of a reference target for POINT and REACH gestures

FIXME An interesting question raised by Herwin: What If I want to point/beat using whatever bodypart available (head, hands, feet)? This would probably not fall under gesture as it is defined here, but where would be put it?


<speech>

Verbal and paraverbal behavior, including the words to be spoken (for example by a speech synthesizer), prosody information and special paralinguistic behaviors (for example filled pauses).

AttributeTypeUseDefaultDescription
typestringoptionaltext/plainMIME type or other string type identifying the type of contents or refered object
refNameoptional start Refers to speech data if not contained within the speech element
textStringoptional start Unprocessed element to promote legibility with external or encoded types

Unlike other behavior elements, <speech> can contain text and other elements, depending on the value of the type attribute. Alternatively, it can refer to external data like audio files or utterance id's. In some cases, this data can introduce new time reference points beyond start, ready, stroke, relax, and end. For instance:

<speech id="s1" type="application/ssml+xml">
  Allows <mark name="wb1"/>
  word   <mark name="wb2"/>
  break  <mark name="wb3"/>
  references.
</speech>

Any BML processor recognizing W3C's SSML should successfully process this behavior and allow following behaviors to refer to the word break time marks using the time marker notation s1:wb1, s1:wb2, or s1:wb3.

When the ref attribute is specified, speech behaviors may also specify times for external resources that don't have their own time markers, using the <tm> element:

<tm> Time Marker
AttributeTypeUseDefaultDescription
idIDrequired start The identifier of the generated synch point
timefloatrequired start Time in seconds, relative to the external resource

For example:

<speech id="s1" start="0.0" type="audio/x-wav" ref="utterance1.wav" text="this is very nice">
  <tm id="tm1" time="0.1" /> <!-- This is -->
  <tm id="tm2" time="1.1" /> <!-- very nice -->
</speech>
<gesture id="g1" stroke="s1:tm2" type="BEAT">
<head id="h1" stroke="g1:stroke" type="NOD">
<gaze id="l1" ready="s1:tm1" relax="s1:tm2" target="book1">

Here the playback of a pre-recorded audio file with speech beginning at time 0.1. The timings of some of the words have also been extracted as specified (possibly through a special tool). The beat gesture is told to strike on the second word group. The stroke of the head nod however is timed explicitly to occur exacly on the stroke of the beat gesture. While uttering the first two words, the gaze is turned towards an object called book1.


<lips>

This element is used for controlling lip shapes including the visualization of phonemes for audiovisual speech.

AttributeTypeUseDefaultDescription
visemeNamerequired start The name of a viseme to be displayed. It will blend with any expression specified in the FACE element
articulationfloatoptional0.5The extent to which visemes are clearly articulated, where 0.0 represents sloppy and 1.0 represents hyper articulation
flappingbooleanoptionalfalseIf true, keeps the mouth oscilating between the viseme and the closed position

<wait>

Defines a pause or delay that other behaviors can reference.

AttributeTypeUseDefaultDescription
durationfloatoptional start Delay or event timeout in seconds
eventREFIDoptional start Event to wait for (see Events section)
no-eventstringoptional startAction to take if event is specified and timeout is exceeded. See details below

Wait behaviors describe the act of waiting for a time or event in a communicative act. Valid wait behaviors require either the duration or event to be specified. If both are specified, the duration describes a timeout for event listening.

If the timeout is exceeded, the no-event attribute is processed. The attribute can have three forms. In the form “FAIL”, the entire act aborts without notification. In the form “FAIL: {event declaration}”, the act aborts after sending a specified event. In the form “MESSAGE: {event declaration}”, a specified message to emitted to notify of the event failure, but the act continues as if the event was received.

If the timeout is execeeded and the attribute no-event is not specified, the behavior continues normally as if the event occurred.

New Behaviors

The core BML behavior elements are by no means comprehensive, as much of the ongoing work behind BML involves identifying and defining a broad and flexible library of behaviors. Implementors are encouraged to explore new behavior elements and specialized attributes when making use of BML. However, we request that those experimental components that cannot be embedded within a special level of description, be identified as non-standard BML by utilizing XML namespaces to prefix the elements and attributes.

The following example utilizes customized behaviors from the Smartbody project. Here, we use the namespace sbm (short for SmartBody Module):

<bml>
   <sbm:animation name="CrossedArms_RArm_beat"/>
   <gaze   target="AUDIENCE" sbm:joint-speeds="100 100 100 300 600"/>
<bml>

Synchronization

Synch Points

A synchronization point or synch point is a point in time that can be shared between two or more behaviors in an effort to synchronize their realization. There are different kinds of synch points, providing different opportunities for synchronization.

Standard Behavior Synch Points

Every behavior is broken down into six phases of realization. Each phase is bounded by a synch point that carries the name of the transition it represents, making it relatively straight-forward to align behaviors at meaningful boundaries. The seven synch points are: start, ready, stroke-start, stroke, stroke-end, relax and end.

  • The behavior's prepatory motion occurs between start and ready, and the retraction back to a neutral or previous state occurs between relax and end.
  • The actual behavior takes place between the ready and relax, with the most effortful part occuring between stroke-start and stroke-end.
  • The stroke can also be referenced as a point of maximal effort via the stroke time reference, such as in a beat gesture or nod.
  • If the behavior does not involve a single moment of greatest effort, the stroke time reference assumes the same time as stroke-start.
  • Separation of ready and stroke-start allows a anticipation hold in gesture space, just as the separation of stroke-end and relax allows a hold for emphasis or continuation.
  • If no hold is declared or possible, ready is assumed to coincide with stroke-start, and stroke-end should coincide with relax. Similarly, if there is no preparatory movement into gesture space, start will coincide with ready, and relax will coincide with end. For example, in a gaze behavior, setting ready and setting stroke-start should result in the same timing for making eye contact, while setting either stroke-end or relax will declare the time of breaking eye contact.

bml_doctor_stroke2.jpg

Event Synch Points

Any BML event (see below) can serve as a synch point. The ID of the event is then used as the name of the synch point when referring to it. It is therefore possible to have more complex behaviors, for example those described at levels beyond 0, emit various events to enable alignment with other basic behaviors.

New Synch Points

New synch points can be introduced. For example the <mark> tag from SSML is used to create synch points for the speech behavior as seen in the example above. Another example from the speech discussion shows the use of a special <tm> tag that provides a synch point at an arbitrary point in time.

When new synch points are introduced for a behavior, it is assumed that start and end will still refer to the first and last synch point for that behavior.

Special Synch Points

Each BML request also has two implicit synch points, bml:start and bml:end, identifying the start of the earliest behavior and the end of the latest behavior.

Aligning to bml:start and bml:end requires special precautions. If there is no offset specified, only start synch points can be aligned to bml:start, and only end synch point can be aligned to bml:end. If there is an offset specified, it must be positive when referring to bml:start and negative when referring to bml:end. This constraints ensure bml:start and bml:end actually mark the proper start and end points of the behavior set.

Synching up to Synch Points

Each phase of a behavior can be scheduled relative to any synch point. This is done with the seven optional XML attributes named after the behavior's own synch points: start, ready, stroke-start, stroke, stroke-end, relax, and end. The attribute value may reference another synch point with an optional offset in seconds:

Synch attribute syntax
Standard: source_id:synch_id
with offset: source_id:synch_id + offset
source_id:synch_id - offset
Shorthand: offset Equivalent to bml:start+offset

Where…

source_id is the ID or name assigned to the owner element of the synch point (or bml when refering to bml:start or bml:end). For example, this would be the ID of another behavior element when referring to that behavior's end synch point.
synch_id is the standard name of the behavior's synch point
offset is a time in seconds to offset the alignment

FIXME Add a formal grammar for the attribute value, inclusive of whitespace.

<!-- Timing example behaviors -->
<gaze start="0.3" end="2.14" /><!-- absolute timing in seconds -->
<gaze stroke="other:stroke" /><!-- relative to another behavior -->
<gaze ready="other:relax + 1.1" /><!-- relative with offset -->

Underspecified Timing Constraints

We are currently considering support for vague / underspecified timing constraints by using the predicates before(..) and after(..). The before(..) predicate indicates behavior sync point should occur at or before another sync point. similarly, the after(..) predicate indicates behavior sync point should occur at or after another sync point.

<gaze ready="after( other:stroke )" /><!-- timing with predicates -->

The <synchronize> Element

While normally most synchronization between behaviors can be specified through attributes in the behavior elements themselves, there are cases where synchronization through a special external element might be useful:

  • If the elements that one wants to synchronize to a synch point don't provide synchronization attributes, for example if they are a part of a higher level of description or otherwise embedded (such as SSML).
  • If one wants to specify some tolerance range for a synchronization operation.
  • If one wants to specify a certain priority for a particular synchronization operation.

This additional functionality is supported through a <synchronize> element that stands as a sibling to the behavior elements inside a <bml> block.

FIXME We need the exact syntax here with an example.

<synchronized> element

AttributeTypeUseDefaultDescription
ref-syncsync-point expressionrequired start the sync-point for which this constraint is relative
constraintlexicalizedrequired start at, after, before, or extensions via namespaced ids

<sync> element

AttributeTypeUseDefaultDescription
destsync-point idrequired start the sync-point for which this constraint is applied

Events

Information about the occurrence of events is carried inside special event BML elements.

AttributeTypeUseDefaultDescription
typeNameRequired start BEHAVIOR: Describe the character itself; WORLD: Describe things happening outside of character
sourceIDOptional ID of emitter The ID of emitter, behavior, synch point or entity that caused the event
timeTimeOptional Time of emission Time stamp in absolute global time

Events can be emitted from within a BML block with a special emission behavior element. Here an event is emitted once the gesture has reached its stroke:

<bml id="bml1">
   <gesture id="g1" type="POINT" target="chair1"/>
   <emit id="emitter1" start="g1:stroke">
      <event id="event1" type="behavior">
         Optional information.
      </event>
   </emit>  
</bml>

Once emitted, the event portion is sent back to the behavior planner from the behavior realizer:

<bml id="bml2">
   <event id="trigger1" source="bml1:emitter1" type="behavior">
      Optional information.
   </event>
</bml>

The behavior realizer can also generate events autonomously to report on behavior progress. For example, a realizer can be configured to send events for every synch point reached, in which case the source attribute would indicate exactly what synch point caused the event.

The realizer will always generate an event when a block of BML behaviors has finished executing:

<bml id="bml2">
   <event id="finished1" source="bml1" type="behavior" time="123.25">
      Finished.
   </event>
</bml>

Feedback

BML provides tags for feedback information from the behavior realizer back to the behavior planner. There are three general kinds of feedback messages.

  1. <event> (see above)
    • Automatically generated when sync points are passed or end of behavior blocks reached.
    • Always states a source ID and the absolute global time.
  2. <exception>
    • Automatically thrown when a behavior or an entire behavior block is canceled or interrupted.
    • Implies that some behavior may not have happened at all.
    • May contain children elements to specify the problem/cause in detail.
  3. <warning>
    • Automatically thrown when a non-fatal synch problem occurred or some part of a behavior description could not be realized as requested.
    • Implies that the requested behavior happened, but perhaps with some modification or fall-back.

Other kinds of information from the behavior realizer has been suggested such as timing information or information on available body parts before actual execution starts. These could possibly be sent back to the behavior planner as responses to special queries. This is not yet part of the specification.

 
projects/bml/main.txt · Last modified: 2009/09/18 10:59 by herwin
 
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki