NOTE
This page contains a BML specification that will not become official until we move it to the main page. This page is still undergoing changes!
Working Groups (Updated June 2nd 2008)
The Behavior Markup Language, or BML, is a framework and an XML description language for controlling the communicative human behavior of embodied conversational agents.
BML assumes a subset of the SAIBA Multimodal Behavior Generation Framework. Specifically, BML requires a behavior planner and a behavior realizer.
Block Diagram
The behavior planner is responsible for planning and generating the BML message. The exact nature of this planning is unspecified, and may be as simple as utilizing a library of BML messages authored offline.
The behavior realizer is responsible for “realizing” the specified behaviors as a set of character actions, such as speech and animation. The behavior realizer is also responsible for feedback messages informing the planner on the status of progress, warnings, and exceptions.
BML does not specify a specific message transport. Different architectures have drastically different notions of a message. A message may come in the form of a string, an XML document or DOM, a message object, or just a function call. Similarly, identifiers in the message contents might be strings, objects / structs, or memory pointers.
To facilitate portability between platforms, the transport should adhere to the following requirements:
BML 1.0 specifies seven different types of messages:
From the planner to the realizer:
Feedback from the realizer to the planner:
Feedback from the realizer to any component, including itself:
A BML performance request must include at least:
id attribute of the <bml> block)The specification does not dictate how much is placed in a single behavior block, and therefore what the granularity of action specification really is. This allows for the possibility that certain systems will be dealing with shorter spurts of behavior (i.e. speech clauses or individual gaze shifts), while others prefer constructing elaborate performances and sending them to the behavior realizer in larger batches (i.e., entire monologues).
Detail what happens when a realizer recieves new performance quests before others are complete. One possibility is to add a simple scheduling instruction to the <bml> tag as an attribute, telling the realizer to replace, interrupt or append with the new block (ISI proposed this). However, it has been argued that this starts to impose a higher level of scheduling on the BML specification that should be handled by the behavior planner itself. So for now, special scheduling attributes or tags can be implemented as extensions where needed.
A draft version of the schema is available here
All BML behaviors need to belong to a behavior block. A behavior block is formed by placing one or more BML behavior elements inside a top-level <bml> element. Unless synchronization is specified (see section on Synchronization), it is assumed that all behaviors in a behavior block start at the same time upon arrival in the behavior realizer.
<bml> <gaze target="PERSON1"/> <speech><text>Welcome to my humble abode</text></speech> </bml>
It is generally assumed that the behavior realizer will attempt to realize all behaviors in a block, and even if some of the behaviors don't successfully complete for some reason, other behaviors still get carried out. If there is an all-or-nothing requirement for all or some of the behaviors, they can be enclosed in a <required> block inside the <bml> block.
In the following example, the entire performance in the BML block will be aborted if either the gaze or the speech behavior is unsuccessful (and an exception message sent back from the behavior realizer, with the abort flag on), but if only the head nod is unsuccessful, the rest will be carried out regardless (and an exception message sent back from the behavior realizer, with the abort flag off).
<bml> <required> <gaze target="PERSON1"/> <speech><text>Welcome to my humble abode</text></speech> </required> <head type="NOD"/> </bml>
All behaviors, <constraint> blocks and the <bml> block itself, must contain a unique reference id via the id attribute. The value of this attribute can be used to refer to particular instances of BML elements, for example when synchronizing one behavior element with another or delivering feedback. The id 'bml' is reserved.
<bml id="bml1"> <gaze id="gaze1" target="AUDIENCE"/> <speech id="speech1" start="gaze1:ready"><text>Welcome ladies and gentlemen!</text></speech> <bml>
It is proposed that this ID attribute be of the standard XML type 'ID' and that any references to it be of the XML type 'IDREF'. These are described in the standard XML specification, including the Name production grammer.
A behavior element describes one kind of a behavior to the behavior realizer. In its simplest form, a behavior element is a single XML tag with a few key attributes:
<bml id="bml1"> <gaze id="gaze1" target="PERSON1"/> </bml>
This most compact form is called core BML, and is considered to be the lowest priority of description. It is mandatory for all behaviors sent to a behavior realizer. The tag names and attributes are part of the core BML specification.
All BML compliant behavior realizers have to guarantee that they can interpret the core BML specification and display the corresponding behaviors. In those cases where a realizer is only providing a special subset of BML, for example a talking head, that should be made very clear and the behaviors not realized should produce an appropriate feedback message (see section on feedback). Those realizers that can interpret any of the higher priorities of description, should make use of those instead.
The core BML description will always stay well above the level of specific implementations. That is, an ideal core BML description of a behavior should not reference specific animation files, audio files, or joint names. Behavior tags and attributes should preferably reference actions and body parts by their common verbs and nouns. This calls for a unified set of core BML behavior description tokens.
BML allows for additional behavior descriptions that go beyond the core BML behavior specification in describing the form of a behavior. Additional descriptions are embedded within a behavior element as children elements of the type description. The type attribute of the description element should identify the type of content, indicating how it should be interpreted. Even if additional descriptions are included in a behavior, the core attributes of the behavior element itself cannot be omitted since the core specification is always the default fallback.
<bml id="bml1"> <gaze id="gaze1" target="PERSON1"> <description priority="1" type="RU.ACT"> <target>PERSON1</target> <intensity>0.6</intensity> <lean>0.4</lean> </description> <description priority="2" type="ISI.SBM"> ... </description> </gaze> </bml>
Description elements BML can include existing representation languages such as SSML, Tobi, etc. or new languages can be created that make use of advanced realization capabilities. Each description element should be a self-contained description of a behavior because a behavior realizer may not know how to combine multiple behavior descriptions.
If a realizer does not known how to interpret the available description types, it should default to the core behavior.
If multiple description elements are given, and a realizer is capable of interpreting more than one, the realizer should use the highest priority description.
BML 1.0 Core includes behavior elements with a minimal set of descriptors as attributes, such as target objects, positions or orientations.
Note that the attributes for each of the behavior elements below are currently under review (it now includes changes from the June 2007 meeting in Paris)
Values for various types of behavior attributes can be one of the following:
This element is used for controlling lip shapes including the visualization of phonemes for audiovisual speech.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| viseme | name | required | The name of a viseme to be displayed. It will blend with any expression specified in the FACE element | |
| articulation | float | optional | 0.5 | The extent to which visemes are clearly articulated, where 0.0 represents sloppy and 1.0 represents hyper articulation |
| flapping | boolean | optional | false | If true, keeps the mouth oscilating between the viseme and the closed position |
Schedules a delay terminated by one or more conditions.
A wait behavior describes a delay without describing other character behavior. While it doesn't not directly describe any active behavior, a scheduled wait behavior that does not overlap with other behaviors might be considered as a performance pause. Other behaviors can attempt to synchronize to the wait behavior's stroke to start (or continue) at the end of the delay.
The <wait> element has one optional attribute to limit the duration:
| Attribute | Type | Use | Description |
|---|---|---|---|
| max-wait | float | optional | Maximum delay of the wait behavior |
A wait behavior may also describe its maximum duration using synchronization with other behaviors via the end sync-point. In other words, if the sync-point bound to the wait behavior's end occurs before its condition, the wait behavior is presumed to have timed out (see below).
(Anm: What about the other sync-points? What does it mean to sync to ready/stroke_start/stroke/stroke_end/relax, or is it even valid? Does the meaning change if the sync point is before or after stroke?)
A wait behavior has two optional components described as sub-elements. The first sub-element is the <condition>. The <condition> element does not have any attributes of itself, but identifies another sub-element as the description of the condition to wait for.
Some wait behaviors may use conditions that cannot be scheduled a priori, such as those tied to external stimulus (e.g., events or messages). Since this synchronization is reactive, its precision may be significantly worse than the synchronization between behaviors with known durations and stroke schedules. This is especially true if the referenced sync-point requires some prepatory action (e.g., syncing to a beat gesture's stroke, rather than its stroke-start).
(Anm: Can a single wait behavior have more than one condition? Can this be done by including more than one element in the <condition>, effectively acting as an AND group? And if so, do we want to consider OR group?)
TODO: List/Detail BML 1.0 wait conditions.
If a wait behavior lacks any conditions, it should be considered unsatisfiable and only end via the maximum wait duration (i.e., max-wait attribute) or end sync-point. A wait behavior without any condition, max-wait attribute, or end sync-point attribute may be considered invalid.
The second sub-element of the wait behavior is the <timeout-actions>, which is only valid if the max-wait attribute or end sync-point attribute was specified in the <wait> element. Timeout actions describe what the behavior should do when an expected condition has not occured.
Within the <timeout-actions> element, subelements describe any additional action to take during timeout. For instance, a behavior may fail using a <fail> element, leading pending behaviors with bound synch-points to also fail or stop processing. Additionally, using the <emit>, the BML processor may emit an additional message to notify other modules. If the <timeout> element does not contain a <fail> action, other pending behaviors or sync-points should continue as if the condition was satisfied.
TODO: List/Detail BML 1.0 timeout actions.
TODO: Examples with explanations, ideally showing the four variants with/without conditions and timeout-actions
<wait id="b1_wait_failed" max-wait="4">
<condition>
<event-match>
<id>optional_event_identifier</id>
<content regex="true">hell( n)?o</content>
</event-match>
</condition>
<timeout-actions>
<emit><-- Just like behavior, but no sync-points -->
<event id="b1_wait_failed">Optional message text</event>
</emit>
</timeout-actions>
</wait>
The use of container elements and sub-elements is used to ensure the <wait> behavior is extensible into new domains. The <condition> and <timeout-actions> containter elements identify the roles of sub-elements, both to BML authors and processors. Extensions to either role should occur as namespaced sub-elements.
Emits an message / event at the stroke sync-point.
<emit> Element
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| source | ID | Required | Emitter ID | The ID of emitter that caused the event |
<event> Element
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| source | ID | Required | Event ID | The ID of the event emitted, used to match <wait> event attributes |
Example:
<bml id="bml1"> <gesture id="g1" type="POINT" target="chair1"/> <emit id="emitter1" stroke="g1:stroke"> <event id="event1"> Optional information. </event> </emit> </bml>
Upon emitter1:stroke, the realizer should broadcast an event message containing the entire <event> element. If a <wait> behavior with an event attribute matching the <event> id is active, pending sync-points may be invoked. The <emit> and the <wait> characters do not have to match, allowing limited multi-party coordination.
Every event message must include:
<event> element (as a string, XML document, or XML DOM Element)
Optionally, implementations may find it useful to include a separate field for the event identifier, as specified in the <event> element.
The core BML behavior elements are by no means comprehensive, as much of the ongoing work behind BML involves identifying and defining a broad and flexible library of behaviors. Implementors are encouraged to explore new behavior elements and specialized attributes when making use of BML. However, we request that those experimental components that do not extend core behaviors using description elements, be identified as non-standard BML by utilizing XML namespaces to prefix the elements and attributes.
The following example utilizes customized behaviors from the SmartBody project. Here, we use the namespace sbm (short for SmartBody Module):
<bml> <sbm:animation name="CrossedArms_RArm_beat"/> <gaze target="AUDIENCE" sbm:joint-speeds="100 100 100 300 600"/> <bml>
A synchronization point or sync-point is a point in time that can be shared between two or more behaviors in an effort to synchronize their realization. There are different kinds of sync-points, providing different opportunities for synchronization.
Every behavior is broken down into six phases of realization. Each phase is bounded by a sync-point that carries the name of the transition it represents, making it relatively straight-forward to align behaviors at meaningful boundaries. The seven sync-points are: start, ready, stroke_start, stroke, stroke_end, relax and end.
start and ready, and the retraction back to a neutral or previous state occurs between relax and end. ready and relax, with the most effortful part occuring between stroke_start and stroke_end. stroke time reference, such as in a beat gesture or nod. stroke time reference assumes the same time as stroke_start. ready and stroke_start allows a anticipation hold in gesture space, just as the separation of stroke_end and relax allows a hold for emphasis or continuation. ready is assumed to coincide with stroke_start, and stroke_end should coincide with relax. Similarly, if there is no preparatory movement into gesture space, start will coincide with ready, and relax will coincide with end. For example, in a gaze behavior, setting ready and setting stroke_start should result in the same timing for making eye contact, while setting either stroke_end or relax will declare the time of breaking eye contact.Any BML event (see below) can serve as a sync-point. The ID of the event is then used as the name of the sync-point when referring to it. It is therefore possible to have more complex behaviors, emit various events to enable alignment with other basic behaviors.
New sync-points can be introduced. For example the <mark> tag from SSML is used to create sync-points for the speech behavior as seen in the example above. Another example from the speech discussion shows the use of a special <sync> tag that provides a sync-point at an arbitrary point in time.
When new sync-points are introduced for a behavior, it is assumed that start and end will still refer to the first and last sync-point for that behavior.
Each BML performance request also has two implicit sync-points, bml:start and bml:end, identifying the start of the earliest behavior and the end of the latest behavior.
Aligning to bml:start and bml:end requires special precautions. If there is no offset specified, only start sync-points can be aligned to bml:start, and only end sync-point can be aligned to bml:end. If there is an offset specified, it must be positive when referring to bml:start and negative when referring to bml:end. This constraints ensure bml:start and bml:end actually mark the proper start and end points of the behavior set.
Each phase of a behavior can be scheduled relative to any sync-point. This is done with the seven optional XML attributes named after the behavior's own sync-points: start, ready, stroke_start, stroke, stroke_end, relax, and end. The attribute value may reference another sync-point with an optional offset in seconds:
| Synch attribute syntax | ||
|---|---|---|
| Standard: | source_id:sync_id | |
| with offset: | source_id:sync_id + offset source_id:sync_id - offset | |
| Shorthand: | offset | Equivalent to bml:start+offset |
Where…
| source_id | is the ID or name assigned to the owner element of the sync-point (or bml when refering to bml:start or bml:end). For example, this would be the ID of another behavior element when referring to that behavior's end sync-point. |
|---|---|
| sync_id | is the standard name of the behavior's sync-point |
| offset | is a time in seconds to offset the alignment |
Add a formal grammar for the attribute value, inclusive of whitespace.
<!-- Timing example behaviors --> <gaze start="0.3" end="2.14" /><!-- absolute timing in seconds --> <gaze stroke="other:stroke" /><!-- relative to another behavior --> <gaze ready="other:relax + 1.1" /><!-- relative with offset -->
The <constraint> element provides a container for specifying additional constraints on the performance. BML 1.0 only defines three timing constraints:
<synchronize> declares one or more sync points should be synchronized with a referenced sync-point notation<before> declares one or more sync points should be performed before a referenced sync-point notation<after> declares one or more sync points should be performed after a referenced sync-point notation
<synchronize> constraints perform just like the sync-point attribute constaints, performing the sync-points of two or more behaviors at the same time.
<constraint id="synchronize_example"> <synchronize ref="speech1:sync4"> <sync ref="beat1:stroke:2"/> <sync ref="nod1:stroke"/> </synchronize> </constraint>
This generalizes the attribute notation in three ways:
<required> element) while the presence of the behaviors is still <required>.
<before> constrains one or more sync-points to perform before a specified sync-point notation.
<constraint id="before_example"> <before ref="speech_1:start"> <sync ref="gaze_1:stroke"/> </before> </constraint>
This constraint example requires the gaze_1 to acquire target (complete the stroke sync-point) before beginning speech_1.
<after> constrains one or more sync-points to perform before a specified sync-point notation.
<constraint id="after_example"> <after ref="speech_1:end+2"> <sync ref="gaze_1:relase"/> </after> </constraint>
This constraint example requires two seconds to pass after speech_1 completes before relaxing gaze_1.
We encourage BML developers to experiment with using the constraint element for the alternative functions through the use of namespaced elements and <description> extensions, for example:
The BML realizer is expected to give feedback about the current state of any BML performance requests, and notification of warnings. This involves a series of messages, but the precise nature of the messages depends upon the messaging architecture. However, there are expectations on the message contents for each message type. Below are the five core BML 1.0 feedback messages.
Notifies the behavior planner that a requested BML performance has begun.
The message must include:
Notifies the behavior planner that a requested BML performance has completed successfully.
The message must include:
Notifies the behavior planner of BML performance progress.
The message must include:
Notifies the behavior planner that requested behaviors and/or constraints have failed to realize, and possibly led to aborting the performance.
Exception message must include at least:
id attributes from the request BML) that failed (or caused the failure if the performance is aborted)Optionally, we suggest the error include a human readable string describing the reason for the exception.
Notifies the behavior planner that requested behaviors and/or constraints were modified during realization.
Warning message must include at least:
id attributes from the request BML) that were modified or replaced to facilitate realization.Optionally, we suggest the warning include a human readable string describing the reason for the warning.