Welcome to the BML page. This page now contains all updates from the Vienna workshop in November 2006. Therefore the information on this page supersedes the information found on the earlier ISI wiki. Suggestions from the Paris workshop in June 2007 will not be integrated into this document until they have been discussed on the forums.
A new draft version of BML 1.0 is being developed on a separate page.
2008 MITRE Meeting Subgroups: (Please look inside the above draft version of BML 1.0)
BML happens on three websites:
The Behavior Markup Language or BML is being proposed as a standard XML interface between the level of behavioral planning and behavior realization in the SAIBA framework for multimodal behavior generation in virtual humans. This wiki is associated with the SAIBA Multimodal Behavior Generation Project and the Behavior Markup Language Project on Mindmakers.
For discussing this development, please make sure to join the SAIBA discussion forum on Mindmakers.
Here is a summary of some BML tools and projects that are available or are in development.
| Project | Institution | Description | In |
|---|---|---|---|
| Expressive Gesture Repository | Paris 8 | Shared repository of variety of expressive gestures that get instantiated in BML for a given discourse context | start |
| ACE (ARticulated Communicator Engine) | U.Bielefeld | BML compliant behavior realizer with smart scheduling and blending/co-articulation of gesture and speech | C++ |
| Embodied Conversational Agent Toolkit (ECAT) | HMI U.Twente / ArticuLab | Facilitates the rapid integration of a broad array of character rendering engines by providing a BML compiler and a translator component | start |
| BCBM Rule Builder | ISI / MA&D | Point and Click creation of FML to BML mapping rules with a live preview of BML behavior blocks through a network connection to various behavior realizers (SmartBody and UnrealPuppets) | C# |
| SmartBody | ISI / ICT | BML compliant behavior realizer with smart blending/co-articulation of gesture | C++ |
| Social Puppets | CADIA / ISI / Alelo | Behavior planner mostly for interactional behavior but also some propositional behavior | Python |
| Ambulation Agents | CADIA / CCP | Behavior planner for an online social game environment | Python |
| Non-Verbal Behavior Generator (NVB) | ISI | BEAT-like system that analyzes a virtual human's communicative intent, emotional state and dialog text, and generates appropriate behavior in BML | C++/XSTL |
| NOVA | DFKI | NOnVerbal Action generator: Data-driven approach to gesture generation. | start |
Add yourselves here (or send Hannes Högni an email)
All BML behaviors need to belong to a behavior block. A behavior block is formed by placing one or more BML behavior elements inside a top-level <bml> element. Unless synchronization is specified (see section on Synchronization), it is assumed that all behaviors in a behavior block start at the same time upon arrival in the behavior realizer.
<bml> <gaze target="PERSON1"/> <speech> Welcome to my humble abode </speech> </bml>
The order of elements inside the <bml> does not have any semantic meaning.
It is generally assumed that the behavior realizer will attempt to realize all behaviors in a block, and even if some of the behaviors don't successfully complete for some reason, other behaviors still get carried out. If there is an all-or-nothing requirement for all or some of the behaviors, they can be enclosed in a <required> block inside the <bml> block. In the following example, the entire performance in the BML block will be canceled if either the gaze or the speech behavior is unsuccessful (and an <exception> message sent back from the behavior realizer), but if only the head nod is unsuccessful, the rest will be carried out regardless (and a <warning> message sent back from the behavior realizer).
<bml> <required> <gaze target="PERSON1"/> <speech> Welcome to my humble abode </speech> </required> <head type="NOD"/> </bml>
The specification does not dictate how much is placed in a single behavior block, and therefore what the granularity of action specification really is. This allows for the possibility that certain systems will be dealing with shorter spurts of behavior, while others prefer constructing elaborate performances and sending them to the behavior realizer in larger batches.
What happens when a behavior realizer receives a new behavior block while still processing the previous one? One possibility is to add a simple scheduling instruction to the <bml> tag as an attribute, telling the realizer to replace, interrupt or append with the new block (ISI proposed this). However, it has been argued that this starts to impose a higher level of scheduling on the BML specification that should be handled by the behavior scheduler itself. So for now, special scheduling attributes or tags can be implemented as extensions where needed.
All BML behavior and synchronization elements, including the top-level <bml>, must contain a unique reference id via the id=”…” attribute. The value of this attribute can be used to refer to particular instances of BML elements, for example when synchronizing one behavior element with another. The id 'bml' is reserved.
<bml id="bml1"> <gaze id="gaze1" target="AUDIENCE"/> <speech id="speech1" start="gaze1:ready"> Welcome ladies and gentlemen! </speech> <bml>
The overhead of requiring these attributes is considered negligible to the benefit of precise feedback logs that include these identifiers. It is simple of automated BML generators to also generate these attributes, if not informed from other behaviors. For human authored BML, we expect the message sending tools can dynamically introducing missing identifiers.
It is proposed that this ID attribute be of the standard XML type 'ID' and that any references to it be of the XML type 'IDREF'. These are described in the standard XML specification
A behavior element describes one kind of a behavior to the behavior realizer. In its simplest form, a behavior element is a single XML tag with a few key attributes:
<bml id="bml1"> <gaze id="gaze1" target="PERSON1"/> </bml>
This most compact form is called a level 0 of description and is mandatory for all behaviors sent to a behavior realizer. The tag names and attributes are part of the core BML specification.
BML allows for additional levels of description that go beyond the core BML behavior attributes at level 0 in describing the form of a behavior. Additional levels are embedded within a behavior element as children elements of the type description with arbitrary content. The type attribute of the description element should identify the type of content, indicating how it should be interpreted. Even if additional levels are included in a behavior, the core attributes of the behavior element itself cannot be omitted since level 0 of description is a default fallback.
<bml id="bml1"> <gaze id="gaze1" target="PERSON1"> <description level="1" type="RU.ACT"> <target>PERSON1</target> <intensity>0.6</intensity> <lean>0.4</lean> </description> <description level="2" type="ISI.SBM"> ... </description> </gaze> </bml>
All BML compliant behavior realizers have to guarantee that they can interpret a level 0 behavior description and display the corresponding behaviors. In those cases where a realizer is only providing a special subset of BML, for example a talking head, that should be made very clear and the behaviors not realized should produce an appropriate feedback message (see section on feedback). Those realizers that can interpret any of the higher levels of description, should make use of those instead. If a realizer is expecting a description of a certain level higher than 0 but does not receive a description at that level, it should default to the level 0 description.
The level 0 description will always stay well above the level of specific implementations. That is, an ideal level 0 description of a behavior should not reference specific animation files, audio files, or joint names. Behavior tags and attributes should preferably reference actions and body parts by their common verbs and nouns. This calls for a unified set of level 0 behavior description tokens.
Levels beyond level 0 can include existing representation languages such as SSML, Tobi, etc. or new languages can be created that make use of advanced realization capabilities. Each level should be a self-contained description of a behavior because a behavior realizer may not know how to combine a description from multiple levels.
It is generally assumed that as the levels go higher, the level of description will become more complex and detailed. For example, levels 1 or 2 may simply repeat the attributes of level 0 but add a few more attributes for extra expressivity, whereas a level 3 or 4 might introduce a whole new set of parameters that drive a special kind of a detailed dynamic simulation.
If multiple levels of description are given and the realizer is capable of interpreting more than one, it is assumed that the realizer will use the highest possible level of description for realization.
This mandatory level includes behavior elements with a certain token name along with a minimal set of descriptors as attributes, such as target objects, positions or orientations.
Note that the attributes for each of the behavior elements below are still subject to review from the sub-groups created at the Vienna meeting.
To specify values for various types of behavior attributes, we propose a set of certain common symbols with well defined semantics.
| Type of Attribute | Symbols | Comments |
|---|---|---|
| Direction | LEFT, RIGHT, UP, DOWN, UPRIGHT, UPLEFT, DOWNLEFT, DOWNRIGHT | Indicating a direction from a center |
Propose more types for this table.
Movement of the head independent of eyes. Types include nodding, shaking, tossing and orienting to a given angle.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| type | Name | required | start | The category of head movement [NOD, SHAKE, TOSS, ORIENT] |
| NOD, SHAKE, TOSS | ||||
| amount | float | optional | 0.0 | (NOD, SHAKE, TOSS) The extent of the movement here 1.0 is fully extended and 0.0 is the least extended |
| repeats | int | optional | 1 | (NOD, SHAKE, TOSS) Number of times the basic head motion is repeated |
| ORIENT | ||||
| target | WorldID | optional | start | (ORIENT) The world ID of the reference target |
| angle | Angle | optional | 0.0 | (ORIENT) Orients the head angle degrees in the specified direction from the current head orientation. If a target is also given, the orientation is relative to the orientation towards that target |
| direction | Direction | optional | RIGHT | (ORIENT) Direction of orientation angle [RIGHT, LEFT, UP, DOWN, ROLLRIGHT, ROLLLEFT] |
Movement of the orientation and shape of the spine and shoulder.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| posture | Name | required | start | The name of the posture to assume |
| transition | Name | optional | start | The name of the animated transition that gets played before final posture is assumed |
Movement of facial muscles to form certain expressions. Types include eyebrow, eyelid and larger expressive mouth movements.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| type | Name | required | start | The part of the face being controlled [FACS, EYEBROWS, EYELIDS, MOUTH] |
| amount | float | optional | 0.5 | The amount of movement where 0.0 is the lowest (or closed) position and 1.0 is the highest (or open) position |
| side | Name | optional | BOTH | Which side of the face is being controlled [BOTH, LEFT, RIGHT] |
| FACS | ||||
| au | int | optional | 0 | (FACS only) The Action Unit (AU) reference number for a Facial Action Coding System (FACS) expression |
| EYEBROWS | ||||
| shape | Name | optional | FLAT | The shape given to the eyebrows [FLAT, POINTDOWN, POINTUP] |
| separation | float | optional | 0.5 | (EYEBROWS only) The horizontal distance of the eyebrows from the center of the forehead where 0.0 Is the shortest distance and 1.0 the furthest distance |
| EYELIDS | ||||
| lid | Name | optional | BOTH | (EYELIDS only) Whether both upper and lower eyelids are affected [BOTH, UPPER, LOWER] |
| MOUTH | ||||
| shape | Name | optional | FLAT | The shape given to the mouth [FLAT, SMILE, LAUGH, PUCKER, FROWN] |
Coordinated movement of the eyes, neck and head direction, indicating where the character is looking.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| target | ID | optional | start | The world ID of the reference target |
| angle | Angle | optional | 0.0 | Orients the gaze angle degrees in the specified direction from the current gaze orientation. If a target is also given, the orientation is relative to the orientation towards that target |
| direction | Direction | optional | RIGHT | Direction of orientation angle [RIGHT, LEFT, UP, DOWN, UPRIGHT, UPLEFT] |
Full body movement, generally independent of the other behaviors. Types include overall orientation, position and posture.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| approach | ID | optional | start | The world ID of a target place or thing to approach prior to assuming an indicated posture |
| proximity | float | optional | 1.0 | How close to approach the target, where 1.0 is “typical” distance, for that target (defined elsewhere), and 0.0 is up against the target |
| face | WorldID | optional | start | The world ID of the a reference target for final facing |
| angle | Angle | optional | 0.0 | The offset angle of final facing, where 0.0 fully faces the reference target |
| posture | Name | optional | start | The name of the posture to assume |
| transition | Name | optional | start | The name of the animated transition that gets played before final posture is assumed |
Movements of the body elements downward from the hip: pelvis, hip, legs including knee, toes and ankle.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| posture | Name | required | start | The name of the posture to assume |
| transition | Name | optional | start | The name of the animated transition that gets played before final posture is assumed |
Coordinated movement with arms and hands, including pointing, reaching, emphasizing (beating), depicting and signaling.
| Attribute | Type | Use | Default |
|---|---|---|---|
| type | Name | required | The category of gesture movement [POINT, REACH, BEAT, DEPICT, SIGNAL] |
| name | Name | optional | The name of a gesture needed for a DEPICT or a SIGNAL gesture |
| target | ID | optional | The world ID of a reference target for POINT and REACH gestures |
An interesting question raised by Herwin: What If I want to point/beat using whatever bodypart available (head, hands, feet)? This would probably not fall under gesture as it is defined here, but where would be put it?
Verbal and paraverbal behavior, including the words to be spoken (for example by a speech synthesizer), prosody information and special paralinguistic behaviors (for example filled pauses).
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| type | string | optional | text/plain | MIME type or other string type identifying the type of contents or refered object |
| ref | Name | optional | start | Refers to speech data if not contained within the speech element |
| text | String | optional | start | Unprocessed element to promote legibility with external or encoded types |
Unlike other behavior elements, <speech> can contain text and other elements, depending on the value of the type attribute. Alternatively, it can refer to external data like audio files or utterance id's. In some cases, this data can introduce new time reference points beyond start, ready, stroke, relax, and end. For instance:
<speech id="s1" type="application/ssml+xml"> Allows <mark name="wb1"/> word <mark name="wb2"/> break <mark name="wb3"/> references. </speech>
Any BML processor recognizing W3C's SSML should successfully process this behavior and allow following behaviors to refer to the word break time marks using the time marker notation s1:wb1, s1:wb2, or s1:wb3.
When the ref attribute is specified, speech behaviors may also specify times for external resources that don't have their own time markers, using the <tm> element:
| <tm> Time Marker | ||||
|---|---|---|---|---|
| Attribute | Type | Use | Default | Description |
| id | ID | required | start | The identifier of the generated synch point |
| time | float | required | start | Time in seconds, relative to the external resource |
For example:
<speech id="s1" start="0.0" type="audio/x-wav" ref="utterance1.wav" text="this is very nice"> <tm id="tm1" time="0.1" /> <!-- This is --> <tm id="tm2" time="1.1" /> <!-- very nice --> </speech> <gesture id="g1" stroke="s1:tm2" type="BEAT"> <head id="h1" stroke="g1:stroke" type="NOD"> <gaze id="l1" ready="s1:tm1" relax="s1:tm2" target="book1">
Here the playback of a pre-recorded audio file with speech beginning at time 0.1. The timings of some of the words have also been extracted as specified (possibly through a special tool). The beat gesture is told to strike on the second word group. The stroke of the head nod however is timed explicitly to occur exacly on the stroke of the beat gesture. While uttering the first two words, the gaze is turned towards an object called book1.
This element is used for controlling lip shapes including the visualization of phonemes for audiovisual speech.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| viseme | Name | required | start | The name of a viseme to be displayed. It will blend with any expression specified in the FACE element |
| articulation | float | optional | 0.5 | The extent to which visemes are clearly articulated, where 0.0 represents sloppy and 1.0 represents hyper articulation |
| flapping | boolean | optional | false | If true, keeps the mouth oscilating between the viseme and the closed position |
Defines a pause or delay that other behaviors can reference.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| duration | float | optional | start | Delay or event timeout in seconds |
| event | REFID | optional | start | Event to wait for (see Events section) |
| no-event | string | optional | start | Action to take if event is specified and timeout is exceeded. See details below |
Wait behaviors describe the act of waiting for a time or event in a communicative act. Valid wait behaviors require either the duration or event to be specified. If both are specified, the duration describes a timeout for event listening.
If the timeout is exceeded, the no-event attribute is processed. The attribute can have three forms. In the form “FAIL”, the entire act aborts without notification. In the form “FAIL: {event declaration}”, the act aborts after sending a specified event. In the form “MESSAGE: {event declaration}”, a specified message to emitted to notify of the event failure, but the act continues as if the event was received.
If the timeout is execeeded and the attribute no-event is not specified, the behavior continues normally as if the event occurred.
The core BML behavior elements are by no means comprehensive, as much of the ongoing work behind BML involves identifying and defining a broad and flexible library of behaviors. Implementors are encouraged to explore new behavior elements and specialized attributes when making use of BML. However, we request that those experimental components that cannot be embedded within a special level of description, be identified as non-standard BML by utilizing XML namespaces to prefix the elements and attributes.
The following example utilizes customized behaviors from the Smartbody project. Here, we use the namespace sbm (short for SmartBody Module):
<bml> <sbm:animation name="CrossedArms_RArm_beat"/> <gaze target="AUDIENCE" sbm:joint-speeds="100 100 100 300 600"/> <bml>
A synchronization point or synch point is a point in time that can be shared between two or more behaviors in an effort to synchronize their realization. There are different kinds of synch points, providing different opportunities for synchronization.
Every behavior is broken down into six phases of realization. Each phase is bounded by a synch point that carries the name of the transition it represents, making it relatively straight-forward to align behaviors at meaningful boundaries. The seven synch points are: start, ready, stroke-start, stroke, stroke-end, relax and end.
start and ready, and the retraction back to a neutral or previous state occurs between relax and end. ready and relax, with the most effortful part occuring between stroke-start and stroke-end. stroke time reference, such as in a beat gesture or nod. stroke time reference assumes the same time as stroke-start. ready and stroke-start allows a anticipation hold in gesture space, just as the separation of stroke-end and relax allows a hold for emphasis or continuation. ready is assumed to coincide with stroke-start, and stroke-end should coincide with relax. Similarly, if there is no preparatory movement into gesture space, start will coincide with ready, and relax will coincide with end. For example, in a gaze behavior, setting ready and setting stroke-start should result in the same timing for making eye contact, while setting either stroke-end or relax will declare the time of breaking eye contact.Any BML event (see below) can serve as a synch point. The ID of the event is then used as the name of the synch point when referring to it. It is therefore possible to have more complex behaviors, for example those described at levels beyond 0, emit various events to enable alignment with other basic behaviors.
New synch points can be introduced. For example the <mark> tag from SSML is used to create synch points for the speech behavior as seen in the example above. Another example from the speech discussion shows the use of a special <tm> tag that provides a synch point at an arbitrary point in time.
When new synch points are introduced for a behavior, it is assumed that start and end will still refer to the first and last synch point for that behavior.
Each BML request also has two implicit synch points, bml:start and bml:end, identifying the start of the earliest behavior and the end of the latest behavior.
Aligning to bml:start and bml:end requires special precautions. If there is no offset specified, only start synch points can be aligned to bml:start, and only end synch point can be aligned to bml:end. If there is an offset specified, it must be positive when referring to bml:start and negative when referring to bml:end. This constraints ensure bml:start and bml:end actually mark the proper start and end points of the behavior set.
Each phase of a behavior can be scheduled relative to any synch point. This is done with the seven optional XML attributes named after the behavior's own synch points: start, ready, stroke-start, stroke, stroke-end, relax, and end. The attribute value may reference another synch point with an optional offset in seconds:
| Synch attribute syntax | ||
|---|---|---|
| Standard: | source_id:synch_id | |
| with offset: | source_id:synch_id + offset source_id:synch_id - offset | |
| Shorthand: | offset | Equivalent to bml:start+offset |
Where…
| source_id | is the ID or name assigned to the owner element of the synch point (or bml when refering to bml:start or bml:end). For example, this would be the ID of another behavior element when referring to that behavior's end synch point. |
|---|---|
| synch_id | is the standard name of the behavior's synch point |
| offset | is a time in seconds to offset the alignment |
Add a formal grammar for the attribute value, inclusive of whitespace.
<!-- Timing example behaviors --> <gaze start="0.3" end="2.14" /><!-- absolute timing in seconds --> <gaze stroke="other:stroke" /><!-- relative to another behavior --> <gaze ready="other:relax + 1.1" /><!-- relative with offset -->
We are currently considering support for vague / underspecified timing constraints by using the predicates before(..) and after(..). The before(..) predicate indicates behavior sync point should occur at or before another sync point. similarly, the after(..) predicate indicates behavior sync point should occur at or after another sync point.
<gaze ready="after( other:stroke )" /><!-- timing with predicates -->
While normally most synchronization between behaviors can be specified through attributes in the behavior elements themselves, there are cases where synchronization through a special external element might be useful:
This additional functionality is supported through a <synchronize> element that stands as a sibling to the behavior elements inside a <bml> block.
We need the exact syntax here with an example.
<synchronized> element
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| ref-sync | sync-point expression | required | start | the sync-point for which this constraint is relative |
| constraint | lexicalized | required | start | at, after, before, or extensions via namespaced ids |
<sync> element
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| dest | sync-point id | required | start | the sync-point for which this constraint is applied |
Information about the occurrence of events is carried inside special event BML elements.
| Attribute | Type | Use | Default | Description |
|---|---|---|---|---|
| type | Name | Required | start | BEHAVIOR: Describe the character itself; WORLD: Describe things happening outside of character |
| source | ID | Optional | ID of emitter | The ID of emitter, behavior, synch point or entity that caused the event |
| time | Time | Optional | Time of emission | Time stamp in absolute global time |
Events can be emitted from within a BML block with a special emission behavior element. Here an event is emitted once the gesture has reached its stroke:
<bml id="bml1"> <gesture id="g1" type="POINT" target="chair1"/> <emit id="emitter1" start="g1:stroke"> <event id="event1" type="behavior"> Optional information. </event> </emit> </bml>
Once emitted, the event portion is sent back to the behavior planner from the behavior realizer:
<bml id="bml2"> <event id="trigger1" source="bml1:emitter1" type="behavior"> Optional information. </event> </bml>
The behavior realizer can also generate events autonomously to report on behavior progress. For example, a realizer can be configured to send events for every synch point reached, in which case the source attribute would indicate exactly what synch point caused the event.
The realizer will always generate an event when a block of BML behaviors has finished executing:
<bml id="bml2"> <event id="finished1" source="bml1" type="behavior" time="123.25"> Finished. </event> </bml>
BML provides tags for feedback information from the behavior realizer back to the behavior planner. There are three general kinds of feedback messages.
<event> (see above)<exception> <warning>Other kinds of information from the behavior realizer has been suggested such as timing information or information on available body parts before actual execution starts. These could possibly be sent back to the behavior planner as responses to special queries. This is not yet part of the specification.