Using the semantic event dispatch framework

Contents of this document:

Overview

An important concept used by the semantic event dispatch framework is the notion that it is useful to identify some subset of the larger system for observation and analysis. Such a subset of the system is referred to as a module for convenience, even though the subset may comprise more than one logical module of the system under analysis. The framework then uses instrumentation and the Java Debug Interface (JDI) to generate a stream of events occurring in the module as the program executes, and dispatches that stream of events to registered listeners. Event dispatch for concurrent Java programs is fully supported.

The quickest path to using this framework is to write an event description language (EDL) specification for the Java program of interest. An EDL specification describes the module for which to generate an event stream, and is defined as a set of classes and observables associated with those classes for which events should be dispatched. The full set of observables currently supported is described in the observables section.

The framework is also designed to be utilized as a programmable interface. Event specifications can be supplied to the event dispatcher by creating a class that implements the event specification interface. There are components provided to facilitate implementation of custom filters that can be attached to the event stream to select events of interest for particular analyses. Clients of the framework therefore have great flexibility in defining what events they want to observe, and in selecting relevant events from the stream to perform their analyses.

Writing an Event Description Language (EDL) Specification

An event description language specification is comprised of a set of sections that define the observables for which events are to be generated in the event stream, and that provide facilities for specifying conditions on the locations in which the events related to those observables should be witnessed. The following is the informal, basic structure of an EDL specification:

  begin EDLSuite <name>
    ( global_array_element_bounds )*
  end
  ( (begin Observables <key>
      begin Preamble
          System-classes: <prog_file>
          (Preamble declarations...)
      end
      (
        (+ | -) observable_event_request [ {
            (locationCondition)*
            } ]
      )*
    )
  | (@import "<edl_file>")
  )+

Preamble declarations:
  ( Module-classes: <prog_file>
    | No-module: ("true" | "false") )
  [ Database-tag: <tag> ]
  [ Type-name: <type_name> (<jni_signature> | <class_name>) ]

where

For a full formal specification, it is recommended that you refer to the BNF grammar for EDL. It may also be helpful to look at the detailed grammar breakdown, with documentation and additional details on particular productions in the EDL grammar.

The remainder of this document is intended as a quickstart guide for writing EDL specifications.

EDLSuite Section

The EDLSuite section defines a name to be associated with the rest of the EDL specification. This name is used by the framework to store the information derived from the remainder of the specification to make it available to the semantic event dispatcher. The name should be unique; if there is a name collision you will be notified when you run the semantic instrumentor (this is not, however, considered a warning, as it may occur if you run the instrumentor more than once on the same EDL specification).

Global array element bounds can also be declared in the EDLSuite section. This is accomplished with one or more "array_element_load_bounds" or "array_element_store_bounds" declarations. The format of these declarations is the same as for the "array_element_load" and "array_element_store" event requests (see below), except that at least one of the minimum or maximum bounds must be given. A global element index bound applies when the element type and event type matches with a array element event request in an Observables section, and that event request does not specify the corresponding bound. Element index bounds specified in particular event requests override global bounds. Note that the act of declaring global array element bounds does not in itself enable observation of array element events — a corresponding array element event request must be made.

Observables Section

An Observables section defines the set of observable events that are to be supplied by the semantic event dispatcher. The key associated with an Observables section is used to uniquely distinguish independent sets of event requests (from here referred to as an "event specification") so that they may operate simultaneously to receive required events from the same run of a semantic event dispatcher. This is most important when using the adaptive event request features of the event dispatcher, in that it enables the event dispatcher to mediate conflicting event enable/disable requests such that listeners requiring certain events are always guaranteed to receive those events.

A particular Observables section must always begin with a Preamble section. The Preamble defines basic information about that particular set of observable event requests. The basic piece of information that is required in all Preamble sections is a reference to the "program list file" that defines all the classes comprising the entire system to be monitored. This is the maximal set of classes over which the semantic instrumentor will operate to perform the necessary actions to capture requested events. The Preamble also supports several optional statements:

Observable Event Requests

An observable event request begins with either a '+' or '-' which indicates whether the request will define a rule for including or excluding events related to the observable. Next is the name of the observable to which the rule applies. The syntax of the remainder of the line is then variable depending on the type of the observable the rule addresses, as described in the list of observables below.

For many observables, it is possible to specify conditions that constrain the locations at which the events relevant to the observable will be witnessed. These conditions are specified within braces using the keywords 'in' or 'not'. An 'in' condition specifies that an observable event should only be raised if the event occurred in a given location. A 'not' condition specifies that an observable event should only be raised if the event did not occur in a given location. Requests that permit location conditions require the use of the brace notation, however, specifying no conditions is permitted and is treated as the equivalent of 'any location'.

Many observables can be specified using wildcards. For example, all fields in a class can be included with a notation such as this: 'SomeClass.*'. However this leads to the possibility that the rules specifying inclusion and exclusion of observables can conflict. Therefore requests also have a precedence, which is assigned in increasing order as the Observables section is read from top to bottom. Requests with higher precedence always overrule requests of lower precedence, which allows greater flexibility in defining the observables to be included. For example, a rule of lower precedence might specify that a broad set of observables are to be included, but a rule of higher precedence will selectively exclude certain observables within that set. For any observable, the rule matching the observable with the highest precedence will always be applied.

Precedence also affects the location conditions; in fact, the precedence is actually associated with each condition (though an implied precedence is assigned for requests that have no conditions). A request with higher precedence but with a location condition that does not conflict with the location condition on a request with lower precedence will not override the rule from the request with lower precedence. If there is a hierarchical relationship between two locations, the location conditions will nest within each other, thus an 'in' rule followed by a 'not' rule can be read as: "the event should be generated if it happens in location X but only if not in location Y within X". For location conditions that conflict with each other without a nesting relationship, the conditions with highest precedence are applied as usual.

Location conditions within different observable event requests (for the same type of observable) will also be nested if appropriate. The requests themselves, however, are never nested, they can only override some subset (or superset) of a prior record. This is an important consideration to be careful of when writing more complex specifications.

The scope of precedence is limited to each event specification (Observables section). In other words, event requests in one event specification do not have a precedence with respect to event requests in another event specification. Each event specification is an independent entity.

Supported Observable Events

The following is a current list of types of events that can be requested in an EDL specification. It is presented using a style similar to EBNF notation, but in a manner that is intended to more clearly illustrate the construction of the event requests, as compared to the formal grammar. The syntax used is as follows:

"new_object" ( <:class_name:>[".*"] | "*" ) "{" ... "}"

Specify the fully qualified name of the class for which allocation by a NEW instruction should be observed. Wildcards are supported. Requires the brace notation for location conditions.

"construct_object" "*" | ( <:class_name:> ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) )

Specify the fully qualified name of the class for which constructor entry should be observed. Wildcard inclusion of all constructors is supported; package qualified wildcards are not supported. If a specific class is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"construct_finish" "*" | ( <:class_name:> ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) )

Specify the fully qualified name of the class for which constructor completion should be observed. Wildcard inclusion of all constructors is supported; package qualified wildcards are not supported. If a specific class is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"get_static" <:class_name:>["."field_name | ".*"] "{" ... "}"

Specify the fully qualified name of the class and static field for which field reads should be observed. Field wildcard is supported to specify all static fields of a class as observable on reads. Requires the braced notation for location conditions.

"put_static" <:class_name:>["."field_name | ".*"] "{" ... "}"

Specify the fully qualified name of the class and static field for which field writes should be observed. Field wildcard is supported to specify all static fields of a class as observable on writes. Requires the braced notation for location conditions.

"get_field" <:class_name:>["."field_name | ".*"] "{" ... "}"

Specify the fully qualified name of the class and instance field for which field reads should be observed. Field wildcard is supported to specify all instance fields of a class as observable on reads. Requires the braced notation for location conditions.

"put_field" <:class_name:>["."field_name | ".*"] "{" ... "}"

Specify the fully qualified name of the class and instance field for which field writes should be observed. Field wildcard is supported to specify all instance fields of a class as observable on writes. Requires the braced notation for location conditions.

"constructor_call" "*" | ( <:class_name:> ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) "{" ... "}"

Specify the fully qualified name of the class for which calls to its constructor(s) should be observed. Wildcard inclusion of all constructor calls is supported; package qualified wildcards are not supported. If a specific class name is given, the signature can be specified. Signature wildcard is supported. Requires the braced notation for location conditions.

"static_call" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) ) "{" ... "}"

Specify the fully qualified name of the class and static method to which calls should be observed. Wildcard inclusion of all static methods in the module, or all static methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Requires the braced notation for location conditions.

"virtual_call" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) ) "{" ... "}"

Specify the fully qualified name of the class and virtual method to which calls should be observed. Wildcard inclusion of all virtual methods in the module, or all virtual methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Requires the braced notation for location conditions.

"interface_call" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) ) "{" ... "}"

Specify the fully qualified name of the class and interface method to which calls should be observed. Wildcard inclusion of all interface methods in the module, or all interface methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Requires the braced notation for location conditions.

"virtual_method_enter" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) )

Specify the fully qualified name of the class and virtual method into which entry should be observed. Wildcard inclusion of all virtual methods in the module, or all virtual methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"virtual_method_exit" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) )

Specify the fully qualified name of the class and virtual method from which exit should be observed. Wildcard inclusion of all virtual methods in the module, or all virtual methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"static_method_enter" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) )

Specify the fully qualified name of the class and static method into which entry should be observed. Wildcard inclusion of all static methods in the module, or all static methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"static_method_exit" "*" | ( <:class_name:> ( ".*" | ( "."method_name ( "*" | ( <:arg_type_1:> ( "," <:arg_type_n:> )* ) | jniMethodSignature ) ) ) )

Specify the fully qualified name of the class and static method from which exit should be observed. Wildcard inclusion of all static methods in the module, or all static methods of a particular class is supported; package qualified wildcards are not supported for specifying classes. If a specific method name is given, the signature can be specified. Signature wildcard is supported. Brace notation for location conditions cannot be used.

"monitor_contend" ( <:class_name:>[".*"] | "*" ) "{" ... "}"

Specify the fully qualified name of the class for which contention for monitors owned by instances of the class should be observed. Wildcards are supported. Requires the braced notation for location conditions.

"monitor_acquire" ( <:class_name:>[".*"] | "*" ) "{" ... "}"

Specify the fully qualified name of the class for which acquisition of monitors owned by instances of the class should be observed. Wildcards are supported. Requires the braced notation for location conditions.

"monitor_pre_release" ( <:class_name:>[".*"] | "*" ) "{" ... "}"

Specify the fully qualified name of the class for which pending release of monitors owned by instances of the class should be observed. Wildcards are supported. Requires the braced notation for location conditions.

"monitor_release" ( <:class_name:>[".*"] | "*" ) "{" ... "}"

Specify the fully qualified name of the class for which the release of monitors owned by instances of the class should be observed. Wildcards are supported. Requires the braced notation for location conditions.

"throw" ( "*" | <:class_name:> ) ["+s"] "{" ... "}"

Specify the fully qualified name of the throwable class for which it should be considered an observable event when an instance of the class is thrown. Wildcard inclusion of all classes in the module is supported; package qualified wildcards are not supported for specifying classes. The optional token '+s' indicates that subclasses of the specified class should also be considered observable throwables. Requires the braced notation for location conditions.

"catch" ( "*" | <:class_name:> ) ["+s"] "{" ... "}"

Specify the fully qualified name of the throwable class for which it should be considered an observable event when an instance of the class is caught. Wildcard inclusion of all classes in the module is supported; package qualified wildcards are not supported for specifying classes. The optional token '+s' indicates that subclasses of the specified class should also be considered observable throwables. Requires the braced notation for location conditions.

"static_init_enter" <:class_name:>[".*"] | "*"

Specify the fully qualified name of the class for which entry into the static initializer should be observed. Wildcards are supported. Brace notation for location conditions cannot be used.

"array_element_load" ( "*" | <:type:> ) ["min:" <uint>] ["max:" <uint>] "{" ... "}"

Specify the type of array element for which array reads should be observed. Wildcard inclusion of all known types is supported. The same set of input formats are accepted for the element type parameter as for the argument type parameters in method selection expressions (such as in a location block).

Minimum and maximum element indexes can also be specified. In the absence of a specified minimum or maximum, corresponding values from the global array_element_load_bounds matching the given type will apply, if any. The presence of only one of the minimum or maximum index bounds, without an overriding global bound for the absent bound, causes all reads from the start of the array to the given element (for maximum) or from the given element to the end of the array (for minimum) to be observed. If the minimum bound is greater than the maximum bound, reads from the "tails" of the array are observed (from the start to the maximum, and from the minimum to the end).

Multiple array_element_load event selections for the same element type may result in merging of the element ranges specified using index bounds.

"array_element_store" ( "*" | <:type:> ) ["min:" <uint>] ["max:" <uint>] "{" ... "}"

Specify the type of array element for which array writes should be observed. Wildcard inclusion of all known types is supported. The same set of input formats are accepted for the element type parameter as for the argument type parameters in method selection expressions (such as in a location block).

Minimum and maximum element indexes can also be specified. In the absence of a specified minimum or maximum, corresponding values from the global array_element_store_bounds matching the given type will apply, if any. The presence of only one of the minimum or maximum index bounds, without an overriding global bound for the absent bound, causes all writes from the start of the array to the given element (for maximum) or from the given element to the end of the array (for minimum) to be observed. If the minimum bound is greater than the maximum bound, writes to the "tails" of the array are observed (from the start to the maximum, and from the minimum to the end).

Multiple array_element_store event selections for the same element type may result in merging of the element ranges specified using index bounds.

Importing Other EDL Files

The EDL specification supports a statement of the form:

  @import "other.edl"

These statements must appear outside of any section. When an import statement is encountered, the parser textually inserts the contents of the specified EDL file into the current file at that location. This enables one to store event specifications that select events for particular purposes (analyses) as fragments in a modular fashion, and then combine them to create composite specifications to, for example, check multiple independent properties simultaneously.

Example EDL Specification

The following is a very simple EDL specification intended just to illustrate the basic behaviors described above.

begin EDLSuite example_spec
end
begin Observables example_events
  begin Preamble
    System-classes: sys.prog
    Module-classes: mod.prog

    Type-name: Writer java.io.Writer
  end

  - virtual_call BlockSynchronization.* {
      in ProgramCore.*
      not ProgramCore.setup Writer
    }
  + virtual_call BlockSynchronization.init * {
    }
end

Classes listed in 'sys.prog' comprise the entire Java program of interest. Classes in 'mod.prog' comprise the subset of the system classes that are considered observable by default (all observable events within these classes are considered included unless explicitly excluded).

The first rule specifies that all calls to virtual methods implemented in the class BlockSynchronization are to be excluded if they occur in any method in class ProgramCore except if they occur in method 'setup' of class ProgramCore that takes a Writer as an argument. Note that package qualifiers would be permitted (required, in fact) for BlockSynchronization and ProgramCore if appropriate.

The second rule specifies that all calls to the virtual method 'init' of class BlockSynchronization with any signature are to be included regardless of where they occur. This overrides the first record for calls to method(s) 'init', but not for calls to any other method.

Running the Event Dispatcher

To use the semantic event instrumentor from the command line, first create an EDL specification as described above. Then run the instrumentor using the following command:

java sofya.ed.semantic.SemanticInstrumentor [-dabe] <edl_spec>

Normally, the instrumentor inserts instrumentation to observe when the program execution leaves and enters the module identified in the specification. This is intended to allow the consumer of an event stream to properly maintain the current context of the execution. The optional parameter '-dabe' (disable automatic boundary events) is provided to instruct the instrumentor to not instrument for these events.

Running the Semantic Tracer

To execute a program instrumented for semantic event dispatch on the command line, run the following command:

java sofya.ed.SemanticTracer [-cout] <-md data_file> <-main main_class> [arg_1 arg_2 ...]

-cout

Instructs the tracer to echo the events to the console on stdout.

-md data_file

Specifies the name of the module data file generated by the semantic event instrumentor.

-main main_class

Specifies the fully qualified name of the main class that is used to launch the program. The subsequent optional argument parameters are the arguments that should be passed to the traced program.

The semantic tracer generates a binary trace file in the current directory containing the events raised during the execution of the program. There is currently no convenience viewer provided for this type of trace file, nor is there a class for reading the trace file programmatically. The format of this file can be derived from the TraceFileTarget class, however, these trace files are intended primarily as a demonstration of the use of the framework. Clients of the framework should write their own listeners to generate trace files in whatever format desired, or to process the event stream directly online.

Notes on Implementation and Concurrency Issues

Monitoring multi-threaded systems in Java poses special challenges for instrumentation. It was determined through the course of implementation that it would not be possible to monitor all types of observables solely through the use of instrumentation. The principle problem is that it must be possible to guarantee the atomic integrity of information transmitted from the probes inserted in the instrumentation process. Unfortunately the only means to guarantee the atomicity of such information for some types of observables is synchronization of probes on a system-global common lock, which excessively interferes with the natural behavior of the system. This degree of interference would also invalidate the information obtained. Therefore this implementation uses the Java Debug Interface in conjunction with instrumentation to meet the need for monitoring specified observables in a non-disruptive manner.

In order to collect information about some types of specified observables, the implementation applies instrumentation at the bytecode level, by operating on the class files comprising the system. The instrumentor may potentially insert instrumentation into any class in the system, as this is often necessary to monitor all actions related to certain types of observables. Instrumentation is also used in some cases to improve the efficiency of the runtime monitoring over that yielded by the JDI.

Probes are currently inserted to observe allocation of new objects, invocation of constructors and methods, and entry into methods. Method calls and entries are observed using instrumentation probes due to the nature of the JDI. Requesting observation of method entries and exits through the JDI is all or nothing. Constraining observation of such events in the manner desired for this application, for example observing only methods matching a certain signature, would require a filtering process in the tracing tool. Many unnecessary events would be raised by the JDI, in addition to the extra cost of applying this dynamic filtering in the tracing tool. By instead inserting probes in the bytecodes to specifically observe events relevant only to specified methods, a considerable reduction in overhead is achieved.

The result of the instrumentation process is set of class files containing the probes necessary to monitor a subset of the observables defined in the EDL specification, and a module data file to be supplied to the event dispatcher which contains information necessary for that tool to provide complete coverage of the specified observables. Among other things, this data file contains a compiled version of the EDL specification.

Note that the instrumentation should only ever be applied once, or the behavior is undefined. If the parameters of the instrumentation need to be changed, the system should be recompiled first and then instrumented again.