User Interface Design Exception Handling Performance Engineering Deployment

System View

The main idea of the general view system is to structure the features of the target system according to their conceptual and logical links. However, some aspects of the target system are spread all over the system making it difficult to assign them to a particular view. Therefore, I introduce the System view that covers the aspects that affect multiple views or even the system as a whole.

User Interface Design

In this section, I will discuss some issues that are related to the user interface of the target system. Today, the main interface between a program and the user is usually a graphical user interface (GUI) and so I will limit the scope of this section to GUIs only. The user interface does not only play an important role with respect to the intended use of the system, but it is also important for the designer to know that in most applications, more then 50% of the total system size are dedicated to the UI. However, due to limited space, we can only briefly mention some basic aspects and direct the reader to the literature for more information.

In the case of a multiagent application, the user interface can serve two major purposes. First, it can manage the task-specific user interaction which is to accept inputs from the user and in turn to present the results of processing the input data. Second, the user interface can be used to monitor and manipulate the system activities of the multiagent system, i.e. the designer (and later the user) must be able to trace the systems activities. The concurrent, distributed approach to problem solving makes this a difficult task that must be carefully executed. Monitoring the system activities is also of vital interest for the development phase of the system with respect to debugging. For monolithic systems, debugging features are supported by the programming language or a development environment; in the case of multiagent systems, however, only little or no support is given by existing languages and environments. Thus, either standard agent development frameworks such as ZEUS must be used to visualize the system activities or the system designer must develop and integrate the facilities to support tracing and debugging into the design.

Despite the requirements of a particular class of applications, however, any user interface should comply with the following principles:

Clarity: The entities of the GUI should be clearly related to the entities they represent in the real world and they should be given unambiguous names or icons.
Comprehensibility: The user interface should be intuitive to use and provide an understandable access to the system functions. The key questions that describe this property are What to look at?, What to do?, When to do it?, Why to do it? and finally How to do it?.
Consistency: An individual entity should have the same representation even if it occurs in different contexts and actions should always lead to the same results in order to guarantee some sort of predictability to user. Furthermore, the windows, dialogs etc. should have a consistent layout and appearance throughout the entire user interface.
Directness: The user interface should provide a direct and intuitive access to accomplish the tasks. For example, complicated parameter settings in several menus before a function can be activated should be avoided.
Control: It is important to design the interface in a way the provides the user with the feeling the he or she controls the behavior of the system and not vice versa.Thus, the system should query the user before taking action and it should keep the user informed about ongoing computations, loading processes etc., for example by showing an hourglass whenever an action is started.

Thus, a good user interface reflects the needs and capabilities of the user, obeys physical constraints of the hardware and conforms with existing standards. Our recommended iterative design process for the user interface consists of seven steps as described in the following process model:

Know your user In this step the user groups are characterized according to their experience, the estimated use frequency, their skills (e.g. typing or other input devices), the available amount of training, their motivation etc. This characterization will help the designer to develop a general idea of the user interface. It is, for example, a completely different task to develop a user interface for a novice that uses the system occasionally or for an expert who is familiar with similar applications and who regularly uses the system.
Relate to the system The initial idea of the interface must then be related to the system that it is indented to represent towards the user. Therefore, the designer identifies potential points-of-interaction between the GUI and the underlying system. These points-of-interaction are all features that can be visualized or manipulated by the user.
Check standards Before the real design process of the particular user interface starts, the designer should check its initial idea against existing standards and the constraints they may impose on the interface. This step is important in order to avoid a user interface that is not generally accepted by the user community. However, if the entire target system is a customized application, deviations from the standard may be tolerable.
Define menus The menus that usually appear on top of the application window define the overall structure of the GUI as they are usually the first entry point for the user. The menus should be related to functional groups within the system and have speaking names in order to enable the user to relate the menu titles and entries to particular functions of the system.
Select windows In this step, the windows that represent different aspects of the system are being built. First of all, the designer must decide which information should be grouped together in a single window and then the following steps are executed for each window:
1. Select presentation techniques The most appropriate presentation technique, e.g. textual, graphical or audio representation, is chosen for each of these groups and their elements.
2. Select the appropriate screen-based controls The control elements include the entire palette of tools offered by most existing window system interfaces, e.g. text inputs, slide bars, list pickers, etc.
3. Create the layout In this step, the presentation and control elements are arranged within the window.
Create a help system The help system is an important issue in a user interface when the interface reaches a particular level of complexity. The designer should provide two types of help to the user, the first one is a general help system that can be queried for specific topics by the user and the second is context-specific help that is activated by the user on-the-fly e.g. by pressing a mouse button while the mouse pointer is over an input field.
Evaluate the interface design The user interface is ideally checked by selected users if they are available. If the current version of the user interface fails in one of the aspects described above, the entire process is re-iterated from step 1.

The seven steps of the previously described process model are independent of a particular programming language or window system and can thus be used to guide the interface design on a very general level of abstraction. In order to speed up the development of the user interface, it is highly recommended to make use of existing software libraries such as Tcl/Tk or Gecco in order to benefit from off-the-shelf components for standard user interface elements.

Although user interface design is probably the most important and most extensive view that covers system-wide properties there are still other aspects probably not as exposed as the user interface that affect the system as a whole. These facets will be discussed in the subsequent sections.

top

Exception Handling

This aspect of the System view describes the exception handling policy of the target system - a feature that is usually spread all over the entire system and that effects almost every part of the target system. However, the term exception is usually used in a lot of different contexts as the following definition shows.

Definition [Exception] An exception is the union of error, exceptional case, rare situation and unusual event.

In order to provide a structure for the different aspects that are covered by the term, the following categorization can be used: Software or design errors are caused by implementation mistakes in the software, e.g. dividing by zero, array index range errors, incorrect loop conditions etc. Hardware errors are the result of failures of the underlying hardware such as memory leaks, sensor failures etc. State errors occur if the systems model of the environment is inconsistent with the actual state of the environment; this kind of error is often found in robotic applications. Timing errors.finally can occur only in real-time systems and are caused by the violation of timing constraints or resource (processor, memory) overload.

In the System view, however, we are only interested in software and design errors and how to handle them. In order to describe the error handling strategy of the target system, it is often useful to characterize the indented mechanism according to the following scheme.

Scope: The scope of the error handling mechanism can either be local or global. A local strategy aims at the individual components of the system and specifies the error handling activities from the individual point-of-view. A global strategy, on the other hand, introduces a central authority to which all errors are reported and that handles the exceptional situation according to a given plan. Multiagent system seem to naturally suggest a localized exception handling scheme. However this can make it sometimes difficult to cope with temporal information e.g. when the designer needs to detect the exact temporal ordering of exceptions during the debugging phase. In this case, a global scheme can significantly reduce the development cost although the idea of a central authority opposes the basic multiagent idea.
Purpose: The purpose of the exception handling process can be error detection or error recovery. The first case is easier to handle as it simply requires some sort of notification mechanism to indicate the presence of an exceptional situation to the designer or the user. Error recovery is usually much harder to achieve as it requires a thorough analysis of the current system state and explicit knowledge of how to handle a particular failure. However, in certain types of multiagent system, this type of exception handling will have to be considers as a simple system shut-down may not be an acceptable behavior.
Technology: The technology aspect of the exception handling mechanism, finally, deals with the concrete implementation of this mechanism. A language-based approach uses the constructs of the underlying programming language to implement the exception handling strategy whereas an operating system-based approach makes use of services provided by the platform on which the target system is running. The choice of the best technology is a difficult matter that requires to weight several factors according to the requirements of the target system. While a language-based approach is usually easier to use and is mostly platform independent, it nonetheless depends on the expressive power of the target language which might be too limited for most multiagent applications. An operating system-based approach, on the other hand, is usually more platform dependent but also usually provides more flexibility to the designer.

Finding the exception handling strategy that is suited best for a particular target system is not easy and may have consequences for the entire system. Thus, it is usually a good idea to consider this aspect quite early in the development process, ideally before the actual implementation is started.

top

Performance Engineering

In this section, I will discuss some ideas that deal with performance aspects of the target system and I will present a micro process model for the Performance Engineering process from Rational.

Definition [Performance Engineering] Performance Engineering is a method to identify and reduce or eliminate performance problems during the software development cycle after the code has been designed and developed.

The performance engineering process shown above is as follows:

Understand In the first step of the Performance Engineering Process the designer must develop a "feeling" for the runtime behavior of the application. This is best done by dividing the application into several phases (e.g. initialization phase, input passing, etc.). Then the designer can decide, according to the separation of the runtime behavior, which phases are the most time consuming and focus the attention on these phases as improvements in these phases are likely to yield the highest gain in performance.
Identify The goal of this step in the Performance Engineering Process is the identification of potential bottlenecks in the selected phases of the application. The sources of bottlenecks are manifold but is still possible to identify some prototypical classes that cover most existing bottlenecks.
Set Goals When the potential bottlenecks are identified, the designer must set quantifiable goals in order to prioritize the bottlenecks according to their relevance. Several criteria for this are possible, the most obvious being the potential performance improvement. However, the cost (effort) to remove a particular bottleneck should always be weighted against any potential gain.
Performance Improvement Cycle The goal of this, most important step in the entire process, is to isolate and eliminate a particular bottleneck. The Improvement Cycle consists mainly of six activities: First of all, the test for a particular bottleneck must be carefully planned in order to enable the designer to focus on the performance aspect under consideration and to exclude any effects that might have influence on the result. Next, the test is executed and the performance of the program is measured. If the evaluation of the test results show, that the performance is already satisfactory according to the previously defined quality standards, the Performance Improvement Cycle can be aborted. If the performance is below the defined measure, the code is changed in order to remove the bottleneck and the test is executed again. If the next measurement shows that the code change has yielded the desired effect, the cycle is aborted. If this is not the case, the cycle starts again and repeats until either the performance goal is met or the effort exceeds a defined amount.

The fundamental step in this process is to identify the bottlenecks that are responsible for performance losses in the target system. Some of the major causes for bottlenecks are the following:

Useless computation

Useless computation is often the result of program changes that make parts of the original code obsolete without removing the then unnecessary code fragments. Another source of useless computation are default computations that are executed even if they are not required, for example opening connections to remote agents by default can have severe effect on the systems start-up time.

Re-computation

This bottleneck is the result of computing results although the could be cached for later use. The following example shows a very simple case of re-computation

if X.getRow() != MAXROW then 
  StartRow = X.getRow() + 1; 
  ...
endif

In the example, the getRow operation on object X is called twice although it could have easily been cached as follows:

Tmp = X.getRow(); 
if Tmp != MAXROW then 
  StartRow = Tmp + 1; 
  ... 
endif

However, this is a rather trivial example that is often detected and removed by the compiler, but it can nonetheless have some impact on the overall runtime behavior e.g. if the getRow function is computationally expensive or when the entire code sequence occurs within a loop. Generally speaking, the higher the effort to compute a particular result and the more often it is needed, the higher is the performance gain through caching.

Waiting for service requests to complete

Whenever a program requests a service from the operating system, it is typically blocked until the request is completed. Requests to the operating system are quite frequent in any computer program and consequently, some attention should be paid to these calls and how they can be reduced or transformed such that they are less vulnerable to external effects. Prominent examples for this kind of bottlenecks are file access and memory management. In the first case, for example, the designer can make sure to be independent from network delays while accessing a file on a file system that is mounted via NFS if the file is read once and then kept in memory for further fast access. Obviously, this does only yield performance gains if the file is accessed more then once. In the case of memory management, it is sometimes useful not to rely on the memory management of the operating system. Allocating a large block of memory and then organizing the memory management locally is often a valuable alternative to the service provided by the operating system. However, even doing so does not prevent the application from being delayed because of page faults and heavy swapping. If this occurs, other mechanisms to speed up memory access must be found.

Wrong or missing assumptions about the runtime system

This kind of bottleneck can often be found in the use of function parameters or local functions. Whenever a data structures is passed to a function using the call-by-value mechanism, the entire data structure must be copied on the stack and back again when the function is done. In the case of large structures, this copy operation can take a long time and additionally slow down memory access in the function body. Therefore, any call-by-value with a large data structure should be replaced with a call-by-reference even if the data structure is not changed in the function. Local functions are a similar problem as the are often generated on the heap at runtime. This takes additional time and memory and should be avoided in frequently called function.

Non-scalable algorithms or data structures

This is a performance killer that is often found in applications that were developed and tested for small example data and that fail to work with the real operational data. A simple example is the following: assume, an agent has a list in which it stores all its acquaintances and that it looks up by performing a linear search whenever it sends a message and that is updated by appending a new entry whenever it receives a message from a formerly unknown agent. As long as the system is small, the effect of this list search and update mechanism can be neglected. However, if the system is scaled up to several hundred or even thousands of agents, this scheme will lead to performance losses that could be avoided by using a more efficient search strategy and/or a better data representation.

top

Deployment

Definition[Software Deployment] Software deployment is the process that covers all of the activities performed after a software system has been developed.

The complexity of the deployment process depends on the complexity of the software and the required system environment. Planning of this process is essential for a successful implementation of the target system at the user site. The deployment life cycle of a software system consists of eight steps:

Release This step is the interface between the development and the deployment process and includes all activities that are necessary to package the software system as well as the knowledge to set in operation at a user site.
Installation This step is usually the most complex activity because it must find and assemble all necessary resources. In this step the system as well as the external resources such as libraries, software packages etc. are either initially deployed at the user site or updated according to the required versions.
Activation The activation of a software system refers to the process of starting the participating components in order to get the system running. For simple software systems, this may require only to push a button or to enter a command line. More complex systems, however, may require more sophisticated, coordinated activities in order to bring all components into operation.
Update Updating am already installed software system means to modify it in order to add new functionality or to remove bugs. Updates are issued by the software provider and taken up by the clients.
Adapt The adaption of an installed system differs from an update in that the former is limited to local changes only while the latter refers to potentially all installed systems as a whole. Adaption thus refers to changes of the system at a particular site in order to adapt it to changes in the system environment.
De-activation De-activation is the inverse process to the activation of the system and refers to a controlled shut-down of all components involved in running the system.
De-installation When the system is no longer required at a user site, it must be removed from the site. This it not necessarily a trivial process because attention must be paid to not disturbing the system environment of the user site e.g. by the deletion of shared resources. Thus, this step is not the process of undoing everything that was done upon the installation of system but it requires an analysis of the current state of the user site to detect dependencies of other software systems on resources that were installed with the target system.
De-release The final step in the deployment life cycle is reached when a system is regarded obsolete and it is no longer developed or supported by the manufacturer. This step is distinct from the previous one in that it does not mean that the system cannot be used any longer. Rather, the users are free to use the system but they should be aware that no further support will be available.

The deployment of a particular system is usually a highly individualized task and thus hard to capture in a general process model. The following generic deployment process model is thus described at a rather high level of abstraction and will need to be tailored according to characteristics of a specific project. Basically, the deployment process consists of five steps:

Identify and characterize components The components that are relevant for the planning of the system deployment are for example the executable files, data sources, hardware devices, external (software) components etc. Each of these components is characterized for example by the site characteristics were it should be installed, the deadlines it must hold, its availability or a version specification. The goal of this step in the planning process is to obtain a complete picture of every entity that is related to the target system in one way or another.
Describe dependencies In this step, the dependencies between the components are explicitly modeled. A dependency is for example a "uses" relation between components or temporal precedences in the installation process that must be maintained. Note that dependencies can vary between different user sites.
Define Activities For each component, the activities that are necessary to set a component into the state that is required by a user site are specified. Examples for activities are the steps that must be performed to install a software library or the instructions to set up a particular hardware device.
Execute In this step, the previously defined activities are executed in the order imposed by the dependencies that were modeled in step 2.
Start In the final step of the deployment process, the system is set into operation at the user site.

This five-step generic deployment process covers the major parts of the deployment life cycle: the release of a particular system is described in step 1, the installation in step 4, the activation in step 5 and the update activity of the life cycle is captured in steps 1 to 4. Other activities in the life cycle are not directly covered but the can make use of the gathered information, e.g. the de-installation will certainly use the component characterizations and the dependency information when removing a system from a particular site.

top