MetaConc
Meta-Level Engineering and Tooling for Complex Concurrent Systems

This was a joint project between the Institute for System Software at JKU, the Software Languages Lab at the Vrije Universiteit Brussel, and the University of Kent (UK). It was funded by the Austrian FWF and the Belgian FWO and lasted from March 2016 to February 2021.

Multicore processors have become widely used for devices such as phones, tablets, and high-end servers. As a result, concurrent programming has become more important and complex software systems started to mix a wide range of concurrency models such as Shared Memory (Thread & Locks, Fork/Join, Software Transactional Memory) and Message Passing (Actors, Communicating Sequential Processes) to solve both functional and non-functional requirements. Concurrency has, however, a pervasive influence on the correctness (race conditions, deadlocks) and the performance (lock contention) of the whole system rather than being nicely confined to subsystems. This makes concurrent programs hard to understand and debug. Software tools for concurrent programs usually work on the lowest abstraction level, e.g., memory accesses, instead of the high-level concurrency concepts.

To support the development of such complex systems, we developed a novel tracing approach as well as a meta-level interface (API), which can capture the interaction among different concurrency models to deliver the concepts needed to support tools such as debuggers or profilers.

The main research challenge was to identify a common meta-level interface that captures the essential properties of multiple concurrency models, with a minimal performance overhead. A second major challenge was to investigate tool support, in particular, debugging tools that assist developers in finding errors and improving program comprehension. Classic approaches can affect the way a program behaves. We worked on minimizing such interference with program execution to avoid hiding concurrency issues.

As a result we devised and implemented

  • A novel and uniform tracing technique for multi-paradigm concurrent programs that efficiently captures a program's high-level events (e.g., messages, transactions, locking) and allows the user to replay the program with the same outcome as in the original execution. The novel aspects are a uniform meta-level interface for different concurrency models and a novel approach for ordering the events in order to overcome nondeterminism.
  • A uniform trace file format that is based on ordered events and allows different concurrency models to use the same infrastructure for storing and retrieving event data.
  • A novel snapshotting technique that efficiently captures a concurrent program's state in the background without halting its execution. Snapshotting allows the user to start the replay of a concurrent program at points in time other than the beginning of the execution.
  • A prototype of a time-traveling debugger that makes use of our infrastructure to debug and analyze multi-paradigm concurrent programs.

Tool Workflow

Our debugging tools are fully integrated into the SOMns language implementation. Hence, it is not necessary to modify a program in any way to get debugging support for nondeterministic programs. The workflow to achieve deterministic replay of a recorded execution is as follows:


  1. To locate the root cause of a nondeterministic bug with our tool, one first needs to execute the program under SOMns and to record a trace of the program's execution in which that bug occurs. If the bug does not occur in every execution, it may be necessary to record multiple executions to get a trace that contains the bug.
  2. Once a trace that lead to the bug has been captured, the bug can be reproduced deterministically by replaying the recorded execution.
  3. By enabling the Kompos protocol, a compatible debugger such as our Kompos web debugger can be attached. The SOMns VM collects additional information about the program execution, particularly concurrency models, that are delivered to the debugger where they can be used for visualization and advanced debugger features for concurrency models. For instance, the communication patterns between concurrent entities can be visualized, and concurrency-model-specific stepping operations and breakpoints are made available.
  4. While replaying the program execution, the program is explored to find the root cause of the bug. A trace can be replayed arbitrarily often. This makes it possible to explore different areas of a program and to start over if a wrong stepping operation was chosen.
  5. Once the root cause of a nondeterministic bug was identified, all that remains is to fix it.

Downloads

Our implementation is publicly available on Github as part of SOMns.

Partners

MetaConc was a joint project of the Institute for System Software at the Johannes Kepler Universität Linz, the Software Languages Lab at the Vrije Universiteit Brussel, and the at the University of Kent.


Institute for System Software

School of Computing
Hanspeter Mössenböck (contact)
Dominik Aumayr
Elisa Gonzalez Boix (contact)
Carmen Torres López
Clément Béra
Stefan Marr (contact)

Funding

The project was funded by FWF Austria and FWO Flanders.

Project I 2491-N31
Project G004816N

Publications