SCOOP Runtime

To implement the SCOOP runtime I use a technique that tries to provide only the simplest guarantee: that once a processor is acquired, the calls that are issued on it from another processor are executed in the order they are dispatched.

Implementation of a processor

I represent a processor (essentially) as a work queue. This queue holds the jobs that the processor has to complete. However, the clients that have issued these jobs must have the assurance that they can maintain pre-/postcondition reasoning between the calls they have sent to the supplier processor.

This is the fundamental guarantee that I aim to provide.

Of course one way to do this is to give each client processor exclusive access to the supplier processor's work queue. This is the situation below where processor x has first acquired the supplier processor, and logged all the calls it wishes, followed by processors x and y.

This approach has the benefit of being very straight-forward to implement, and thus easy to ensure the correctness. However, it introduces a lot of unnecessary contention: all 3 processors are vying for the same piece of memory and lock.

As a solution to this, I propose using a queue of queues. This is slightly more complicated than the naive solution above, but offers better scalability. Each processor holds a queue, and each element in the queue represents a connection with a client processor. Each client is free to fill these queues as they wish, in parallel with one another.

This has two benefits:
  1. The contention for the supplier processor is reduced, while the queue of queues is not yet ful it can service many different clients.
  2. Since each client has their own private queue for the supplier they share less memory, which is lowers memory contention in multicore systems, lowering cross-core communication.
In general a highly efficient concurrent queue implementation is needed for this. The TBB concurrent_bounded_queue class does this job very well, implemented using a system of 8 mini-queues and some highly tuned code.

Of course these just describe the data structures used, the actual computations are performed by a processor thread. However, the processor thread actually has a very simple implementation: it just pulls private queues off of the queue of queues and processes the work inside the private queues until they are signaled as done.

The private queue

The private queue that links the client and the supplier processor has the responsibility of logging calls and tracking what sort of calls have been logged. Logging the calls for queries means waiting on the result to arrive before returning from the private queue.

The private queue performs very simple tracking, namely remembering if the last call made was a query. If the last call was a query, that means that any subsequent calls that are logged through the private queue can actually be performed directly by the client instead of by the supplier processor. This is because if the last completed action was a query, the supplier can not have any work remaining on it. Therefore, there is no harm (as the result cannot change) in executing the query directly on the client side.