|This page summarizes the project which began in September 2009 to implement shared-memory parallelism into the language of Qaqao. Various aspects of the project are summarized below.|
What is Implicit Parallelism?
Implicit Parallelism is a parallel programming model which aims to take advantage of the parallelism already inherent in the structure of a programming language. This is opposed to Explicit Parallelism which involves user-specified parallel instructions. Though Implicit Parallelism may at first seem a great deal like Automatic Parallelism, it actually differs significantly because language structures can be built in such a way to restrict coding practices.
Why Implicit Parallelism?
Implicit Parallelism simplifies the user's experience with parallel programming. In the modern day, nearly every basic end-user computer has at least 2 cores, and in the near future, this number will increase dramatically. Without the need to explicitly manage parallel resources, the average program can take advantage of simple multi-core processing. Essentially, because of structures already built into the programming language by its very design, parallelism is easy to achieve, and good behavior is easier to evaluate.
Which Structures in Qaqao Will Be Targeted?
What Are the Specific Goals of this Project?
This project aims at the creation of a parallel model for Qaqao threading, which is flexible enough to allow different back-end parallel implementations, and then to implement this model with a simple fully-shared-memory back-end—Pthreads—as a first demo. Therefore, the majority of the work on the project is devoted to developing this model, and investigating methods of validation of the code within the parallel construct.
The parallel model for Qaqao threads is very much like the client/host model used by Cuda and other device languages. Primarily, the idea is that the program written by the user has no knowledge of its parallel implementation. In simple terms, the program simply calls the parallel "umbrella" which overhangs the program (of course, this is not literal, as there is no overhead assuming no parallel constructs), and states the parallel structure it has encountered and how large the iteration space is. The parallel controller then checks the number of threads it has remaining for allocation, checks for the possibility of over-threading, and divides up the iteration space accordingly. It is then also the controller's job to make sure that all returning threads are correctly resolved and values returned where they belong.
Compile-Time Description / Validation
As stated above, the most important feature of Implicit Parallelism is the fact that structures already built into the language can be restrictive enough to allow simpler validation. However, concurrent safety within the structures must still be evaluated. A scheme was developed for this purpose which takes advantage of the fact that everything imaginable is defined as an Object in Qaqao. This scheme is called "Message Labeling".
The idea behind message labeling is that message-passing underlies all operations in Qaqao. In addition, one takes note of the fact that some messages are safer than others were they to be run in parallel with one another. The idea behind this sort of "labeling database" comes from an article published by John Viega of Reliable Software Technologies, and, also from the concept of a "Pure Function," which originated in High-Performance Fortran.
The current labels for Qaqao messages are:
A message is classified based on its operations (i.e. Does it have I/O operations? Does it use bind statements or other unsafe parallel operators?) and the safety levels of any messages which are sent within the message. For instance, any message which sends another messaged labeled "Concurrent Dangerous" to any object without protecting the send will also be classified as "Concurrent Dangerous". An "Incomplete" label is assigned to any message which, when declared, does not know of the internal implementation of one or more of its message calls (i.e. it uses a currently-unimplemented message).
There are a number of features, however, which distinguish this work, making it not simply a copy of the relatively unsucessful project of High Performance Fortran. Most notably, this parallel model takes careful note of memory locality in any way possible. For instance, because of the promise of contiguous memory space in Vectors, they are the preferred structure for parallel programming in Qaqao to maximize L1 cache hits. In addition, the possibility of Vector initialization across the threaded environment to have better paged memory is currently being explored. Notably, it is certainly true that there is only a very limited amount that implicit parallelism can do with regard to locality. The effectiveness of parallelism in this regard largely hinges on the code provided to the compiler.
As stated above, one of the most important forms of static analysis available to the compiler is evaluation of the concurrent safety rating of messages. All messages which are Concurrent Safe can be called freely in the parallel structure. If used in a generally non-intrusive manner (loosely defined), Concurrent Suspicious messages generally are acceptable, though compiler warnings may be generated if the compiler is unsure of the usage. Concurrent Dangerous messages are not allowed in parallel structures if unprotected (e.g. by a Mutex). It is possible, however, that one may have a need for a Concurrent Dangerous call in parallel (though this should very rarely be the case), and, as such, compiler directives are available to temporarily "turn off" static validation of mesages in parallel.
As has been duly noted by a large amount of research, static detection of race conditions is a problem in NP, that is, it cannot be guarenteed to complete in polynomial time. However, heuristic methods of race condition detection do exist. Currently, if an Object is being set in a parallel structure, Qaqao checks whether all sources are/will be the same object, and if this is not easily determined, a warning is generated to alert the programmer of the possible race condition. Of course, this method is quite ineffective as the number of false-positives is exorbitantly high. This area is still being investigated for improvements, and will likely continue to be a focus of the project in subsequent years.
This area of the research is rapidly evolving, and, as such, the nature of the evaluation is not descriptive at this point. More updates will be posted on this area as soon as they become available.
Once the project has reached a more stable state, portions of the source code will be made available under the GPL. Examples and screenshots will be made available once testing of parallel implementations completes.
Qaqao Research—Implicit Parallelism
© Peter Ohmann, 2009
Questions? Mail will be forwarded.