YoGA Philosophy

YoGA aims at providing the user with the ability to work on the GPU from within Yorick and using the CArMA API to simplify access to GPU features. It allows to easily build and debug high-level applications that run on the GPU thanks to Yorick's interpreted environment.

Because the GPU is a "device" in a "host" GPU applications performance tend to be limited by data transfers between the host and the GPU. Depending on the memory bandwidth of the GPU used, data transfers (host->GPU & GPU->host) as well as memory allocation can kill your acceleration factor. In this scenario, you have to allocate memory space once, then transfer- intensive compute -transfer back and then free memory. This can be very effective depending on the amount of data to transfer both ways.

Yorick plugin

The Yorick implementation uses an opaque object that points to the CArMA object C++ class : Yoga Object. It is built using the standard API for interfacing yorick packages to the interpreter. This way persistent objects on the GPU memory can be created and manipulated from Yorick. To this object in Yorick are also associated wrappers that allow to mimic basic operations on Yorick variables (alloc / create, destroy / free, print, eval). Hence a Yoga Object can be manipulated in the same way as a standard Yorick variable. Allocation is done once, and destroy is handled either by the user when needed or by Yorick when terminating (minimal chances for a leak).

Additionally, device2host and host2device routines are provided allowing the transfer between a standard Yorick variable and a Yoga Object.

C-wrappers aimed at being launched from within Yorick have also been added. They wrap calls to these CArMA oject C++ class methods using the content of the stack as arguments. These wrappers can be called as functions, in which case they will create new Yoga Objects to store the result or a subroutines in which case they will use pre-existing objects. They provide various mathematical functionalities. After object creation, using these wrappers, the user can build a fast sequence with no memory space allocation, perform multiple complex operations on Yoga Objects, only on the GPU, and then transfer back the result (for display for instance) and eventually (and optionaly) desallocate. See for instance the Practice YoGA page for some details and a practical example.

Updated by Damien Gratadour over 10 years ago · 3 revisions