UTTUNGA

CalligoTech’s Uttunga (Sanskrit word, meaning “Reaching to a greater height”) is an accelerator card for HPC/AI workload acceleration, powered by our multi-core Posit-enabled RISC-V based TUNGA SoC.  Uttunga takes any Server (x86, ARM or PowerPC) with PCIe slots to greater heights of performance, with Posit-based Computing capability.

Posits offer optimized memory usage and computing efficiency. Standard RISC-V Instruction Set (F & D) is leveraged to implement basic arithmetic operations in Posit <32,2> and <64,3> configurations. Other configurations of Posits are supported using the programmable gates, described below. Our compilers are enhanced to generate Posit-enabled binaries of any C/C++/gFortran applications, without the need for any source level modifications.


Uttunga serves as the platform for smooth integration of legacy scientific libraries such as BLAS, MAGMA and liner solver libraries, to off-load from host CPU and more importantly enabling Posit-based computing.  Integration in the host memory hierarchy is transparent for avoiding the need of data copy and the accelerator offers standard support of programs. The libraries are organized in order to expose Posit-enabled computing as compatible replacements of their usual counterparts in BLAS and other libraries. 



Uttunga enables energy efficient computation with use of iterative linear algebra kernels, such as for Physics and Chemistry HPC codes. QUIRE feature in TUNGA reduces rounding errors, improves computational stability. QUIRE accumulator structure produces exact dot products, guaranteed up to ~2 billion-long vectors (231). High impact QUIRE feature eliminates the need for unnecessary 64-bit computations.

 

Programmable gates are part of our TUNGA SoC. Pool of FPGA gates is mainly useful to support functions that are on critical path and that require reconfigurable feature in the field.  Some of the examples are – Acceleration of specific tasks for datacenter services, Off-load of a wide variety of small tasks from CPU and speed processing, to handle non-standard data types to speed up AI training and inference. Applications such as cryptography, AI, host-CPU support functions for variable precision computing will be implemented on the pool of gates.