This installment of a "SYCL Sparkler" explores in depth a way to implement a reasonably efficient implementation for Homomorphic Encryption using modern C++ with SYCL. As a result of their work, the authors learned some valuable optimization techniques and insights that the they have taken time to share in this very interesting and detailed piece.
A key value of using C++ with SYCL, is the ability to be portable while supporting the ability to optimize at a lower level when it is deemed worth the effort. This work helps illustrate how the authors isolated that optimization work, and their thought process on how to pick what to optimize. The code for this implementation is available open source online. None of the performance numbers shown are intended to provide guidance on hardware selection. The authors offer their results and observations to illustrate the magnitude of changes that may correspond to the optimizations being discussed. Readers will find the information valuable to motivate their own optimization work on their applications using some of the techniques highlighted by these authors.
Key Insights shared include: pros/cons of a hand-tuned vISA, memory allocation overheads, multi-tile scaling, event-based profiling, algorithm tuning, measuring of device throughput, developing with 'dualities' to increase portability and performance portability.
In a SYCL Sparkler, developers share details of their implementations as well as key insights (lessons learned). They discuss not only what worked, but what did not workâat least what did not work initially. Learning effective use of any programing technique is boosted by learning the best thought processes to achieve programming results, and this is boosted by learning from the successes and false starts that other experts experience on the road to success. We appreciate the honesty of the authors in exposing their learnings, and happily include such discussions here that would not be explored in depth in most publications.
In this piece, the authors share learnings from a project to create and optimize a SYCL-based GPU backend for Microsoft SEAL. Multiple optimizations are discussed including organizing to benefit from local memory, instruction optimization for modular addition and multiplication operations, and reduce memory allocation costs. Their insights are invaluable lessons discussed in this fascinating implementation. The authors also explore how having two GPU architectures (tiled and not tiled) help them tune their code to be more portable, and that supporting another duality (Linux and Windows) proved invaluable in expanding the test coverage substantially making our code robust sooner.
Homomorphic Encryption (HE) is an emerging encryption scheme that allows computations to be performed directly on encrypted messages. This property provides promising applications such as privacy-preserving deep learning and cloud computing. Prior works have been proposed to enable practical privacy-preserving applications with architectural-aware optimizations on CPUs, CUDA-enabled GPUs, and FPGAs. However, there was no systematic optimization for the whole HE pipeline on Intel GPUs. We present the first-ever SYCL-based GPU backend for Microsoft SEAL APIs. We perform optimizations from instruction level, algorithmic level, and application level to accelerate our HE library based on the Cheon, Kim, Kim, and Song (CKKS) scheme on Intel GPUs. The performance is validated on two experimental (non-production) Intel GPUs.
Author(s): Alexander Lyashevsky, Alexey Titov, Yiqin Qiu, and Yujia Zhai
Publisher: Alexander Lyashevsky, Alexey Titov, Yiqin Qiu, and Yujia Zhai
Year: 2023
Language: English
Pages: 84