Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CUDA has two APIs:

1. The runtime api (libcudart.so)

2. The driver api (libcuda.so).

The driver api is very close to the opencl api and is very low level. Most people use the CUDA runtime api which is vastly more convenient. The main difficulty with OpenCl and the driver api is that you have to manually load GPU code onto the device which then returns a handle. You generally have to load the code onto every device which means multiple handles for the same function. This makes executing kernel quiet a lot of work. The runtime api does this all automatically which make programming with CUDA quiet easy since launching a kernel is basically a function call. The CUDA rutime also automatically handles context creation which is another time saver.

When I first learned OpenCL I was shocked at how difficult is was to simply write a simple vector add program since there was all this additional code loading, creating contexts, etc. The setup / boiler plate was greater than the actually code itself.

It basically boils down to convenience in my opinion. Couple this with the fact the NVIDIA generally has the most powerful and energy efficient cards and it's no surprise they took the market.



> The driver api is very close to the opencl api and is very low level.

They are only realistically comparable from OpenCL 2.0 onwards. But no NVIDIA card supports anything beyond 1.2, and with that decision they basically killed OpenCL.


FWIW ROCm, which is what AMD has indicated they will be investing in moving forward, doesn't support OpenCL 2.0 either.


The open conccurent to the runtime api is SYCL.


Except, how many cards are shipping production quality SYSCL drivers, or provide GPGPU SYSCL graphical debuggers?


https://www.codeplay.com/products/computesuite/computecpp Enable SYCL for all openCL devices so Intel, AMD, Nvidia, FPGAs, a lot of things and smartphones which is order of magnitude more devices than CUDA.Products targeting only nvidia devices are mostly niche markets which is pathetic. As for debuggers codexl has been extended to support it.

Except SYCL there's Open{MP/ACC} gpu offloading which become viable and portable. There's also HIP/rocm which transpile to openCL AND CUDA (best of both worlds?) And can transpile CUDA to HIP almost totally automatically. That's how AMD ported tensorflow to openCL.


Smartphones? Which ones?

iOS with its ageing OpenCL drivers, or the new Metal Shader drivers?

Or Android, which Google rather uses their own languages, Renderscript and Halide?

Yes some OEMs do happen to ship non standard Android drivers that also support OpenCL, which require vendor specific SDK to be actually usable, thus not an option versus Renderscript or Halide.

Do you happen to actually know CodePlay? They got their name creating compilers with vectorization optimization for the PS3 and other game consoles.

Their ComputeCpp is a pivot into the GPGPU world and their aren't doing the community edition just from the kindness of their hearts, rather as path into their products.

"If you want to do things with this release, be prepared to be a pioneer. This release is pre-conformance, which means that we do not implement 100% of the SYCL specification. We currently only support Linux and two OpenCL implementations, by Intel and AMD, but wider support is coming. You may find that some unsupported implementations of OpenCL work with ComputeCpp. That's great, but we don't officially support anything else (yet). Most of the open-source libraries being ported to SYCL are not completed yet. This means that you should only check out some of these projects if you want to do some development yourself. We are building a big vision here: large, complex software highly accelerated on a wide range of processors, entirely by open standards. So, please be patient, or work with us."

Feels like it still needs to mature a little bit.

Even Intel, despite their SYSCL contributions to clang (experimental release last 31st July), has been developing in parallel their own extensions, Data Parallel C++, that no one knows in what form will they contribute back to Khronos, if at all.

Meanwhile CUDA has been developed to be language agnostic from the get go, with out of the box support for C, C++, Fortran. Now with Julia, Haskell, Java, .NET support as well.

While Khronos kept banging the C is good enough message until it was too late for vendors to actually care about SPIR-V.


Have you compared performance between your suggested solutions and what can be achieved using hardware vendor platforms? If not then whats kind of pathetic is how quickly you dismiss the people above who say the HAVE done this before.

If you have seen something we have not when it comes to performance then please by all means share it so we can learn!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: