Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] [R-package] Installing a GPU-enabled Build in R --- amd gpu. #6786

Open
nipnipj opened this issue Jan 13, 2025 · 7 comments
Open

[GPU] [R-package] Installing a GPU-enabled Build in R --- amd gpu. #6786

nipnipj opened this issue Jan 13, 2025 · 7 comments

Comments

@nipnipj
Copy link

nipnipj commented Jan 13, 2025

I'm using Archlinux and I'm trying to use the following command lines to install lightgbm in R.
I already installed opencl-amd from AUR.

git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
Rscript build_r.R --use-gpu --no-build-vignettes

Then I get the following error message

-- Looking for CL_VERSION_3_0
-- Looking for CL_VERSION_3_0 - not found
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find OpenCL (missing: OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake/Modules/FindOpenCL.cmake:177 (find_package_handle_standard_args)
  CMakeLists.txt:190 (find_package)

They output of clinfo --list is:

Platform #0: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
Platform #1: rusticl
 `-- Device #0: AMD Radeon RX 6650 XT (radeonsi, navi23, LLVM 19.1.6, DRM 3.59)
Platform #2: AMD Accelerated Parallel Processing
 `-- Device #0: gfx1032

I'm wodering whether there is a way to indicate the gpu platform and device to this Rscript build_r.R and try to build LGBM. Or what's the correct way to build LGBM in R with gpu support when using an AMD gpu.

On other applications I can indicate what gpu platform and device I want to use. Thus I can confirm the the GPU is working.

@jameslamb
Copy link
Collaborator

Thanks for using LightGBM. Let's try to narrow this further... are you able to build lib_lightgbm like this?

cmake -B build -S .
cmake --build build --target _lightgbm

If yes, then we'll know the issue is specific to the R package.

Either way, could you share the logs from running those commands?

@jameslamb jameslamb changed the title Installing a GPU-enabled Build in R --- amd gpu. [GPU] [R-package] Installing a GPU-enabled Build in R --- amd gpu. Jan 13, 2025
@nipnipj
Copy link
Author

nipnipj commented Jan 13, 2025

This is the output after using cmake -B build -S .

CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find OpenCL (missing: OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake/Modules/FindOpenCL.cmake:177 (find_package_handle_standard_args)
  CMakeLists.txt:190 (find_package)

@nipnipj
Copy link
Author

nipnipj commented Jan 13, 2025

When I try to write a path

Rscript build_r.R \
    --use-gpu \
    --no-build-vignettes \
    --opencl-library=/usr/lib/libOpenCL.so

this is the output

Error in source("install.libs.R", local = local.env) : 
  install.libs.R:143:70: unexpected string constant
142: # NOTE: build_r.R replaces the line below
143: command_line_args <- c('-DOpenCL_LIBRARY='/usr/lib/libOpenCL.so.1.0.0''

@nipnipj
Copy link
Author

nipnipj commented Jan 13, 2025

I also tried cmake -B build -S . -DUSE_GPU=1 -DCMAKE_PREFIX_PATH=/opt/rocm/ which I think work.

CMake Warning (dev) at CMakeLists.txt:196 (find_package):
  Policy CMP0167 is not set: The FindBoost module is removed.  Run "cmake
  --help-policy CMP0167" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Using _mm_prefetch
-- Using _mm_malloc
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/cuysaurus/LightGBM/build

Now, cmake --build build --target _lightgbm seems to works too. I'll confirm.
It didn't

[ 72%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/data_parallel_tree_learner.cpp.o
In file included from /opt/rocm/include/CL/cl.h:20,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/cl.hpp:19,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/config.hpp:16,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/buffer.hpp:14,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/core.hpp:18,
                 from /home/cuysaurus/LightGBM/src/treelearner/gpu_tree_learner.h:33,
                 from /home/cuysaurus/LightGBM/src/treelearner/parallel_tree_learner.h:15,
                 from /home/cuysaurus/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:9:
/opt/rocm/include/CL/cl_version.h:21:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
   21 | agma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
      |                                                                                                     ^

In file included from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/program.hpp:35,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/kernel.hpp:24,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/memory_object.hpp:16,
                 from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/buffer.hpp:17:
/home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/detail/sha1.hpp: In member function ‘boost::compute::detail::sha1::operator std::string()’:
/home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/detail/sha1.hpp:41:26: error: cannot convert ‘unsigned int [5]’ to ‘unsigned char (&)[20]’
   41 |             h.get_digest(digest);
      |                          ^~~~~~
      |                          |
      |                          unsigned int [5]
In file included from /home/cuysaurus/LightGBM/external_libs/compute/include/boost/compute/detail/sha1.hpp:18:
/usr/include/boost/uuid/detail/sha1.hpp:179:43: note:   initializing argument 1 of ‘void boost::uuids::detail::sha1::get_digest(unsigned char (&)[20])’
  179 | inline void sha1::get_digest(digest_type& digest)
      |                              ~~~~~~~~~~~~~^~~~~~
make[3]: *** [CMakeFiles/lightgbm_objs.dir/build.make:415: CMakeFiles/lightgbm_objs.dir/src/treelearner/data_parallel_tree_learner.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:96: CMakeFiles/lightgbm_objs.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:200: CMakeFiles/_lightgbm.dir/rule] Error 2
make: *** [Makefile:208: _lightgbm] Error 2

@sofiageo
Copy link

@nipnipj

It looks to me that boost package for compute in Arch is broken until 1.87 is released. (I'm the opencl-amd maintainer and I tried to build it too)

@jameslamb
Copy link
Collaborator

On other applications I can indicate what gpu platform and device I want to use.

Can you provide specific details about this? Maybe those things you're referring to would be useful references for LightGBM to follow, but difficult to say without specific examples.

install.libs.R:143:70: unexpected string constant

This is a known issue, tracked in #5960. If you have time and interest, we'd welcome a contribution to fix it... for both Unix-like operating systems and Windows.

It looks to me that boost package for compute in Arch is broken until 1.87 is released.

LightGBM doesn't currently support boost versions that new. We'd welcome help with that if either of you are interested:

So depending on what "broken" means, I guess you won't be able to build the OpenCL-based GPU variant of the {lightgbm} R package on Arch Linux until LightGBM supports newer versions of boost AND boost 1.87 comes out.

It would also be helpful if you could share links to relevant discussions / bug reports we could follow, to better understand what "broken" means.

I'm the opencl-amd maintainer and I tried to build it too

@sofiageo thanks for commenting! What is "it" in this statement? Some variant of LightGBM (the R package? just the shared library?) or something else?

If it's LightGBM, we'd welcome some help to improve this if you have time and interest. One other thing to note... in LightGBM's own CI, we are still using PoCL 1.8, which at this point is over 4 years old (#5596).

@nipnipj
Copy link
Author

nipnipj commented Jan 19, 2025

Can you provide specific details about this? Maybe those things you're referring to would be useful references for LightGBM to follow, but difficult to say without specific examples.

I tried to say that I can use my gpu for "running models". For example, in Stan. See this example (https://mc-stan.org/cmdstanr/articles/articles-online-only/opencl.html). Here we can select a platform and a device via argument opencl_ids = c(0, 0).

A list of available platforms and devices is shown with command clinfo --list in terminal.

In case of LGBM, using the gpu is easy when using nvidia. But i still can't make it work with an amd gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants