[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with `device_type="cuda"` #6784

zansibal · 2025-01-11T22:03:41Z

Description

Having a categoric feature with exactly 32k+1 n.o. categories, for any positive integer k, causes fatal exception with device_type="cuda". It works fine for "cpu".

Reproducible example

import lightgbm as lgb
import numpy as np
X = np.random.randint(0, 97, (1000, 1)) # 97, or 32k+1 for any positive integer k triggers the bug
y = np.random.uniform(-1, 1, 1000)
lgb.train({"device_type": "cuda"}, lgb.Dataset(X, y, categorical_feature=[0]))

Output:

[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Info] Total Bins 97
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 1
[LightGBM] [Info] Start training from score -0.009523
[LightGBM] [Fatal] [CUDA] invalid argument .../LightGBM4.5.0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu 235

---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
Cell In[2], line 5
      3 X = np.random.randint(0, 97, (1000, 1)) # 97, or 32k+1 for any positive integer k triggers the bug
      4 y = np.random.uniform(-1, 1, 1000)
----> 5 lgb.train({'device_type': 'cuda'}, lgb.Dataset(X, y, categorical_feature=[0]))

File .../lib/python3.12/site-packages/lightgbm/engine.py:307, in train(params, train_set, num_boost_round, valid_sets, valid_names, feval, init_model, feature_name, categorical_feature, keep_training_booster, callbacks)
    295 for cb in callbacks_before_iter:
    296     cb(
    297         callback.CallbackEnv(
    298             model=booster,
   (...)
    304         )
    305     )
--> 307 booster.update(fobj=fobj)
    309 evaluation_result_list: List[_LGBM_BoosterEvalMethodResultType] = []
    310 # check evaluation result.

File .../lib/python3.12/site-packages/lightgbm/basic.py:4135, in Booster.update(self, train_set, fobj)
   4133 if self.__set_objective_to_none:
   4134     raise LightGBMError("Cannot update due to null objective function.")
-> 4135 _safe_call(
   4136     _LIB.LGBM_BoosterUpdateOneIter(
   4137         self._handle,
   4138         ctypes.byref(is_finished),
   4139     )
   4140 )
   4141 self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
   4142 return is_finished.value == 1

File .../lib/python3.12/site-packages/lightgbm/basic.py:296, in _safe_call(ret)
    288 """Check the return value from C API call.
    289 
    290 Parameters
   (...)
    293     The return value from C API calls.
    294 """
    295 if ret != 0:
--> 296     raise LightGBMError(_LIB.LGBM_GetLastError().decode("utf-8"))

LightGBMError: [CUDA] invalid argument .../LightGBM4.5.0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu 235

Environment info

LightGBM version or commit hash: 4.5.0

Command(s) you used to install LightGBM

sudo apt install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
sudo apt install nvidia-cuda-toolkit
git clone --recursive https://github.com/microsoft/LightGBM LightGBM-4.5.0
cd LightGBM
git reset --hard 3f7e6081275624edfca1f9b3096bea7a81a744ed # version 4.5.0
mkdir build
cd build
cmake -DUSE_GPU=1 -DUSE_CUDA=1 -DCMAKE_C_COMPILER=/usr/bin/gcc-12 -DCMAKE_CXX_COMPILER=/usr/bin/g++-12 ..
make -j$(nproc)
cd ..
sudo apt install python3-pip
pip install setuptools numpy scipy scikit-learn -U
sh ./build-python.sh install --precompile

Ubuntu 24.04 LTS
Nvidia RTX 4090 GPU

Additional Comments

Tried on multiple GPUs (but same machine). I happened to have 97 categories in my use case. I then ran a loop testing with all n.o. categories up to 100, finding that 33 and 65 fails as well.

The text was updated successfully, but these errors were encountered:

jameslamb changed the title ~~Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda"~~ [CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda" Jan 12, 2025

jameslamb added question bug and removed question labels Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with `device_type="cuda"` #6784

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with `device_type="cuda"` #6784

zansibal commented Jan 11, 2025 •

edited

Loading

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda" #6784

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda" #6784

Comments

zansibal commented Jan 11, 2025 • edited Loading

Description

Reproducible example

Environment info

Additional Comments

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with `device_type="cuda"` #6784

[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with `device_type="cuda"` #6784

zansibal commented Jan 11, 2025 •

edited

Loading