You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having a categoric feature with exactly 32k+1 n.o. categories, for any positive integer k, causes fatal exception with device_type="cuda". It works fine for "cpu".
Reproducible example
import lightgbm as lgb
import numpy as np
X = np.random.randint(0, 97, (1000, 1)) # 97, or 32k+1 for any positive integer k triggers the bug
y = np.random.uniform(-1, 1, 1000)
lgb.train({"device_type": "cuda"}, lgb.Dataset(X, y, categorical_feature=[0]))
Output:
[LightGBM] [Warning] Using sparse features with CUDA is currently not supported.
[LightGBM] [Info] Total Bins 97
[LightGBM] [Info] Number of data points in the train set: 1000, number of used features: 1
[LightGBM] [Info] Start training from score -0.009523
[LightGBM] [Fatal] [CUDA] invalid argument .../LightGBM4.5.0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu 235
---------------------------------------------------------------------------
LightGBMError Traceback (most recent call last)
Cell In[2], line 5
3 X = np.random.randint(0, 97, (1000, 1)) # 97, or 32k+1 for any positive integer k triggers the bug
4 y = np.random.uniform(-1, 1, 1000)
----> 5 lgb.train({'device_type': 'cuda'}, lgb.Dataset(X, y, categorical_feature=[0]))
File .../lib/python3.12/site-packages/lightgbm/engine.py:307, in train(params, train_set, num_boost_round, valid_sets, valid_names, feval, init_model, feature_name, categorical_feature, keep_training_booster, callbacks)
295 for cb in callbacks_before_iter:
296 cb(
297 callback.CallbackEnv(
298 model=booster,
(...)
304 )
305 )
--> 307 booster.update(fobj=fobj)
309 evaluation_result_list: List[_LGBM_BoosterEvalMethodResultType] = []
310 # check evaluation result.
File .../lib/python3.12/site-packages/lightgbm/basic.py:4135, in Booster.update(self, train_set, fobj)
4133 if self.__set_objective_to_none:
4134 raise LightGBMError("Cannot update due to null objective function.")
-> 4135 _safe_call(
4136 _LIB.LGBM_BoosterUpdateOneIter(
4137 self._handle,
4138 ctypes.byref(is_finished),
4139 )
4140 )
4141 self.__is_predicted_cur_iter = [False for _ in range(self.__num_dataset)]
4142 return is_finished.value == 1
File .../lib/python3.12/site-packages/lightgbm/basic.py:296, in _safe_call(ret)
288 """Check the return value from C API call.
289
290 Parameters
(...)
293 The return value from C API calls.
294 """
295 if ret != 0:
--> 296 raise LightGBMError(_LIB.LGBM_GetLastError().decode("utf-8"))
LightGBMError: [CUDA] invalid argument .../LightGBM4.5.0/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu 235
Tried on multiple GPUs (but same machine). I happened to have 97 categories in my use case. I then ran a loop testing with all n.o. categories up to 100, finding that 33 and 65 fails as well.
The text was updated successfully, but these errors were encountered:
jameslamb
changed the title
Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda"
[CUDA] Categoric feature with 32k+1 n.o. categories causes fatal exception with device_type="cuda"Jan 12, 2025
Description
Having a categoric feature with exactly 32k+1 n.o. categories, for any positive integer k, causes fatal exception with
device_type="cuda"
. It works fine for"cpu"
.Reproducible example
Output:
Environment info
LightGBM version or commit hash: 4.5.0
Command(s) you used to install LightGBM
Ubuntu 24.04 LTS
Nvidia RTX 4090 GPU
Additional Comments
Tried on multiple GPUs (but same machine). I happened to have 97 categories in my use case. I then ran a loop testing with all n.o. categories up to 100, finding that 33 and 65 fails as well.
The text was updated successfully, but these errors were encountered: