Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential optimization for CASE WHEN for protecting against divide by zero #11570

Open
andygrove opened this issue Jul 20, 2024 · 3 comments · May be fixed by #14200
Open

Potential optimization for CASE WHEN for protecting against divide by zero #11570

andygrove opened this issue Jul 20, 2024 · 3 comments · May be fixed by #14200
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers optimizer Optimizer rules performance Make DataFusion faster

Comments

@andygrove
Copy link
Member

andygrove commented Jul 20, 2024

Is your feature request related to a problem or challenge?

A common usage of CASE WHEN (particularly in the TPC-DS benchmark) is to protect against divide by zero errors. For example:

CASE WHEN y > 0 THEN x / y ELSE null END

The CaseExpr implementation is quite expensive and we could replace this whole expression with a divide kernel that returns null if the right hand side is zero (Rust has a div_checked function that already provides this functionality).

arrow-rs already has the following code in arrow-array/src/arithmetic.rs, which is really close to what we need. I think we just need a version that returns an Option::None instead of an Err.

#[inline]
fn div_checked(self, rhs: Self) -> Result<Self, ArrowError> {
    if rhs.is_zero() {
        Err(ArrowError::DivideByZero)
    } else {
        self.checked_div(rhs).ok_or_else(|| {
            ArrowError::ComputeError(format!(
                "Overflow happened on: {:?} / {:?}",
                self, rhs
            ))
        })
    }
}

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@andygrove andygrove added enhancement New feature or request optimizer Optimizer rules performance Make DataFusion faster good first issue Good for newcomers labels Jul 20, 2024
@irenjj
Copy link
Contributor

irenjj commented Jul 20, 2024

take

@irenjj
Copy link
Contributor

irenjj commented Jul 28, 2024

I have tried to solve this problem, but encountered some issues: Should we simplify the when_expr and then_expr in case_when_no_expr to div_checked? If it should be handled in this function, how can we recognize it as div_checked?

@andygrove
Copy link
Member Author

@irenjj I think it would be a case of writing code like this to detect this particular pattern:

            } else if when_then_expr.len() == 1 {
                let when_then = &when_then_expr[0];
                if let Some(b) = when_then.1.as_any().downcast_ref::<BinaryExpr>() {
                    if b.op() == Operator::Gt {
                        if let Some(lit) = when_then.1.as_any().downcast_ref::<Literal>() {
                            if matches!(lit.value(), ScalarValue::Int32(Some(0))) {
                                
                            }
                        }
                    }
                }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers optimizer Optimizer rules performance Make DataFusion faster
Projects
None yet
2 participants