-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lambda Snapstart kafka connection errors #2715
Comments
Our Kafka support does not support snapstart or CRAC. How Kafka works makes it very hard to snapshot it. I would recommend, for safety reasons, to only initialize after the restore. |
Thanks for your reply! There is an initial apache/kafka#13619 which tries to handle CRaC but it looks like there is not that much interest.
Exactly, this is what I want. I do not need a snapshot of a working kafka client, I need a method to call on the kafka client so that it reconnects and/or verifies current connections. This would remove old connections which are gone (because they were there during the snapshot) and create a new. Can you point me to a method which I could call in the afterRestore method? |
You cannot use reactive messaging, but you can create a low-level Kafka client in the afterRestore, or create a lazy producer and not use it during the snapshot phase (so basically, initialize it during the first HTTP call) |
Hm yeah, but I still want to use this dependency... |
If it's only to produce, you can use the lazy feature (@ogunalp it should delay the initialization of the producer right?) |
Indeed, I forgot about the |
Thanks, I'll try it with |
lazy-client worked! Thanks. Snapshot is created without a kafka connection, so there is no exception anymore. |
Sorry, closed it. @ozangunalp mentioned a test with pausable-channels. |
@hamburml is there a test repository that you can share? |
@ozangunalp not right now. I try to prepare one this evening. My service only writes into a |
Describe the bug
copy from quarkusio/quarkus#42286
maybe here is a better place :)
Hi,
we use snapstart on our quarkus lambdas. Some of them use smallrye-messaging to write or receive messages from a kafka. This works as expected unfortunately in our logs we have some warnings that the connection to a kafka node was lost either to auth error or firewall blocking.
Afaik during the init phase the whole memory of a started quarkus lambda is stored and when the lambda is reused reloaded into the memory to skip the init phase. That also means that pooled connections are "stored" but in reality are already closed.
Now I thought i simply need to close all open kafka connections before the snapshot is created. I did this with a org.crac.Resource and the beforeCheckpoint method. Now the warnings in the log are gone but it looks like no new connections are initiated and therefore all messages send via a channel fail. I also used KafkaProducer::flush but that didnt help.
Any ideas?
I found quarkusio/quarkus#31401 which is the same issue but with database connections.
The text was updated successfully, but these errors were encountered: