Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterate callstack API #4033

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

g0djan
Copy link
Contributor

@g0djan g0djan commented Jan 17, 2025

New WAMR public API to iterate over the runtime call stack frames and execute a user defined callback on those.
To make the most use of it use next APIs inside of the callback.

CAUTION: this APIs is not thread safe and not intended to be. If you need to call it from another thread ensure the passed exec_env is suspended.

Our use case

Sometimes WAMR runtime gets stuck in production and we have no data where in the code compiled to WASM it happens. We currently only track such situations in a separate native thread. To increase visibility into the problem we developed internal solution that requires presence of this API in WAMR. If a separate thread finds that the WASM VM thread has stuck, it interrupts it with a user defined signal and calls this API to collect callstack. The main complexity is maintaining async-signal-safety and avoiding segfaults. For that we're maintaining atomic copies of exec_env, exec_env->module_inst, exec_env->module_inst->module. Those copies are always set to NULL before the referenced memory is freed. Before a call to this API those copies are always checked for validity. In our use case scenario we guarantee ourselves only absence of crashes but we realize that the frame data that we collect might be invalidated due to a signal interruption. However it's highly unlikely and is not a concern for us.

Have we tried existing WAMR APIs for our usecase?

Yes, we've tried suggested by maintainers wasm_cluster_suspend_thread and wasm_runtime_terminate.

  1. In our production runtime often recovers from being stuck, so wasm_runtime_terminate is not a good option for us to report the call stack
  2. The wasm_cluster_suspend_thread doesn't suit us either. Even if it did we'd still need API to iterate over stackframes.

@@ -10,6 +10,7 @@
#include "../common/wasm_runtime_common.h"
#include "../common/wasm_memory.h"
#include "../interpreter/wasm_runtime.h"
#include <stdbool.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wamr defines its own boolean type, you should use that instead of stdbool

* For more details check wasm_iterate_callstack in wasm_export.h
*/
if (!is_tiny_frame(exec_env)) {
//TODO: support standard frames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we print an error/warning here? Or ideally, can we provide an implementation of that? I think there's already a code in this file to iterate through both types of frames so can be used as a reference

return;
}

AOTTinyFrame* frame = (AOTTinyFrame*)(top - sizeof(AOTTinyFrame));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to iterate from top to bottom or from bottom to top? bottom pointer is constant but the top one might change during the signal; so if you iterate from top, you might actually start with some very high address if you're unluckly because the pointer will be invalid.

@@ -864,6 +867,37 @@ wasm_runtime_create_exec_env(wasm_module_inst_t module_inst,
WASM_RUNTIME_API_EXTERN void
wasm_runtime_destroy_exec_env(wasm_exec_env_t exec_env);


typedef bool (*wasm_frame_callback)(void*, wasm_frame_ptr_t);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to document the meaning of the bool return type.

* - exec_env->module_inst
* - exec_env->module_inst->module
*
* Note for devs: please refrain from such modifications inside of this call
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just keep it in the .c file, no need to expose it publicly.

@@ -26,6 +26,8 @@ typedef struct WASMInterpFrame {
/* Instruction pointer of the bytecode array. */
uint8 *ip;

uint32 func_index;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need it? we already have function above.

* - exec_env->module_inst->module
*
* Note for devs: please refrain from such modifications inside of this call
* - any allocations/freeing memory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's up to the developer to provide a safe environment to do these actions. I may as well call this from within wasm code via a native api just to print call stack arbitrarily while it's perfectly safe to malloc anything. I think the only thing this api needs to document/ensure is that it's async-signal-safe to traverse the call stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants