Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Inconsistent Data Displayed in Flamegraph #2483

Closed
mossini-smeup opened this issue Oct 28, 2024 · 7 comments · Fixed by #2571
Closed

[Bug]: Inconsistent Data Displayed in Flamegraph #2483

mossini-smeup opened this issue Oct 28, 2024 · 7 comments · Fixed by #2571

Comments

@mossini-smeup
Copy link

What happened?

When analyzing the trace flamegraph, some data appears to be inaccurate. Specifically, when there are two spans at the same level in the flamegraph, switching to the "trace timeline" view and then returning to the flamegraph causes the data to change. The duration of the spans seems to increase upon revisiting the flamegraph.

Steps to reproduce

  1. Open the trace flamegraph for a trace containing two spans at the same level.
  2. Note the duration and positioning of the spans.
  3. Switch to the "trace timeline" view.
  4. Return to the flamegraph view.
  5. Observe that the duration and/or positioning of the spans have changed.

Expected behavior

Switching between views causes changes in the flamegraph display, showing inaccurate durations for the spans.

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

v1.53.0

SDK

Jaeger v1.53.0
Commit b620f0e
Build 2024-01-08T18:05:40Z
Jaeger UI v1.37.0

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

Kubernetes

Deployment configs

No response

@yurishkuro
Copy link
Member

Can you attach a trace that reproduces the bug? You can use anonymizer from the main repository to clean up sensitive data.

@mossini-smeup
Copy link
Author

Here's the trace in json format trace.zip

@MAVRICK-1
Copy link

@yurishkuro can i work on this issue ?

@yurishkuro
Copy link
Member

@MAVRICK-1 sure, feel free, you don't have to ask

@Zen-cronic
Copy link
Contributor

Zen-cronic commented Jan 8, 2025

here's my finding:

Visual Trace

There's a pattern to how the span gets its data modified:

  1. Take note of the span with 8.17s duration. (It's labelDetail is me-java-rpgle-std::SELECT postgres.MU24020F)
    Image

  2. After navigating to Trace Framegraph, you can see the duration of another span (me-java-rpgle-std::CHAIN Execution): 0.10min.
    Image

  3. Go back to the timeline view, and the duration of the second span (CHAIN Execution) has increased to 12.14s (start time stays the same).
    Image

  4. Back to the flamegraph and the duration has increased to 0.30min (with the span bar elongating):
    Image

  5. With each navigation between the timeline and flamegraph view, this duration value keeps increasing:

  • Timeline view (s): 8.17 > 12.14 > 24.28 > 36.42 > 48.55 > ...
  • Flamegraph view (min): 0.10 > 0.30 > 0.51 > 0.71 > 0.91 > ...

Interestingly, this bug doesn't happen when switching between other graphs/tables - only with Flamegraph. This led to the hypothesis that something from the flamegraph related logic kept updating the span's duration.

Investigation

So I logged the convertedProfile variable that holds the flamegraph data via a simple log in the TraceFlamegraph component.

  useEffect(() => {
    console.log('convertedProfile:', convertedProfile)
  
  }, [convertedProfile])

With each navigation to the Trace Flamegraph view page, the convertedProfile object is logged. At first glance, it looks like the same object is being logged, but I diffed the objects with each render. Here's what's changed:

The objects are of Profile type from @pyroscope/models/src.

Diff

A is the object on the first Flamegraph render, B is on the second render.
The levels prop is an [][]number. In array index 7, the elements of index 1, 2, 5, 6 are increased to the shown values.
In array index 8, elements of index 4 and 8 are increased to the shown values.

updatedDiff A & B: {
  flamebearer: {
    levels:  {
      '7': {
        '1': 59517368,
        '2': 51352103,
        '5': 115280680,
        '6': 115276374
      },
      '8': { '4': 51352103, '8': 115276374 }
    }
  }
}

B is the object on the second Flamegraph render, C is on the third render.
The same elements are updated as the above, but to even higher values.

updatedDiff B & C: {
  flamebearer:  {
    levels: {
      '7': {
        '1': 64921126,
        '2': 56755861,
        '5': 127414376,
        '6': 127410070
      },
      '8': { '4': 56755861, '8': 127410070 }
    }
  }
}

Analysis

What's worth noting is that the increase from the previous render to the current render is constant (2 different values based on the nested indicies).

Increase by 5403758

  • B.levels.7.1 - A.levels.7.1 = 59517368 - 54113610 = 5403758.
  • B.levels.7.2 - A.levels.7.2 = 5403758.
  • B.levels.8.4 - A.levels.8.4 = 5403758.

Increase by 12133696

  • B.levels.7.5 - A.levels.7.5 = 115280680 - 103146984 = 12133696.
  • B.levels.7.6 - A.levels.7.6 = 12133696.
  • B.levels.8.8 - A.levels.8.8 = 12133696.
updatedDiff N-1 & N: {
  flamebearer:  {
    levels: {
      '7': {
        '1': xx, // 5403758 difference
        '2': xx, // 5403758 difference
        '5': xx, // 12133696 difference
        '6': xx // 12133696 difference
      },
      '8': { 
          '4': xx, // 5403758 difference
          '8': xx // 12133696 difference
       }
    }
  }
}

Conclusion

Therefore, the culprit is either FlamegraphRenderer or convertJaegerTraceToProfile that's mutating the nested array elements in levels - both are from the @pyroscope/flamegraph pkg.

@Zen-cronic
Copy link
Contributor

this issue could be part of #2534

@yurishkuro
Copy link
Member

@Zen-cronic would you like to attempt a fix? It seems the easiest solution is to clone the trace before passing to a 3rd party library like pyroscope if it's already known to mutate the data. That would solve the back & forth problem, but will not solve the issue with pyroscope showing incorrect number (8s > .10m). For the latter, maybe it's a bug in the library that can be fixed by upgrading to the latest version (auto-upgrade keeps failing).

Overall though, pyroscope is no longer supported and we should move off of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants