Confusion about dispersion entropy outcomes and probabilities #433

rusandris · 2025-01-09T17:39:18Z

Hi!
I came across a strange behaviour related to the Dispersion outcome space.
Following the same example given in Rostaghi, M. and Azami, H. (2016), I get the same symbolic time series as presented in the paper.

using ComplexityMeasures
x=[9,8,1,12,5,-3,1.5,8.01,2.99,4,-1,10]
d = Dispersion(; c = 3, m = 2, τ = 1)
codify(d,x) #[3,  3,  1,  3,  2,  1,  1,  3,  2,  2,  1,  3]

However, when I try to calculate the probabilities of the dispersion patterns with m=2,

probs,outc = probabilities_and_outcomes(d,x)
probs
 Probabilities{Float64,1} over 7 outcomes
 [1, 1]  0.09090909090909091
 [1, 2]  0.18181818181818182
 [1, 3]  0.09090909090909091
 [2, 2]  0.09090909090909091
 [2, 3]  0.18181818181818182
 [3, 1]  0.2727272727272727
 [3, 3]  0.09090909090909091

the output contains patterns that aren't even observed ([1,2]), others appear with the incorrect probability ([1,3]). Is this due to some difference in the definitions/implementation? What am I missing?
Thanks

The text was updated successfully, but these errors were encountered:

Datseris · 2025-01-09T20:11:38Z

cc @kahaaga

(I haven't read the paper)

kahaaga · 2025-01-10T22:36:31Z

@rusandris Hey there!

I did a bit of digging. The implementation here differs from the original implementation in step 2 of the paper. They use positive embedding lags; we use negative embedding lags. Otherwise the implementation is identical.

Sanity check

The critical line is in the definition of τs in counts_and_outcomes(o::Dispersion, ...), where we use -τ instead of τ. (I added a few @show statements to track the codified symbols and dispersion patterns).

function counts_and_outcomes(o::Dispersion, x::AbstractVector{<:Real})
    N = length(x)
    @show symbols = codify(o, x)
    # We must use genembed, not embed, to make sure the zero lag is included
    m, τ = o.m, o.τ
    τs = tuple((x for x in 0:-τ:-(m-1)*τ)...)  # Rostaghi uses 0:+τ:+(m-1)*τ 
    @show dispersion_patterns = genembed(symbols, τs, ones(m)).data 
    cts = fasthist!(dispersion_patterns) # This sorts `dispersion_patterns`
    outs = unique!(dispersion_patterns) # Therefore, outcomes are the sorted patterns.
    c = Counts(cts, (outs, ))
    return c, outcomes(c)
end

If you just put a minus sign in front of your desired lag, you get precisely what they do in the original paper.

julia> x=[9,8,1,12,5,-3,1.5,8.01,2.99,4,-1,10]; d = Dispersion(; c = 3, m = 2, τ = -1); # use -1 lag to get Rostaghi behavior

julia> probs,outc = allprobabilities_and_outcomes(d,x); probs
symbols = codify(o, x) = [3, 3, 1, 3, 2, 1, 1, 3, 2, 2, 1, 3]
dispersion_patterns = (genembed(symbols, τs, ones(m))).data = SVector{2, Int64}[[3, 3], [3, 1], [1, 3], [3, 2], [2, 1], [1, 1], [1, 3], [3, 2], [2, 2], [2, 1], [1, 3]]
 Probabilities{Float64,1} over 9 outcomes
 [1, 1]  0.09090909090909091
 [1, 2]  0.0
 [1, 3]  0.2727272727272727
 [2, 1]  0.18181818181818182
 [2, 2]  0.09090909090909091
 [2, 3]  0.0
 [3, 1]  0.09090909090909091
 [3, 2]  0.18181818181818182
 [3, 3]  0.09090909090909091

That'll also give you the dispersion entropy for their example.

julia> information(Shannon(base = ℯ), d, x)
symbols = [3, 3, 1, 3, 2, 1, 1, 3, 2, 2, 1, 3]
dispersion_patterns = SVector{2, Int64}[[3, 3], [3, 1], [1, 3], [3, 2], [2, 1], [1, 1], [1, 3], [3, 2], [2, 2], [2, 1], [1, 3]]
1.8462202193216335

Reasoning

The reason we use the negative sign on the lag here is for compatibility with Associations.jl, where we explicitly need embedding vectors constructed using negative τ for correctness for use in transfer entropy computation etc. Depending on your application, positive or negative embedding lags may work equally well.

For large enough real data sets, this implementation detail likely won't matter anyways if the goal is to compute the dispersion entropy, since this quantify only cares about the relative frequency of dispersion patterns/embedding vectors.

Solution

This isn't strictly a bug, since we say in the docstring that this outcome space is based on Rostaghi et al, not that it implements it precisely. The choice of the sign of the embedding lag is somewhat arbitrary anyways, and in fact, the particular choice of the positive embedding lag isn't elaborated on in the original paper at all, as far as I can see by quickly skimming the paper again now.

I think the solution here is to add a documentation note explaining this implementation discrepancy. That should be enough to clear up any future confusion, I guess?

rusandris · 2025-01-13T15:25:16Z

Thank you very much for clearing this up for me! Yes, it seems this can be solved by simply adding a note in the docs.

Datseris added the bug Something isn't working label Jan 9, 2025

rusandris changed the title ~~Incorrect dispersion entropy outcomes and probabilities~~ Confusion about dispersion entropy outcomes and probabilities Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about dispersion entropy outcomes and probabilities #433

Confusion about dispersion entropy outcomes and probabilities #433

rusandris commented Jan 9, 2025

Datseris commented Jan 9, 2025

kahaaga commented Jan 10, 2025 •

edited

Loading

rusandris commented Jan 13, 2025

Confusion about dispersion entropy outcomes and probabilities #433

Confusion about dispersion entropy outcomes and probabilities #433

Comments

rusandris commented Jan 9, 2025

Datseris commented Jan 9, 2025

kahaaga commented Jan 10, 2025 • edited Loading

Sanity check

Reasoning

Solution

rusandris commented Jan 13, 2025

kahaaga commented Jan 10, 2025 •

edited

Loading