-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about dispersion entropy outcomes and probabilities #433
Comments
cc @kahaaga (I haven't read the paper) |
@rusandris Hey there! I did a bit of digging. The implementation here differs from the original implementation in step 2 of the paper. They use positive embedding lags; we use negative embedding lags. Otherwise the implementation is identical. Sanity checkThe critical line is in the definition of function counts_and_outcomes(o::Dispersion, x::AbstractVector{<:Real})
N = length(x)
@show symbols = codify(o, x)
# We must use genembed, not embed, to make sure the zero lag is included
m, τ = o.m, o.τ
τs = tuple((x for x in 0:-τ:-(m-1)*τ)...) # Rostaghi uses 0:+τ:+(m-1)*τ
@show dispersion_patterns = genembed(symbols, τs, ones(m)).data
cts = fasthist!(dispersion_patterns) # This sorts `dispersion_patterns`
outs = unique!(dispersion_patterns) # Therefore, outcomes are the sorted patterns.
c = Counts(cts, (outs, ))
return c, outcomes(c)
end If you just put a minus sign in front of your desired lag, you get precisely what they do in the original paper. julia> x=[9,8,1,12,5,-3,1.5,8.01,2.99,4,-1,10]; d = Dispersion(; c = 3, m = 2, τ = -1); # use -1 lag to get Rostaghi behavior
julia> probs,outc = allprobabilities_and_outcomes(d,x); probs
symbols = codify(o, x) = [3, 3, 1, 3, 2, 1, 1, 3, 2, 2, 1, 3]
dispersion_patterns = (genembed(symbols, τs, ones(m))).data = SVector{2, Int64}[[3, 3], [3, 1], [1, 3], [3, 2], [2, 1], [1, 1], [1, 3], [3, 2], [2, 2], [2, 1], [1, 3]]
Probabilities{Float64,1} over 9 outcomes
[1, 1] 0.09090909090909091
[1, 2] 0.0
[1, 3] 0.2727272727272727
[2, 1] 0.18181818181818182
[2, 2] 0.09090909090909091
[2, 3] 0.0
[3, 1] 0.09090909090909091
[3, 2] 0.18181818181818182
[3, 3] 0.09090909090909091 That'll also give you the dispersion entropy for their example. julia> information(Shannon(base = ℯ), d, x)
symbols = [3, 3, 1, 3, 2, 1, 1, 3, 2, 2, 1, 3]
dispersion_patterns = SVector{2, Int64}[[3, 3], [3, 1], [1, 3], [3, 2], [2, 1], [1, 1], [1, 3], [3, 2], [2, 2], [2, 1], [1, 3]]
1.8462202193216335 ReasoningThe reason we use the negative sign on the lag here is for compatibility with Associations.jl, where we explicitly need embedding vectors constructed using negative For large enough real data sets, this implementation detail likely won't matter anyways if the goal is to compute the dispersion entropy, since this quantify only cares about the relative frequency of dispersion patterns/embedding vectors. SolutionThis isn't strictly a bug, since we say in the docstring that this outcome space is based on Rostaghi et al, not that it implements it precisely. The choice of the sign of the embedding lag is somewhat arbitrary anyways, and in fact, the particular choice of the positive embedding lag isn't elaborated on in the original paper at all, as far as I can see by quickly skimming the paper again now. I think the solution here is to add a documentation note explaining this implementation discrepancy. That should be enough to clear up any future confusion, I guess? |
Thank you very much for clearing this up for me! Yes, it seems this can be solved by simply adding a note in the docs. |
Hi!
I came across a strange behaviour related to the
Dispersion
outcome space.Following the same example given in Rostaghi, M. and Azami, H. (2016), I get the same symbolic time series as presented in the paper.
However, when I try to calculate the probabilities of the dispersion patterns with m=2,
the output contains patterns that aren't even observed ([1,2]), others appear with the incorrect probability ([1,3]). Is this due to some difference in the definitions/implementation? What am I missing?
Thanks
The text was updated successfully, but these errors were encountered: