Home

RunLengthArrays.jl

This small Julia package implements a new type, RunLengthArray{N,T}, that efficiently encodes an array containing long sequences of repeated values, like the following example:

6 6 6 6 4 4 4 4 4 4 4 10 10 10 10 10 10 …

The RunLengthArray{N,T} type uses a run-length encoding algorithm.

The following program shows its usage:

original = Int64[3, 3, 3, 7, 7, 7, 7];
compressed = CMBSim.RunLengthArray{Int,Int64}(original);

for elem in compressed:
    println(elem)
end

println("The sum of the elements is ", sum(compressed))
println("The 3rd element is ", compressed[3])

push!(compressed, 7)

The type RunLengthArray derives from AbstractArray.

Creating an array

You can create an instance of RunLengthArray{N,T} using one of its three constructors:

The types N and T encode the types of the counter and of the value, respectively.

The third constructor is the most interesting. Suppose that you want to store a sequence containing four instances of the number 4.6, followed by five instances of the number 5.7:

4.6 4.6 4.6 4.6   5.7 5.7 5.7 5.7 5.7

You can achieve this result using one of the two following forms:

# Second constructor
x = RunLengthArray{Int,Float64}([4.6, 4.6, 4.6, 4.6, 5.7, 5.7, 5.7, 5.7, 5.7])

# Third constructor (preferred in this case)
y = RunLengthArray{Int,Float64}([4, 5], [4.6, 5.7])

Operations on run-length arrays

The package defines specialized versions of a few functions in Julia's library, in order to make them more efficient. Here is a list of them:

Calling this functions to run-length arrays can be significantly faster. Here is an example:

julia> compr = RunLengthArray{Int, Float64}([100000, 200000], [1.1, 6.6]);

julia> uncompr = collect(compr);

julia> @benchmark minimum(compr)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     37.372 ns (0.00% GC)
  median time:      39.660 ns (0.00% GC)
  mean time:        48.777 ns (14.87% GC)
  maximum time:     52.816 μs (99.91% GC)
  --------------
  samples:          10000
  evals/sample:     992

julia> @benchmark minimum(uncompr)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     751.926 μs (0.00% GC)
  median time:      776.371 μs (0.00% GC)
  mean time:        788.463 μs (0.00% GC)
  maximum time:     1.094 ms (0.00% GC)
  --------------
  samples:          6326
  evals/sample:     1

Modifying run-length arrays

There are only three operations that can modify a run-length array:

The push! function can be used in two ways:

Here is an example:

x = RunLengthArray{Int, Float64}()
push!(x, (4, 1.1))
push!(x, (6, 2.2))

# The above array is equal to the following:
y = RunLengthArray{Int, Float64}([4, 6], [1.1, 2.2])

API documentation

An array of objects of type T encoded using run-length compression.

This works like an array, but it is designed to be extremely efficient when the array is long and consists of long repetitions («runs») of the same elements. The length of each run is held using type N; this type should be chosen according to the expected maximum length of a run.

Once a RunLengthArray has been created, new elements can be added only at the end of it, using push! (add one element) or append! (add a sequence of elements).

arr = RunLengthArray{Int, Int8}(Int8[3, 3, 3, 7, 7, 7, 7, 7, 7, 4])

longarr = collect(arr)

@assert length(arr) == 10
@assert sum(arr) == 30
@assert arr[1] == 3
@assert arr[2] == 3

println("$(sizeof(arr))")  # Prints 16 (bytes)
println("$(sizeof(longarr))")  # Prints 56 (bytes)
source
runs(arr::RunLengthArray{N,T})

Returns a Array{N,1} type containing the length of each run in the run-length array.

x = RunLengthArray{Int,String}(["X", "X", "X", "O", "O"])
@assert runs(x) == Int[3, 2]

To get a list of the values, use values.

source
Base.append!Method.
append!(arr::RunLengthArray{N,T}, otherarray)

Append an array to a RunLengthArray, modifying it in place.

source
Base.push!Function.
push!(arr::RunLengthArray{N,T}, newval::T) where {N <: Number,T}
push!(arr::RunLengthArray{N,T}, newrun::Tuple{N,T}) where {N <: Number,T}

Append a element or a run of elements to the end of a RunLengthArray. In the second case, the parameter newrun must be a tuple containing the number of repetitions n and the value v: (n, v).

arr = RunLengthArray{Int, Float64}([3, 2], [1.1, 6.5])
@assert length(arr) == 5

# Add 6 instances of the number "1.4" at the end of the array
push!(arr, (6, 1.4))

@assert length(arr) == 11
Base.push!

`

source

Index