RunLengthArrays.jl
This small Julia package implements a new type, RunLengthArray{N,T}
, that efficiently encodes an array containing long sequences of repeated values, like the following example:
6 6 6 6 4 4 4 4 4 4 4 10 10 10 10 10 10 …
The RunLengthArray{N,T}
type uses a run-length encoding algorithm.
The following program shows its usage:
original = Int64[3, 3, 3, 7, 7, 7, 7];
compressed = CMBSim.RunLengthArray{Int,Int64}(original);
for elem in compressed:
println(elem)
end
println("The sum of the elements is ", sum(compressed))
println("The 3rd element is ", compressed[3])
push!(compressed, 7)
The type RunLengthArray
derives from AbstractArray
.
Creating an array
You can create an instance of RunLengthArray{N,T}
using one of its three constructors:
- You can call the constructor without parameters; in this case, the array will be empty. You can fill it later using
push!
andappend!
. - You can pass an array, which will be compressed;
- You can pass two arrays, containing the length and value of each run respectively.
The types N
and T
encode the types of the counter and of the value, respectively.
The third constructor is the most interesting. Suppose that you want to store a sequence containing four instances of the number 4.6, followed by five instances of the number 5.7:
4.6 4.6 4.6 4.6 5.7 5.7 5.7 5.7 5.7
You can achieve this result using one of the two following forms:
# Second constructor
x = RunLengthArray{Int,Float64}([4.6, 4.6, 4.6, 4.6, 5.7, 5.7, 5.7, 5.7, 5.7])
# Third constructor (preferred in this case)
y = RunLengthArray{Int,Float64}([4, 5], [4.6, 5.7])
Operations on run-length arrays
The package defines specialized versions of a few functions in Julia's library, in order to make them more efficient. Here is a list of them:
sum
minimum
maximum
extrema
sort
sort!
Calling this functions to run-length arrays can be significantly faster. Here is an example:
julia> compr = RunLengthArray{Int, Float64}([100000, 200000], [1.1, 6.6]);
julia> uncompr = collect(compr);
julia> @benchmark minimum(compr)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 37.372 ns (0.00% GC)
median time: 39.660 ns (0.00% GC)
mean time: 48.777 ns (14.87% GC)
maximum time: 52.816 μs (99.91% GC)
--------------
samples: 10000
evals/sample: 992
julia> @benchmark minimum(uncompr)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 751.926 μs (0.00% GC)
median time: 776.371 μs (0.00% GC)
mean time: 788.463 μs (0.00% GC)
maximum time: 1.094 ms (0.00% GC)
--------------
samples: 6326
evals/sample: 1
Modifying run-length arrays
There are only three operations that can modify a run-length array:
push!
(add an element or a run to the end of the array);append!
(add a sequence to the end of the array);sort!
(sort the array in-place).
The push!
function can be used in two ways:
- Passing one value of type
T
will append it to the end of the array; - Passing a tuple with type
Tuple{N,T}
will append a run to the end of the array.
Here is an example:
x = RunLengthArray{Int, Float64}()
push!(x, (4, 1.1))
push!(x, (6, 2.2))
# The above array is equal to the following:
y = RunLengthArray{Int, Float64}([4, 6], [1.1, 2.2])
API documentation
RunLengthArrays.RunLengthArray
— Type.An array of objects of type T
encoded using run-length compression.
This works like an array, but it is designed to be extremely efficient when the array is long and consists of long repetitions («runs») of the same elements. The length of each run is held using type N
; this type should be chosen according to the expected maximum length of a run.
Once a RunLengthArray
has been created, new elements can be added only at the end of it, using push!
(add one element) or append!
(add a sequence of elements).
arr = RunLengthArray{Int, Int8}(Int8[3, 3, 3, 7, 7, 7, 7, 7, 7, 4])
longarr = collect(arr)
@assert length(arr) == 10
@assert sum(arr) == 30
@assert arr[1] == 3
@assert arr[2] == 3
println("$(sizeof(arr))") # Prints 16 (bytes)
println("$(sizeof(longarr))") # Prints 56 (bytes)
RunLengthArrays.runs
— Method.runs(arr::RunLengthArray{N,T})
Returns a Array{N,1}
type containing the length of each run in the run-length array.
x = RunLengthArray{Int,String}(["X", "X", "X", "O", "O"])
@assert runs(x) == Int[3, 2]
To get a list of the values, use values
.
Base.append!
— Method.append!(arr::RunLengthArray{N,T}, otherarray)
Append an array to a RunLengthArray
, modifying it in place.
Base.push!
— Function.push!(arr::RunLengthArray{N,T}, newval::T) where {N <: Number,T}
push!(arr::RunLengthArray{N,T}, newrun::Tuple{N,T}) where {N <: Number,T}
Append a element or a run of elements to the end of a RunLengthArray
. In the second case, the parameter newrun
must be a tuple containing the number of repetitions n
and the value v
: (n, v)
.
arr = RunLengthArray{Int, Float64}([3, 2], [1.1, 6.5])
@assert length(arr) == 5
# Add 6 instances of the number "1.4" at the end of the array
push!(arr, (6, 1.4))
@assert length(arr) == 11
Base.push!
`