Cstruct_cap
Raw memory buffers with capabilities
Cstruct_cap
wraps OCaml Stdlib's Bigarray module. Each t
consists of a proxy (consisting of offset, length, and the actual Bigarray
.t buffer). The goal of this module is two-fold: enable zero-copy - the underlying buffer is shared by most of the functions - and static checking of read and write capabilities to the underlying buffer (using phantom types).
Each 'a t
is parameterized by the available capabilities: read (rd
) and write (wr
): to access the contents of the buffer the read
capability is necessary, for modifying the content of the buffer the write
capability is necessary. Capabilities can only be dropped, never gained, to a buffer. If code only has read capability, this does not mean that there is no other code fragment with write capability to the underlying buffer.
The functions that retrieve bytes (get_uint8
etc.) require a read
capability, functions mutating the underlying buffer (set_uint8
etc.) require a write
capability. Allocation of a buffer (via create
, ...) returns a t
with read and write capabilities. ro
drops the write capability, wo
drops the read capability. The only exception is unsafe_to_bigarray
that returns the underlying Bigarray.t
.
Accessors and mutators for fixed size integers (8, 16, 32, 64 bit) are provided for big-endian and little-endian encodings.
type buffer =
( char, Stdlib.Bigarray.int8_unsigned_elt, Stdlib.Bigarray.c_layout )
Stdlib.Bigarray.Array1.t
Type of buffer. A t
is composed of an underlying buffer.
equal a b
is true
iff a
and b
correspond to the same sequence of bytes (it uses memcmp
internally). Both a
and b
need at least read capability rd
.
pp ppf t
pretty-prints t
on ppf
. t
needs read capability rd
.
val length : 'a t -> int
val check_alignment : 'a t -> int -> bool
check_alignment t alignment
is true
if the first byte stored in the underlying buffer of t
is at a memory address where address mod alignment = 0
, false
otherwise. The mod
used has the C/OCaml semantic (which differs from Python). Typical uses are to check a buffer is aligned to a page or disk sector boundary.
create len
allocates a buffer and proxy with both read and write capabilities of size len
. It is filled with zero bytes.
create_unsafe len
allocates a buffer and proxy with both read and write capabilities of size len
.
Note that the returned t
will contain arbitrary data, likely including the contents of previously-deallocated cstructs.
Beware!
Forgetting to replace this data could cause your application to leak sensitive information.
sub t ~off ~len
returns a proxy which shares the underlying buffer of t
. It is sliced at offset off
and of length len
. The returned value has the same capabilities as t
.
sub_copy t ~off ~len
is a new copy of sub t ~off ~len
, that does not share the underlying buffer of t
. The returned value has read-write capabilities because it doesn't affect t
.
shift t len
returns a proxy which shares the underlying buffer of t
. The returned value starts len
bytes later than the given t
. The returned value has the same capabilities as t
.
shiftv ts n
is ts
without the first n
bytes. It has the property that equal (concat (shiftv ts n)) (shift (concat ts) n)
. This operation is fairly fast, as it will share the tail of the list. The first item in the returned list is never an empty cstruct, so you'll get []
if and only if lenv ts = n
.
split ~start t len
returns two proxies extracted from t
. The first starts at offset start
(default 0
), and is of length len
. The second is the remainder of t
. The underlying buffer is shared, the capabilities are preserved.
val copy : 'a t -> int -> int -> string
copy cstr off len
is the same as Cstruct.to_string cstr ~off ~len
.
append a b
allocates a buffer r
of size length a + length b
. Then the content of a
is copied at the start of the buffer r
, and b
is copied behind a
's end in r
. a
and b
need at least read capability rd
, the returned value has both read and write capabilities.
concat vss
allocates a buffer r
of size lenv vss
. Each v
of vss
is copied into the buffer r
. Each v
of vss
need at least read capability rd
, the returned value has both read and write capabilities.
fillv ~src ~dst
copies from src
to dst
until src
is exhausted or dst
is full. It returns the number of bytes copied and the remaining data from src
, if any. This is useful if you want to bufferize data into fixed-sized chunks. Each t
of src
need at least read capability rd
. dst
needs at least write capability wr
.
rev t
allocates a buffer r
of size length t
, and fills it with the bytes of t
in reverse order. The given t
needs at least read capability rd
, the returned value has both read and write capabilities.
memset t x
sets all bytes of t
to x land 0xFF
. t
needs at least write capability wr
.
blit src ~src_off dst ~dst_off ~len
copies len
bytes from src
starting at index src_off
to dst
starting at index dst_off
. It works correctly even if src
and dst
refer to the same underlying buffer, and the src
and dst
intervals overlap. This function uses memmove
internally.
src
needs at least read capability rd
. dst
needs at least write capability wr
.
blit_from_string src ~src_off dst ~dst_off ~len
copies len
byres from src
starting at index src_off
to dst
starting at index dst_off
. This function uses memcpy
internally.
dst
needs at least write capability wr
.
blit_from_bytes src ~src_off dst ~dst_off ~len
copies len
bytes from src
starting at index src_off
to dst
starting at index dst_off
. This uses memcpy
internally.
dst
needs at least write capability wr
.
of_string ~off ~len s
allocates a buffer and copies the contents of s
into it starting at offset off
(default 0
) and of length len
(default String.length s - off
). The returned value has both read and write capabilities.
to_string ~off ~len t
is the string representation of the segment of t
starting at off
(default 0
) of size len
(default length t - off
). t
needs at least read capability rd
.
to_hex_string ~off ~len t
is a fresh OCaml string
containing the hex representation of sub t off len
. See Cstruct.to_hex_string
.
of_hex ~off ~len s
allocates a buffer and copies the content of s
starting at offset off
(default 0
) of length len
(default String.length s - off
), decoding the hex-encoded characters. Whitespaces in the string are ignored, every pair of hex-encoded characters in s
are converted to one byte in the returned t
, which is exactly half the size of the non-whitespace characters of s
from off
of length len
.
of_bytes ~off ~len b
allocates a buffer and copies the contents of b
into it starting at offset off
(default 0
) and of length len
(default Bytes.length b - off
). The returned value has both read and write capabilities.
to_bytes ~off ~len t
is the bytes representation of the segment of t
starting at off
(default 0
) of size len
(default length t - off
). t
needs at least read capability rd
.
blit_to_bytes src ~src_off dst ~dst_off ~len
copies length len
bytes from src
, starting at index src_off
, to sequences dst
, starting at index dst_off
. blit_to_bytes
uses memcpy
internally.
src
needs at least read capability rd
.
of_bigarray ~off ~len b
is a proxy that contains b
with offset off
(default 0
) of length len
(default Bigarray.Array1.dim b - off
). The returned value has both read and write capabilties.
unsafe_to_bigarray t
converts t
into a buffer
Bigarray, using the Bigarray slicing to allocate a fresh proxy Bigarray that preserves sharing of the underlying buffer.
In other words:
let t = Cstruct_cap.create 10 in
let b = Cstruct_cap.unsafe_to_bigarray t in
Bigarray.Array1.set b 0 '\x42' ;
assert (Cstruct_cap.get_char t 0 = '\x42')
iter lenf of_cstruct t
is an iterator over t
that returns elements of size lenf t
and type of_cstruct t
. t
needs at least read capability rd
and iter
keeps capabilities of t
on of_cstruct
.
val fold : ( 'acc -> 'x -> 'acc ) -> 'x iter -> 'acc -> 'acc
fold f iter acc
is (f iterN accN ... (f iter acc)...)
.
get_char t off
returns the character contained in t
at offset off
. t
needs at least read capability rd
.
set_char t off c
sets the character contained in t
at offset off
to character c
. t
needs at least write capability wr
.
get_uint8 t off
returns the byte contained in t
at offset off
. t
needs at least read capability rd
.
set_uint8 t off x
sets the byte contained in t
at offset off
to byte x
. t
needs at least write capability wr
.
module BE : sig ... end
module LE : sig ... end
As Cstruct
, capabilities interface provides helpers functions to help the user to parse contents.
head cs
is Some (get cs h)
with h = 0
if rev = false
(default) or h
= length cs - 1
if rev = true
. None
is returned if cs
is empty.
tail cs
is cs
without its first (rev
is false
, default) or last (rev
is true
) byte or cs
is empty.
is_prefix ~affix cs
is true
iff affix.[zidx] = cs.[zidx]
for all indices zidx
of affix
.
is_suffix ~affix cs
is true
iff affix.[n - zidx] = cs.[m - zidx]
for all indices zidx
of affix
with n = length affix - 1
and m = length cs
- 1
.
is_infix ~affix cs
is true
iff there exists an index z
in cs
such that for all indices zidx
of affix
we have affix.[zidx] = cs.[z +
zidx]
.
for_all p cs
is true
iff for all indices zidx
of cs
, p cs.[zidx] =
true
.
exists p cs
is true
iff there exists an index zidx
of cs
with p
cs.[zidx] = true
.
trim ~drop cs
is cs
with prefix and suffix bytes satisfying drop
in cs
removed. drop
defaults to function ' ' | '\r' .. '\t' -> true | _ ->
false
.
val span :
?rev:bool ->
?min:int ->
?max:int ->
?sat:( char -> bool ) ->
'a rd t ->
'a rd t * 'a rd t
span ~rev ~min ~max ~sat cs
is (l, r)
where:
rev
is false
(default), l
is at least min
and at most max
consecutive sat
satisfying initial bytes of cs
or is_empty
if there are no such bytes. r
are the remaining bytes of cs
.rev
is true
, r
is at least min
and at most max
consecutive sat
satisfying final bytes of cs
or is_empty
if there are no such bytes. l
are the remaining bytes of cs
.If max
is unspecified the span is unlimited. If min
is unspecified it defaults to 0
. If min > max
the condition can't be satisfied and the left or right span, depending on rev
, is always empty. sat
defaults to (fun _ -> true)
.
The invariant l ^ r = s
holds.
For instance, the ABNF expression:
time := 1*10DIGIT
can be translated to:
let (time, _) = span ~min:1 ~max:10 is_digit cs in
take ~rev ~min ~max ~sat cs
is the matching span of span
without the remaining one. In other words:
(if rev then snd else fst) @@ span ~rev ~min ~max ~sat cs
drop ~rev ~min ~max ~sat cs
is the remaining span of span
without the matching one. In other words:
(if rev then fst else snd) @@ span ~rev ~min ~max ~sat cs
cut ~sep cs
is either the pair Some (l, r)
of the two (possibly empty) sub-buffers of cs
that are delimited by the first match of the non empty separator string sep
or None
if sep
can't be matched in cs
. Matching starts from the beginning of cs
(rev
is false
, default) or the end (rev
is true
).
The invariant l ^ sep ^ r = s
holds.
For instance, the ABNF expression:
field_name := *PRINT field_value := *ASCII field := field_name ":" field_value
can be translated to:
match cut ~sep:":" value with
| Some (field_name, field_value) -> ...
| None -> invalid_arg "invalid field"
cuts ~sep cs
is the list of all sub-buffers of cs
that are delimited by matches of the non empty separator sep
. Empty sub-buffers are omitted in the list if empty
is false
(default to true
).
Matching separators in cs
starts from the beginning of cs
(rev
is false
, default) or the end (rev
is true
). Once one is found, the separator is skipped and matching starts again, that is separator matches can't overlap. If there is no separator match in cs
, the list [cs]
is returned.
The following invariants hold:
concat ~sep (cuts ~empty:true ~sep cs) = cs
cuts ~empty:true ~sep cs <> []
For instance, the ABNF expression:
arg := *(ASCII / ",") ; any characters exclude "," args := arg *("," arg)
can be translated to:
let args = cuts ~sep:"," buffer in
fields ~empty ~is_sep cs
is the list of (possibly empty) sub-buffers that are delimited by bytes for which is_sep
is true
. Empty sub-buffers are omitted in the list if empty
is false
(defaults to true
). is_sep c
if it's not define by the user is true
iff c
is an US-ASCII white space character, that is one of space ' '
(0x20
), tab '\t'
(0x09
), newline '\n'
(0x0a
), vertical tab (0x0b
), form feed (0x0c
), carriage return '\r'
(0x0d
).
find ~rev sat cs
is the sub-buffer of cs
(if any) that spans the first byte that satisfies sat
in cs
after position start cs
(rev
is false
, default) or before stop cs
(rev
is true
). None
is returned if there is no matching byte in s
.
find_sub ~rev ~sub cs
is the sub-buffer of cs
(if any) that spans the first match of sub
in cs
after position start cs
(rev
is false
, default) or before stop cs
(rev
is true
). Only bytes are compared and sub
can be on a different base buffer. None
is returned if there is no match of sub
in s
.
filter sat cs
is the buffer made of the bytes of cs
that satisfy sat
, in the same order.
filter_map f cs
is the buffer made of the bytes of cs
as mapped by f
, in the same order.
map f cs
is cs'
with cs'.[i] = f cs.[i]
for all indices i
of cs
. f
is invoked in increasing index order.