diff --git a/previews/PR239/.documenter-siteinfo.json b/previews/PR239/.documenter-siteinfo.json index 9ceb50c..3ca034d 100644 --- a/previews/PR239/.documenter-siteinfo.json +++ b/previews/PR239/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-06T02:47:43","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-06T23:45:15","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/previews/PR239/devnotes/index.html b/previews/PR239/devnotes/index.html index 66720c8..7fc51bc 100644 --- a/previews/PR239/devnotes/index.html +++ b/previews/PR239/devnotes/index.html @@ -3,4 +3,4 @@ user <--- |state.buffer1| <--- <stream.codec> <--- |state.buffer2| <--- stream When writing data (`state.mode == :write`): - user ---> |state.buffer1| ---> <stream.codec> ---> |state.buffer2| ---> stream

In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.

State (defined in src/state.jl) has five fields:

The mode field may be one of the following value:

Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.

The initial mode is :idle and mode transition happens as shown in the following diagram: Mode transition

Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.

A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.

Shared buffers

Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.

If buffer2 is shared it is considered to be owned by the underlying stream by the stats and position functions.

readdata!(input::IO, output::Buffer) and flush_buffer2(stream::TranscodingStream) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.

Noop codec

Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.

+ user ---> |state.buffer1| ---> <stream.codec> ---> |state.buffer2| ---> stream

In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.

State (defined in src/state.jl) has five fields:

The mode field may be one of the following value:

Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.

The initial mode is :idle and mode transition happens as shown in the following diagram: Mode transition

Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.

A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.

Shared buffers

Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.

If buffer2 is shared it is considered to be owned by the underlying stream by the stats and position functions.

readdata!(input::IO, output::Buffer) and flush_buffer2(stream::TranscodingStream) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.

Noop codec

Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.

diff --git a/previews/PR239/examples/index.html b/previews/PR239/examples/index.html index bd7904d..e9c813c 100644 --- a/previews/PR239/examples/index.html +++ b/previews/PR239/examples/index.html @@ -102,4 +102,4 @@ data1 = read(stream, 8) TranscodingStreams.unread(stream, data1) data2 = read(stream, 8) -@assert data1 == data2

The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.

Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.

+@assert data1 == data2

The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.

Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.

diff --git a/previews/PR239/index.html b/previews/PR239/index.html index 9109bb0..d08ef66 100644 --- a/previews/PR239/index.html +++ b/previews/PR239/index.html @@ -115,4 +115,4 @@ Bzip2DecompressorStream Decompress data in bzip2 (.bz2) format. -

Notes

Wrapped streams

The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.

Error handling

You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.

+

Notes

Wrapped streams

The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.

Error handling

You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.

diff --git a/previews/PR239/migrating/index.html b/previews/PR239/migrating/index.html index 5bb21f9..7670625 100644 --- a/previews/PR239/migrating/index.html +++ b/previews/PR239/migrating/index.html @@ -1,2 +1,2 @@ -Migration · TranscodingStreams.jl

Migration

How to migrate from v0.10 to v0.11

v0.11 has a few subtle breaking changes to eof and seekend.

Memory(data::ByteData)

The Memory(data::ByteData) constructor was removed. Use Memory(pointer(data), sizeof(data)) instead.

seekend(stream::TranscodingStream)

Generic seekend for TranscodingStream was removed. If the objective is to discard all remaining data in the stream, use skip(stream, typemax(Int64)) instead where typemax(Int64) is meant to be a large number to exhaust the stream. Ideally, specific implementations of TranscodingStream will implement seekend only if efficient means exist to avoid fully processing the stream. NoopStream still supports seekend.

The previous behavior of the generic seekend was something like (seekstart(stream); seekend(stream.stream); stream) but this led to inconsistencies with the position of the stream.

eof(stream::TranscodingStream)

eof now throws an error if called on a stream that is closed or in writing mode. Use !isreadable(stream) || eof(stream) if you need to more closely match previous behavior.

+Migration · TranscodingStreams.jl

Migration

How to migrate from v0.10 to v0.11

v0.11 has a few subtle breaking changes to eof and seekend.

Memory(data::ByteData)

The Memory(data::ByteData) constructor was removed. Use Memory(pointer(data), sizeof(data)) instead.

seekend(stream::TranscodingStream)

Generic seekend for TranscodingStream was removed. If the objective is to discard all remaining data in the stream, use skip(stream, typemax(Int64)) instead where typemax(Int64) is meant to be a large number to exhaust the stream. Ideally, specific implementations of TranscodingStream will implement seekend only if efficient means exist to avoid fully processing the stream. NoopStream still supports seekend.

The previous behavior of the generic seekend was something like (seekstart(stream); seekend(stream.stream); stream) but this led to inconsistencies with the position of the stream.

eof(stream::TranscodingStream)

eof now throws an error if called on a stream that is closed or in writing mode. Use !isreadable(stream) || eof(stream) if you need to more closely match previous behavior.

diff --git a/previews/PR239/reference/index.html b/previews/PR239/reference/index.html index e0d4c40..5466dc3 100644 --- a/previews/PR239/reference/index.html +++ b/previews/PR239/reference/index.html @@ -11,7 +11,7 @@ julia> readline(file) "TranscodingStreams.jl" -julia> close(stream)source
Base.transcodeFunction
transcode(
+julia> close(stream)
source
Base.transcodeFunction
transcode(
     ::Type{C},
     data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},
 )::Vector{UInt8} where {C<:Codec}

Transcode data by applying a codec C().

Note that this method does allocation and deallocation of C() in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data) is a recommended method in terms of performance.

Examples

julia> using CodecZlib
@@ -23,7 +23,7 @@
 julia> decompressed = transcode(ZlibDecompressor, compressed);
 
 julia> String(decompressed)
-"abracadabra"
source
transcode(
+"abracadabra"
source
transcode(
     codec::Codec,
     data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},
     [output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],
@@ -51,4 +51,4 @@
 
 julia> String(decompressed)
 "abracadabra"
-
source
TranscodingStreams.unsafe_transcode!Function
unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.transcode!Function
transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.TOKEN_ENDConstant

A special token indicating the end of data.

TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.

Note

Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.

source
TranscodingStreams.unsafe_readFunction
unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int

Copy at most nbytes from input into output.

This function is similar to Base.unsafe_read but is different in some points:

  • It does not throw EOFError when it fails to read nbytes from input.
  • It returns the number of bytes written to output.
  • It does not block if there are buffered data in input.
source
TranscodingStreams.unreadFunction
unread(stream::TranscodingStream, data::AbstractVector{UInt8})

Insert data to the current reading position of stream.

The next read(stream, sizeof(data)) call will read data that are just inserted.

data must not alias any internal buffers in stream

source
TranscodingStreams.unsafe_unreadFunction
unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)

Insert nbytes pointed by data to the current reading position of stream.

The data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.

data must not alias any internal buffers in stream

source
Base.positionMethod
position(stream::TranscodingStream)

Return the number of bytes read from or written to stream.

Note that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.

source
Base.skipFunction
skip(stream::TranscodingStream, offset)

Read bytes from stream until offset bytes have been read or eof(stream) is reached.

Return stream, discarding read bytes.

This function will not throw an EOFError if eof(stream) is reached before offset bytes can be read.

source

Statistics

TranscodingStreams.StatsType

I/O statistics.

Its object has four fields:

  • in: the number of bytes supplied into the stream
  • out: the number of bytes consumed out of the stream
  • transcoded_in: the number of bytes transcoded from the input buffer
  • transcoded_out: the number of bytes transcoded to the output buffer

Note that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.

source
TranscodingStreams.statsFunction
stats(stream::TranscodingStream)

Create an I/O statistics object of stream.

source

Codec

TranscodingStreams.NoopType
Noop()

Create a noop codec.

Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.

The implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.

source
TranscodingStreams.NoopStreamType
NoopStream(stream::IO)

Create a noop stream.

source
Base.positionMethod
position(stream::NoopStream)

Get the current poition of stream.

Note that this method may return a wrong position when

  • some data have been inserted by TranscodingStreams.unread, or
  • the position of the wrapped stream has been changed outside of this package.
source
TranscodingStreams.CodecType

An abstract codec type.

Any codec supporting the transcoding protocol must be a subtype of this type.

Transcoding protocol

Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.

There are six functions for a codec to implement:

  • expectedsize: return the expected size of transcoded data
  • pledgeinsize: tell the codec the total input size
  • minoutsize: return the minimum output size of process
  • initialize: initialize the codec
  • finalize: finalize the codec
  • startproc: start processing with the codec
  • process: process data with the codec.

These are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, pledgeinsize, initialize, finalize, and startproc have a default implementation.

Your codec type is denoted by C and its object by codec.

Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.

expectedsize

The expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.

pledgeinsize

The pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol method is used when transcode is called to tell the codec the total input size. Some compressors can add this total input size to a header, making expectedsize accurate during later decompression. By default this just returns :ok. If there is an error, the return code must be :error and the error argument must be set to an exception object. Setting an inaccurate insize may cause the codec to error later on while streaming data. A negative insize means unknown content size.

minoutsize

The minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.

initialize

The initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.

finalize

The finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.

startproc

The startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.

process

The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.

source
TranscodingStreams.expectedsizeFunction
expectedsize(codec::Codec, input::Memory)::Int

Return the expected size of the transcoded input with codec.

The default method returns input.size.

source
TranscodingStreams.pledgeinsizeFunction
pledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol

Tell the codec the total input size.

The default method does nothing and returns :ok.

source
TranscodingStreams.minoutsizeFunction
minoutsize(codec::Codec, input::Memory)::Int

Return the minimum output size to be ensured when calling process.

The default method returns max(1, div(input.size, 4)).

source
TranscodingStreams.initializeFunction
initialize(codec::Codec)::Void

Initialize codec.

The default method does nothing.

source
TranscodingStreams.finalizeFunction
finalize(codec::Codec)::Void

Finalize codec.

The default method does nothing.

source
TranscodingStreams.startprocFunction
startproc(codec::Codec, mode::Symbol, error::Error)::Symbol

Start data processing with codec of mode.

The default method does nothing and returns :ok.

source
TranscodingStreams.processFunction
process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}

Do data processing with codec.

There is no default method.

source

Internal types

TranscodingStreams.MemoryType

A contiguous memory.

This type works like a Vector method.

source
TranscodingStreams.ErrorType

Container of transcoding error.

An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException("error!")).

source
TranscodingStreams.StateType

A mutable state type of transcoding streams.

See Developer's notes for details.

source
+source
TranscodingStreams.unsafe_transcode!Function
unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.transcode!Function
transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.TOKEN_ENDConstant

A special token indicating the end of data.

TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.

Note

Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.

source
TranscodingStreams.unsafe_readFunction
unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int

Copy at most nbytes from input into output.

This function is similar to Base.unsafe_read but is different in some points:

  • It does not throw EOFError when it fails to read nbytes from input.
  • It returns the number of bytes written to output.
  • It does not block if there are buffered data in input.
source
TranscodingStreams.unreadFunction
unread(stream::TranscodingStream, data::AbstractVector{UInt8})

Insert data to the current reading position of stream.

The next read(stream, sizeof(data)) call will read data that are just inserted.

data must not alias any internal buffers in stream

source
TranscodingStreams.unsafe_unreadFunction
unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)

Insert nbytes pointed by data to the current reading position of stream.

The data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.

data must not alias any internal buffers in stream

source
Base.positionMethod
position(stream::TranscodingStream)

Return the number of bytes read from or written to stream.

Note that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.

source
Base.skipFunction
skip(stream::TranscodingStream, offset)

Read bytes from stream until offset bytes have been read or eof(stream) is reached.

Return stream, discarding read bytes.

This function will not throw an EOFError if eof(stream) is reached before offset bytes can be read.

source

Statistics

TranscodingStreams.StatsType

I/O statistics.

Its object has four fields:

  • in: the number of bytes supplied into the stream
  • out: the number of bytes consumed out of the stream
  • transcoded_in: the number of bytes transcoded from the input buffer
  • transcoded_out: the number of bytes transcoded to the output buffer

Note that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.

source
TranscodingStreams.statsFunction
stats(stream::TranscodingStream)

Create an I/O statistics object of stream.

source

Codec

TranscodingStreams.NoopType
Noop()

Create a noop codec.

Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.

The implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.

source
TranscodingStreams.NoopStreamType
NoopStream(stream::IO)

Create a noop stream.

source
Base.positionMethod
position(stream::NoopStream)

Get the current poition of stream.

Note that this method may return a wrong position when

  • some data have been inserted by TranscodingStreams.unread, or
  • the position of the wrapped stream has been changed outside of this package.
source
TranscodingStreams.CodecType

An abstract codec type.

Any codec supporting the transcoding protocol must be a subtype of this type.

Transcoding protocol

Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.

There are six functions for a codec to implement:

  • expectedsize: return the expected size of transcoded data
  • pledgeinsize: tell the codec the total input size
  • minoutsize: return the minimum output size of process
  • initialize: initialize the codec
  • finalize: finalize the codec
  • startproc: start processing with the codec
  • process: process data with the codec.

These are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, pledgeinsize, initialize, finalize, and startproc have a default implementation.

Your codec type is denoted by C and its object by codec.

Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.

expectedsize

The expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.

pledgeinsize

The pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol method is used when transcode is called to tell the codec the total input size. This is called after startproc and before process. Some compressors can add this total input size to a header, making expectedsize accurate during later decompression. By default this just returns :ok. If there is an error, the return code must be :error and the error argument must be set to an exception object. Setting an inaccurate insize may cause the codec to error later on while processing data. A negative insize means unknown content size.

minoutsize

The minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.

initialize

The initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.

finalize

The finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.

startproc

The startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.

process

The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.

source
TranscodingStreams.expectedsizeFunction
expectedsize(codec::Codec, input::Memory)::Int

Return the expected size of the transcoded input with codec.

The default method returns input.size.

source
TranscodingStreams.pledgeinsizeFunction
pledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol

Tell the codec the total input size.

The default method does nothing and returns :ok.

source
TranscodingStreams.minoutsizeFunction
minoutsize(codec::Codec, input::Memory)::Int

Return the minimum output size to be ensured when calling process.

The default method returns max(1, div(input.size, 4)).

source
TranscodingStreams.initializeFunction
initialize(codec::Codec)::Void

Initialize codec.

The default method does nothing.

source
TranscodingStreams.finalizeFunction
finalize(codec::Codec)::Void

Finalize codec.

The default method does nothing.

source
TranscodingStreams.startprocFunction
startproc(codec::Codec, mode::Symbol, error::Error)::Symbol

Start data processing with codec of mode.

The default method does nothing and returns :ok.

source
TranscodingStreams.processFunction
process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}

Do data processing with codec.

There is no default method.

source

Internal types

TranscodingStreams.MemoryType

A contiguous memory.

This type works like a Vector method.

source
TranscodingStreams.ErrorType

Container of transcoding error.

An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException("error!")).

source
TranscodingStreams.StateType

A mutable state type of transcoding streams.

See Developer's notes for details.

source
diff --git a/previews/PR239/search_index.js b/previews/PR239/search_index.js index 4fce4d7..cbb0b33 100644 --- a/previews/PR239/search_index.js +++ b/previews/PR239/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Read-lines-from-a-gzip-compressed-file","page":"Examples","title":"Read lines from a gzip-compressed file","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The following snippet is an example of using CodecZlib.jl, which exports GzipDecompressorStream{S} as an alias of TranscodingStream{GzipDecompressor,S}, where S is a subtype of IO:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nstream = GzipDecompressorStream(open(\"data.txt.gz\"))\nfor line in eachline(stream)\n # do something...\nend\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Note that the last close call closes the wrapped file as well. Alternatively, open(, ) do ... end syntax closes the file at the end:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nopen(GzipDecompressorStream, \"data.txt.gz\") do stream\n for line in eachline(stream)\n # do something...\n end\nend","category":"page"},{"location":"examples/#Read-compressed-data-from-a-pipe","page":"Examples","title":"Read compressed data from a pipe","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The input is not limited to usual files. You can read data from a pipe (actually, any IO object that implements standard I/O methods) as follows:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nproc = open(`cat some.data.gz`)\nstream = GzipDecompressorStream(proc)\nfor line in eachline(stream)\n # do something...\nend\nclose(stream) # This will finish the process as well.","category":"page"},{"location":"examples/#Save-a-data-matrix-with-Zstd-compression","page":"Examples","title":"Save a data matrix with Zstd compression","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Writing compressed data is easy. One thing you need to keep in mind is to call close after writing data; otherwise, the output file will be incomplete:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing DelimitedFiles\nmat = randn(100, 100)\nstream = ZstdCompressorStream(open(\"data.mat.zst\", \"w\"))\nwritedlm(stream, mat)\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Of course, open(, ...) do ... end just works:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing DelimitedFiles\nmat = randn(100, 100)\nopen(ZstdCompressorStream, \"data.mat.zst\", \"w\") do stream\n writedlm(stream, mat)\nend","category":"page"},{"location":"examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END","page":"Examples","title":"Explicitly finish transcoding by writing TOKEN_END","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"When writing data, the end of a data stream is indicated by calling close, which writes an epilogue if necessary and flushes all buffered data to the underlying I/O stream. If you want to explicitly specify the end of a data chunk for some reason, you can write TranscodingStreams.TOKEN_END to the transcoding stream, which finishes the current transcoding process without closing the underlying stream:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing TranscodingStreams\nbuf = IOBuffer()\nstream = ZstdCompressorStream(buf)\nwrite(stream, \"foobarbaz\"^100, TranscodingStreams.TOKEN_END)\nflush(stream)\ncompressed = take!(buf)\nclose(stream)","category":"page"},{"location":"examples/#Use-a-noop-codec","page":"Examples","title":"Use a noop codec","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The Noop codec does nothing (i.e., buffering data without transformation). NoopStream is an alias of TranscodingStream{Noop}. The following example creates a decompressor stream based on the extension of a filepath:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing CodecXz\nusing TranscodingStreams\n\nfunction makestream(filepath)\n if endswith(filepath, \".gz\")\n codec = GzipDecompressor()\n elseif endswith(filepath, \".xz\")\n codec = XzDecompressor()\n else\n codec = Noop()\n end\n return TranscodingStream(codec, open(filepath))\nend\n\nmakestream(\"data.txt.gz\")\nmakestream(\"data.txt.xz\")\nmakestream(\"data.txt\")","category":"page"},{"location":"examples/#Change-the-codec-of-a-file","page":"Examples","title":"Change the codec of a file","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams are composable: a stream can be an input/output of another stream. You can use this to change the format of a file by composing different codecs as below:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing CodecZstd\n\ninput = open(\"data.txt.gz\", \"r\")\noutput = open(\"data.txt.zst\", \"w\")\n\nstream = GzipDecompressorStream(ZstdCompressorStream(output))\nwrite(stream, input)\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Effectively, this is equivalent to the following pipeline:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"cat data.txt.gz | gzip -d | zstd >data.txt.zst","category":"page"},{"location":"examples/#Stop-decoding-on-the-end-of-a-block","page":"Examples","title":"Stop decoding on the end of a block","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Many codecs support decoding concatenated data blocks (or chunks). For example, if you concatenate two gzip files into a single file and read it using GzipDecompressorStream, you will see the byte stream of concatenation of the two files. If you need the part corresponding the first file, you can set stop_on_end to true to stop transcoding at the end of the first block. Note that setting stop_on_end to true does not close the wrapped stream because you will often want to reuse it.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\n# cat foo.txt.gz bar.txt.gz > foobar.txt.gz\nstream = GzipDecompressorStream(open(\"foobar.txt.gz\"), stop_on_end=true)\nread(stream) #> the content of foo.txt\neof(stream) #> true","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"In the case where you need to reuse the wrapped stream, the code above must be slightly modified because the transcoding stream may read more bytes than necessary from the wrapped stream. Wrapping the stream with NoopStream solves the problem because any extra data read after the end of the chunk will be stored back in the internal buffer of the wrapped transcoding stream.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing TranscodingStreams\nstream = NoopStream(open(\"foobar.txt.gz\"))\nread(GzipDecompressorStream(stream, stop_on_end=true)) #> the content of foo.txt\nread(GzipDecompressorStream(stream, stop_on_end=true)) #> the content of bar.txt","category":"page"},{"location":"examples/#Check-I/O-statistics","page":"Examples","title":"Check I/O statistics","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams.stats returns a snapshot of the I/O statistics. For example, the following function shows progress of decompression to the standard error:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\n\nfunction decompress(input, output)\n buffer = Vector{UInt8}(undef, 16 * 1024)\n GC.@preserve buffer while !eof(input)\n n = min(bytesavailable(input), length(buffer))\n unsafe_read(input, pointer(buffer), n)\n unsafe_write(output, pointer(buffer), n)\n stats = TranscodingStreams.stats(input)\n print(STDERR, \"\\rin: $(stats.in), out: $(stats.out)\")\n end\n println(STDERR)\nend\n\ninput = GzipDecompressorStream(open(\"foobar.txt.gz\"))\noutput = IOBuffer()\ndecompress(input, output)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"stats.in is the number of bytes supplied to the stream and stats.out is the number of bytes consumed out of the stream.","category":"page"},{"location":"examples/#Transcode-data-in-one-shot","page":"Examples","title":"Transcode data in one shot","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams.jl extends the transcode function to transcode a data in one shot. transcode takes a codec object as its first argument and a data vector as its second argument:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\ndecompressed = transcode(ZlibDecompressor, b\"x\\x9cKL*JLNLI\\x04R\\x00\\x19\\xf2\\x04U\")\nString(decompressed)","category":"page"},{"location":"examples/#Transcode-lots-of-strings","page":"Examples","title":"Transcode lots of strings","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"transcode(, data) method is convenient but suboptimal when transcoding a number of objects. This is because the method reallocates a new codec object for every call. Instead, you can use transcode(, data) method that reuses the allocated object as follows. In this usage, you need to explicitly allocate and free resources by calling TranscodingStreams.initialize and TranscodingStreams.finalize, respectively.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing TranscodingStreams\nstrings = [\"foo\", \"bar\", \"baz\"]\ncodec = ZstdCompressor()\nTranscodingStreams.initialize(codec) # allocate resources\ntry\n for s in strings\n data = transcode(codec, s)\n # do something...\n end\ncatch\n rethrow()\nfinally\n TranscodingStreams.finalize(codec) # free resources\nend","category":"page"},{"location":"examples/#Unread-data","page":"Examples","title":"Unread data","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStream supports unread operation, which inserts data into the current reading position. This is useful when you want to peek from the stream. TranscodingStreams.unread and TranscodingStreams.unsafe_unread functions are provided:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using TranscodingStreams\nstream = NoopStream(open(\"data.txt\"))\ndata1 = read(stream, 8)\nTranscodingStreams.unread(stream, data1)\ndata2 = read(stream, 8)\n@assert data1 == data2","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.","category":"page"},{"location":"reference/#Reference","page":"Reference","title":"Reference","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"CurrentModule = TranscodingStreams","category":"page"},{"location":"reference/#TranscodingStream","page":"Reference","title":"TranscodingStream","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStream(codec::Codec, stream::IO)\ntranscode\nTranscodingStreams.unsafe_transcode!\n\nTranscodingStreams.transcode!\nTranscodingStreams.TOKEN_END\nTranscodingStreams.unsafe_read\nTranscodingStreams.unread\nTranscodingStreams.unsafe_unread\nBase.position(stream::TranscodingStream)\nBase.skip","category":"page"},{"location":"reference/#TranscodingStreams.TranscodingStream-Tuple{TranscodingStreams.Codec, IO}","page":"Reference","title":"TranscodingStreams.TranscodingStream","text":"TranscodingStream(codec::Codec, stream::IO;\n bufsize::Integer=16384,\n stop_on_end::Bool=false,\n sharedbuf::Bool=(stream isa TranscodingStream))\n\nCreate a transcoding stream with codec and stream.\n\nA TranscodingStream object wraps an input/output stream object stream, and transcodes the byte stream using codec. It is a subtype of IO and supports most of the I/O functions in the standard library.\n\nSee the docs (https://bicycle1885.github.io/TranscodingStreams.jl/stable/) for available codecs, examples, and more details of the type.\n\nArguments\n\ncodec: The data transcoder. The transcoding stream does the initialization and finalization of codec. Therefore, a codec object is not reusable once it is passed to a transcoding stream.\nstream: The wrapped stream. It must be opened before passed to the constructor.\nbufsize: The initial buffer size (the default size is 16KiB). The buffer may be extended whenever codec requests so.\nstop_on_end: The flag to stop reading on :end return code from codec. The transcoded data are readable even after stopping transcoding process. With this flag on, stream is not closed when the wrapper stream is closed with close. Note that if reading some extra data may be read from stream into an internal buffer, and thus stream must be a TranscodingStream object and sharedbuf must be true to reuse stream.\nsharedbuf: The flag to share buffers between adjacent transcoding streams. The value must be false if stream is not a TranscodingStream object.\n\nExamples\n\njulia> using TranscodingStreams\n\njulia> file = open(joinpath(dirname(dirname(pathof(TranscodingStreams))), \"README.md\"));\n\njulia> stream = TranscodingStream(Noop(), file);\n\njulia> readline(file)\n\"TranscodingStreams.jl\"\n\njulia> close(stream)\n\n\n\n\n\n","category":"method"},{"location":"reference/#Base.transcode","page":"Reference","title":"Base.transcode","text":"transcode(\n ::Type{C},\n data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},\n)::Vector{UInt8} where {C<:Codec}\n\nTranscode data by applying a codec C().\n\nNote that this method does allocation and deallocation of C() in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data) is a recommended method in terms of performance.\n\nExamples\n\njulia> using CodecZlib\n\njulia> data = b\"abracadabra\";\n\njulia> compressed = transcode(ZlibCompressor, data);\n\njulia> decompressed = transcode(ZlibDecompressor, compressed);\n\njulia> String(decompressed)\n\"abracadabra\"\n\n\n\n\n\ntranscode(\n codec::Codec,\n data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},\n [output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],\n)::Vector{UInt8}\n\nTranscode data by applying codec.\n\nIf output is unspecified, then this method will allocate it.\n\nNote that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\nExamples\n\njulia> using CodecZlib\n\njulia> data = b\"abracadabra\";\n\njulia> codec = ZlibCompressor();\n\njulia> TranscodingStreams.initialize(codec)\n\njulia> compressed = Vector{UInt8}()\n\njulia> transcode(codec, data, compressed);\n\njulia> TranscodingStreams.finalize(codec)\n\njulia> codec = ZlibDecompressor();\n\njulia> TranscodingStreams.initialize(codec)\n\njulia> decompressed = transcode(codec, compressed);\n\njulia> TranscodingStreams.finalize(codec)\n\njulia> String(decompressed)\n\"abracadabra\"\n\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unsafe_transcode!","page":"Reference","title":"TranscodingStreams.unsafe_transcode!","text":"unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)\n\nTranscode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.transcode!","page":"Reference","title":"TranscodingStreams.transcode!","text":"transcode!(output::Buffer, codec::Codec, input::Buffer)\n\nTranscode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.TOKEN_END","page":"Reference","title":"TranscodingStreams.TOKEN_END","text":"A special token indicating the end of data.\n\nTOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.\n\nnote: Note\nCall flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.\n\n\n\n\n\n","category":"constant"},{"location":"reference/#TranscodingStreams.unsafe_read","page":"Reference","title":"TranscodingStreams.unsafe_read","text":"unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int\n\nCopy at most nbytes from input into output.\n\nThis function is similar to Base.unsafe_read but is different in some points:\n\nIt does not throw EOFError when it fails to read nbytes from input.\nIt returns the number of bytes written to output.\nIt does not block if there are buffered data in input.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unread","page":"Reference","title":"TranscodingStreams.unread","text":"unread(stream::TranscodingStream, data::AbstractVector{UInt8})\n\nInsert data to the current reading position of stream.\n\nThe next read(stream, sizeof(data)) call will read data that are just inserted.\n\ndata must not alias any internal buffers in stream\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unsafe_unread","page":"Reference","title":"TranscodingStreams.unsafe_unread","text":"unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)\n\nInsert nbytes pointed by data to the current reading position of stream.\n\nThe data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.\n\ndata must not alias any internal buffers in stream\n\n\n\n\n\n","category":"function"},{"location":"reference/#Base.position-Tuple{TranscodingStream}","page":"Reference","title":"Base.position","text":"position(stream::TranscodingStream)\n\nReturn the number of bytes read from or written to stream.\n\nNote that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.\n\n\n\n\n\n","category":"method"},{"location":"reference/#Base.skip","page":"Reference","title":"Base.skip","text":"skip(stream::TranscodingStream, offset)\n\nRead bytes from stream until offset bytes have been read or eof(stream) is reached.\n\nReturn stream, discarding read bytes.\n\nThis function will not throw an EOFError if eof(stream) is reached before offset bytes can be read.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Statistics","page":"Reference","title":"Statistics","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Stats\nTranscodingStreams.stats","category":"page"},{"location":"reference/#TranscodingStreams.Stats","page":"Reference","title":"TranscodingStreams.Stats","text":"I/O statistics.\n\nIts object has four fields:\n\nin: the number of bytes supplied into the stream\nout: the number of bytes consumed out of the stream\ntranscoded_in: the number of bytes transcoded from the input buffer\ntranscoded_out: the number of bytes transcoded to the output buffer\n\nNote that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.stats","page":"Reference","title":"TranscodingStreams.stats","text":"stats(stream::TranscodingStream)\n\nCreate an I/O statistics object of stream.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Codec","page":"Reference","title":"Codec","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Noop\nTranscodingStreams.NoopStream\nBase.position(::NoopStream)","category":"page"},{"location":"reference/#TranscodingStreams.Noop","page":"Reference","title":"TranscodingStreams.Noop","text":"Noop()\n\nCreate a noop codec.\n\nNoop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.\n\nThe implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.NoopStream","page":"Reference","title":"TranscodingStreams.NoopStream","text":"NoopStream(stream::IO)\n\nCreate a noop stream.\n\n\n\n\n\n","category":"type"},{"location":"reference/#Base.position-Tuple{NoopStream}","page":"Reference","title":"Base.position","text":"position(stream::NoopStream)\n\nGet the current poition of stream.\n\nNote that this method may return a wrong position when\n\nsome data have been inserted by TranscodingStreams.unread, or\nthe position of the wrapped stream has been changed outside of this package.\n\n\n\n\n\n","category":"method"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Codec\nTranscodingStreams.expectedsize\nTranscodingStreams.pledgeinsize\nTranscodingStreams.minoutsize\nTranscodingStreams.initialize\nTranscodingStreams.finalize\nTranscodingStreams.startproc\nTranscodingStreams.process","category":"page"},{"location":"reference/#TranscodingStreams.Codec","page":"Reference","title":"TranscodingStreams.Codec","text":"An abstract codec type.\n\nAny codec supporting the transcoding protocol must be a subtype of this type.\n\nTranscoding protocol\n\nTranscoding proceeds by calling some functions in a specific way. We call this \"transcoding protocol\" and any codec must implement it as described below.\n\nThere are six functions for a codec to implement:\n\nexpectedsize: return the expected size of transcoded data\npledgeinsize: tell the codec the total input size\nminoutsize: return the minimum output size of process\ninitialize: initialize the codec\nfinalize: finalize the codec\nstartproc: start processing with the codec\nprocess: process data with the codec.\n\nThese are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, pledgeinsize, initialize, finalize, and startproc have a default implementation.\n\nYour codec type is denoted by C and its object by codec.\n\nErrors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.\n\nexpectedsize\n\nThe expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.\n\npledgeinsize\n\nThe pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol method is used when transcode is called to tell the codec the total input size. Some compressors can add this total input size to a header, making expectedsize accurate during later decompression. By default this just returns :ok. If there is an error, the return code must be :error and the error argument must be set to an exception object. Setting an inaccurate insize may cause the codec to error later on while streaming data. A negative insize means unknown content size.\n\nminoutsize\n\nThe minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.\n\ninitialize\n\nThe initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.\n\nfinalize\n\nThe finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.\n\nstartproc\n\nThe startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.\n\nprocess\n\nThe process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.expectedsize","page":"Reference","title":"TranscodingStreams.expectedsize","text":"expectedsize(codec::Codec, input::Memory)::Int\n\nReturn the expected size of the transcoded input with codec.\n\nThe default method returns input.size.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.pledgeinsize","page":"Reference","title":"TranscodingStreams.pledgeinsize","text":"pledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol\n\nTell the codec the total input size.\n\nThe default method does nothing and returns :ok.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.minoutsize","page":"Reference","title":"TranscodingStreams.minoutsize","text":"minoutsize(codec::Codec, input::Memory)::Int\n\nReturn the minimum output size to be ensured when calling process.\n\nThe default method returns max(1, div(input.size, 4)).\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.initialize","page":"Reference","title":"TranscodingStreams.initialize","text":"initialize(codec::Codec)::Void\n\nInitialize codec.\n\nThe default method does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.finalize","page":"Reference","title":"TranscodingStreams.finalize","text":"finalize(codec::Codec)::Void\n\nFinalize codec.\n\nThe default method does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.startproc","page":"Reference","title":"TranscodingStreams.startproc","text":"startproc(codec::Codec, mode::Symbol, error::Error)::Symbol\n\nStart data processing with codec of mode.\n\nThe default method does nothing and returns :ok.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.process","page":"Reference","title":"TranscodingStreams.process","text":"process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}\n\nDo data processing with codec.\n\nThere is no default method.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Internal-types","page":"Reference","title":"Internal types","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Memory\nTranscodingStreams.Error\nTranscodingStreams.State","category":"page"},{"location":"reference/#TranscodingStreams.Memory","page":"Reference","title":"TranscodingStreams.Memory","text":"A contiguous memory.\n\nThis type works like a Vector method.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.Error","page":"Reference","title":"TranscodingStreams.Error","text":"Container of transcoding error.\n\nAn object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException(\"error!\")).\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.State","page":"Reference","title":"TranscodingStreams.State","text":"A mutable state type of transcoding streams.\n\nSee Developer's notes for details.\n\n\n\n\n\n","category":"type"},{"location":"migrating/#Migration","page":"Migration","title":"Migration","text":"","category":"section"},{"location":"migrating/#How-to-migrate-from-v0.10-to-v0.11","page":"Migration","title":"How to migrate from v0.10 to v0.11","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"v0.11 has a few subtle breaking changes to eof and seekend.","category":"page"},{"location":"migrating/#Memory(data::ByteData)","page":"Migration","title":"Memory(data::ByteData)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"The Memory(data::ByteData) constructor was removed. Use Memory(pointer(data), sizeof(data)) instead.","category":"page"},{"location":"migrating/#seekend(stream::TranscodingStream)","page":"Migration","title":"seekend(stream::TranscodingStream)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"Generic seekend for TranscodingStream was removed. If the objective is to discard all remaining data in the stream, use skip(stream, typemax(Int64)) instead where typemax(Int64) is meant to be a large number to exhaust the stream. Ideally, specific implementations of TranscodingStream will implement seekend only if efficient means exist to avoid fully processing the stream. NoopStream still supports seekend.","category":"page"},{"location":"migrating/","page":"Migration","title":"Migration","text":"The previous behavior of the generic seekend was something like (seekstart(stream); seekend(stream.stream); stream) but this led to inconsistencies with the position of the stream.","category":"page"},{"location":"migrating/#eof(stream::TranscodingStream)","page":"Migration","title":"eof(stream::TranscodingStream)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"eof now throws an error if called on a stream that is closed or in writing mode. Use !isreadable(stream) || eof(stream) if you need to more closely match previous behavior.","category":"page"},{"location":"#Home","page":"Home","title":"Home","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: TranscodingStream)","category":"page"},{"location":"#Overview","page":"Home","title":"Overview","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TranscodingStreams.jl is a package for transcoding data streams. Transcoding may be compression, decompression, ASCII encoding, and any other codec. The package exports a data type TranscodingStream, which is a subtype of IO and wraps other IO object to transcode data read from or written to the wrapped stream.","category":"page"},{"location":"","page":"Home","title":"Home","text":"In this page, we introduce the basic concepts of TranscodingStreams.jl and currently available packages. The Examples page demonstrates common usage. The Reference page offers a comprehensive API document.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TranscodingStream has two type parameters, C<:Codec and S<:IO, and hence the concrete data type is written as TranscodingStream{C<:Codec,S<:IO}. This type wraps an underlying I/O stream S by a transcoding codec C. C and S are orthogonal and hence you can use any combination of these two types. The underlying stream may be any stream that supports I/O operations defined by the Base module. For example, it may be IOStream, TTY, IOBuffer, or TranscodingStream. The codec C must define the transcoding protocol defined in this package. We already have various codecs in packages listed below. Of course, you can define your own codec by implementing the transcoding protocol described in TranscodingStreams.Codec.","category":"page"},{"location":"","page":"Home","title":"Home","text":"You can install codec packages using the standard package manager. These codec packages are independent of each other and can be installed separately. You won't need to explicitly install the TranscodingStreams.jl package unless you will use lower-level interfaces of it. Each codec package defines some codec types, which is a subtype of TranscodingStreams.Codec, and their corresponding transcoding stream aliases. These aliases are partially instantiated by a codec type; for example, GzipDecompressionStream{S} is an alias of TranscodingStream{GzipDecompressor,S}, where S is a subtype of IO.","category":"page"},{"location":"","page":"Home","title":"Home","text":"\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PackageLibraryFormatCodecStream aliasDescription
CodecZlib.jlzlibRFC1952GzipCompressorGzipCompressorStreamCompress data in gzip (.gz) format.
GzipDecompressorGzipDecompressorStreamDecompress data in gzip (.gz) format.
RFC1950ZlibCompressorZlibCompressorStreamCompress data in zlib format.
ZlibDecompressorZlibDecompressorStreamDecompress data in zlib format.
RFC1951DeflateCompressorDeflateCompressorStreamCompress data in deflate format.
DeflateDecompressorDeflateDecompressorStreamDecompress data in deflate format.
CodecXz.jlxzThe .xz File FormatXzCompressorXzCompressorStreamCompress data in xz (.xz) format.
XzDecompressorXzDecompressorStreamDecompress data in xz (.xz) format.
CodecZstd.jlzstdZstandard Compression FormatZstdCompressorZstdCompressorStreamCompress data in zstd (.zst) format.
ZstdDecompressorZstdDecompressorStreamDecompress data in zstd (.zst) format.
CodecBase.jlnativeRFC4648Base16EncoderBase16EncoderStreamEncode binary in base16 format.
Base16DecoderBase16DecoderStreamDecode binary in base16 format.
Base32EncoderBase32EncoderStreamEncode binary in base32 format.
Base32DecoderBase32DecoderStreamDecode binary in base32 format.
Base64EncoderBase64EncoderStreamEncode binary in base64 format.
Base64DecoderBase64DecoderStreamDecode binary in base64 format.
CodecBzip2.jlbzip2Bzip2CompressorBzip2CompressorStreamCompress data in bzip2 (.bz2) format.
Bzip2DecompressorBzip2DecompressorStreamDecompress data in bzip2 (.bz2) format.
","category":"page"},{"location":"#Notes","page":"Home","title":"Notes","text":"","category":"section"},{"location":"#Wrapped-streams","page":"Home","title":"Wrapped streams","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.","category":"page"},{"location":"#Error-handling","page":"Home","title":"Error handling","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.","category":"page"},{"location":"devnotes/#Developer's-notes","page":"Developer's notes","title":"Developer's notes","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"These notes are not for end users but rather for developers who are interested in the design of the package.","category":"page"},{"location":"devnotes/#TranscodingStream-type","page":"Developer's notes","title":"TranscodingStream type","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"TranscodingStream{C,S} (defined in src/stream.jl) has three fields:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"codec: data codec (<:C where C<:Codec)\nstream: data stream (<:S where S<:IO)\nstate: current state (<:State).","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A codec will be implemented by package developers and only a special codec Noop is defined in this package. A stream can be any object that implements at least Base.isopen, Base.eof, Base.close, Base.bytesavailable, Base.unsafe_read, and Base.unsafe_write. All mutable fields are delegated to state and hence the stream type itself is immutable.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A stream has two buffers in the state field. These are used to store pre-transcoded and transcoded data in the stream. The stream passes references of these two buffers to the codec when processing data. The following diagram illustrates the flow of data:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"When reading data (`state.mode == :read`):\n user <--- |state.buffer1| <--- <--- |state.buffer2| <--- stream\n\nWhen writing data (`state.mode == :write`):\n user ---> |state.buffer1| ---> ---> |state.buffer2| ---> stream","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"State (defined in src/state.jl) has five fields:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"mode: current stream mode (<:Symbol)\ncode: return code of the last codec's method call (<:Symbol)\nerror: exception returned by the codec (<:Error)\nbuffer1: data buffer that is closer to the user (<:Buffer)\nbuffer2: data buffer that is farther to the user (<:Buffer)\nbytes_written_out: number of bytes written to the underlying stream (<:Int64)","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"The mode field may be one of the following value:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":":idle : initial and intermediate mode, no buffered data\n:read : being ready to read data, data may be buffered\n:write: being ready to write data, data may be buffered\n:stop : transcoding is stopped after read, data may be buffered\n:close: closed, no buffered data\n:panic: an exception has been thrown in codec, data may be buffered but we cannot do anything","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"The initial mode is :idle and mode transition happens as shown in the following diagram: (Image: Mode transition)","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.","category":"page"},{"location":"devnotes/#Shared-buffers","page":"Developer's notes","title":"Shared buffers","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"If buffer2 is shared it is considered to be owned by the underlying stream by the stats and position functions.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"readdata!(input::IO, output::Buffer) and flush_buffer2(stream::TranscodingStream) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.","category":"page"},{"location":"devnotes/#Noop-codec","page":"Developer's notes","title":"Noop codec","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.","category":"page"}] +[{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Read-lines-from-a-gzip-compressed-file","page":"Examples","title":"Read lines from a gzip-compressed file","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The following snippet is an example of using CodecZlib.jl, which exports GzipDecompressorStream{S} as an alias of TranscodingStream{GzipDecompressor,S}, where S is a subtype of IO:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nstream = GzipDecompressorStream(open(\"data.txt.gz\"))\nfor line in eachline(stream)\n # do something...\nend\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Note that the last close call closes the wrapped file as well. Alternatively, open(, ) do ... end syntax closes the file at the end:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nopen(GzipDecompressorStream, \"data.txt.gz\") do stream\n for line in eachline(stream)\n # do something...\n end\nend","category":"page"},{"location":"examples/#Read-compressed-data-from-a-pipe","page":"Examples","title":"Read compressed data from a pipe","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The input is not limited to usual files. You can read data from a pipe (actually, any IO object that implements standard I/O methods) as follows:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nproc = open(`cat some.data.gz`)\nstream = GzipDecompressorStream(proc)\nfor line in eachline(stream)\n # do something...\nend\nclose(stream) # This will finish the process as well.","category":"page"},{"location":"examples/#Save-a-data-matrix-with-Zstd-compression","page":"Examples","title":"Save a data matrix with Zstd compression","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Writing compressed data is easy. One thing you need to keep in mind is to call close after writing data; otherwise, the output file will be incomplete:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing DelimitedFiles\nmat = randn(100, 100)\nstream = ZstdCompressorStream(open(\"data.mat.zst\", \"w\"))\nwritedlm(stream, mat)\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Of course, open(, ...) do ... end just works:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing DelimitedFiles\nmat = randn(100, 100)\nopen(ZstdCompressorStream, \"data.mat.zst\", \"w\") do stream\n writedlm(stream, mat)\nend","category":"page"},{"location":"examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END","page":"Examples","title":"Explicitly finish transcoding by writing TOKEN_END","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"When writing data, the end of a data stream is indicated by calling close, which writes an epilogue if necessary and flushes all buffered data to the underlying I/O stream. If you want to explicitly specify the end of a data chunk for some reason, you can write TranscodingStreams.TOKEN_END to the transcoding stream, which finishes the current transcoding process without closing the underlying stream:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing TranscodingStreams\nbuf = IOBuffer()\nstream = ZstdCompressorStream(buf)\nwrite(stream, \"foobarbaz\"^100, TranscodingStreams.TOKEN_END)\nflush(stream)\ncompressed = take!(buf)\nclose(stream)","category":"page"},{"location":"examples/#Use-a-noop-codec","page":"Examples","title":"Use a noop codec","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"The Noop codec does nothing (i.e., buffering data without transformation). NoopStream is an alias of TranscodingStream{Noop}. The following example creates a decompressor stream based on the extension of a filepath:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing CodecXz\nusing TranscodingStreams\n\nfunction makestream(filepath)\n if endswith(filepath, \".gz\")\n codec = GzipDecompressor()\n elseif endswith(filepath, \".xz\")\n codec = XzDecompressor()\n else\n codec = Noop()\n end\n return TranscodingStream(codec, open(filepath))\nend\n\nmakestream(\"data.txt.gz\")\nmakestream(\"data.txt.xz\")\nmakestream(\"data.txt\")","category":"page"},{"location":"examples/#Change-the-codec-of-a-file","page":"Examples","title":"Change the codec of a file","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams are composable: a stream can be an input/output of another stream. You can use this to change the format of a file by composing different codecs as below:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing CodecZstd\n\ninput = open(\"data.txt.gz\", \"r\")\noutput = open(\"data.txt.zst\", \"w\")\n\nstream = GzipDecompressorStream(ZstdCompressorStream(output))\nwrite(stream, input)\nclose(stream)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Effectively, this is equivalent to the following pipeline:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"cat data.txt.gz | gzip -d | zstd >data.txt.zst","category":"page"},{"location":"examples/#Stop-decoding-on-the-end-of-a-block","page":"Examples","title":"Stop decoding on the end of a block","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"Many codecs support decoding concatenated data blocks (or chunks). For example, if you concatenate two gzip files into a single file and read it using GzipDecompressorStream, you will see the byte stream of concatenation of the two files. If you need the part corresponding the first file, you can set stop_on_end to true to stop transcoding at the end of the first block. Note that setting stop_on_end to true does not close the wrapped stream because you will often want to reuse it.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\n# cat foo.txt.gz bar.txt.gz > foobar.txt.gz\nstream = GzipDecompressorStream(open(\"foobar.txt.gz\"), stop_on_end=true)\nread(stream) #> the content of foo.txt\neof(stream) #> true","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"In the case where you need to reuse the wrapped stream, the code above must be slightly modified because the transcoding stream may read more bytes than necessary from the wrapped stream. Wrapping the stream with NoopStream solves the problem because any extra data read after the end of the chunk will be stored back in the internal buffer of the wrapped transcoding stream.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\nusing TranscodingStreams\nstream = NoopStream(open(\"foobar.txt.gz\"))\nread(GzipDecompressorStream(stream, stop_on_end=true)) #> the content of foo.txt\nread(GzipDecompressorStream(stream, stop_on_end=true)) #> the content of bar.txt","category":"page"},{"location":"examples/#Check-I/O-statistics","page":"Examples","title":"Check I/O statistics","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams.stats returns a snapshot of the I/O statistics. For example, the following function shows progress of decompression to the standard error:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\n\nfunction decompress(input, output)\n buffer = Vector{UInt8}(undef, 16 * 1024)\n GC.@preserve buffer while !eof(input)\n n = min(bytesavailable(input), length(buffer))\n unsafe_read(input, pointer(buffer), n)\n unsafe_write(output, pointer(buffer), n)\n stats = TranscodingStreams.stats(input)\n print(STDERR, \"\\rin: $(stats.in), out: $(stats.out)\")\n end\n println(STDERR)\nend\n\ninput = GzipDecompressorStream(open(\"foobar.txt.gz\"))\noutput = IOBuffer()\ndecompress(input, output)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"stats.in is the number of bytes supplied to the stream and stats.out is the number of bytes consumed out of the stream.","category":"page"},{"location":"examples/#Transcode-data-in-one-shot","page":"Examples","title":"Transcode data in one shot","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStreams.jl extends the transcode function to transcode a data in one shot. transcode takes a codec object as its first argument and a data vector as its second argument:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZlib\ndecompressed = transcode(ZlibDecompressor, b\"x\\x9cKL*JLNLI\\x04R\\x00\\x19\\xf2\\x04U\")\nString(decompressed)","category":"page"},{"location":"examples/#Transcode-lots-of-strings","page":"Examples","title":"Transcode lots of strings","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"transcode(, data) method is convenient but suboptimal when transcoding a number of objects. This is because the method reallocates a new codec object for every call. Instead, you can use transcode(, data) method that reuses the allocated object as follows. In this usage, you need to explicitly allocate and free resources by calling TranscodingStreams.initialize and TranscodingStreams.finalize, respectively.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using CodecZstd\nusing TranscodingStreams\nstrings = [\"foo\", \"bar\", \"baz\"]\ncodec = ZstdCompressor()\nTranscodingStreams.initialize(codec) # allocate resources\ntry\n for s in strings\n data = transcode(codec, s)\n # do something...\n end\ncatch\n rethrow()\nfinally\n TranscodingStreams.finalize(codec) # free resources\nend","category":"page"},{"location":"examples/#Unread-data","page":"Examples","title":"Unread data","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"TranscodingStream supports unread operation, which inserts data into the current reading position. This is useful when you want to peek from the stream. TranscodingStreams.unread and TranscodingStreams.unsafe_unread functions are provided:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using TranscodingStreams\nstream = NoopStream(open(\"data.txt\"))\ndata1 = read(stream, 8)\nTranscodingStreams.unread(stream, data1)\ndata2 = read(stream, 8)\n@assert data1 == data2","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.","category":"page"},{"location":"reference/#Reference","page":"Reference","title":"Reference","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"CurrentModule = TranscodingStreams","category":"page"},{"location":"reference/#TranscodingStream","page":"Reference","title":"TranscodingStream","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStream(codec::Codec, stream::IO)\ntranscode\nTranscodingStreams.unsafe_transcode!\n\nTranscodingStreams.transcode!\nTranscodingStreams.TOKEN_END\nTranscodingStreams.unsafe_read\nTranscodingStreams.unread\nTranscodingStreams.unsafe_unread\nBase.position(stream::TranscodingStream)\nBase.skip","category":"page"},{"location":"reference/#TranscodingStreams.TranscodingStream-Tuple{TranscodingStreams.Codec, IO}","page":"Reference","title":"TranscodingStreams.TranscodingStream","text":"TranscodingStream(codec::Codec, stream::IO;\n bufsize::Integer=16384,\n stop_on_end::Bool=false,\n sharedbuf::Bool=(stream isa TranscodingStream))\n\nCreate a transcoding stream with codec and stream.\n\nA TranscodingStream object wraps an input/output stream object stream, and transcodes the byte stream using codec. It is a subtype of IO and supports most of the I/O functions in the standard library.\n\nSee the docs (https://bicycle1885.github.io/TranscodingStreams.jl/stable/) for available codecs, examples, and more details of the type.\n\nArguments\n\ncodec: The data transcoder. The transcoding stream does the initialization and finalization of codec. Therefore, a codec object is not reusable once it is passed to a transcoding stream.\nstream: The wrapped stream. It must be opened before passed to the constructor.\nbufsize: The initial buffer size (the default size is 16KiB). The buffer may be extended whenever codec requests so.\nstop_on_end: The flag to stop reading on :end return code from codec. The transcoded data are readable even after stopping transcoding process. With this flag on, stream is not closed when the wrapper stream is closed with close. Note that if reading some extra data may be read from stream into an internal buffer, and thus stream must be a TranscodingStream object and sharedbuf must be true to reuse stream.\nsharedbuf: The flag to share buffers between adjacent transcoding streams. The value must be false if stream is not a TranscodingStream object.\n\nExamples\n\njulia> using TranscodingStreams\n\njulia> file = open(joinpath(dirname(dirname(pathof(TranscodingStreams))), \"README.md\"));\n\njulia> stream = TranscodingStream(Noop(), file);\n\njulia> readline(file)\n\"TranscodingStreams.jl\"\n\njulia> close(stream)\n\n\n\n\n\n","category":"method"},{"location":"reference/#Base.transcode","page":"Reference","title":"Base.transcode","text":"transcode(\n ::Type{C},\n data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},\n)::Vector{UInt8} where {C<:Codec}\n\nTranscode data by applying a codec C().\n\nNote that this method does allocation and deallocation of C() in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data) is a recommended method in terms of performance.\n\nExamples\n\njulia> using CodecZlib\n\njulia> data = b\"abracadabra\";\n\njulia> compressed = transcode(ZlibCompressor, data);\n\njulia> decompressed = transcode(ZlibDecompressor, compressed);\n\njulia> String(decompressed)\n\"abracadabra\"\n\n\n\n\n\ntranscode(\n codec::Codec,\n data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},\n [output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],\n)::Vector{UInt8}\n\nTranscode data by applying codec.\n\nIf output is unspecified, then this method will allocate it.\n\nNote that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\nExamples\n\njulia> using CodecZlib\n\njulia> data = b\"abracadabra\";\n\njulia> codec = ZlibCompressor();\n\njulia> TranscodingStreams.initialize(codec)\n\njulia> compressed = Vector{UInt8}()\n\njulia> transcode(codec, data, compressed);\n\njulia> TranscodingStreams.finalize(codec)\n\njulia> codec = ZlibDecompressor();\n\njulia> TranscodingStreams.initialize(codec)\n\njulia> decompressed = transcode(codec, compressed);\n\njulia> TranscodingStreams.finalize(codec)\n\njulia> String(decompressed)\n\"abracadabra\"\n\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unsafe_transcode!","page":"Reference","title":"TranscodingStreams.unsafe_transcode!","text":"unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)\n\nTranscode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.transcode!","page":"Reference","title":"TranscodingStreams.transcode!","text":"transcode!(output::Buffer, codec::Codec, input::Buffer)\n\nTranscode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.TOKEN_END","page":"Reference","title":"TranscodingStreams.TOKEN_END","text":"A special token indicating the end of data.\n\nTOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.\n\nnote: Note\nCall flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.\n\n\n\n\n\n","category":"constant"},{"location":"reference/#TranscodingStreams.unsafe_read","page":"Reference","title":"TranscodingStreams.unsafe_read","text":"unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int\n\nCopy at most nbytes from input into output.\n\nThis function is similar to Base.unsafe_read but is different in some points:\n\nIt does not throw EOFError when it fails to read nbytes from input.\nIt returns the number of bytes written to output.\nIt does not block if there are buffered data in input.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unread","page":"Reference","title":"TranscodingStreams.unread","text":"unread(stream::TranscodingStream, data::AbstractVector{UInt8})\n\nInsert data to the current reading position of stream.\n\nThe next read(stream, sizeof(data)) call will read data that are just inserted.\n\ndata must not alias any internal buffers in stream\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.unsafe_unread","page":"Reference","title":"TranscodingStreams.unsafe_unread","text":"unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)\n\nInsert nbytes pointed by data to the current reading position of stream.\n\nThe data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.\n\ndata must not alias any internal buffers in stream\n\n\n\n\n\n","category":"function"},{"location":"reference/#Base.position-Tuple{TranscodingStream}","page":"Reference","title":"Base.position","text":"position(stream::TranscodingStream)\n\nReturn the number of bytes read from or written to stream.\n\nNote that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.\n\n\n\n\n\n","category":"method"},{"location":"reference/#Base.skip","page":"Reference","title":"Base.skip","text":"skip(stream::TranscodingStream, offset)\n\nRead bytes from stream until offset bytes have been read or eof(stream) is reached.\n\nReturn stream, discarding read bytes.\n\nThis function will not throw an EOFError if eof(stream) is reached before offset bytes can be read.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Statistics","page":"Reference","title":"Statistics","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Stats\nTranscodingStreams.stats","category":"page"},{"location":"reference/#TranscodingStreams.Stats","page":"Reference","title":"TranscodingStreams.Stats","text":"I/O statistics.\n\nIts object has four fields:\n\nin: the number of bytes supplied into the stream\nout: the number of bytes consumed out of the stream\ntranscoded_in: the number of bytes transcoded from the input buffer\ntranscoded_out: the number of bytes transcoded to the output buffer\n\nNote that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.stats","page":"Reference","title":"TranscodingStreams.stats","text":"stats(stream::TranscodingStream)\n\nCreate an I/O statistics object of stream.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Codec","page":"Reference","title":"Codec","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Noop\nTranscodingStreams.NoopStream\nBase.position(::NoopStream)","category":"page"},{"location":"reference/#TranscodingStreams.Noop","page":"Reference","title":"TranscodingStreams.Noop","text":"Noop()\n\nCreate a noop codec.\n\nNoop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.\n\nThe implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.NoopStream","page":"Reference","title":"TranscodingStreams.NoopStream","text":"NoopStream(stream::IO)\n\nCreate a noop stream.\n\n\n\n\n\n","category":"type"},{"location":"reference/#Base.position-Tuple{NoopStream}","page":"Reference","title":"Base.position","text":"position(stream::NoopStream)\n\nGet the current poition of stream.\n\nNote that this method may return a wrong position when\n\nsome data have been inserted by TranscodingStreams.unread, or\nthe position of the wrapped stream has been changed outside of this package.\n\n\n\n\n\n","category":"method"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Codec\nTranscodingStreams.expectedsize\nTranscodingStreams.pledgeinsize\nTranscodingStreams.minoutsize\nTranscodingStreams.initialize\nTranscodingStreams.finalize\nTranscodingStreams.startproc\nTranscodingStreams.process","category":"page"},{"location":"reference/#TranscodingStreams.Codec","page":"Reference","title":"TranscodingStreams.Codec","text":"An abstract codec type.\n\nAny codec supporting the transcoding protocol must be a subtype of this type.\n\nTranscoding protocol\n\nTranscoding proceeds by calling some functions in a specific way. We call this \"transcoding protocol\" and any codec must implement it as described below.\n\nThere are six functions for a codec to implement:\n\nexpectedsize: return the expected size of transcoded data\npledgeinsize: tell the codec the total input size\nminoutsize: return the minimum output size of process\ninitialize: initialize the codec\nfinalize: finalize the codec\nstartproc: start processing with the codec\nprocess: process data with the codec.\n\nThese are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, pledgeinsize, initialize, finalize, and startproc have a default implementation.\n\nYour codec type is denoted by C and its object by codec.\n\nErrors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.\n\nexpectedsize\n\nThe expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.\n\npledgeinsize\n\nThe pledgeinsize(codec::C, insize::Int64, error::Error)::Symbol method is used when transcode is called to tell the codec the total input size. This is called after startproc and before process. Some compressors can add this total input size to a header, making expectedsize accurate during later decompression. By default this just returns :ok. If there is an error, the return code must be :error and the error argument must be set to an exception object. Setting an inaccurate insize may cause the codec to error later on while processing data. A negative insize means unknown content size.\n\nminoutsize\n\nThe minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.\n\ninitialize\n\nThe initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.\n\nfinalize\n\nThe finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.\n\nstartproc\n\nThe startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.\n\nprocess\n\nThe process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.expectedsize","page":"Reference","title":"TranscodingStreams.expectedsize","text":"expectedsize(codec::Codec, input::Memory)::Int\n\nReturn the expected size of the transcoded input with codec.\n\nThe default method returns input.size.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.pledgeinsize","page":"Reference","title":"TranscodingStreams.pledgeinsize","text":"pledgeinsize(codec::Codec, insize::Int64, error::Error)::Symbol\n\nTell the codec the total input size.\n\nThe default method does nothing and returns :ok.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.minoutsize","page":"Reference","title":"TranscodingStreams.minoutsize","text":"minoutsize(codec::Codec, input::Memory)::Int\n\nReturn the minimum output size to be ensured when calling process.\n\nThe default method returns max(1, div(input.size, 4)).\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.initialize","page":"Reference","title":"TranscodingStreams.initialize","text":"initialize(codec::Codec)::Void\n\nInitialize codec.\n\nThe default method does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.finalize","page":"Reference","title":"TranscodingStreams.finalize","text":"finalize(codec::Codec)::Void\n\nFinalize codec.\n\nThe default method does nothing.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.startproc","page":"Reference","title":"TranscodingStreams.startproc","text":"startproc(codec::Codec, mode::Symbol, error::Error)::Symbol\n\nStart data processing with codec of mode.\n\nThe default method does nothing and returns :ok.\n\n\n\n\n\n","category":"function"},{"location":"reference/#TranscodingStreams.process","page":"Reference","title":"TranscodingStreams.process","text":"process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}\n\nDo data processing with codec.\n\nThere is no default method.\n\n\n\n\n\n","category":"function"},{"location":"reference/#Internal-types","page":"Reference","title":"Internal types","text":"","category":"section"},{"location":"reference/","page":"Reference","title":"Reference","text":"TranscodingStreams.Memory\nTranscodingStreams.Error\nTranscodingStreams.State","category":"page"},{"location":"reference/#TranscodingStreams.Memory","page":"Reference","title":"TranscodingStreams.Memory","text":"A contiguous memory.\n\nThis type works like a Vector method.\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.Error","page":"Reference","title":"TranscodingStreams.Error","text":"Container of transcoding error.\n\nAn object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException(\"error!\")).\n\n\n\n\n\n","category":"type"},{"location":"reference/#TranscodingStreams.State","page":"Reference","title":"TranscodingStreams.State","text":"A mutable state type of transcoding streams.\n\nSee Developer's notes for details.\n\n\n\n\n\n","category":"type"},{"location":"migrating/#Migration","page":"Migration","title":"Migration","text":"","category":"section"},{"location":"migrating/#How-to-migrate-from-v0.10-to-v0.11","page":"Migration","title":"How to migrate from v0.10 to v0.11","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"v0.11 has a few subtle breaking changes to eof and seekend.","category":"page"},{"location":"migrating/#Memory(data::ByteData)","page":"Migration","title":"Memory(data::ByteData)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"The Memory(data::ByteData) constructor was removed. Use Memory(pointer(data), sizeof(data)) instead.","category":"page"},{"location":"migrating/#seekend(stream::TranscodingStream)","page":"Migration","title":"seekend(stream::TranscodingStream)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"Generic seekend for TranscodingStream was removed. If the objective is to discard all remaining data in the stream, use skip(stream, typemax(Int64)) instead where typemax(Int64) is meant to be a large number to exhaust the stream. Ideally, specific implementations of TranscodingStream will implement seekend only if efficient means exist to avoid fully processing the stream. NoopStream still supports seekend.","category":"page"},{"location":"migrating/","page":"Migration","title":"Migration","text":"The previous behavior of the generic seekend was something like (seekstart(stream); seekend(stream.stream); stream) but this led to inconsistencies with the position of the stream.","category":"page"},{"location":"migrating/#eof(stream::TranscodingStream)","page":"Migration","title":"eof(stream::TranscodingStream)","text":"","category":"section"},{"location":"migrating/","page":"Migration","title":"Migration","text":"eof now throws an error if called on a stream that is closed or in writing mode. Use !isreadable(stream) || eof(stream) if you need to more closely match previous behavior.","category":"page"},{"location":"#Home","page":"Home","title":"Home","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: TranscodingStream)","category":"page"},{"location":"#Overview","page":"Home","title":"Overview","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TranscodingStreams.jl is a package for transcoding data streams. Transcoding may be compression, decompression, ASCII encoding, and any other codec. The package exports a data type TranscodingStream, which is a subtype of IO and wraps other IO object to transcode data read from or written to the wrapped stream.","category":"page"},{"location":"","page":"Home","title":"Home","text":"In this page, we introduce the basic concepts of TranscodingStreams.jl and currently available packages. The Examples page demonstrates common usage. The Reference page offers a comprehensive API document.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"TranscodingStream has two type parameters, C<:Codec and S<:IO, and hence the concrete data type is written as TranscodingStream{C<:Codec,S<:IO}. This type wraps an underlying I/O stream S by a transcoding codec C. C and S are orthogonal and hence you can use any combination of these two types. The underlying stream may be any stream that supports I/O operations defined by the Base module. For example, it may be IOStream, TTY, IOBuffer, or TranscodingStream. The codec C must define the transcoding protocol defined in this package. We already have various codecs in packages listed below. Of course, you can define your own codec by implementing the transcoding protocol described in TranscodingStreams.Codec.","category":"page"},{"location":"","page":"Home","title":"Home","text":"You can install codec packages using the standard package manager. These codec packages are independent of each other and can be installed separately. You won't need to explicitly install the TranscodingStreams.jl package unless you will use lower-level interfaces of it. Each codec package defines some codec types, which is a subtype of TranscodingStreams.Codec, and their corresponding transcoding stream aliases. These aliases are partially instantiated by a codec type; for example, GzipDecompressionStream{S} is an alias of TranscodingStream{GzipDecompressor,S}, where S is a subtype of IO.","category":"page"},{"location":"","page":"Home","title":"Home","text":"\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
PackageLibraryFormatCodecStream aliasDescription
CodecZlib.jlzlibRFC1952GzipCompressorGzipCompressorStreamCompress data in gzip (.gz) format.
GzipDecompressorGzipDecompressorStreamDecompress data in gzip (.gz) format.
RFC1950ZlibCompressorZlibCompressorStreamCompress data in zlib format.
ZlibDecompressorZlibDecompressorStreamDecompress data in zlib format.
RFC1951DeflateCompressorDeflateCompressorStreamCompress data in deflate format.
DeflateDecompressorDeflateDecompressorStreamDecompress data in deflate format.
CodecXz.jlxzThe .xz File FormatXzCompressorXzCompressorStreamCompress data in xz (.xz) format.
XzDecompressorXzDecompressorStreamDecompress data in xz (.xz) format.
CodecZstd.jlzstdZstandard Compression FormatZstdCompressorZstdCompressorStreamCompress data in zstd (.zst) format.
ZstdDecompressorZstdDecompressorStreamDecompress data in zstd (.zst) format.
CodecBase.jlnativeRFC4648Base16EncoderBase16EncoderStreamEncode binary in base16 format.
Base16DecoderBase16DecoderStreamDecode binary in base16 format.
Base32EncoderBase32EncoderStreamEncode binary in base32 format.
Base32DecoderBase32DecoderStreamDecode binary in base32 format.
Base64EncoderBase64EncoderStreamEncode binary in base64 format.
Base64DecoderBase64DecoderStreamDecode binary in base64 format.
CodecBzip2.jlbzip2Bzip2CompressorBzip2CompressorStreamCompress data in bzip2 (.bz2) format.
Bzip2DecompressorBzip2DecompressorStreamDecompress data in bzip2 (.bz2) format.
","category":"page"},{"location":"#Notes","page":"Home","title":"Notes","text":"","category":"section"},{"location":"#Wrapped-streams","page":"Home","title":"Wrapped streams","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.","category":"page"},{"location":"#Error-handling","page":"Home","title":"Error handling","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.","category":"page"},{"location":"devnotes/#Developer's-notes","page":"Developer's notes","title":"Developer's notes","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"These notes are not for end users but rather for developers who are interested in the design of the package.","category":"page"},{"location":"devnotes/#TranscodingStream-type","page":"Developer's notes","title":"TranscodingStream type","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"TranscodingStream{C,S} (defined in src/stream.jl) has three fields:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"codec: data codec (<:C where C<:Codec)\nstream: data stream (<:S where S<:IO)\nstate: current state (<:State).","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A codec will be implemented by package developers and only a special codec Noop is defined in this package. A stream can be any object that implements at least Base.isopen, Base.eof, Base.close, Base.bytesavailable, Base.unsafe_read, and Base.unsafe_write. All mutable fields are delegated to state and hence the stream type itself is immutable.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A stream has two buffers in the state field. These are used to store pre-transcoded and transcoded data in the stream. The stream passes references of these two buffers to the codec when processing data. The following diagram illustrates the flow of data:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"When reading data (`state.mode == :read`):\n user <--- |state.buffer1| <--- <--- |state.buffer2| <--- stream\n\nWhen writing data (`state.mode == :write`):\n user ---> |state.buffer1| ---> ---> |state.buffer2| ---> stream","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"State (defined in src/state.jl) has five fields:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"mode: current stream mode (<:Symbol)\ncode: return code of the last codec's method call (<:Symbol)\nerror: exception returned by the codec (<:Error)\nbuffer1: data buffer that is closer to the user (<:Buffer)\nbuffer2: data buffer that is farther to the user (<:Buffer)\nbytes_written_out: number of bytes written to the underlying stream (<:Int64)","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"The mode field may be one of the following value:","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":":idle : initial and intermediate mode, no buffered data\n:read : being ready to read data, data may be buffered\n:write: being ready to write data, data may be buffered\n:stop : transcoding is stopped after read, data may be buffered\n:close: closed, no buffered data\n:panic: an exception has been thrown in codec, data may be buffered but we cannot do anything","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"The initial mode is :idle and mode transition happens as shown in the following diagram: (Image: Mode transition)","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.","category":"page"},{"location":"devnotes/#Shared-buffers","page":"Developer's notes","title":"Shared buffers","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"If buffer2 is shared it is considered to be owned by the underlying stream by the stats and position functions.","category":"page"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"readdata!(input::IO, output::Buffer) and flush_buffer2(stream::TranscodingStream) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.","category":"page"},{"location":"devnotes/#Noop-codec","page":"Developer's notes","title":"Noop codec","text":"","category":"section"},{"location":"devnotes/","page":"Developer's notes","title":"Developer's notes","text":"Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.","category":"page"}] }