diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index c6239512..8e4ceecc 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-10-17T18:54:52","documenter_version":"1.1.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-10-19T06:24:57","documenter_version":"1.1.1"}} \ No newline at end of file diff --git a/dev/devnotes/index.html b/dev/devnotes/index.html index 9ba153de..37049995 100644 --- a/dev/devnotes/index.html +++ b/dev/devnotes/index.html @@ -3,4 +3,4 @@ user <--- |state.buffer1| <--- <stream.codec> <--- |state.buffer2| <--- stream When writing data (`state.mode == :write`): - user ---> |state.buffer1| ---> <stream.codec> ---> |state.buffer2| ---> stream

In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.

State (defined in src/state.jl) has five fields:

The mode field may be one of the following value:

Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.

The initial mode is :idle and mode transition happens as shown in the following diagram: Mode transition

Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.

A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.

Shared buffers

Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.

readdata!(input::IO, output::Buffer) and writedata!(output::IO, input::Buffer) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.

Noop codec

Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.

+ user ---> |state.buffer1| ---> <stream.codec> ---> |state.buffer2| ---> stream

In the read mode, a user pull out data from state.buffer1 and pre-transcoded data are filled in state.buffer2. In the write mode, a user will push data into state.buffer1 and transcoded data are filled in state.buffer2. The default buffer size is 16KiB for each.

State (defined in src/state.jl) has five fields:

The mode field may be one of the following value:

Note that mode=:stop does not mean there is no data available in the stream. This is because transcoded data may be left in the buffer.

The initial mode is :idle and mode transition happens as shown in the following diagram: Mode transition

Modes surrounded by a bold circle are a state in which the transcoding stream has released resources by calling finalize(codec). The mode transition should happen in the changemode!(stream, newmode) function in src/stream.jl. Trying an undefined transition will thrown an exception.

A transition happens according to internal or external events of the transcoding stream. The status code and the error object returned by codec methods are internal events, and user's method calls are external events. For example, calling read(stream) will change the mode from :init to :read and then calling close(stream) will change the mode from :read to :close. When data processing fails in the codec, a codec will return :error and the stream will result in :panic.

Shared buffers

Adjacent transcoding streams may share their buffers. This will reduce memory allocation and eliminate data copy between buffers.

readdata!(input::IO, output::Buffer) and writedata!(output::IO, input::Buffer) do the actual work of read/write data from/to the underlying stream. These methods have a special pass for shared buffers.

Noop codec

Noop (NoopStream) is a codec that does nothing. It works as a buffering layer on top of the underlying stream. Since NoopStream does not need to have two distinct buffers, buffer1 and buffer2 in the State object are shared and some specialized methods are defined for the type. All of these are defined in src/noop.jl.

diff --git a/dev/examples/index.html b/dev/examples/index.html index 17f1ec76..74296862 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -102,4 +102,4 @@ data1 = read(stream, 8) TranscodingStreams.unread(stream, data1) data2 = read(stream, 8) -@assert data1 == data2

The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.

Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.

+@assert data1 == data2

The unread operation is different from the write operation in that the unreaded data are not written to the wrapped stream. The unreaded data are stored in the internal buffer of a transcoding stream.

Unfortunately, unwrite operation is not provided because there is no way to cancel write operations that are already committed to the wrapped stream.

diff --git a/dev/index.html b/dev/index.html index 34bf717e..ef1ff321 100644 --- a/dev/index.html +++ b/dev/index.html @@ -115,4 +115,4 @@ Bzip2DecompressorStream Decompress data in bzip2 (.bz2) format. -

Notes

Wrapped streams

The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.

Error handling

You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.

+

Notes

Wrapped streams

The wrapper stream takes care of the wrapped stream. Reading or writing data from or to the wrapped stream outside the management will result in unexpected behaviors. When you close the wrapped stream, you must call the close method of the wrapper stream, which releases allocated resources and closes the wrapped stream.

Error handling

You may encounter an error while processing data with this package. For example, your compressed data may be corrupted or truncated for some reason, and the decompressor cannot recover the original data. In such a case, the codec informs the stream of the error, and the stream goes to an unrecoverable mode. In this mode, the only possible operations are isopen and close. Other operations, such as read or write, will result in an argument error exception. Resources allocated by the codec will be released by the stream, and hence you must not call the finalizer of the codec.

diff --git a/dev/reference/index.html b/dev/reference/index.html index 8614426b..50eb2b1c 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -11,7 +11,7 @@ julia> readline(file) "TranscodingStreams.jl" -julia> close(stream)source
Base.transcodeFunction
transcode(
+julia> close(stream)
source
Base.transcodeFunction
transcode(
     ::Type{C},
     data::Union{Vector{UInt8},Base.CodeUnits{UInt8}},
 )::Vector{UInt8} where {C<:Codec}

Transcode data by applying a codec C().

Note that this method does allocation and deallocation of C() in every call, which is handy but less efficient when transcoding a number of objects. transcode(codec, data) is a recommended method in terms of performance.

Examples

julia> using CodecZlib
@@ -23,7 +23,7 @@
 julia> decompressed = transcode(ZlibDecompressor, compressed);
 
 julia> String(decompressed)
-"abracadabra"
source
transcode(
+"abracadabra"
source
transcode(
     codec::Codec,
     data::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer},
     [output::Union{Vector{UInt8},Base.CodeUnits{UInt8},Buffer}],
@@ -51,4 +51,4 @@
 
 julia> String(decompressed)
 "abracadabra"
-
source
TranscodingStreams.unsafe_transcode!Function
unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.transcode!Function
transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.TOKEN_ENDConstant

A special token indicating the end of data.

TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.

Note

Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.

source
TranscodingStreams.unsafe_readFunction
unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int

Copy at most nbytes from input into output.

This function is similar to Base.unsafe_read but is different in some points:

  • It does not throw EOFError when it fails to read nbytes from input.
  • It returns the number of bytes written to output.
  • It does not block if there are buffered data in input.
source
TranscodingStreams.unreadFunction
unread(stream::TranscodingStream, data::Vector{UInt8})

Insert data to the current reading position of stream.

The next read(stream, sizeof(data)) call will read data that are just inserted.

source
TranscodingStreams.unsafe_unreadFunction
unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)

Insert nbytes pointed by data to the current reading position of stream.

The data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.

source
Base.positionMethod
position(stream::TranscodingStream)

Return the number of bytes read from or written to stream.

Note that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.

source

Statistics

TranscodingStreams.StatsType

I/O statistics.

Its object has four fields:

  • in: the number of bytes supplied into the stream
  • out: the number of bytes consumed out of the stream
  • transcoded_in: the number of bytes transcoded from the input buffer
  • transcoded_out: the number of bytes transcoded to the output buffer

Note that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.

source
TranscodingStreams.statsFunction
stats(stream::TranscodingStream)

Create an I/O statistics object of stream.

source

Codec

TranscodingStreams.NoopType
Noop()

Create a noop codec.

Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.

The implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.

source
TranscodingStreams.NoopStreamType
NoopStream(stream::IO)

Create a noop stream.

source
Base.positionMethod
position(stream::NoopStream)

Get the current poition of stream.

Note that this method may return a wrong position when

  • some data have been inserted by TranscodingStreams.unread, or
  • the position of the wrapped stream has been changed outside of this package.
source
TranscodingStreams.CodecType

An abstract codec type.

Any codec supporting the transcoding protocol must be a subtype of this type.

Transcoding protocol

Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.

There are six functions for a codec to implement:

  • expectedsize: return the expected size of transcoded data
  • minoutsize: return the minimum output size of process
  • initialize: initialize the codec
  • finalize: finalize the codec
  • startproc: start processing with the codec
  • process: process data with the codec.

These are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, initialize, finalize, and startproc have a default implementation.

Your codec type is denoted by C and its object by codec.

Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.

expectedsize

The expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.

minoutsize

The minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.

initialize

The initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.

finalize

The finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.

startproc

The startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.

process

The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.

source
TranscodingStreams.expectedsizeFunction
expectedsize(codec::Codec, input::Memory)::Int

Return the expected size of the transcoded input with codec.

The default method returns input.size.

source
TranscodingStreams.minoutsizeFunction
minoutsize(codec::Codec, input::Memory)::Int

Return the minimum output size to be ensured when calling process.

The default method returns max(1, div(input.size, 4)).

source
TranscodingStreams.initializeFunction
initialize(codec::Codec)::Void

Initialize codec.

The default method does nothing.

source
TranscodingStreams.finalizeFunction
finalize(codec::Codec)::Void

Finalize codec.

The default method does nothing.

source
TranscodingStreams.startprocFunction
startproc(codec::Codec, mode::Symbol, error::Error)::Symbol

Start data processing with codec of mode.

The default method does nothing and returns :ok.

source
TranscodingStreams.processFunction
process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}

Do data processing with codec.

There is no default method.

source

Internal types

TranscodingStreams.MemoryType

A contiguous memory.

This type works like a Vector method.

source
TranscodingStreams.ErrorType

Container of transcoding error.

An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException("error!")).

source
TranscodingStreams.StateType

A mutable state type of transcoding streams.

See Developer's notes for details.

source
+source
TranscodingStreams.unsafe_transcode!Function
unsafe_transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output without validation of input or output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.transcode!Function
transcode!(output::Buffer, codec::Codec, input::Buffer)

Transcode input by applying codec and storing the results in output with validation of input and output. Note that this method does not initialize or finalize codec. This is efficient when you transcode a number of pieces of data, but you need to call TranscodingStreams.initialize and TranscodingStreams.finalize explicitly.

source
TranscodingStreams.TOKEN_ENDConstant

A special token indicating the end of data.

TOKEN_END may be written to a transcoding stream like write(stream, TOKEN_END), which will terminate the current transcoding block.

Note

Call flush(stream) after write(stream, TOKEN_END) to make sure that all data are written to the underlying stream.

source
TranscodingStreams.unsafe_readFunction
unsafe_read(input::IO, output::Ptr{UInt8}, nbytes::Int)::Int

Copy at most nbytes from input into output.

This function is similar to Base.unsafe_read but is different in some points:

  • It does not throw EOFError when it fails to read nbytes from input.
  • It returns the number of bytes written to output.
  • It does not block if there are buffered data in input.
source
TranscodingStreams.unreadFunction
unread(stream::TranscodingStream, data::Vector{UInt8})

Insert data to the current reading position of stream.

The next read(stream, sizeof(data)) call will read data that are just inserted.

source
TranscodingStreams.unsafe_unreadFunction
unsafe_unread(stream::TranscodingStream, data::Ptr, nbytes::Integer)

Insert nbytes pointed by data to the current reading position of stream.

The data are copied into the internal buffer and hence data can be safely used after the operation without interfering the stream.

source
Base.positionMethod
position(stream::TranscodingStream)

Return the number of bytes read from or written to stream.

Note that the returned value will be different from that of the underlying stream wrapped by stream. This is because stream buffers some data and the codec may change the length of data.

source

Statistics

TranscodingStreams.StatsType

I/O statistics.

Its object has four fields:

  • in: the number of bytes supplied into the stream
  • out: the number of bytes consumed out of the stream
  • transcoded_in: the number of bytes transcoded from the input buffer
  • transcoded_out: the number of bytes transcoded to the output buffer

Note that, since the transcoding stream does buffering, in is transcoded_in + {size of buffered data} and out is transcoded_out - {size of buffered data}.

source
TranscodingStreams.statsFunction
stats(stream::TranscodingStream)

Create an I/O statistics object of stream.

source

Codec

TranscodingStreams.NoopType
Noop()

Create a noop codec.

Noop (no operation) is a codec that does nothing. The data read from or written to the stream are kept as-is without any modification. This is often useful as a buffered stream or an identity element of a composition of streams.

The implementations are specialized for this codec. For example, a Noop stream uses only one buffer rather than a pair of buffers, which avoids copying data between two buffers and the throughput will be larger than a naive implementation.

source
TranscodingStreams.NoopStreamType
NoopStream(stream::IO)

Create a noop stream.

source
Base.positionMethod
position(stream::NoopStream)

Get the current poition of stream.

Note that this method may return a wrong position when

  • some data have been inserted by TranscodingStreams.unread, or
  • the position of the wrapped stream has been changed outside of this package.
source
TranscodingStreams.CodecType

An abstract codec type.

Any codec supporting the transcoding protocol must be a subtype of this type.

Transcoding protocol

Transcoding proceeds by calling some functions in a specific way. We call this "transcoding protocol" and any codec must implement it as described below.

There are six functions for a codec to implement:

  • expectedsize: return the expected size of transcoded data
  • minoutsize: return the minimum output size of process
  • initialize: initialize the codec
  • finalize: finalize the codec
  • startproc: start processing with the codec
  • process: process data with the codec.

These are defined in the TranscodingStreams and a new codec type must extend these methods if necessary. Implementing a process method is mandatory but others are optional. expectedsize, minoutsize, initialize, finalize, and startproc have a default implementation.

Your codec type is denoted by C and its object by codec.

Errors that occur in these methods are supposed to be unrecoverable and the stream will go to the panic mode. Only Base.isopen and Base.close are available in that mode.

expectedsize

The expectedsize(codec::C, input::Memory)::Int method takes codec and input, and returns the expected size of transcoded data. This method will be used as a hint to determine the size of a data buffer when transcode is called. A good hint will reduce the number of buffer resizing and hence result in better performance.

minoutsize

The minoutsize(codec::C, input::Memory)::Int method takes codec and input, and returns the minimum required size of the output memory when process is called. For example, an encoder of base64 will write at least four bytes to the output and hence it is reasonable to return 4 with this method.

initialize

The initialize(codec::C)::Void method takes codec and returns nothing. This is called once and only once before starting any data processing. Therefore, you may initialize codec (e.g. allocating memory needed to process data) with this method. If initialization fails for some reason, it may throw an exception and no other methods (including finalize) will be called. Therefore, you need to release the memory before throwing an exception.

finalize

The finalize(codec::C)::Void method takes codec and returns nothing. This is called once and only only once just before the transcoding stream goes to the close mode (i.e. when Base.close is called) or just after startproc or process throws an exception. Other errors that happen inside the stream (e.g. EOFError) will not call this method. Therefore, you may finalize codec (e.g. freeing memory) with this method. If finalization fails for some reason, it may throw an exception. You should release the allocated memory in codec before returning or throwing an exception in finalize because otherwise nobody cannot release the memory. Even when an exception is thrown while finalizing a stream, the stream will become the close mode for safety.

startproc

The startproc(codec::C, mode::Symbol, error::Error)::Symbol method takes codec, mode and error, and returns a status code. This is called just before the stream starts reading or writing data. mode is either :read or :write and then the stream starts reading or writing, respectively. The return code must be :ok if codec is ready to read or write data. Otherwise, it must be :error and the error argument must be set to an exception object.

process

The process(codec::C, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol} method takes codec, input, output and error, and returns a consumed data size, a produced data size and a status code. This is called repeatedly while processing data. The input (input) and output (output) data are a Memory object, which is a pointer to a contiguous memory region with size. You must read input data from input, transcode the bytes, and then write the output data to output. Finally you need to return the size of read data, the size of written data, and :ok status code so that the caller can know how many bytes are consumed and produced in the method. When transcoding reaches the end of a data stream, it is notified to this method by empty input. In that case, the method need to write the buffered data (if any) to output. If there is no data to write, the status code must be set to :end. The process method will be called repeatedly until it returns :end status code. If an error happens while processing data, the error argument must be set to an exception object and the return code must be :error.

source
TranscodingStreams.expectedsizeFunction
expectedsize(codec::Codec, input::Memory)::Int

Return the expected size of the transcoded input with codec.

The default method returns input.size.

source
TranscodingStreams.minoutsizeFunction
minoutsize(codec::Codec, input::Memory)::Int

Return the minimum output size to be ensured when calling process.

The default method returns max(1, div(input.size, 4)).

source
TranscodingStreams.initializeFunction
initialize(codec::Codec)::Void

Initialize codec.

The default method does nothing.

source
TranscodingStreams.finalizeFunction
finalize(codec::Codec)::Void

Finalize codec.

The default method does nothing.

source
TranscodingStreams.startprocFunction
startproc(codec::Codec, mode::Symbol, error::Error)::Symbol

Start data processing with codec of mode.

The default method does nothing and returns :ok.

source
TranscodingStreams.processFunction
process(codec::Codec, input::Memory, output::Memory, error::Error)::Tuple{Int,Int,Symbol}

Do data processing with codec.

There is no default method.

source

Internal types

TranscodingStreams.MemoryType

A contiguous memory.

This type works like a Vector method.

source
TranscodingStreams.ErrorType

Container of transcoding error.

An object of this type is used to notify the caller of an exception that happened inside a transcoding method. The error field is undefined at first but will be filled when data processing failed. The error should be set by calling the setindex! method (e.g. error[] = ErrorException("error!")).

source
TranscodingStreams.StateType

A mutable state type of transcoding streams.

See Developer's notes for details.

source