23 Sep 13:07

Trinkle23897

v0.3.0rc0

dcfcbb3

0.3.0rc0 Pre-release

Pre-release

This is a pre-release for testing anaconda.

Assets 4

08 Sep 13:38

Trinkle23897

v0.2.7

64af7ea

0.2.7

API Change

exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
add save_only_last_obs for replay buffer in order to save the memory. (#184)
remove default value in batch.split() and add merge_last argument (#185)
fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
add max_batchsize in onpolicy algorithms (#189)
keep only sumtree in segment tree implementation (#193)
add __contains__ and pop in batch: key in batch, batch.pop(key, deft) (#189)
remove dict return support for collector preprocess_fn (#189)
remove **kwargs in ReplayBuffer (#189)
add no_grad argument in collector.collect (#204)

Enhancement

add DQN Atari examples (#187)
change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
Numba acceleration for GAE, nstep, and segment tree (#193)
add policy.eval() in all test scripts' "watch performance" (#189)
add test_returns (both GAE and nstep) (#189)
improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
polish examples/box2d/bipedal_hardcore_sac.py (#207)

Bug fix

fix a bug in MAPolicy: buffer.rew = Batch() doesn't change buffer.rew (thanks mypy) (#207)
~~set policy.eval() before collector.collect (#204)~~ This is a bug
fix shape inconsistency for torch.Tensor in replay buffer (#189)
potential bugfix for subproc.wait (#189)
fix RecurrentActorProb (#189)
fix some incorrect type annotation (#189)
fix a bug in tictactoe set_eps (#193)
dirty fix for asyncVenv check_id test

Assets 3

19 Aug 07:21

Trinkle23897

v0.2.6

a9f9940

0.2.6

API Change

Replay buffer allows stack_num = 1 (#165)
add policy.update to enable post process and remove collector.sample (#180)
Remove collector.close and rename VectorEnv to DummyVectorEnv (#179)

Enhancement

Enable async simulation for all vector envs (#179)
Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
unify single-env and multi-env in collector (#157)
Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
Add ShmemVectorEnv implementation (#174)
Add Dueling DQN implementation (#170)
Add profile workflow (#143)
Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)

Bug fix

fix #162 of multi-dim action (#160)

Note: 0.3 is coming soon!

Assets 3

22 Jul 06:59

Trinkle23897

v0.2.5

bd9c3c7

0.2.5

New feature

Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)

Documentation

Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)

Bugfix

Fix inconsistent shape in A2CPolicy and PPOPolicy. Please be careful when dealing with log_prob (#155)
Fix list of tensors inside Batch, e.g., Batch(a=[np.zeros(3), torch.zeros(3)]) (#147)
Fix buffer update when stack_num > 0 (#154)
Remove useless kwargs

Assets 3

14 Jul 00:00

Trinkle23897

v0.2.4.post1

26fb874

0.2.4.post1

Several bug fix and enhancement:

remove deprecated API append (#126)
Batch.cat_ and Batch.stack_ is now working well with inconsistent keys (#130)
Batch.is_empty now correctly recognizes empty over empty Batch (#128)
reconstruct collector: remove multiple buffer case, change the internal data to Batch, and add reward_metric for MARL usage (#125)
add Batch.update to mimic dict.update (#128)

Assets 3

10 Jul 09:50

Trinkle23897

v0.2.4

47e8e26

0.2.4

Algorithm Implementation

n_step returns for all Q-learning based algorithms; (#51)
Auto alpha tuning in SAC (#80)
Reserve policy._state to support saving hidden states in replay buffer (#19)
Add sample_avail argument in ReplayBuffer to sample only available index in RNN training mode (#19)

New Feature

Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
Advanced slicing method of Batch (#106)
Batch(kwargs, copy=True) will perform a deep copy (#110)
Add random=True argument in collector.collect to perform sampling with random policy (#78)

API Change

Batch.append -> Batch.cat
Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
Add some pre-defined nets in tianshou.utils.net. Since we only define API instead of a class, we do not present it in tianshou.net. (#123)

Docs

Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html

Assets 3

01 Jun 01:50

Trinkle23897

v0.2.3

5f2c534

0.2.3

Enhancement

Multimodal obs (also support any type obs) (#38, #69)
Batch over Batch
preprocess_fn (#42)
Type annotation
batch.to_torch, batch.to_numpy
pickle support for batch

Fixed Bugs

SAC/PPO diag gaussian
PPO orthogonal init
DQN zero eps
Fix type infer in replay buffer

Assets 3

26 Apr 07:25

Trinkle23897

v0.2.2

6b96f12

0.2.2

Algorithm Implementation

Generalized Advantage Estimation (GAE);
Update PPO algorithm with arXiv:1811.02553 and arXiv:1912.09729;
Vanilla Imitation Learning (BC & DA, with continuous/discrete action space);
Prioritized DQN;
RNN-style policy network;
Fix SAC with torch==1.5.0

API change

change __call__ to forward in policy;
Add save_fn in trainer;
Add __repr__ in tianshou.data, e.g. print(buffer)

Assets 3

07 Apr 03:52

Trinkle23897

v0.2.1

d9d2763

0.2.1

First version with full documentation.
Support algorithms: DQN/VPG/A2C/DDPG/PPO/TD3/SAC

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Change

Enhancement

Bug fix

API Change

Enhancement

Bug fix

New feature

Documentation

Bugfix

Algorithm Implementation

New Feature

API Change

Docs

Enhancement

Fixed Bugs

Algorithm Implementation

API change

Releases: thu-ml/tianshou

0.3.0rc0

0.2.7

API Change

Enhancement

Bug fix

0.2.6

API Change

Enhancement

Bug fix

0.2.5

New feature

Documentation

Bugfix

0.2.4.post1

0.2.4

Algorithm Implementation

New Feature

API Change

Docs

0.2.3

Enhancement

Fixed Bugs

0.2.2

Algorithm Implementation

API change

0.2.1