Releases: thu-ml/tianshou
Releases · thu-ml/tianshou
0.3.0rc0
This is a pre-release for testing anaconda.
0.2.7
API Change
- exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (#184)
- add
save_only_last_obs
for replay buffer in order to save the memory. (#184) - remove default value in batch.split() and add merge_last argument (#185)
- fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard (#189)
- add max_batchsize in onpolicy algorithms (#189)
- keep only sumtree in segment tree implementation (#193)
- add
__contains__
andpop
in batch:key in batch
,batch.pop(key, deft)
(#189) - remove dict return support for collector preprocess_fn (#189)
- remove
**kwargs
in ReplayBuffer (#189) - add no_grad argument in collector.collect (#204)
Enhancement
- add DQN Atari examples (#187)
- change the type-checking order in batch.py and converter.py in order to meet the most often case first (#189)
- Numba acceleration for GAE, nstep, and segment tree (#193)
- add policy.eval() in all test scripts' "watch performance" (#189)
- add test_returns (both GAE and nstep) (#189)
- improve the code-coverage (from 90% to 95%) and remove the dead code (#189)
- polish examples/box2d/bipedal_hardcore_sac.py (#207)
Bug fix
- fix a bug in MAPolicy:
buffer.rew = Batch()
doesn't changebuffer.rew
(thanks mypy) (#207) set policy.eval() before collector.collect (#204)This is a bug- fix shape inconsistency for torch.Tensor in replay buffer (#189)
- potential bugfix for subproc.wait (#189)
- fix RecurrentActorProb (#189)
- fix some incorrect type annotation (#189)
- fix a bug in tictactoe set_eps (#193)
- dirty fix for asyncVenv check_id test
0.2.6
API Change
- Replay buffer allows stack_num = 1 (#165)
- add policy.update to enable post process and remove collector.sample (#180)
- Remove
collector.close
and renameVectorEnv
toDummyVectorEnv
(#179)
Enhancement
- Enable async simulation for all vector envs (#179)
- Improve PER (#159): use segment tree and enable all Q-learning algorithms to use PER
- unify single-env and multi-env in collector (#157)
- Pickle compatible for replay buffer and improve buffer.get (#182): fix #84 and make buffer more efficient
- Add ShmemVectorEnv implementation (#174)
- Add Dueling DQN implementation (#170)
- Add profile workflow (#143)
- Add BipedalWalkerHardcore-v3 SAC example (#177) (about 1 hour it is well-trained)
Bug fix
Note: 0.3 is coming soon!
0.2.5
New feature
Multi-agent Reinforcement Learning: https://tianshou.readthedocs.io/en/latest/tutorials/tictactoe.html (#122)
Documentation
Add a tutorial of Batch class to standardized the behavior of Batch: https://tianshou.readthedocs.io/en/latest/tutorials/batch.html (#142)
Bugfix
0.2.4.post1
Several bug fix and enhancement:
- remove deprecated API
append
(#126) Batch.cat_
andBatch.stack_
is now working well with inconsistent keys (#130)- Batch.is_empty now correctly recognizes empty over empty Batch (#128)
- reconstruct collector: remove multiple buffer case, change the internal data to
Batch
, and add reward_metric for MARL usage (#125) - add
Batch.update
to mimicdict.update
(#128)
0.2.4
Algorithm Implementation
- n_step returns for all Q-learning based algorithms; (#51)
- Auto alpha tuning in SAC (#80)
- Reserve
policy._state
to support saving hidden states in replay buffer (#19) - Add
sample_avail
argument in ReplayBuffer to sample only available index in RNN training mode (#19)
New Feature
- Batch.cat (#87), Batch.stack (#93), Batch.empty (#106, #110)
- Advanced slicing method of Batch (#106)
Batch(kwargs, copy=True)
will perform a deep copy (#110)- Add
random=True
argument in collector.collect to perform sampling with random policy (#78)
API Change
Batch.append
->Batch.cat
- Remove atari wrapper to examples, since it is not a key feature in tianshou (#124)
- Add some pre-defined nets in
tianshou.utils.net
. Since we only define API instead of a class, we do not present it intianshou.net
. (#123)
Docs
Add cheatsheet: https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html
0.2.3
0.2.2
Algorithm Implementation
- Generalized Advantage Estimation (GAE);
- Update PPO algorithm with arXiv:1811.02553 and arXiv:1912.09729;
- Vanilla Imitation Learning (BC & DA, with continuous/discrete action space);
- Prioritized DQN;
- RNN-style policy network;
- Fix SAC with torch==1.5.0
API change
- change
__call__
toforward
in policy; - Add
save_fn
in trainer; - Add
__repr__
in tianshou.data, e.g.print(buffer)
0.2.1
First version with full documentation.
Support algorithms: DQN/VPG/A2C/DDPG/PPO/TD3/SAC