Hi Benny,

Shouldn't Gamma be multiplied by each step reward?
1
Benny Friedman
Jordi TORRES.AI
·Follow
Oct 21, 2020
--
Hi Benny,
check this code line: returnG = list(map(lambda s: s.reward * (GAMMA ** len(s.steps)), elite_candidates))
The value of total return depend of the episode length
regards,
Jordi
--
--
Written by Jordi TORRES.AI2.3K Followers
·43 Following
Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams