Oct 21, 2020
Hi Benny,
check this code line: returnG = list(map(lambda s: s.reward * (GAMMA ** len(s.steps)), elite_candidates))
The value of total return depend of the episode length
regards,
Jordi
Hi Benny,
check this code line: returnG = list(map(lambda s: s.reward * (GAMMA ** len(s.steps)), elite_candidates))
The value of total return depend of the episode length
regards,
Jordi
Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI