Jordi TORRES.AI
Oct 21, 2020

--

Hi Benny,

check this code line: returnG = list(map(lambda s: s.reward * (GAMMA ** len(s.steps)), elite_candidates))

The value of total return depend of the episode length

regards,

Jordi

--

--

Jordi TORRES.AI
Jordi TORRES.AI

Written by Jordi TORRES.AI

Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI

Responses (1)