Hello,
I want to add heuristics to an MCTS implementation but I still want MCTS to “take over” and make the final decision. Say I have a function policy() that returns a value from 1.0 to 0.0 depending on the strength of the move as determined by heuristics. Is this modification to the UCT formula correct?
UCT = move.rewards/move.visits + exploration_rate * policy(move) * sqrt(log(totalSiblingVisits) / move.visits)