Environment Simulator ===================== Env transition model: s = (1,1) a = right 0.1 0.0 0.1 0.8 0.0 a = up 0.8 0.0 0.1 0.1 0.0 a = left 0.1 0.0 0.9 0.0 0.0 a = down 0.0 0.0 0.9 0.1 0.0 s = (1,2) a = right 0.1 0.0 0.8 0.0 0.1 a = up 0.8 0.0 0.2 0.0 0.0 a = left 0.1 0.0 0.8 0.0 0.1 a = down 0.0 0.0 0.2 0.0 0.8 s = (1,3) a = right 0.0 0.0 0.1 0.8 0.1 a = up 0.0 0.0 0.9 0.1 0.0 a = left 0.0 0.0 0.9 0.0 0.1 a = down 0.0 0.0 0.1 0.1 0.8 s = (2,1) a = right 0.0 0.0 0.2 0.8 0.0 a = up 0.0 0.1 0.8 0.1 0.0 a = left 0.0 0.8 0.2 0.0 0.0 a = down 0.0 0.1 0.8 0.1 0.0 s = (2,2) a = right -99.0 -99.0 -99 -99.0 -99.0 a = up -99.0 -99.0 -99 -99.0 -99.0 a = left -99.0 -99.0 -99 -99.0 -99.0 a = down -99.0 -99.0 -99 -99.0 -99.0 s = (2,3) a = right 0.0 0.0 0.2 0.8 0.0 a = up 0.0 0.1 0.8 0.1 0.0 a = left 0.0 0.8 0.2 0.0 0.0 a = down 0.0 0.1 0.8 0.1 0.0 s = (3,1) a = right 0.1 0.0 0.1 0.8 0.0 a = up 0.8 0.1 0.0 0.1 0.0 a = left 0.1 0.8 0.1 0.0 0.0 a = down 0.0 0.1 0.8 0.1 0.0 s = (3,2) a = right 0.1 0.0 0.0 0.8 0.1 a = up 0.8 0.0 0.1 0.1 0.0 a = left 0.1 0.0 0.8 0.0 0.1 a = down 0.0 0.0 0.1 0.1 0.8 s = (3,3) a = right 0.0 0.0 0.1 0.8 0.1 a = up 0.0 0.1 0.8 0.1 0.0 a = left 0.0 0.8 0.1 0.0 0.1 a = down 0.0 0.1 0.0 0.1 0.8 s = (4,1) a = right 0.1 0.0 0.9 0.0 0.0 a = up 0.8 0.1 0.1 0.0 0.0 a = left 0.1 0.8 0.1 0.0 0.0 a = down 0.0 0.1 0.9 0.0 0.0 s = (4,2) a = right -99.0 -99.0 -99 -99.0 -99.0 a = up -99.0 -99.0 -99 -99.0 -99.0 a = left -99.0 -99.0 -99 -99.0 -99.0 a = down -99.0 -99.0 -99 -99.0 -99.0 s = (4,3) a = right -99.0 -99.0 -99 -99.0 -99.0 a = up -99.0 -99.0 -99 -99.0 -99.0 a = left -99.0 -99.0 -99 -99.0 -99.0 a = down -99.0 -99.0 -99 -99.0 -99.0 Env reward distribution: -0.04 -0.04 -0.04 1.0 -0.04 -0.04 -1.0 -0.04 -0.04 -0.04 -0.04 Active ADP Reinforcement Learning Agent ======================================= State rewards (init to 0): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 State utilities (init to 2.0): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Learning starts: [Trial 1 Step 0] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.04 0.0 0.0 0.0 Frequency of state-action pairs (not shown if 0): Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> -> -> Action: ->; rnd # 0.7960446 [Trial 1 Step 1] cur state (2,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 2.0 2.0 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> -> -> Action: ->; rnd # 0.046558022 [Trial 1 Step 2] cur state (3,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 1.96 2.0 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> -> -> Action: ->; rnd # 0.92861474 [Trial 1 Step 3] cur state (3,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 1.96 2.0 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> -> -> Action: ->; rnd # 0.7331306 [Trial 1 Step 4] cur state (4,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> ^ -> Action: ->; rnd # 0.83997756 [Trial 1 Step 5] cur state (4,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> ^ -> Action: ->; rnd # 0.2601924 [Trial 1 Step 6] cur state (4,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 0.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> 0.0 -> -> ^ ^ Action: ^; rnd # 0.7917761 [Trial 1 Step 7] cur state (4,2); rw -1.0 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 -> -> ^ ^ [Trial 2 Step 8] cur state (1,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 -> -> ^ ^ Action: ->; rnd # 0.22853518 [Trial 2 Step 9] cur state (2,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ -> ^ ^ Action: ->; rnd # 0.4980923 [Trial 2 Step 10] cur state (3,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 2.0 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ Action: ^; rnd # 0.7908377 [Trial 2 Step 11] cur state (3,2); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.46295977 [Trial 2 Step 12] cur state (4,2); rw -1.0 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ [Trial 3 Step 13] cur state (1,1); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 2.0 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ Action: ^; rnd # 0.6945982 [Trial 3 Step 14] cur state (1,2); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.74626094 [Trial 3 Step 15] cur state (1,2); rw -0.04 State utilities (0 means unknown): 2.0 2.0 2.0 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 -> -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.038089752 [Trial 3 Step 16] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 2.0 2.0 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 ^ -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.931123 [Trial 3 Step 17] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 2.0 2.0 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 0.0 ^ -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.3168165 [Trial 3 Step 18] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 2.0 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 0.0 ^ -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.43859768 [Trial 3 Step 19] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 0.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 0.0 ^ -> -1.0 ^ ^ ^ ^ Action: ->; rnd # 0.16991699 [Trial 3 Step 20] cur state (4,3); rw 1.0 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 ^ ^ ^ ^ [Trial 4 Step 21] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 ^ ^ ^ ^ Action: ^; rnd # 0.96401376 [Trial 4 Step 22] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 <- ^ ^ ^ Action: <-; rnd # 0.5329186 [Trial 4 Step 23] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 <- ^ ^ ^ Action: <-; rnd # 0.26704413 [Trial 4 Step 24] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 v ^ ^ ^ Action: v; rnd # 0.12890291 [Trial 4 Step 25] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.96 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 v ^ ^ ^ Action: v; rnd # 0.15103918 [Trial 4 Step 26] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> ^ ^ ^ Action: ->; rnd # 0.20679152 [Trial 4 Step 27] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> ^ ^ ^ Action: ^; rnd # 0.54986995 [Trial 4 Step 28] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> ^ ^ ^ Action: ^; rnd # 0.54419273 [Trial 4 Step 29] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> <- ^ ^ Action: <-; rnd # 0.4738745 [Trial 4 Step 30] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> <- ^ ^ Action: ->; rnd # 0.59614396 [Trial 4 Step 31] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> <- ^ ^ Action: <-; rnd # 0.66908526 [Trial 4 Step 32] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ->; rnd # 0.793936 [Trial 4 Step 33] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: v; rnd # 0.19701117 [Trial 4 Step 34] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9200 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ->; rnd # 0.9877214 [Trial 4 Step 35] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9120 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ->; rnd # 0.8630126 [Trial 4 Step 36] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ^; rnd # 0.43726683 [Trial 4 Step 37] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ^; rnd # 0.7254709 [Trial 4 Step 38] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: ^ -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ^; rnd # 0.24208689 [Trial 4 Step 39] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: <- -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: <-; rnd # 0.7353733 [Trial 4 Step 40] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: <- -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: <-; rnd # 0.48358268 [Trial 4 Step 41] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: v; rnd # 0.9434789 [Trial 4 Step 42] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.96 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: v; rnd # 0.33431625 [Trial 4 Step 43] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.8800 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 1.0 ^ -> -1.0 -> v ^ ^ Action: ^; rnd # 0.23984724 [Trial 4 Step 44] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 1.0 <- -> -1.0 -> v ^ ^ Action: ->; rnd # 0.11605853 [Trial 4 Step 45] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.8999 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> -> 1.0 <- -> -1.0 -> v ^ ^ Action: ->; rnd # 0.15743673 [Trial 4 Step 46] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ -> 1.0 <- -> -1.0 -> v ^ ^ Action: ->; rnd # 0.5249627 [Trial 4 Step 47] cur state (4,3); rw 1.0 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 -> v ^ ^ [Trial 5 Step 48] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.9133 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 -> v ^ ^ Action: ->; rnd # 0.34091562 [Trial 5 Step 49] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.9142 1.96 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 -> v ^ ^ Action: v; rnd # 0.9275591 [Trial 5 Step 50] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 ^ -> ^ ^ Action: ->; rnd # 0.30363202 [Trial 5 Step 51] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 -> -> ^ ^ Action: ^; rnd # 0.6654932 [Trial 5 Step 52] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (3,2) ->: 1.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- -> -1.0 -> -> <- ^ Action: ->; rnd # 0.91778487 [Trial 5 Step 53] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> <- ^ Action: <-; rnd # 0.86761856 [Trial 5 Step 54] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> <- ^ Action: ->; rnd # 0.43346757 [Trial 5 Step 55] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> <- ^ Action: <-; rnd # 0.59287214 [Trial 5 Step 56] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> v ^ Action: ->; rnd # 0.4763136 [Trial 5 Step 57] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> v ^ Action: v; rnd # 0.61179584 [Trial 5 Step 58] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.9200 1.96 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 -> -> v ^ Action: v; rnd # 0.49740374 [Trial 5 Step 59] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.7484852 [Trial 5 Step 60] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.6502678 [Trial 5 Step 61] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.13795912 [Trial 5 Step 62] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 0.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.15723121 [Trial 5 Step 63] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.3571875 [Trial 5 Step 64] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.9066 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> ^ ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.4221617 [Trial 5 Step 65] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.9066 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> <- ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.8820089 [Trial 5 Step 66] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.9066 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> <- ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.26179814 [Trial 5 Step 67] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.9066 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.4036066 [Trial 5 Step 68] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.9100 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: v; rnd # 0.16778785 [Trial 5 Step 69] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.9100 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.91794 [Trial 5 Step 70] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.9000 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.9240982 [Trial 5 Step 71] cur state (1,3); rw -0.04 State utilities (0 means unknown): 1.8900 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.5661997 [Trial 5 Step 72] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.8959 1.96 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> v ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: v; rnd # 0.5478104 [Trial 5 Step 73] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.39932293 [Trial 5 Step 74] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> ^ 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.99100995 [Trial 5 Step 75] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> <- 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.4268908 [Trial 5 Step 76] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> <- 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.5331207 [Trial 5 Step 77] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> <- 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.76222724 [Trial 5 Step 78] cur state (2,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ->; rnd # 0.4574027 [Trial 5 Step 79] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ Action: v; rnd # 0.01791799 [Trial 5 Step 80] cur state (4,3); rw 1.0 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ [Trial 6 Step 81] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8800 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ Action: ^; rnd # 0.38479125 [Trial 6 Step 82] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.8999 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.4399194 [Trial 6 Step 83] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.9000 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 <- ^ -1.0 ^ -> ^ ^ Action: <-; rnd # 0.8533624 [Trial 6 Step 84] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.9000 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 v ^ -1.0 ^ -> ^ ^ Action: v; rnd # 0.840985 [Trial 6 Step 85] cur state (1,2); rw -0.04 State utilities (0 means unknown): 1.8800 1.9200 1.96 1.0 1.96 1.96 -1.0 1.9000 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v -> v 1.0 v ^ -1.0 ^ -> ^ ^ Action: v; rnd # 0.7192972 [Trial 6 Step 86] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.8560 1.9200 1.96 1.0 1.8160 1.96 -1.0 1.8251 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 8.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> v 1.0 ^ ^ -1.0 -> -> ^ ^ Action: ->; rnd # 0.60956824 [Trial 6 Step 87] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.8560 1.9200 1.96 1.0 1.8160 1.96 -1.0 1.8270 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> v 1.0 ^ ^ -1.0 -> -> ^ ^ Action: ->; rnd # 0.3396414 [Trial 6 Step 88] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.8560 1.9200 1.96 1.0 1.8160 1.96 -1.0 1.8270 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> v 1.0 ^ ^ -1.0 -> -> ^ ^ Action: ^; rnd # 0.28070676 [Trial 6 Step 89] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.8560 1.9200 1.96 1.0 1.8160 1.96 -1.0 1.8270 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> v 1.0 ^ ^ -1.0 -> -> ^ ^ Action: ^; rnd # 0.20821321 [Trial 6 Step 90] cur state (3,3); rw -0.04 State utilities (0 means unknown): 1.8560 1.9200 1.96 1.0 1.8160 1.96 -1.0 1.8270 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: -> -> v 1.0 ^ <- -1.0 -> -> ^ ^ Action: v; rnd # 0.5785366 [Trial 6 Step 91] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6666 1.6086 1.5847 1.0 1.7394 1.96 -1.0 1.8176 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v <- -1.0 -> -> ^ ^ Action: <-; rnd # 0.64791256 [Trial 6 Step 92] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6572 1.5774 1.5376 1.0 1.7371 1.96 -1.0 1.8171 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v <- -1.0 -> -> ^ ^ Action: <-; rnd # 0.10909575 [Trial 6 Step 93] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.6571 1.5771 1.5371 1.0 1.7371 1.96 -1.0 1.8171 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> ^ ^ Action: ^; rnd # 0.3999663 [Trial 6 Step 94] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6571 1.5771 1.5371 1.0 1.7371 1.96 -1.0 1.8171 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 5.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> ^ ^ Action: v; rnd # 0.13610333 [Trial 6 Step 95] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.6571 1.5771 1.5371 1.0 1.7371 1.96 -1.0 1.8171 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 5.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> ^ ^ Action: ^; rnd # 0.7679606 [Trial 6 Step 96] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6571 1.5771 1.5371 1.0 1.7371 1.96 -1.0 1.8171 1.8800 1.9200 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> ^ ^ Action: v; rnd # 0.95547974 [Trial 6 Step 97] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6292 1.5599 1.5256 1.0 1.7017 1.8008 -1.0 1.7784 1.8401 1.8800 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v <- -1.0 -> -> -> ^ Action: <-; rnd # 0.348692 [Trial 6 Step 98] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6173 1.5377 1.4980 1.0 1.6971 1.8000 -1.0 1.7771 1.8400 1.8800 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 2.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> ^ Action: v; rnd # 0.6247471 [Trial 6 Step 99] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.6171 1.5371 1.4971 1.0 1.6971 1.8199 -1.0 1.7771 1.8400 1.8800 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> ^ Action: ->; rnd # 0.5356418 [Trial 6 Step 100] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.6340 1.5497 1.5071 1.0 1.7162 1.8399 -1.0 1.7969 1.8599 1.8999 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 3.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 1.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> ^ Action: ^; rnd # 0.0024469495 [Trial 6 Step 101] cur state (4,2); rw -1.0 State utilities (0 means unknown): 1.6371 1.5570 1.5169 1.0 1.7171 1.8400 -1.0 1.7971 1.8600 1.9000 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 3.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- [Trial 7 Step 102] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.6371 1.5570 1.5169 1.0 1.7171 1.8400 -1.0 1.7971 1.8600 1.9000 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 9.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 3.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.509188 [Trial 7 Step 103] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.6398 1.5596 1.5194 1.0 1.7199 1.8400 -1.0 1.7999 1.8600 1.9000 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 3.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.71815956 [Trial 7 Step 104] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.6399 1.5599 1.5199 1.0 1.7200 1.8400 -1.0 1.8000 1.8600 1.9000 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 3.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.8384805 [Trial 7 Step 105] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.6044 1.5349 1.5008 1.0 1.6771 1.7919 -1.0 1.7532 1.8115 1.8510 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 4.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 3.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: v; rnd # 0.20946425 [Trial 7 Step 106] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.5925 1.5124 1.4725 1.0 1.6728 1.7998 -1.0 1.7531 1.8132 1.8532 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 4.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 4.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.85288775 [Trial 7 Step 107] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.5647 1.4938 1.4586 1.0 1.6375 1.7576 -1.0 1.7128 1.7705 1.8096 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 5.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 4.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: v; rnd # 0.2912919 [Trial 7 Step 108] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.5498 1.4704 1.4309 1.0 1.6298 1.7599 -1.0 1.7098 1.7699 1.8099 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 5.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 5.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.8084696 [Trial 7 Step 109] cur state (3,2); rw -0.04 State utilities (0 means unknown): 1.5258 1.4539 1.4180 1.0 1.5989 1.7217 -1.0 1.6740 1.7313 1.7701 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 6.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 5.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: v; rnd # 0.398624 [Trial 7 Step 110] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.5092 1.4303 1.3911 1.0 1.5886 1.7202 -1.0 1.6684 1.7282 1.7682 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 6.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.6013247 [Trial 7 Step 111] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.5403 1.4501 1.4047 1.0 1.6281 1.7671 -1.0 1.7130 1.7753 1.8161 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 7.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 0.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: <-; rnd # 0.20695078 [Trial 7 Step 112] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.5573 1.4761 1.4353 1.0 1.6380 1.7704 -1.0 1.7183 1.7784 1.8185 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 7.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: ->; rnd # 0.09544718 [Trial 7 Step 113] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.5761 1.4907 1.4477 1.0 1.6600 1.7950 -1.0 1.7421 1.8031 1.8433 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 1.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> <- Action: <-; rnd # 0.039084375 [Trial 7 Step 114] cur state (4,2); rw -1.0 State utilities (0 means unknown): 1.5836 1.5032 1.4629 1.0 1.6638 1.7959 -1.0 1.7439 1.8039 1.8439 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v [Trial 8 Step 115] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.5836 1.5032 1.4629 1.0 1.6638 1.7959 -1.0 1.7439 1.8039 1.8439 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 10.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: ->; rnd # 0.91247934 [Trial 8 Step 116] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.5792 1.4996 1.4600 1.0 1.6590 1.7959 -1.0 1.7390 1.8039 1.8439 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 11.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: ->; rnd # 0.96535176 [Trial 8 Step 117] cur state (1,1); rw -0.04 State utilities (0 means unknown): 1.5742 1.4947 1.4551 1.0 1.6540 1.7960 -1.0 1.7340 1.8040 1.8440 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 12.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: ->; rnd # 0.5439003 [Trial 8 Step 118] cur state (2,1); rw -0.04 State utilities (0 means unknown): 1.5771 1.4968 1.4566 1.0 1.6572 1.7960 -1.0 1.7373 1.8040 1.8440 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: ->; rnd # 0.40634257 [Trial 8 Step 119] cur state (3,1); rw -0.04 State utilities (0 means unknown): 1.5773 1.4973 1.4573 1.0 1.6573 1.7960 -1.0 1.7373 1.8040 1.8440 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 8.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: ->; rnd # 0.42158538 [Trial 8 Step 120] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.5882 1.5049 1.4630 1.0 1.6705 1.8108 -1.0 1.7517 1.8188 1.8589 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 0.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: v; rnd # 0.6248332 [Trial 8 Step 121] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.5924 1.5122 1.4720 1.0 1.6724 1.8111 -1.0 1.7525 1.8191 1.8591 1.96 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 1.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> v Action: v; rnd # 0.46890223 [Trial 8 Step 122] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.5199 1.4798 1.4550 1.0 1.5425 1.5572 -1.0 1.5532 1.5575 1.5583 1.5600 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> -> Action: ->; rnd # 0.47830248 [Trial 8 Step 123] cur state (4,1); rw -0.04 State utilities (0 means unknown): 1.1589 1.1571 1.1559 1.0 1.1596 1.1599 -1.0 1.1598 1.1599 1.1599 1.1600 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: v ^ <- 1.0 v v -1.0 -> -> -> -> Action: ->; rnd # 0.6894144 [Trial 8 Step 124] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.8572 0.9199 0.96 1.0 0.8193 0.9199 -1.0 0.7896 0.8399 0.8799 0.7600 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ -> Action: ->; rnd # 0.12648642 [Trial 8 Step 125] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.8160 0.9199 -1.0 0.7856 0.8399 0.8799 0.3600 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ -> Action: ->; rnd # 0.33393115 [Trial 8 Step 126] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 -0.039 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ -> Action: ->; rnd # 0.8615943 [Trial 8 Step 127] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 -0.100 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: <-; rnd # 0.3839084 [Trial 8 Step 128] cur state (3,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 6.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.5812425 [Trial 8 Step 129] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 2.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.2685007 [Trial 8 Step 130] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.75879735 [Trial 8 Step 131] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- [Trial 9 Step 132] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7855 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 13.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.779531 [Trial 9 Step 133] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7869 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.40907586 [Trial 9 Step 134] cur state (3,1); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7869 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 7.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.27392185 [Trial 9 Step 135] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7869 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 3.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.4207862 [Trial 9 Step 136] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.856 0.9199 0.96 1.0 0.816 0.9199 -1.0 0.7869 0.8399 0.8799 0.2133 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 4.0; <-: 3.0; v: 6.0 s = (3,3) ->: 3.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.84524983 [Trial 9 Step 137] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8295 0.8933 0.9333 1.0 0.7896 0.8933 -1.0 0.7607 0.8134 0.8534 0.1956 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 4.0; <-: 3.0; v: 6.0 s = (3,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.6761655 [Trial 9 Step 138] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8293 0.8933 0.9333 1.0 0.7893 0.8933 -1.0 0.7602 0.8133 0.8533 0.1955 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 4.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.6013074 [Trial 9 Step 139] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8359 0.8999 0.9399 1.0 0.7959 0.8999 -1.0 0.7668 0.8199 0.8599 0.1999 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- [Trial 10 Step 140] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8359 0.8999 0.9399 1.0 0.7959 0.8999 -1.0 0.7668 0.8199 0.8599 0.1999 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 14.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.24964654 [Trial 10 Step 141] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7679 0.8199 0.8599 0.1999 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ->; rnd # 0.5771543 [Trial 10 Step 142] cur state (3,1); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7679 0.8199 0.8599 0.1999 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 8.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 -> -> ^ <- Action: ^; rnd # 0.042447746 [Trial 10 Step 143] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7360 0.7360 0.7760 0.1440 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 3.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ -> ^ <- Action: <-; rnd # 0.050444484 [Trial 10 Step 144] cur state (4,2); rw -1.0 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7359 0.7016 0.7413 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ -> ^ <- [Trial 11 Step 145] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7359 0.7016 0.7413 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ -> ^ <- Action: ^; rnd # 0.18840349 [Trial 11 Step 146] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7426 0.7026 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 2.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.16398114 [Trial 11 Step 147] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8359 0.9 0.94 1.0 0.7959 0.9 -1.0 0.7426 0.7026 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.409976 [Trial 11 Step 148] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.84 0.9 0.94 1.0 0.7999 0.9 -1.0 0.7466 0.7066 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.437845 [Trial 11 Step 149] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.84 0.9 0.94 1.0 0.7999 0.9 -1.0 0.7466 0.7066 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 5.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8567182 [Trial 11 Step 150] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8203 0.8800 0.9200 1.0 0.7804 0.8800 -1.0 0.7278 0.6886 0.7226 -0.178 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 5.0; <-: 3.0; v: 6.0 s = (3,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.23003012 [Trial 11 Step 151] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8200 0.8800 0.9200 1.0 0.7800 0.8800 -1.0 0.7266 0.6866 0.7223 -0.178 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.41847825 [Trial 11 Step 152] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8279 0.8879 0.9279 1.0 0.7878 0.8879 -1.0 0.7343 0.6940 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 12 Step 153] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8279 0.8879 0.9279 1.0 0.7878 0.8879 -1.0 0.7343 0.6940 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.27670515 [Trial 12 Step 154] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8279 0.8879 0.928 1.0 0.7879 0.8879 -1.0 0.7379 0.6979 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 3.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.05989158 [Trial 12 Step 155] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.828 0.888 0.9280 1.0 0.788 0.888 -1.0 0.738 0.6979 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.53497875 [Trial 12 Step 156] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8308 0.888 0.9280 1.0 0.7908 0.888 -1.0 0.7408 0.7008 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 6.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.15069002 [Trial 12 Step 157] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8308 0.888 0.9280 1.0 0.7908 0.888 -1.0 0.7408 0.7008 0.7298 -0.175 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5633621 [Trial 12 Step 158] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8361 0.8933 0.9333 1.0 0.7961 0.8933 -1.0 0.7460 0.7059 0.7348 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 13 Step 159] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8361 0.8933 0.9333 1.0 0.7961 0.8933 -1.0 0.7460 0.7059 0.7348 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.4457689 [Trial 13 Step 160] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8361 0.8933 0.9333 1.0 0.7961 0.8933 -1.0 0.7481 0.7081 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 4.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7069648 [Trial 13 Step 161] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8361 0.8933 0.9333 1.0 0.7961 0.8933 -1.0 0.7481 0.7081 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8343032 [Trial 13 Step 162] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8247 0.8933 0.9333 1.0 0.7847 0.8933 -1.0 0.7367 0.6968 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 5.0; <-: 2.0; v: 2.0 s = (1,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.18994617 [Trial 13 Step 163] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8247 0.8933 0.9333 1.0 0.7847 0.8933 -1.0 0.7367 0.6967 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.7348649 [Trial 13 Step 164] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8283 0.8933 0.9333 1.0 0.7883 0.8933 -1.0 0.7403 0.7003 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 7.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5051227 [Trial 13 Step 165] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8283 0.8933 0.9333 1.0 0.7883 0.8933 -1.0 0.7403 0.7003 0.7349 -0.172 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5231733 [Trial 13 Step 166] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8321 0.8971 0.9371 1.0 0.7920 0.8971 -1.0 0.7440 0.7039 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 14 Step 167] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8321 0.8971 0.9371 1.0 0.7920 0.8971 -1.0 0.7440 0.7039 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8202366 [Trial 14 Step 168] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8321 0.8971 0.9371 1.0 0.7921 0.8971 -1.0 0.7454 0.7054 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 6.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7682812 [Trial 14 Step 169] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8321 0.8971 0.9371 1.0 0.7921 0.8971 -1.0 0.7454 0.7054 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.739798 [Trial 14 Step 170] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8349 0.8971 0.9371 1.0 0.7949 0.8971 -1.0 0.7482 0.7082 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 8.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.03687972 [Trial 14 Step 171] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8349 0.8971 0.9371 1.0 0.7949 0.8971 -1.0 0.7482 0.7082 0.7384 -0.170 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.44731116 [Trial 14 Step 172] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8377 0.8999 0.9399 1.0 0.7977 0.8999 -1.0 0.7510 0.7109 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 15 Step 173] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8377 0.8999 0.9399 1.0 0.7977 0.8999 -1.0 0.7510 0.7109 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.6614155 [Trial 15 Step 174] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8377 0.9 0.94 1.0 0.7977 0.9 -1.0 0.7520 0.7120 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 7.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.89030546 [Trial 15 Step 175] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8371 0.9 0.94 1.0 0.7914 0.9 -1.0 0.7457 0.7057 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.17956042 [Trial 15 Step 176] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8372 0.9 0.94 1.0 0.7922 0.9 -1.0 0.7465 0.7065 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.37262315 [Trial 15 Step 177] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8394 0.9 0.94 1.0 0.7944 0.9 -1.0 0.7487 0.7087 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 9.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.31536442 [Trial 15 Step 178] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8394 0.9 0.94 1.0 0.7944 0.9 -1.0 0.7487 0.7087 0.7411 -0.169 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.04684466 [Trial 15 Step 179] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7509 0.7109 0.7432 -0.168 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 16 Step 180] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7509 0.7109 0.7432 -0.168 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 8.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.05648935 [Trial 16 Step 181] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7400 0.7032 0.7432 -0.168 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ -> ^ <- Action: ->; rnd # 0.034972727 [Trial 16 Step 182] cur state (3,1); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7400 0.7032 0.7432 -0.168 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 9.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ -> ^ <- Action: ^; rnd # 0.009208918 [Trial 16 Step 183] cur state (4,1); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7395 0.6995 0.6595 -0.210 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 10.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 4.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- <- <- Action: <-; rnd # 0.5386818 [Trial 16 Step 184] cur state (3,1); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7395 0.6995 0.6747 -0.035 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 10.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.28612053 [Trial 16 Step 185] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7395 0.6995 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 6.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.20590037 [Trial 16 Step 186] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8417 0.9022 0.9422 1.0 0.7967 0.9022 -1.0 0.7395 0.6995 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5025818 [Trial 16 Step 187] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8434 0.9039 0.9439 1.0 0.7984 0.9039 -1.0 0.7412 0.7012 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 17 Step 188] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8434 0.9039 0.9439 1.0 0.7984 0.9039 -1.0 0.7412 0.7012 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.29669726 [Trial 17 Step 189] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8434 0.9039 0.9439 1.0 0.7984 0.9039 -1.0 0.7434 0.7034 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 9.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.95787495 [Trial 17 Step 190] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8430 0.9039 0.9439 1.0 0.7930 0.9039 -1.0 0.7380 0.6980 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.2398948 [Trial 17 Step 191] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8431 0.9039 0.9439 1.0 0.7942 0.9039 -1.0 0.7392 0.6992 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.13721663 [Trial 17 Step 192] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8450 0.9039 0.9439 1.0 0.7961 0.9039 -1.0 0.7411 0.7011 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 10.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.014776707 [Trial 17 Step 193] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8450 0.9039 0.9439 1.0 0.7961 0.9039 -1.0 0.7411 0.7011 0.6955 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.21903223 [Trial 17 Step 194] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8464 0.9054 0.9454 1.0 0.7975 0.9054 -1.0 0.7425 0.7024 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 18 Step 195] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8464 0.9054 0.9454 1.0 0.7975 0.9054 -1.0 0.7425 0.7024 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 10.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7111673 [Trial 18 Step 196] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8464 0.9054 0.9454 1.0 0.7975 0.9054 -1.0 0.7442 0.7042 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.5973544 [Trial 18 Step 197] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8465 0.9054 0.9454 1.0 0.7985 0.9054 -1.0 0.7452 0.7052 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.7292907 [Trial 18 Step 198] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8481 0.9054 0.9454 1.0 0.8001 0.9054 -1.0 0.7467 0.7067 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5605562 [Trial 18 Step 199] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8481 0.9054 0.9454 1.0 0.8001 0.9054 -1.0 0.7467 0.7067 0.6968 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.22442114 [Trial 18 Step 200] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8493 0.9066 0.9466 1.0 0.8013 0.9066 -1.0 0.7479 0.7079 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 19 Step 201] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8493 0.9066 0.9466 1.0 0.8013 0.9066 -1.0 0.7479 0.7079 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 11.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9213517 [Trial 19 Step 202] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8493 0.9066 0.9466 1.0 0.8013 0.9066 -1.0 0.7435 0.7035 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8151987 [Trial 19 Step 203] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8493 0.9066 0.9466 1.0 0.8013 0.9066 -1.0 0.7453 0.7053 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 12.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.90000594 [Trial 19 Step 204] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8490 0.9066 0.9466 1.0 0.7970 0.9066 -1.0 0.7410 0.7010 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.99898535 [Trial 19 Step 205] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8486 0.9066 0.9466 1.0 0.7926 0.9066 -1.0 0.7366 0.6966 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8406345 [Trial 19 Step 206] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8483 0.9066 0.9466 1.0 0.7883 0.9066 -1.0 0.7323 0.6923 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.21538728 [Trial 19 Step 207] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8484 0.9066 0.9466 1.0 0.7903 0.9066 -1.0 0.7342 0.6942 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 16.0; <-: 2.0; v: 2.0 s = (1,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8839686 [Trial 19 Step 208] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8403 0.9066 0.9466 1.0 0.7821 0.9066 -1.0 0.7262 0.6862 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 16.0; <-: 2.0; v: 2.0 s = (1,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.99095184 [Trial 19 Step 209] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8396 0.9066 0.9466 1.0 0.7778 0.9066 -1.0 0.7219 0.6819 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7379352 [Trial 19 Step 210] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.9066 0.9466 1.0 0.7799 0.9066 -1.0 0.7239 0.6839 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.16909015 [Trial 19 Step 211] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8420 0.9066 0.9466 1.0 0.7820 0.9066 -1.0 0.7260 0.6860 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 12.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9224197 [Trial 19 Step 212] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8387 0.9033 0.9466 1.0 0.7787 0.9066 -1.0 0.7227 0.6828 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 13.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5367563 [Trial 19 Step 213] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8389 0.9035 0.9466 1.0 0.7789 0.9066 -1.0 0.7229 0.6829 0.6979 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.28582567 [Trial 19 Step 214] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8399 0.9046 0.9476 1.0 0.7799 0.9076 -1.0 0.7239 0.6838 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 20 Step 215] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8399 0.9046 0.9476 1.0 0.7799 0.9076 -1.0 0.7239 0.6838 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 13.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8061476 [Trial 20 Step 216] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8399 0.9046 0.9476 1.0 0.7799 0.9076 -1.0 0.7254 0.6854 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.058423936 [Trial 20 Step 217] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8402 0.9046 0.9476 1.0 0.7817 0.9076 -1.0 0.7272 0.6872 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.42729008 [Trial 20 Step 218] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8419 0.9046 0.9476 1.0 0.7835 0.9076 -1.0 0.7289 0.6889 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 14.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8608376 [Trial 20 Step 219] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8389 0.9015 0.9476 1.0 0.7804 0.9076 -1.0 0.7259 0.6860 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.825078 [Trial 20 Step 220] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8358 0.8984 0.9476 1.0 0.7773 0.9076 -1.0 0.7228 0.6829 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.11960977 [Trial 20 Step 221] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8364 0.8991 0.9476 1.0 0.7780 0.9076 -1.0 0.7234 0.6834 0.6989 -0.020 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 15.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.810572 [Trial 20 Step 222] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8303 0.8929 0.9415 1.0 0.7720 0.9015 -1.0 0.7177 0.6780 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 7.0; <-: 3.0; v: 6.0 s = (3,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9804703 [Trial 20 Step 223] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8290 0.8916 0.9402 1.0 0.7706 0.8945 -1.0 0.7161 0.6763 0.6868 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 8.0; <-: 3.0; v: 6.0 s = (3,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.50121313 [Trial 20 Step 224] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8291 0.8918 0.9403 1.0 0.7707 0.8953 -1.0 0.7161 0.6761 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 16.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.70949095 [Trial 20 Step 225] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8305 0.8932 0.9417 1.0 0.7720 0.8967 -1.0 0.7174 0.6774 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 21 Step 226] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8305 0.8932 0.9417 1.0 0.7720 0.8967 -1.0 0.7174 0.6774 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 14.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.68281233 [Trial 21 Step 227] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8305 0.8932 0.9417 1.0 0.7721 0.8967 -1.0 0.7187 0.6787 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.5233007 [Trial 21 Step 228] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8307 0.8932 0.9417 1.0 0.7736 0.8967 -1.0 0.7202 0.6802 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4813726 [Trial 21 Step 229] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8322 0.8932 0.9417 1.0 0.7751 0.8967 -1.0 0.7217 0.6817 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.509184 [Trial 21 Step 230] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8328 0.8937 0.9417 1.0 0.7756 0.8967 -1.0 0.7223 0.6823 0.6888 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 17.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.97506887 [Trial 21 Step 231] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8300 0.8909 0.9389 1.0 0.7729 0.8939 -1.0 0.7196 0.6798 0.6862 -0.028 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.25001174 [Trial 21 Step 232] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8313 0.8923 0.9403 1.0 0.7742 0.8953 -1.0 0.7208 0.6807 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 22 Step 233] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8313 0.8923 0.9403 1.0 0.7742 0.8953 -1.0 0.7208 0.6807 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 15.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.046730578 [Trial 22 Step 234] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8313 0.8923 0.9403 1.0 0.7742 0.8953 -1.0 0.7142 0.6742 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 16.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: <-; rnd # 0.37124664 [Trial 22 Step 235] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8313 0.8923 0.9403 1.0 0.7742 0.8953 -1.0 0.7142 0.6742 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 16.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.34143358 [Trial 22 Step 236] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8313 0.8923 0.9403 1.0 0.7742 0.8953 -1.0 0.7157 0.6757 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.91986316 [Trial 22 Step 237] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8310 0.8923 0.9403 1.0 0.7710 0.8953 -1.0 0.7125 0.6725 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.67679286 [Trial 22 Step 238] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8311 0.8923 0.9403 1.0 0.7725 0.8953 -1.0 0.7140 0.6740 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6107045 [Trial 22 Step 239] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8324 0.8923 0.9403 1.0 0.7738 0.8953 -1.0 0.7153 0.6753 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 18.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8737616 [Trial 22 Step 240] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8298 0.8896 0.9403 1.0 0.7711 0.8953 -1.0 0.7127 0.6728 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.0079776645 [Trial 22 Step 241] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8304 0.8903 0.9403 1.0 0.7718 0.8953 -1.0 0.7133 0.6733 0.6875 -0.027 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 19.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.06400871 [Trial 22 Step 242] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8317 0.8915 0.9415 1.0 0.7730 0.8965 -1.0 0.7145 0.6744 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 23 Step 243] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8317 0.8915 0.9415 1.0 0.7730 0.8965 -1.0 0.7145 0.6744 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 17.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.31294978 [Trial 23 Step 244] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8317 0.8915 0.9415 1.0 0.7730 0.8965 -1.0 0.7159 0.6759 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7294124 [Trial 23 Step 245] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8318 0.8915 0.9415 1.0 0.7743 0.8965 -1.0 0.7172 0.6772 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.97947276 [Trial 23 Step 246] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8293 0.8915 0.9415 1.0 0.7718 0.8965 -1.0 0.7147 0.6747 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.027881145 [Trial 23 Step 247] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8306 0.8915 0.9415 1.0 0.7731 0.8965 -1.0 0.7160 0.6760 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9788089 [Trial 23 Step 248] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8281 0.8890 0.9415 1.0 0.7706 0.8965 -1.0 0.7135 0.6736 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.38753086 [Trial 23 Step 249] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8289 0.8897 0.9415 1.0 0.7714 0.8965 -1.0 0.7142 0.6742 0.6886 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 20.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.21322119 [Trial 23 Step 250] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8299 0.8908 0.9426 1.0 0.7724 0.8976 -1.0 0.7152 0.6752 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 24 Step 251] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8299 0.8908 0.9426 1.0 0.7724 0.8976 -1.0 0.7152 0.6752 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 18.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.70500475 [Trial 24 Step 252] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8299 0.8908 0.9426 1.0 0.7724 0.8976 -1.0 0.7164 0.6764 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.274041 [Trial 24 Step 253] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8301 0.8908 0.9426 1.0 0.7736 0.8976 -1.0 0.7176 0.6776 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.2243579 [Trial 24 Step 254] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8312 0.8908 0.9426 1.0 0.7748 0.8976 -1.0 0.7187 0.6787 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8277868 [Trial 24 Step 255] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8289 0.8885 0.9426 1.0 0.7724 0.8976 -1.0 0.7164 0.6765 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.67090285 [Trial 24 Step 256] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8297 0.8893 0.9426 1.0 0.7732 0.8976 -1.0 0.7172 0.6772 0.6896 -0.026 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 21.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.41982043 [Trial 24 Step 257] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8306 0.8902 0.9436 1.0 0.7741 0.8986 -1.0 0.7181 0.6780 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 25 Step 258] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8306 0.8902 0.9436 1.0 0.7741 0.8986 -1.0 0.7181 0.6780 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 19.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8003798 [Trial 25 Step 259] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8306 0.8902 0.9436 1.0 0.7741 0.8986 -1.0 0.7191 0.6791 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8223948 [Trial 25 Step 260] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8304 0.8902 0.9436 1.0 0.7715 0.8986 -1.0 0.7165 0.6765 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.332596 [Trial 25 Step 261] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8305 0.8902 0.9436 1.0 0.7727 0.8986 -1.0 0.7177 0.6777 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.99397403 [Trial 25 Step 262] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8283 0.8902 0.9436 1.0 0.7705 0.8986 -1.0 0.7155 0.6755 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.61637396 [Trial 25 Step 263] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8294 0.8902 0.9436 1.0 0.7716 0.8986 -1.0 0.7166 0.6766 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.47482193 [Trial 25 Step 264] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8301 0.8909 0.9436 1.0 0.7723 0.8986 -1.0 0.7173 0.6773 0.6905 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 22.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6474077 [Trial 25 Step 265] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8310 0.8918 0.9444 1.0 0.7732 0.8994 -1.0 0.7181 0.6781 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 26 Step 266] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8310 0.8918 0.9444 1.0 0.7732 0.8994 -1.0 0.7181 0.6781 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 20.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.76712024 [Trial 26 Step 267] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8310 0.8918 0.9444 1.0 0.7732 0.8994 -1.0 0.7191 0.6791 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7592746 [Trial 26 Step 268] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8311 0.8918 0.9444 1.0 0.7742 0.8994 -1.0 0.7201 0.6801 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8446202 [Trial 26 Step 269] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8260 0.8918 0.9444 1.0 0.7691 0.8994 -1.0 0.7151 0.6751 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.4315995 [Trial 26 Step 270] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8261 0.8918 0.9444 1.0 0.7701 0.8994 -1.0 0.7160 0.6760 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.0962348 [Trial 26 Step 271] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8274 0.8918 0.9444 1.0 0.7714 0.8994 -1.0 0.7173 0.6773 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4130932 [Trial 26 Step 272] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8280 0.8924 0.9444 1.0 0.7720 0.8994 -1.0 0.7179 0.6779 0.6913 -0.025 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 23.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.41353244 [Trial 26 Step 273] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8288 0.8932 0.9452 1.0 0.7728 0.9002 -1.0 0.7186 0.6786 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 27 Step 274] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8288 0.8932 0.9452 1.0 0.7728 0.9002 -1.0 0.7186 0.6786 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 21.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7581656 [Trial 27 Step 275] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8288 0.8932 0.9452 1.0 0.7728 0.9002 -1.0 0.7195 0.6795 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.2790333 [Trial 27 Step 276] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8289 0.8932 0.9452 1.0 0.7737 0.9002 -1.0 0.7203 0.6803 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.44882488 [Trial 27 Step 277] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8301 0.8932 0.9452 1.0 0.7748 0.9002 -1.0 0.7215 0.6815 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.39535922 [Trial 27 Step 278] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8306 0.8938 0.9452 1.0 0.7754 0.9002 -1.0 0.7221 0.6821 0.6920 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 24.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.070760906 [Trial 27 Step 279] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8313 0.8945 0.9459 1.0 0.7761 0.9009 -1.0 0.7227 0.6827 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 28 Step 280] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8313 0.8945 0.9459 1.0 0.7761 0.9009 -1.0 0.7227 0.6827 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 22.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.84000677 [Trial 28 Step 281] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8313 0.8945 0.9459 1.0 0.7761 0.9009 -1.0 0.7235 0.6835 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.47742635 [Trial 28 Step 282] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8314 0.8945 0.9459 1.0 0.7769 0.9009 -1.0 0.7243 0.6843 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4955572 [Trial 28 Step 283] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8325 0.8945 0.9459 1.0 0.7779 0.9009 -1.0 0.7253 0.6853 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.10266805 [Trial 28 Step 284] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8330 0.8950 0.9459 1.0 0.7785 0.9009 -1.0 0.7258 0.6858 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 25.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6927061 [Trial 28 Step 285] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8336 0.8956 0.9465 1.0 0.7791 0.9015 -1.0 0.7264 0.6864 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 29 Step 286] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8336 0.8956 0.9465 1.0 0.7791 0.9015 -1.0 0.7264 0.6864 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 23.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8017591 [Trial 29 Step 287] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8336 0.8956 0.9465 1.0 0.7791 0.9015 -1.0 0.7271 0.6871 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9677755 [Trial 29 Step 288] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8334 0.8956 0.9465 1.0 0.7770 0.9015 -1.0 0.7250 0.6850 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.75644124 [Trial 29 Step 289] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8335 0.8956 0.9465 1.0 0.7778 0.9015 -1.0 0.7258 0.6858 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.48829246 [Trial 29 Step 290] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8345 0.8956 0.9465 1.0 0.7788 0.9015 -1.0 0.7268 0.6868 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5401715 [Trial 29 Step 291] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8349 0.8961 0.9465 1.0 0.7793 0.9015 -1.0 0.7273 0.6873 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 26.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.66229194 [Trial 29 Step 292] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8355 0.8967 0.9471 1.0 0.7799 0.9021 -1.0 0.7278 0.6878 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 30 Step 293] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8355 0.8967 0.9471 1.0 0.7799 0.9021 -1.0 0.7278 0.6878 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 24.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.120024025 [Trial 30 Step 294] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8355 0.8967 0.9471 1.0 0.7799 0.9021 -1.0 0.7284 0.6884 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7184302 [Trial 30 Step 295] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8356 0.8967 0.9471 1.0 0.7806 0.9021 -1.0 0.7292 0.6892 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9225925 [Trial 30 Step 296] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8339 0.8967 0.9471 1.0 0.7789 0.9021 -1.0 0.7274 0.6875 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.49242067 [Trial 30 Step 297] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8348 0.8967 0.9471 1.0 0.7798 0.9021 -1.0 0.7284 0.6884 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.44548798 [Trial 30 Step 298] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8352 0.8971 0.9471 1.0 0.7802 0.9021 -1.0 0.7288 0.6888 0.6938 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 27.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.55790216 [Trial 30 Step 299] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8358 0.8977 0.9477 1.0 0.7808 0.9027 -1.0 0.7293 0.6893 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 31 Step 300] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8358 0.8977 0.9477 1.0 0.7808 0.9027 -1.0 0.7293 0.6893 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 25.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.939198 [Trial 31 Step 301] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8358 0.8977 0.9477 1.0 0.7808 0.9027 -1.0 0.7274 0.6874 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 26.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8526054 [Trial 31 Step 302] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8358 0.8977 0.9477 1.0 0.7808 0.9027 -1.0 0.7281 0.6881 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.45195752 [Trial 31 Step 303] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8359 0.8977 0.9477 1.0 0.7815 0.9027 -1.0 0.7287 0.6887 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.29109251 [Trial 31 Step 304] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8367 0.8977 0.9477 1.0 0.7823 0.9027 -1.0 0.7296 0.6896 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.41458207 [Trial 31 Step 305] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8371 0.8981 0.9477 1.0 0.7827 0.9027 -1.0 0.7300 0.6900 0.6943 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 28.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.13986361 [Trial 31 Step 306] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8376 0.8985 0.9481 1.0 0.7832 0.9031 -1.0 0.7305 0.6905 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 32 Step 307] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8376 0.8985 0.9481 1.0 0.7832 0.9031 -1.0 0.7305 0.6905 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 27.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8177852 [Trial 32 Step 308] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8376 0.8985 0.9481 1.0 0.7832 0.9031 -1.0 0.7310 0.6910 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.014573276 [Trial 32 Step 309] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8377 0.8985 0.9481 1.0 0.7838 0.9031 -1.0 0.7317 0.6917 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4027412 [Trial 32 Step 310] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8385 0.8985 0.9481 1.0 0.7846 0.9031 -1.0 0.7325 0.6925 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6829335 [Trial 32 Step 311] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8389 0.8989 0.9481 1.0 0.7850 0.9031 -1.0 0.7328 0.6928 0.6947 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 29.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.3597504 [Trial 32 Step 312] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8393 0.8994 0.9486 1.0 0.7855 0.9036 -1.0 0.7333 0.6933 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 33 Step 313] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8994 0.9486 1.0 0.7855 0.9036 -1.0 0.7333 0.6933 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 28.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.20890993 [Trial 33 Step 314] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8393 0.8994 0.9486 1.0 0.7855 0.9036 -1.0 0.7338 0.6938 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9690078 [Trial 33 Step 315] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8391 0.8994 0.9486 1.0 0.7838 0.9036 -1.0 0.7321 0.6921 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.23592323 [Trial 33 Step 316] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8392 0.8994 0.9486 1.0 0.7844 0.9036 -1.0 0.7327 0.6927 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.64823437 [Trial 33 Step 317] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.8994 0.9486 1.0 0.7851 0.9036 -1.0 0.7335 0.6935 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5555841 [Trial 33 Step 318] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8997 0.9486 1.0 0.7855 0.9036 -1.0 0.7338 0.6938 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 30.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8164121 [Trial 33 Step 319] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8370 0.8964 0.9453 1.0 0.7823 0.9003 -1.0 0.7307 0.6908 0.6921 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 9.0; <-: 3.0; v: 6.0 s = (3,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.3011495 [Trial 33 Step 320] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8371 0.8965 0.9454 1.0 0.7823 0.9010 -1.0 0.7306 0.6906 0.6927 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 31.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.7443539 [Trial 33 Step 321] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8376 0.8971 0.9460 1.0 0.7828 0.9015 -1.0 0.7311 0.6911 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 34 Step 322] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8376 0.8971 0.9460 1.0 0.7828 0.9015 -1.0 0.7311 0.6911 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 29.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8843322 [Trial 34 Step 323] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8376 0.8971 0.9460 1.0 0.7828 0.9015 -1.0 0.7316 0.6916 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.40101492 [Trial 34 Step 324] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8377 0.8971 0.9460 1.0 0.7834 0.9015 -1.0 0.7322 0.6922 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.105452895 [Trial 34 Step 325] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8384 0.8971 0.9460 1.0 0.7841 0.9015 -1.0 0.7329 0.6929 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.26342684 [Trial 34 Step 326] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8387 0.8974 0.9460 1.0 0.7844 0.9015 -1.0 0.7332 0.6932 0.6932 -0.024 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 32.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.48257893 [Trial 34 Step 327] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8392 0.8979 0.9465 1.0 0.7849 0.9020 -1.0 0.7337 0.6937 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 35 Step 328] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8392 0.8979 0.9465 1.0 0.7849 0.9020 -1.0 0.7337 0.6937 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 30.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.6918506 [Trial 35 Step 329] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8392 0.8979 0.9465 1.0 0.7849 0.9020 -1.0 0.7342 0.6942 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.15252131 [Trial 35 Step 330] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8393 0.8979 0.9465 1.0 0.7855 0.9020 -1.0 0.7347 0.6947 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.042528093 [Trial 35 Step 331] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.8979 0.9465 1.0 0.7861 0.9020 -1.0 0.7353 0.6953 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8079738 [Trial 35 Step 332] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8385 0.8965 0.9465 1.0 0.7847 0.9020 -1.0 0.7339 0.6939 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.29769653 [Trial 35 Step 333] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8388 0.8968 0.9465 1.0 0.7850 0.9020 -1.0 0.7343 0.6943 0.6937 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 33.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.1381638 [Trial 35 Step 334] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8393 0.8973 0.9469 1.0 0.7855 0.9025 -1.0 0.7347 0.6947 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 36 Step 335] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8973 0.9469 1.0 0.7855 0.9025 -1.0 0.7347 0.6947 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 31.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.6200245 [Trial 36 Step 336] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8393 0.8973 0.9469 1.0 0.7855 0.9025 -1.0 0.7351 0.6951 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.30892158 [Trial 36 Step 337] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8393 0.8973 0.9469 1.0 0.7860 0.9025 -1.0 0.7356 0.6956 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.48951554 [Trial 36 Step 338] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.8973 0.9469 1.0 0.7866 0.9025 -1.0 0.7362 0.6962 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.26423204 [Trial 36 Step 339] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8976 0.9469 1.0 0.7869 0.9025 -1.0 0.7366 0.6965 0.6941 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 34.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.40174848 [Trial 36 Step 340] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8407 0.8980 0.9474 1.0 0.7874 0.9029 -1.0 0.7370 0.6970 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 37 Step 341] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8407 0.8980 0.9474 1.0 0.7874 0.9029 -1.0 0.7370 0.6970 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 32.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.98366195 [Trial 37 Step 342] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8407 0.8980 0.9474 1.0 0.7874 0.9029 -1.0 0.7355 0.6955 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 33.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.58179265 [Trial 37 Step 343] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8407 0.8980 0.9474 1.0 0.7874 0.9029 -1.0 0.7359 0.6959 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.87876475 [Trial 37 Step 344] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8406 0.8980 0.9474 1.0 0.7859 0.9029 -1.0 0.7345 0.6945 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.49987537 [Trial 37 Step 345] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8406 0.8980 0.9474 1.0 0.7864 0.9029 -1.0 0.7350 0.6950 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.16874641 [Trial 37 Step 346] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8412 0.8980 0.9474 1.0 0.7870 0.9029 -1.0 0.7355 0.6955 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.038930953 [Trial 37 Step 347] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8415 0.8983 0.9474 1.0 0.7873 0.9029 -1.0 0.7358 0.6958 0.6945 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 35.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.054353952 [Trial 37 Step 348] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8419 0.8987 0.9478 1.0 0.7877 0.9033 -1.0 0.7362 0.6962 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 38 Step 349] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8419 0.8987 0.9478 1.0 0.7877 0.9033 -1.0 0.7362 0.6962 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 34.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.57275325 [Trial 38 Step 350] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8419 0.8987 0.9478 1.0 0.7877 0.9033 -1.0 0.7366 0.6966 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.03116405 [Trial 38 Step 351] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8419 0.8987 0.9478 1.0 0.7882 0.9033 -1.0 0.7371 0.6971 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5883258 [Trial 38 Step 352] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8424 0.8987 0.9478 1.0 0.7887 0.9033 -1.0 0.7377 0.6977 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.48373258 [Trial 38 Step 353] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8427 0.8990 0.9478 1.0 0.7890 0.9033 -1.0 0.7379 0.6979 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 36.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.68114704 [Trial 38 Step 354] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8431 0.8994 0.9481 1.0 0.7894 0.9037 -1.0 0.7383 0.6983 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 39 Step 355] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8431 0.8994 0.9481 1.0 0.7894 0.9037 -1.0 0.7383 0.6983 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 35.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.2989782 [Trial 39 Step 356] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8431 0.8994 0.9481 1.0 0.7894 0.9037 -1.0 0.7387 0.6987 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.04677373 [Trial 39 Step 357] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8431 0.8994 0.9481 1.0 0.7898 0.9037 -1.0 0.7391 0.6991 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.60182935 [Trial 39 Step 358] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8436 0.8994 0.9481 1.0 0.7903 0.9037 -1.0 0.7396 0.6996 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.900988 [Trial 39 Step 359] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8424 0.8981 0.9481 1.0 0.7891 0.9037 -1.0 0.7384 0.6984 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8728855 [Trial 39 Step 360] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8411 0.8969 0.9481 1.0 0.7878 0.9037 -1.0 0.7371 0.6972 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.70168877 [Trial 39 Step 361] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8415 0.8972 0.9481 1.0 0.7881 0.9037 -1.0 0.7375 0.6975 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 37.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.45831978 [Trial 39 Step 362] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8418 0.8976 0.9485 1.0 0.7885 0.9041 -1.0 0.7378 0.6978 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 40 Step 363] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8418 0.8976 0.9485 1.0 0.7885 0.9041 -1.0 0.7378 0.6978 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 36.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.71526927 [Trial 40 Step 364] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8418 0.8976 0.9485 1.0 0.7885 0.9041 -1.0 0.7382 0.6982 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.3509425 [Trial 40 Step 365] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8419 0.8976 0.9485 1.0 0.7889 0.9041 -1.0 0.7386 0.6986 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.08651048 [Trial 40 Step 366] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8423 0.8976 0.9485 1.0 0.7894 0.9041 -1.0 0.7391 0.6991 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.77592665 [Trial 40 Step 367] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8427 0.8979 0.9485 1.0 0.7897 0.9041 -1.0 0.7394 0.6994 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 38.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.646836 [Trial 40 Step 368] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8430 0.8983 0.9488 1.0 0.7900 0.9044 -1.0 0.7397 0.6997 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 41 Step 369] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8430 0.8983 0.9488 1.0 0.7900 0.9044 -1.0 0.7397 0.6997 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 37.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9218047 [Trial 41 Step 370] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8430 0.8983 0.9488 1.0 0.7900 0.9044 -1.0 0.7384 0.6984 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 38.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.20714647 [Trial 41 Step 371] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8430 0.8983 0.9488 1.0 0.7900 0.9044 -1.0 0.7388 0.6988 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.18994164 [Trial 41 Step 372] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8430 0.8983 0.9488 1.0 0.7905 0.9044 -1.0 0.7392 0.6992 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.11623496 [Trial 41 Step 373] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8435 0.8983 0.9488 1.0 0.7909 0.9044 -1.0 0.7396 0.6996 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.054790854 [Trial 41 Step 374] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8438 0.8986 0.9488 1.0 0.7912 0.9044 -1.0 0.7399 0.6999 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 39.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5228618 [Trial 41 Step 375] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8441 0.8989 0.9492 1.0 0.7915 0.9047 -1.0 0.7402 0.7002 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 42 Step 376] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8441 0.8989 0.9492 1.0 0.7915 0.9047 -1.0 0.7402 0.7002 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 39.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.61680365 [Trial 42 Step 377] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8441 0.8989 0.9492 1.0 0.7915 0.9047 -1.0 0.7406 0.7006 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.61940634 [Trial 42 Step 378] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8441 0.8989 0.9492 1.0 0.7919 0.9047 -1.0 0.7410 0.7010 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.47013748 [Trial 42 Step 379] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8445 0.8989 0.9492 1.0 0.7923 0.9047 -1.0 0.7414 0.7014 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.7734075 [Trial 42 Step 380] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8448 0.8992 0.9492 1.0 0.7926 0.9047 -1.0 0.7417 0.7017 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 40.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.427876 [Trial 42 Step 381] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8451 0.8995 0.9495 1.0 0.7929 0.9050 -1.0 0.7420 0.7020 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 43 Step 382] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8451 0.8995 0.9495 1.0 0.7929 0.9050 -1.0 0.7420 0.7020 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 40.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.35181618 [Trial 43 Step 383] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8451 0.8995 0.9495 1.0 0.7929 0.9050 -1.0 0.7423 0.7023 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.80781615 [Trial 43 Step 384] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8450 0.8995 0.9495 1.0 0.7917 0.9050 -1.0 0.7411 0.7011 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.5401904 [Trial 43 Step 385] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8450 0.8995 0.9495 1.0 0.7921 0.9050 -1.0 0.7415 0.7015 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8832242 [Trial 43 Step 386] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8425 0.8995 0.9495 1.0 0.7895 0.9050 -1.0 0.7389 0.6989 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7697841 [Trial 43 Step 387] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8425 0.8995 0.9495 1.0 0.7899 0.9050 -1.0 0.7393 0.6993 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5417027 [Trial 43 Step 388] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8430 0.8995 0.9495 1.0 0.7903 0.9050 -1.0 0.7397 0.6997 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9018679 [Trial 43 Step 389] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8418 0.8983 0.9495 1.0 0.7892 0.9050 -1.0 0.7386 0.6986 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.52424866 [Trial 43 Step 390] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8421 0.8986 0.9495 1.0 0.7895 0.9050 -1.0 0.7389 0.6989 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 41.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.015510201 [Trial 43 Step 391] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7392 0.6992 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 44 Step 392] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7392 0.6992 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 41.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.08683485 [Trial 44 Step 393] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7369 0.6969 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 3.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: <-; rnd # 0.8159834 [Trial 44 Step 394] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7357 0.6823 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 4.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: <-; rnd # 0.73913836 [Trial 44 Step 395] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7360 0.6860 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 42.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.95702994 [Trial 44 Step 396] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7348 0.6848 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 43.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8156845 [Trial 44 Step 397] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8424 0.8989 0.9497 1.0 0.7898 0.9053 -1.0 0.7352 0.6852 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.75332403 [Trial 44 Step 398] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8425 0.8989 0.9497 1.0 0.7902 0.9053 -1.0 0.7356 0.6856 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.94756395 [Trial 44 Step 399] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8414 0.8989 0.9497 1.0 0.7891 0.9053 -1.0 0.7345 0.6845 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9535378 [Trial 44 Step 400] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8989 0.9497 1.0 0.7880 0.9053 -1.0 0.7334 0.6834 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.19516534 [Trial 44 Step 401] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8989 0.9497 1.0 0.7885 0.9053 -1.0 0.7339 0.6839 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4089148 [Trial 44 Step 402] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8411 0.8992 0.9497 1.0 0.7888 0.9053 -1.0 0.7342 0.6842 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 42.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.13775057 [Trial 44 Step 403] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8413 0.8995 0.9500 1.0 0.7890 0.9056 -1.0 0.7345 0.6844 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 45 Step 404] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8413 0.8995 0.9500 1.0 0.7890 0.9056 -1.0 0.7345 0.6844 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 44.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.5738495 [Trial 45 Step 405] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8413 0.8995 0.9500 1.0 0.7890 0.9056 -1.0 0.7349 0.6849 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.35546398 [Trial 45 Step 406] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8414 0.8995 0.9500 1.0 0.7894 0.9056 -1.0 0.7352 0.6852 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9943965 [Trial 45 Step 407] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8995 0.9500 1.0 0.7883 0.9056 -1.0 0.7342 0.6842 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.3075812 [Trial 45 Step 408] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8995 0.9500 1.0 0.7888 0.9056 -1.0 0.7346 0.6846 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9229744 [Trial 45 Step 409] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8398 0.8984 0.9500 1.0 0.7878 0.9056 -1.0 0.7336 0.6836 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.74633133 [Trial 45 Step 410] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8401 0.8987 0.9500 1.0 0.7881 0.9056 -1.0 0.7339 0.6839 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 43.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.11554921 [Trial 45 Step 411] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8403 0.8990 0.9503 1.0 0.7883 0.9058 -1.0 0.7341 0.6841 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 46 Step 412] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8403 0.8990 0.9503 1.0 0.7883 0.9058 -1.0 0.7341 0.6841 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 45.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.5798638 [Trial 46 Step 413] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8403 0.8990 0.9503 1.0 0.7883 0.9058 -1.0 0.7345 0.6845 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.09873569 [Trial 46 Step 414] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8990 0.9503 1.0 0.7886 0.9058 -1.0 0.7349 0.6849 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.78342116 [Trial 46 Step 415] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8990 0.9503 1.0 0.7891 0.9058 -1.0 0.7353 0.6853 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.69360405 [Trial 46 Step 416] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8411 0.8993 0.9503 1.0 0.7894 0.9058 -1.0 0.7356 0.6856 0.6972 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 44.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.88912654 [Trial 46 Step 417] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8389 0.8971 0.9481 1.0 0.7872 0.9037 -1.0 0.7335 0.6836 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 10.0; <-: 3.0; v: 6.0 s = (3,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.56510156 [Trial 46 Step 418] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8390 0.8972 0.9482 1.0 0.7873 0.9042 -1.0 0.7335 0.6835 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 45.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.45947385 [Trial 46 Step 419] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8393 0.8974 0.9484 1.0 0.7876 0.9044 -1.0 0.7338 0.6838 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 47 Step 420] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8974 0.9484 1.0 0.7876 0.9044 -1.0 0.7338 0.6838 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 46.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.89056474 [Trial 47 Step 421] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8393 0.8974 0.9484 1.0 0.7876 0.9044 -1.0 0.7342 0.6842 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.57270217 [Trial 47 Step 422] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8393 0.8974 0.9484 1.0 0.7879 0.9044 -1.0 0.7345 0.6845 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.92751884 [Trial 47 Step 423] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8383 0.8974 0.9484 1.0 0.7869 0.9044 -1.0 0.7335 0.6835 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.23108214 [Trial 47 Step 424] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8388 0.8974 0.9484 1.0 0.7873 0.9044 -1.0 0.7339 0.6839 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8409691 [Trial 47 Step 425] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8378 0.8964 0.9484 1.0 0.7863 0.9044 -1.0 0.7329 0.6830 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.48246777 [Trial 47 Step 426] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8381 0.8967 0.9484 1.0 0.7866 0.9044 -1.0 0.7332 0.6832 0.6959 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 46.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.092326105 [Trial 47 Step 427] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8383 0.8970 0.9487 1.0 0.7869 0.9047 -1.0 0.7335 0.6835 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 48 Step 428] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8383 0.8970 0.9487 1.0 0.7869 0.9047 -1.0 0.7335 0.6835 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 47.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.29555285 [Trial 48 Step 429] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8383 0.8970 0.9487 1.0 0.7869 0.9047 -1.0 0.7338 0.6838 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9224449 [Trial 48 Step 430] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8383 0.8970 0.9487 1.0 0.7859 0.9047 -1.0 0.7328 0.6828 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.29607892 [Trial 48 Step 431] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8383 0.8970 0.9487 1.0 0.7862 0.9047 -1.0 0.7331 0.6831 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.029644608 [Trial 48 Step 432] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8387 0.8970 0.9487 1.0 0.7866 0.9047 -1.0 0.7336 0.6836 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8637152 [Trial 48 Step 433] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8378 0.8960 0.9487 1.0 0.7857 0.9047 -1.0 0.7326 0.6826 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.76523423 [Trial 48 Step 434] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8381 0.8963 0.9487 1.0 0.7860 0.9047 -1.0 0.7329 0.6829 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 47.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.11916906 [Trial 48 Step 435] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8383 0.8966 0.9490 1.0 0.7862 0.9050 -1.0 0.7331 0.6831 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 49 Step 436] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8383 0.8966 0.9490 1.0 0.7862 0.9050 -1.0 0.7331 0.6831 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 48.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.10751909 [Trial 49 Step 437] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8383 0.8966 0.9490 1.0 0.7862 0.9050 -1.0 0.7335 0.6835 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.6987751 [Trial 49 Step 438] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8383 0.8966 0.9490 1.0 0.7865 0.9050 -1.0 0.7338 0.6838 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.609909 [Trial 49 Step 439] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8388 0.8966 0.9490 1.0 0.7870 0.9050 -1.0 0.7342 0.6842 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6946896 [Trial 49 Step 440] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8391 0.8969 0.9490 1.0 0.7872 0.9050 -1.0 0.7345 0.6845 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 48.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.07312971 [Trial 49 Step 441] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7875 0.9053 -1.0 0.7347 0.6847 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 50 Step 442] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7875 0.9053 -1.0 0.7347 0.6847 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 49.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.029454768 [Trial 50 Step 443] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7875 0.9053 -1.0 0.7325 0.6825 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 5.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: <-; rnd # 0.579213 [Trial 50 Step 444] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7875 0.9053 -1.0 0.7327 0.6847 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 50.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.17488319 [Trial 50 Step 445] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7875 0.9053 -1.0 0.7331 0.6851 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.40138698 [Trial 50 Step 446] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8393 0.8972 0.9493 1.0 0.7878 0.9053 -1.0 0.7333 0.6853 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.72428405 [Trial 50 Step 447] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8397 0.8972 0.9493 1.0 0.7882 0.9053 -1.0 0.7337 0.6857 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9939216 [Trial 50 Step 448] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8388 0.8962 0.9493 1.0 0.7873 0.9053 -1.0 0.7328 0.6848 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.45847535 [Trial 50 Step 449] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8391 0.8965 0.9493 1.0 0.7876 0.9053 -1.0 0.7331 0.6851 0.6967 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 49.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6670309 [Trial 50 Step 450] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8394 0.8968 0.9495 1.0 0.7878 0.9055 -1.0 0.7334 0.6853 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 51 Step 451] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8394 0.8968 0.9495 1.0 0.7878 0.9055 -1.0 0.7334 0.6853 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 51.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.11525929 [Trial 51 Step 452] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8394 0.8968 0.9495 1.0 0.7878 0.9055 -1.0 0.7337 0.6857 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7964942 [Trial 51 Step 453] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8394 0.8968 0.9495 1.0 0.7881 0.9055 -1.0 0.7340 0.6860 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.18432522 [Trial 51 Step 454] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8398 0.8968 0.9495 1.0 0.7885 0.9055 -1.0 0.7344 0.6864 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.40649462 [Trial 51 Step 455] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8400 0.8971 0.9495 1.0 0.7887 0.9055 -1.0 0.7346 0.6866 0.6969 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 50.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.13099879 [Trial 51 Step 456] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8403 0.8973 0.9497 1.0 0.7890 0.9057 -1.0 0.7349 0.6869 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 52 Step 457] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8403 0.8973 0.9497 1.0 0.7890 0.9057 -1.0 0.7349 0.6869 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 52.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.16594112 [Trial 52 Step 458] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8403 0.8973 0.9497 1.0 0.7890 0.9057 -1.0 0.7352 0.6872 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.12741536 [Trial 52 Step 459] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8973 0.9497 1.0 0.7892 0.9057 -1.0 0.7355 0.6875 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.2813933 [Trial 52 Step 460] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8407 0.8973 0.9497 1.0 0.7896 0.9057 -1.0 0.7358 0.6878 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.0054851174 [Trial 52 Step 461] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8409 0.8976 0.9497 1.0 0.7899 0.9057 -1.0 0.7361 0.6881 0.6971 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 51.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.43166333 [Trial 52 Step 462] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8412 0.8978 0.9499 1.0 0.7901 0.9059 -1.0 0.7363 0.6883 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 53 Step 463] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8412 0.8978 0.9499 1.0 0.7901 0.9059 -1.0 0.7363 0.6883 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 53.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.6287128 [Trial 53 Step 464] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8412 0.8978 0.9499 1.0 0.7901 0.9059 -1.0 0.7366 0.6886 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.22581679 [Trial 53 Step 465] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8412 0.8978 0.9499 1.0 0.7903 0.9059 -1.0 0.7369 0.6889 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.09759349 [Trial 53 Step 466] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8415 0.8978 0.9499 1.0 0.7907 0.9059 -1.0 0.7372 0.6892 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.79513836 [Trial 53 Step 467] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8418 0.8980 0.9499 1.0 0.7910 0.9059 -1.0 0.7375 0.6895 0.6973 -0.021 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 52.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8627702 [Trial 53 Step 468] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8400 0.8962 0.9481 1.0 0.7892 0.9041 -1.0 0.7357 0.6878 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 11.0; <-: 3.0; v: 6.0 s = (3,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.22202873 [Trial 53 Step 469] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8400 0.8963 0.9482 1.0 0.7892 0.9045 -1.0 0.7357 0.6877 0.6960 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 53.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6335859 [Trial 53 Step 470] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8403 0.8965 0.9484 1.0 0.7894 0.9048 -1.0 0.7360 0.6880 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 54 Step 471] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8403 0.8965 0.9484 1.0 0.7894 0.9048 -1.0 0.7360 0.6880 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 54.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.22140193 [Trial 54 Step 472] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8403 0.8965 0.9484 1.0 0.7894 0.9048 -1.0 0.7363 0.6883 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9022464 [Trial 54 Step 473] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8402 0.8965 0.9484 1.0 0.7885 0.9048 -1.0 0.7354 0.6874 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 62.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7992031 [Trial 54 Step 474] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8402 0.8965 0.9484 1.0 0.7888 0.9048 -1.0 0.7356 0.6876 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.07995635 [Trial 54 Step 475] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8406 0.8965 0.9484 1.0 0.7891 0.9048 -1.0 0.7360 0.6880 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.19563586 [Trial 54 Step 476] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8968 0.9484 1.0 0.7894 0.9048 -1.0 0.7362 0.6882 0.6962 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 54.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.26371378 [Trial 54 Step 477] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8410 0.8970 0.9487 1.0 0.7896 0.9050 -1.0 0.7365 0.6884 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 55 Step 478] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8410 0.8970 0.9487 1.0 0.7896 0.9050 -1.0 0.7365 0.6884 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 55.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.76110566 [Trial 55 Step 479] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8410 0.8970 0.9487 1.0 0.7896 0.9050 -1.0 0.7367 0.6887 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.49342322 [Trial 55 Step 480] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8411 0.8970 0.9487 1.0 0.7899 0.9050 -1.0 0.7370 0.6890 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.085268795 [Trial 55 Step 481] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8414 0.8970 0.9487 1.0 0.7902 0.9050 -1.0 0.7373 0.6893 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.6417425 [Trial 55 Step 482] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8416 0.8972 0.9487 1.0 0.7904 0.9050 -1.0 0.7376 0.6896 0.6964 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 55.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.82290006 [Trial 55 Step 483] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8399 0.8955 0.9469 1.0 0.7887 0.9033 -1.0 0.7359 0.6880 0.6948 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 12.0; <-: 3.0; v: 6.0 s = (3,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.47007233 [Trial 55 Step 484] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.8955 0.9470 1.0 0.7887 0.9036 -1.0 0.7359 0.6879 0.6952 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 56.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.2893895 [Trial 55 Step 485] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8402 0.8958 0.9472 1.0 0.7890 0.9039 -1.0 0.7361 0.6881 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 56 Step 486] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8958 0.9472 1.0 0.7890 0.9039 -1.0 0.7361 0.6881 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 56.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.9764541 [Trial 56 Step 487] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8958 0.9472 1.0 0.7890 0.9039 -1.0 0.7353 0.6873 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 57.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.15441781 [Trial 56 Step 488] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8402 0.8958 0.9472 1.0 0.7890 0.9039 -1.0 0.7355 0.6875 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.13389993 [Trial 56 Step 489] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8402 0.8958 0.9472 1.0 0.7892 0.9039 -1.0 0.7358 0.6878 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.58266354 [Trial 56 Step 490] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8405 0.8958 0.9472 1.0 0.7895 0.9039 -1.0 0.7361 0.6881 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.61231995 [Trial 56 Step 491] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8960 0.9472 1.0 0.7898 0.9039 -1.0 0.7363 0.6883 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 57.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4952796 [Trial 56 Step 492] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8410 0.8963 0.9475 1.0 0.7900 0.9041 -1.0 0.7366 0.6886 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 57 Step 493] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8410 0.8963 0.9475 1.0 0.7900 0.9041 -1.0 0.7366 0.6886 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 58.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.13912255 [Trial 57 Step 494] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8410 0.8963 0.9475 1.0 0.7900 0.9041 -1.0 0.7369 0.6889 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.4111544 [Trial 57 Step 495] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8410 0.8963 0.9475 1.0 0.7903 0.9041 -1.0 0.7371 0.6891 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.7423581 [Trial 57 Step 496] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8413 0.8963 0.9475 1.0 0.7906 0.9041 -1.0 0.7374 0.6894 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8123277 [Trial 57 Step 497] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8405 0.8955 0.9475 1.0 0.7898 0.9041 -1.0 0.7366 0.6886 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5991053 [Trial 57 Step 498] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8408 0.8957 0.9475 1.0 0.7900 0.9041 -1.0 0.7368 0.6888 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 58.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9164183 [Trial 57 Step 499] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8400 0.8949 0.9467 1.0 0.7892 0.9034 -1.0 0.7360 0.6881 0.6949 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 59.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.382082 [Trial 57 Step 500] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7894 0.9036 -1.0 0.7363 0.6883 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 58 Step 501] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7894 0.9036 -1.0 0.7363 0.6883 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 59.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.026183963 [Trial 58 Step 502] cur state (2,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7894 0.9036 -1.0 0.7344 0.6864 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 6.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: <-; rnd # 0.65169704 [Trial 58 Step 503] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7894 0.9036 -1.0 0.7346 0.6879 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 60.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.14904642 [Trial 58 Step 504] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7894 0.9036 -1.0 0.7349 0.6882 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.96225965 [Trial 58 Step 505] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7886 0.9036 -1.0 0.7341 0.6874 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.3337683 [Trial 58 Step 506] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8402 0.8952 0.9469 1.0 0.7889 0.9036 -1.0 0.7343 0.6876 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.4958877 [Trial 58 Step 507] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8405 0.8952 0.9469 1.0 0.7891 0.9036 -1.0 0.7346 0.6879 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9836585 [Trial 58 Step 508] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8397 0.8944 0.9469 1.0 0.7884 0.9036 -1.0 0.7338 0.6872 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 67.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.090022266 [Trial 58 Step 509] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8399 0.8946 0.9469 1.0 0.7886 0.9036 -1.0 0.7340 0.6874 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 60.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.117801666 [Trial 58 Step 510] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8402 0.8949 0.9472 1.0 0.7888 0.9039 -1.0 0.7343 0.6876 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 59 Step 511] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8949 0.9472 1.0 0.7888 0.9039 -1.0 0.7343 0.6876 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 61.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.92336786 [Trial 59 Step 512] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8402 0.8949 0.9472 1.0 0.7888 0.9039 -1.0 0.7335 0.6868 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 62.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.31753153 [Trial 59 Step 513] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8402 0.8949 0.9472 1.0 0.7888 0.9039 -1.0 0.7338 0.6871 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 68.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.8160428 [Trial 59 Step 514] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8401 0.8949 0.9472 1.0 0.7880 0.9039 -1.0 0.7330 0.6863 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 69.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.46131545 [Trial 59 Step 515] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8401 0.8949 0.9472 1.0 0.7883 0.9039 -1.0 0.7332 0.6865 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 70.0; <-: 2.0; v: 2.0 s = (1,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.8717455 [Trial 59 Step 516] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8384 0.8949 0.9472 1.0 0.7865 0.9039 -1.0 0.7314 0.6848 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 70.0; <-: 2.0; v: 2.0 s = (1,3) ->: 67.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.08101809 [Trial 59 Step 517] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8384 0.8949 0.9472 1.0 0.7867 0.9039 -1.0 0.7317 0.6850 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 67.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.71392465 [Trial 59 Step 518] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8387 0.8949 0.9472 1.0 0.7871 0.9039 -1.0 0.7320 0.6853 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.30175006 [Trial 59 Step 519] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8389 0.8951 0.9472 1.0 0.7873 0.9039 -1.0 0.7322 0.6855 0.6954 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 61.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.3604713 [Trial 59 Step 520] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8392 0.8954 0.9474 1.0 0.7875 0.9041 -1.0 0.7325 0.6858 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 60 Step 521] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8392 0.8954 0.9474 1.0 0.7875 0.9041 -1.0 0.7325 0.6858 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 63.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.7192468 [Trial 60 Step 522] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8392 0.8954 0.9474 1.0 0.7875 0.9041 -1.0 0.7328 0.6861 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 71.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.4697585 [Trial 60 Step 523] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8392 0.8954 0.9474 1.0 0.7878 0.9041 -1.0 0.7330 0.6863 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 68.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.2702275 [Trial 60 Step 524] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8395 0.8954 0.9474 1.0 0.7881 0.9041 -1.0 0.7333 0.6866 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.36815637 [Trial 60 Step 525] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8397 0.8956 0.9474 1.0 0.7883 0.9041 -1.0 0.7335 0.6868 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 62.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.13399202 [Trial 60 Step 526] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8399 0.8958 0.9477 1.0 0.7885 0.9043 -1.0 0.7337 0.6871 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 61 Step 527] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8399 0.8958 0.9477 1.0 0.7885 0.9043 -1.0 0.7337 0.6871 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 64.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.3788138 [Trial 61 Step 528] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8399 0.8958 0.9477 1.0 0.7885 0.9043 -1.0 0.7340 0.6874 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 72.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.10004538 [Trial 61 Step 529] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8400 0.8958 0.9477 1.0 0.7887 0.9043 -1.0 0.7342 0.6876 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 69.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.068799615 [Trial 61 Step 530] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8402 0.8958 0.9477 1.0 0.7890 0.9043 -1.0 0.7345 0.6879 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.567874 [Trial 61 Step 531] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8405 0.8960 0.9477 1.0 0.7892 0.9043 -1.0 0.7347 0.6881 0.6958 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 63.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.9269457 [Trial 61 Step 532] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8397 0.8953 0.9469 1.0 0.7885 0.9036 -1.0 0.7340 0.6874 0.6951 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 64.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.06548303 [Trial 61 Step 533] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8400 0.8955 0.9472 1.0 0.7887 0.9038 -1.0 0.7342 0.6876 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 62 Step 534] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8400 0.8955 0.9472 1.0 0.7887 0.9038 -1.0 0.7342 0.6876 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 65.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.17793757 [Trial 62 Step 535] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8400 0.8955 0.9472 1.0 0.7887 0.9038 -1.0 0.7345 0.6879 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 73.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.73915154 [Trial 62 Step 536] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8400 0.8955 0.9472 1.0 0.7889 0.9038 -1.0 0.7347 0.6881 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 70.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.41155386 [Trial 62 Step 537] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8403 0.8955 0.9472 1.0 0.7892 0.9038 -1.0 0.7350 0.6883 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.46690035 [Trial 62 Step 538] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8405 0.8957 0.9472 1.0 0.7894 0.9038 -1.0 0.7352 0.6885 0.6953 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 65.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.5470047 [Trial 62 Step 539] cur state (4,3); rw 1.0 State utilities (0 means unknown): 0.8407 0.8960 0.9474 1.0 0.7897 0.9041 -1.0 0.7354 0.6888 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- [Trial 63 Step 540] cur state (1,1); rw -0.04 State utilities (0 means unknown): 0.8407 0.8960 0.9474 1.0 0.7897 0.9041 -1.0 0.7354 0.6888 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 66.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.31636113 [Trial 63 Step 541] cur state (1,2); rw -0.04 State utilities (0 means unknown): 0.8407 0.8960 0.9474 1.0 0.7897 0.9041 -1.0 0.7357 0.6890 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 74.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.636491 [Trial 63 Step 542] cur state (1,3); rw -0.04 State utilities (0 means unknown): 0.8407 0.8960 0.9474 1.0 0.7899 0.9041 -1.0 0.7359 0.6892 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 75.0; <-: 2.0; v: 2.0 s = (1,3) ->: 71.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.528228 [Trial 63 Step 543] cur state (2,3); rw -0.04 State utilities (0 means unknown): 0.8410 0.8960 0.9474 1.0 0.7901 0.9041 -1.0 0.7362 0.6895 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 75.0; <-: 2.0; v: 2.0 s = (1,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.17996132 [Trial 63 Step 544] cur state (3,3); rw -0.04 State utilities (0 means unknown): 0.8412 0.8962 0.9474 1.0 0.7903 0.9041 -1.0 0.7364 0.6897 0.6956 -0.022 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 75.0; <-: 2.0; v: 2.0 s = (1,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 73.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 66.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ->; rnd # 0.88314533 [Trial 63 Step 545] cur state (3,2); rw -0.04 State utilities (0 means unknown): 0.8397 0.8947 0.9459 1.0 0.7889 0.9026 -1.0 0.7349 0.6883 0.6942 -0.023 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 75.0; <-: 2.0; v: 2.0 s = (1,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 73.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 13.0; <-: 3.0; v: 6.0 s = (3,3) ->: 67.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- ^ <- Action: ^; rnd # 0.043481648 [Trial 63 Step 546] cur state (4,2); rw -1.0 State utilities (0 means unknown): 0.8187 0.8736 0.9248 1.0 0.7682 0.7337 -1.0 0.7156 0.6711 0.6341 -0.056 Frequency of state-action pairs (not shown if 0): s = (1,1) ->: 15.0; ^: 67.0; <-: 2.0; v: 2.0 s = (1,2) ->: 2.0; ^: 75.0; <-: 2.0; v: 2.0 s = (1,3) ->: 72.0; ^: 2.0; <-: 2.0; v: 2.0 s = (2,1) ->: 11.0; ^: 2.0; <-: 7.0; v: 2.0 s = (2,3) ->: 73.0; ^: 2.0; <-: 2.0; v: 2.0 s = (3,1) ->: 9.0; ^: 11.0; <-: 2.0; v: 2.0 s = (3,2) ->: 2.0; ^: 14.0; <-: 3.0; v: 6.0 s = (3,3) ->: 67.0; ^: 2.0; <-: 2.0; v: 2.0 s = (4,1) ->: 7.0; ^: 2.0; <-: 5.0; v: 2.0 Best poicy so far: -> -> -> 1.0 ^ ^ -1.0 ^ <- <- <-