### operator: q-learn

```    description:
q-learn[iterations, alpha, gamma, op] sp
q-learn with respect to states given by op
and sp is the set of final/goal states

examples:
-- data from here: http://mnemstudio.org/path-finding-q-learning-tutorial.htm
step |0> => |4>
step |1> => |3> + |5>
step |2> => |3>
step |3> => |1> + |2> + |4>
step |4> => |0> + |3> + |5>
step |5> => |1> + |4> + |5>

reward |0> => |0>
reward |1> => |0>
reward |2> => |0>
reward |3> => |0>
reward |4> => |0>
reward |5> => |100>

-- learn the Q and norm-Q values:
q-learn[1000, 1, 0.8, step] |5>

-- now display the results in a table:
table[transition, norm-Q] ket-sort rel-kets[norm-Q]

+------------+--------+
| transition | norm-Q |
+------------+--------+
| 0 -> 4     | 80     |
| 1 -> 3     | 64.0   |
| 1 -> 5     | 100    |
| 2 -> 3     | 64.0   |
| 3 -> 1     | 80     |
| 3 -> 2     | 51.2   |
| 3 -> 4     | 80     |
| 4 -> 0     | 64.0   |
| 4 -> 3     | 64.0   |
| 4 -> 5     | 100    |
| 5 -> 1     | 80     |
| 5 -> 4     | 80     |
| 5 -> 5     | 100.0  |
+------------+--------+

-- next example, data from here (see the cheese example):
-- https://medium.freecodecamp.org/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe

step |A> => |B> + |D>
step |B> => |A> + |E> + |C>
step |C> => |B> + |F>
step |D> => |A> + |E>
step |E> => |E>
step |F> => |E> + |C> + |F>

reward |A> => |0>
reward |B> => |1>
reward |C> => |0>
reward |D> => |2>
reward |E> => |-10>
reward |F> => |10>

-- q-learn[iterations, alpha, gamma, op] set-of-terminal-states:
|null> => q-learn[1000, 1, 0.8, step] (|E> + |F>)

-- now display the transition table:
table[transition, norm-Q] ket-sort rel-kets[norm-Q]

+------------+--------+
| transition | norm-Q |
+------------+--------+
| A -> B     | 66.0   |
| A -> D     | 46.24  |
| B -> A     | 52.8   |
| B -> C     | 80     |
| B -> E     | -20.0  |
| C -> B     | 66.0   |
| C -> F     | 100    |
| D -> A     | 52.8   |
| D -> E     | -20.0  |
| E -> E     | -20.0  |
| F -> C     | 80.0   |
| F -> E     | -20.0  |
| F -> F     | 100.0  |
+------------+--------+

-- show the walk sequences:
walk |*> #=> q-walk |_self>
table[start, walk] rel-kets[step]

+-------+-------------------+
| start | walk              |
+-------+-------------------+
| A     | A . B . C . F     |
| B     | B . C . F         |
| C     | C . F             |
| D     | D . A . B . C . F |
| E     | E                 |
| F     | F                 |
+-------+-------------------+