All is learned –– every aspect of the blue-dot’s control is self-governed. Autonomous navigation is made possible by acquired experience from off-policy learning. Off-policy learning allows for learning by demonstration, something that would further improve performance. Observe the real-time improvement by comparing initial and final proficiency in each video!
Note that the WaterWorld environment involves temporal dynamics due to intertial mechanics, best described as `` driving on ice’’. Results from the WaterWorld environment is directly applilcable to maritime autonomy.
A video of recursive desire-structures – full category II autonomy. The first layer learns Q-values according to the place-cell representation. When actions have a Euclidean significance, state-action values can be interpreted as a desire vector. Desire vectors can establish desires for deeper neoRL nodes. Agent design is illustrated in the figure on the left of this page.
(Objective: Catch green dots, avoid red, until there are no green left –– upon which the board is reset. All behavior is learned – autonomous navigation) —-
For more Elements-of-Interest (EoI), the navigation challenge becomes difficult; here mastered by a PC + OVC collaborative neoRL agent with 0th order desires – where agent desire is directly governed by the location of EoI in each NRES modality. With all EoI being algorithmically defined, this video demonstrates an agent with category I autonomy.
For comparison, navigation by autonomous desires is possible, i.e., a full category II autonomy according to the illustration to my left. 8 elements of interest makes this challenge difficult for a human pilot, requiring constant vigilance for longer periods of time. The neoRL agent is capable of proficient navigation without becoming tired. See Towards neoRL networks; the emergence of purposive graphs