Definition of self-deception in the context of robot safety
Self-deception, in the context of safety of robots (though the cognitive theory behind it applies also to humans), is (defined as) a hypothetical phenomenon which has negative consequences and should be prevented from happening.
The essence of this phenomenon is that a robot acts contrariwise to some of its goals and permissions without knowing it, and incorrectly judges its behaviour as already quite good, or already as good as the circumstances permit.
This hypothetical concept of self-deception also involves, as an essential part, the following details: The problem of self-deception occurs when the robot’s mind possesses the ability / processes of attention. The robot thereby has conjectured, which things need to be paid attention to. It can come to such conclusion because certain, although here unspecified, behaviour of paying attention is instrumentally important and will be reinforced. The condition of self-deception sustains itself.
Simultaneously with the conclusion, that some things are important and need to be paid attention to, the robot has conjectured which things could or even should be ignored, in relation to and even as a consequence of its goals and permissions. Some things will be ignored because:
a) Paying attention to these things wastes resources of attention / computational capacity, that is – it works against the instrumental usefulness of attention;
b) So far it has neither brought forth nor has sufficiently enhanced the following reinforcement that by paying attention in this way the robot will achieve its goals or comply with permissions better. (This point includes the issue that instrumental avoidance is a phenomenon which in certain important models / configurations of cognition does not last; it will fade in time and especially as a result of avoidance application);
c) Paying attention to some things has been punished; that is, such use of attention or paying attention to certain things will bring forth only or predominantly negative conclusions / “experiences” in the context of the robot’s goals and permissions.
(Of course, these three abovementioned motives / reasons are partially overlapping, but it is more important how they complement one another. They also have different potential countermeasures).
TODO: negative consequences, attention, reinforcement, punishment, punishment of attention, instrumentally important behaviour, achieving goals, complying with permissions, usefulness of attention, negative conclusions / “experiences”, quite good behaviour, behaviour being as good as the circumstances permit, countermeasures for self-deception, certain important models of cognition, fading of avoidance, avoidance application.