Question:
What is the option within the SpamAssassin settings "Use auto-learned data:" and how will that affect my filtering?
Answer:
The SpamAssassin auto-learned data, also known as Bayesian learning or
simply "bayes", is a system wide adaptive learning system. It learns emerging
characteristics of both spam and non-spam messages that the
pre-configured tests don't catch.

To determine which messages to learn, it does a second scoring pass on
all messages passed through SpamAssassin (excluding blacklists,
whitelists, and a few other settings that might be manipulated by
individuals). If this score is below a configured threshold, it is
learned as non-spam, and above another threshold it is learned as spam.
This learning step is done irregardless if you have the "Use
auto-learned data" flag enabled, and does not affect the final score. When the "Use auto-learned data" flag is enabled, the following happens:
1. The value of different scores used within SpamAssassin is changed to take into account that additional scores from the bayes filtering may be added. These scores can be seen in the SpamAssassin Test and Scoring Chart under the bayes column.
2. The SpamAssassin bayesian classifier analyzes the message to compute
a probability that the contents are spam, based on previously learned
messages.
3. A score is chosen based on that calculated probability value and
added to the final score.
All messages that are received within the FutureQuest network that pass through SpamAssassin are used to "Auto-Learn" as well as messages from blacklisted IP addresses, as complied from Realtime Blackhole Listings.
|