Page395
Bayesian Filtering
Bayesian filtering is named after Thomas Bayes, an English clergyman who devised several probability and statistical methods including “a simple mathematical formula used for calculating conditional probabilities” [36].
Bayesian filtering is commonly used to identify spam. Paul Gram described Bayesian filtering to identify spam in his paper “A Plan for Spam” (see www.paulgraham.com/spam.html). He described using a “corpus” of “spam” and “ham,” human-selected groups of spam and non-spam, respectively. He then used Bayesian filtering techniques to automatically assign a mathematical probability that certain “tokens” (words in the email) were indications of spam.
Genetic Algorithms and Programming
Genetic Algorithms and Programming fundamentally change the way software is developed: instead of being coded by a programmer, they evolve to solve a problem. Genetic Algorithms and Programming seek to replicate nature’s evolution, where animals evolve to solve problems. Genetic programming refers to creating entire software programs (usually in the form of Lisp source code); genetic algorithms refer to creating shorter pieces of code (represented as strings called chromosomes).
Both are automatically generated, and then “bred” through multiple generations to improve via Darwinian principles: “Genetic algorithms are search algorithms based on the mechanics of natural selection and natural genetics. They combine survival of the fittest among string structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search. In every generation, a new set of artificial creatures (strings) is created using bits and pieces of the fittest of the old; an occasional new part is tried for good measure. While randomized, genetic algorithms are no simple random walk. They efficiently exploit historical information to speculate on new search points with expected improved performance” [37].
Genetic programming creates random programs and assigns them a task of solving a problem. The fitness function describes how well they perform their task. Crossover “breeds” two programs together (swaps their code). Mutation introduces random changes in some programs. John R. Koza described the process in “Genetic Programming: On the Programming of Computers by Means of Natural Selection”. The process is summarized here:
- “Generate an initial population of random computer programs.
- Execute each program in the population and assign it a fitness value according to how well it solves the problem.
- Create a new population of computer programs.
- Copy the best existing programs.
- Create new computer programs by mutation.
- Create new computer programs by crossover (sexual reproduction)” [38].
Genetic Algorithms and Genetic Programming have been used to program a Pac-Man playing program, robotic soccer teams, networked intrusion detection systems, and many others.