Heuristic algorithms.

On this lecture we present a survey of heuristic (approximate)
methods of discrete optimization: greedy algorithm, beam search, hill climbing.
A notion of neighborhood of problem solutions is also analyzed.

Let us recall a definition of discrete optimization problem:

Let *X* will be an arbitrary finite set (state space), let *f:X->R* -
a function on *X* (goal function). Then by an optimization problem we mean a problem of finding an element *x _{0}* in

*f(x*, for all_{0})=max( f(x) )*x*in the set*X**f(x*, for all_{0})=min( f(x) )*x*in the set*X*

Every discrete and finite optimization problem may be solved by
enumerative search through all possible solutions (all elements of state space *X*).
Hoverer, for many real-sized problems it is practically impossible. Thus, one
will need approximate algorithms, which will give a satisfactory result in
short time. This kind of methods we will call heuristics (heuristic
algorithms).

We will start with some exemplary optimization problems. They all
have easy formulation, high practical importance and lack of fast and accurate
solutions.

**Example**: Traveling salesman problem (TSP).

Let *G* be a given graph which edges are labeled by lengths.
Find the shortest path going exactly once through all vertices of *G* (if
one exists).

State space: all possible patches going exactly once through all
vertices. Size of the state space: at most *n!*, where *n* - the
number of vertices.

Goal function: total path length. Our aim is to minimize the goal
function.

The importance of good solutions of this problem is obvious for
transport companies. Some production optimizations may also be based on TSP
problem (e.g. to design a route of welding robot arm).

**Example**: Binary matrix covering problem.

Let a binary matrix *A={a _{ik}}* be given. Find a
smallest subset

State space: all possible column coverings of *A*. Size of the
state space: less than *2 ^{n}*, where

Goal function: the size of *B*.

This problem arises when we have a set of items which have various
features and we have to select a smallest subset of items having all needed
features. E.g. our company may sign contracts with many various suppliers
offering many various goods and we want to sign possibly small number of
contracts covering all our needs.

**Example**: Knapsack problem.

Suppose we have *n* items labeled by its weight *m _{i}*
and value

State space: all possible subsets of items. State space size: *2 ^{n}*.

Goal function: total value of items in knapsack (assuming the
weight limit is not violated).

**General principle**: the greedy algorithm builds the solution
gradually, on each step trying to achieve maximal (local) benefit.

**Example**: Courses selection problem.

We have a schedule of *n* courses, starting and finishing on a
given time. Find a maximal subset of non-intersecting courses.

Greedy algorithm: on each step select a course which finishes
earlier (excluding those which intersect with the previously selected ones). In
this case the maximal benefit is the maximal amount of time left.

Greedy algorithm gives optimal solution for this problem.

**Example**: Traveling salesman problem.

On each step go to the nearest unvisited vertex.

**Example**: Matrix covering problem.

On each step add to the solution a column which cover the most
non-covered rows.

**Example**: Knapsack problem.

On each step we add one item to the knapsack, such that the weight
limitation is not voided and the item have the highest value/kg.

As we could see, the greedy method is simple, intuitive, and
straightforward. On the other hand, it is in many cases suboptimal, as a greedy
selection of a solution (locally the best one) path may lead to a worse
(globally) solution. It is similar to a beginner chess player, analyzing the
game from the point of view of immediate benefits; such a player may be easily
trapped by more experienced one.

Hill climbing algorithm does not build a solution step by step, as the greedy algorithm
does. Every step of the hill climbing algorithm involves a complete solution
(better or worse) of our optimization problem. Hill climbing means a gradual
improvement of a given solution of the problem.

**General principle**: start from a random
solution of the problem, search through its neighbors and find one with the
highest value of goal function ("climb the hill"), repeat until a
local optimum is achieved (i.e. none of the neighbors has better value of *f*).

General scheme of the hill climbing algorithm:

*x _{0} = Random(X)
do*

*max=x _{0}
for(x in N(x0))*

*if(f(x)>f(max)) max=x*

*end for
if(max=x _{0}) break
x_{0}=max*

*while(1)*

The set *N(x)* above denotes a set of neighbors of a solution *x*.
Note that it is the neighborhood in state space - i.e. it is a kind of similarity
relation of the complete solutions of our problem (the similarity criteria are
defined by us). E.g. in traveling salesman problem we should not connect the
notion of neighborhood with the distance between vertices ("cities"),
because a vertice is not a complete solution of TSP. In this case we should
define somehow a distance (and neighborhood) between full paths; e.g. by
defining two paths as neighbors if they differ by a transposition of two
cities.

Hill climbing method is easy to implement and fast. On the other
hand, it falls easily in local optima, i.e. it may find a solution which is
much worse than optimal one, but it cannot be improved by simple operation. The
final result is highly connected with the starting point of the search.

General principle of tabu search: as in hill climbing, we
search neighbors of the current solution and go to the best neighbor, but we
exclude from our search a list of *k* recently visited solutions (e.g. *k*=1000).

The difference between tabu search and hill climbing is such that
we allow "going down the hill", i.e. to accept a neighbor worse than
the current solution (providing it is the best available neighbor). We can do
it safely since we will not get looped: we cannot return to the solution we
analyzed a while ago (it is on our tabu list). Therefore we will escape from
the local optimum and we have a chance of finding better optima.

**General principle**: we will build final solution step by step (as in greedy
algorithm), but we will find not only one locally best continuation (as in
greedy method), but *k* of them. On the next step we will start from these
*k* partial solutions concurrently and find the next best *k*
continuations. The method may be regarded as parallel greedy algorithm,
restricted to the best *k* paths on every step.

**Example**: Traveling salesman problem for 7 cities (k=3).

- Start from a random city
(vertice).
- Find k nearest cities.
- Start from all of them in
parallel, check a distance to all unvisited cities. Find k cities such
that the sum of distances of the first two steps is minimal.
- Repeat 3., remember the best k
partial paths on each step.
- On the last step, select the
best of the paths.

Beam search may be regarded as an improvement of greedy algorithm. In this
case we will not simply select the best (local) path to the solution, but we will
analyze several other possibilities. On the other hand, limited size of
"search beam" keeps the time efficiency of the algorithm (we avoid
the exponential growth of a number of possible paths).

To use a hill climbing or tabu search (or several other methods,
presented on the next lectures) we should define a notion of neighborhood on a
state space *X*. For each *x* (a potential solution of our problem,
or an element of the state space) we will define a finite set *N(x)* of
its neighbors. The set *N(x)* may be defined arbitrarily, although in
practice we should do it carefully and in connection with the optimization
problem. The principle of all methods similar to hill climbing is to perform
rather small, gradual changes to the solution, thus the definition of
neighborhood should reflect to such small changes.

Moreover, the neighborhood of a solution should not be too
numerous, otherwise algorithms (which check all neighbors in a loop) will be
too slow. On the other hand, the neighborhood should be a connected relation,
i.e. it should be possible to transit from any solution to all other solutions
by walking from one neighbor to another. Otherwise our algorithms would not be
able to search all the state space and may miss an optimal solution.

Example: X - a plane (discrete, for a given grid step e.g. 0.01).
As a neighbor of a point *x* we may regard any point which is not further
than by 0.01 in any direction.

Example: vertex covering problem.

Given a graph *G*, find the smallest set of vertices *B* adjacent to
all edges of *G*.

State space *X* is a family of all (potential) coverings, i.e.
all subsets of the set of vertices. We may regard two solutions as neighbors
when they differ by just one vertex. *Note that we are talking about the
whole solutions, not single vertices of G!*

The lecture was devoted to optimization techniques, which are based
on one of the following principles:

**Greedy principle**: build a solution
gradually, achieving as much as possible on each step.

**Neighborhood
search**: define a neighborhood relation (i.e. which
solutions we will regard as similar) and search through the state space by
"jumping" from one solution to another (the better one, if possible).

References:

- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein.
MIT Press/McGraw-Hill, 2001.*Introduction to algorithms.*

*See the text above for detailed definitions
(keywords are marked red).*

**hill climbing** - algorithm
which improves a given solution by searching its neighbors

**greedy algorithm** - algorithm which builds
a solution gradually, with the best (local) components

**goal function** - a real-valued function
which we have to optimize

**tabu search** - a modification of hill climbing
algorithm which allows to escape from local optima

**state space** - a set of all possible
solutions of the optimization problem

**beam search** - a modification of greedy
algorithm which allow to check many paths in parallel

**neighbor** - two different solutions of
the optimization problem will be regarded as neighbors, if they met some
(arbitrarily defined) similarity criteria

**optimization problem** - a task of finding
minimal or maximal value of a goal function on a discrete set (state space)

6.1. Find an example of traveling salesman problem (an instance of
the distance graph), for which a greedy algorithm will fail to find an optimal
path.

6.2. Solve the knapsack problem with hill climbing method (i.e.
define a neighborhood and a goal function).