[MUD-Dev] request for comments (was: Mud-Dev FAQ)

coder at ibm.net coder at ibm.net
Mon Jan 5 21:33:36 CET 1998


On 05/01/98 at 08:32 AM, Vadim Tkachenko <vadimt at 4cs.com> said: >Jon A.
Lambert wrote:
>> 
>>  Methods of handling coordinate spaces: neighborhoods, tree forms
>>  R-Trees, R*-Trees, 3d arrays, Quad/Oct trees

>What are R-Trees and R*-Trees?

--<cut>--

>From http://www.cs.cuhk.hk/~drsam/methods.html:
- --<cut>--
What is R-Tree

A R-Tree, proposed by Antonin Guttman[1], is an index structure for point
and spatial data at the same time. Insert, delete and search can be
intermixed without periodic reorganization. It uses a tuple to represent a
spatial data in the database. In order to retrieve the data, each tuple
has a unique identifier, tuple-identifier. At the leaf node of a R-Tree,
it has index record that can reference the spatial data. The index record
is (I, tuple-identifier). I is an n-dimensional rectangle and it is the
bounding rectangle of the spatial data indexed. This rectangle is also
known as minimal bounding rectangle, MBR. and each entry in
tuple-identifier is the upper and lower bounds, [upper, lower], of the
rectangle along the
dimension. Non-leaf nodes contain entries (I, childnode-pointer) where I
is the minimal rectangle bounding all the rectangles in the lower nodes'
entries. Childnode-pointer is the pointer to a lower node in the R-Tree.
Let M and m<=M/2 be the maximum and minimum number of entries can be
filled into one node respectively.

Properties of R-Tree

A R-Tree satisfies the following properties:

    A R-Tree is a height balance tree and all leaves are on the same
    level.

    Root node has at least two children unless it is the leaf node.
    Every non-leaf node contains between m and M entries unless it is
    the root.

    For each entries (I, childnode-pointer) in a non-leaf node, I is
    the smallest rectangle that spatially contains all rectangles in
    its child nodes.

    Every leaf node contains between m and M index records unless it
    is the root.

    For each index record (I, tuple-identifier) in a leaf node, I is
    the smallest rectangle that spatially contains the n-dimensional
    data object represented by the indicated tuple.

R*-Tree

The R-tree is based on a heuristic optimization. The optimization
criterion is to minimize the area of each enclosing rectangle in the inner
nodes. R*-Tree which incorporates a combined optimization of area, margin
and overlap of each bounding rectangle in the inner nodes was proposed in
[6]. For slightly higher implementation cost, it outperforms the existing
R-Tree variants.

    Minimizing the area covered by a bounding rectangle should
    minimize the dead space. This will improve performance since
    decisions which paths have to be traversed, can be taken on higher
    level

    Minimizing the overlap between bounding rectangles decreases the
    number of paths to be traversed.

    Minimizing the margin of a bounding rectangle will make the
    rectangle more quadratic. It is because for fixed area, the object
    with the smallest margin is the square. Quadratic rectangles can
    be packed easily and thus building a smaller rectangle.

VP-Tree

Conventional spatial index structures divide the multi-dimensional vector
space into partitions which have approximately the same number of data
points as each other. It facilitates in finding the nearest neighbor of a
given query point because it is only necessary to touch a small number of
partitions. Most partitioning methods are based on absolute coordinate
values of the vector space. R-Tree and R*-Tree described before use this
type of partitioning method. The structures partitioned in this way are
useful for queries based on absolute coordinates, like range queries.
However, in general, it does not maintain any distance information, such
as distance between points within a partition and the partition's
boundaries. Since this information is critical in pruning the search space
for
nearest-neighbor search, index structures using partitioning methods based
on absolute coordinate are thus not so useful for
multi-dimensional nearest-neighbor search.

Nearest-neighbor search by definition is to find out one point with
minimum point-to-point distance from a given query point, so it is natural
to use partitioning method based on relative distance rather than absolute
coordinate values. Vantage-Point tree, or VP-Tree, method was proposed by
Peter N.Yianilos. It uses the partitioning method based on relative
distance and aims for handling multi-dimensional nearest neighbor search.

As mentioned before, VP-Tree method bases the partitioning on the relative
distances among the data points, rather than their absolute coordinate
values. It also bases on a particular vantage
point. Actually, vantage point is nothing special but a point selected
from a vector space, or a set of data points. However, the choice of
vantage point plays an important role in the performance of indexing
algorithm.

- --<cut>--

And for QuadTrees:

- --<cut>--

QUADTREE IMAGES by Bob Glickstein

A ``quadtree'' is a means of encoding an image as a tree structure. Each
node of the tree has up to four children.  The root node represents the
entire image; its children represent the four quadrants of the entire
image; their children represent the sixteen subquadrants; the children of
those represent the sixty-four sub-subquadrants, and so on.

A leaf node corresponds to a single pixel, and is marked with the color of
that pixel (in this implementation, black or white only).  If a non-leaf
node has two or more children of the same color, then that color is stored
in the parent and the children are deleted.  Thus, if an entire quadrant
(subquadrant, sub-subquadrant, etc.) of the image is white, that
information can be stored in a single quadtree node, and all of the
children of that node can be removed.

For certain images, this encoding yields enormous savings in storage size.
 Such images are typically line drawings or other bitmaps with several
areas of solid black or white.  Images with a lot of dithering or
stippling, such as scanned images, tend to yield little or no savings in
space.

An amusing aspect of quadtrees is that they can be displayed according to
a depth-first or a breadth-first traversal of the tree.  In a depth-first
traversal, first the prodemonant color of the entire image is displayed;
then the predominant color of the first quadrant is displayed; then the
predominant color of the first subquadrant of the first quadrant, and so
on.  The user can watch the quadrants and subquadrants being drawn.  A
breadth-first traversal, however, is much more interesting.  Since it
displays first the predominant color of the entire image, followed by the
predominant colors of the four major quadrants, followed by the
predominant colors or the sixteen subquadrants, and so on, the effect is
one of a gradually resolving image with finer and finer detail at each
step.

- --<cut>--
--<cut>--

The recent neighborhoods thread is also probably worth reviewing as
another model.  I'd also suggest spending working thru the Stony Brook
site: 

  http://www.cs.sunysb.edu/~algorith/

It is a gold mine.

--
J C Lawrence                               Internet: claw at null.net
----------(*)                              Internet: coder at ibm.net
...Honourary Member of Clan McFud -- Teamer's Avenging Monolith...




More information about the mud-dev-archive mailing list