So how does SpaceMonger's treemapping algorithm work? While
we can't give you the whole details in a short space, we
can give you an overview of it. The text below is somewhat
technical in nature, but any second-year computer-science student
should be able to follow it without too much difficulty.
The scanner and in-memory database
SpaceMonger maintains the disk's data as a tree of
ScanFile and ScanFolder objects in RAM. A
ScanFolder can be thought of as an array that stores
multiple ScanFiles. Each ScanFile object
contains information about a file, such as its name and size
and type, and a pointer to a possible deeper
ScanFolder.
ScanFolders, in addition to storing an array of
ScanFiles, also store a sum of the sizes of all the
data under them. During the scan, or when data changes, SpaceMonger
updates higher ScanFolder objects when their descendants get
added, changed, or removed to ensure that every ScanFolder
in the tree is always valid for the known data.
SpaceMonger maintains a "watch" on the disk (using the
ReadDirectoryChangesW API), updates the tree when the disk
changes, and notifies anything displaying parts of the disk
(the treemap, the pie chart, etc.) that they need to update
their displays accordingly.
The treemapper
SpaceMonger's treemapper is broken into three parts: The
layout engine, the renderer, and the widget. The layout engine
is responsible for determining where files should be positioned
in the treemap. The renderer is responsible for drawing files
and folders in the treemap. The widget is repsonsible for
accepting and processing user input, such as mouse clicks and
key presses.
(This design is vaguely model-view-controller-esque,
in that the model is the tree itself, the view is jointly
managed by all three parts of the treemapper, and the widget is
entirely responsible for the controller aspects of the design.)
These three components represent roughly 4,200 lines of code,
so again, we'll only give a broad overview to its design, mainly
focusing on the layout engine (the renderer and the widget have
a lot of details, but are conceptually fairly straightforward).
The layout engine is responsible for apportioning the space
in the display to files and folders. SpaceMonger does this
using a pair of recursive algorithms. Our tests determined that
a single good general-purpose algorithm beats special-purpose
algorithms, so SpaceMonger uses the same algorithms to divide
up space no matter how that space is structured.
The layout algorithm is embodied in a function named
LayoutFolder, which is given a folder from the tree and
the coordinates of the folder** in the
display. If the folder is the root of the tree, then the
coordinates are those of the entire display.
LayoutFolder starts by sorting the files within
the folder by size (it uses a counting sort and radix sort
to do this especially quickly, but any sorting algorithm would
work). It then takes the sorted files and passes them and the
folder's coordinates to a
function named DivideDisplayArea. DivideDisplayArea
does the majority of the hard layout work. If there's only one
file or less, the file obviously consumes all of the folder's
space. If there are two or more, though, its task is a bit more
complicated.
DivideDisplayArea takes all of the files it's given,
and, using a greedy algorithm, divides them into two lists, A and
B, whose total sizes are nearly equal***. Then
the available area is divided in half, with each sub-area proportional
to the total size of each list that must fit in it. This split is done horizontally
if the available area has a high aspect ratio, and vertically if
it has a low aspect ratio. The remaining partial lists and
partial areas are then fed back into DivideDisplayArea,
which will eventually recurse until each area only has one file
to place inside it.
When every file has been positioned, DivideDisplayArea
returns back to LayoutFolder, which then goes through
each of its file again and calls LayoutFolder on it if
the file is actually a folder. This, then, causes every file
to be fully positioned in the display.
Once a file is positioned for rendering, rendering it to
the display is a relatively simple phase that can be performed
after layout. Rendering is left as an exercise for the reader.
Pseudocode for a SpaceMonger-like treemapping engine
The algorithms described above can be approximated by the
following (rather lengthy) pseudocode. This pseudocode is
written in a vaguely C-ish style, but should be fairly
comprehensible to any programmer familiar with C, C++, Perl,
Java, Javascript, or PHP (and is likely to be at least
somewhat readable to fans of Pascal and Ada too).
function LayoutFolder(folder_data, folder_rectangle) {
sorted_array = new array(folder_data.num_children);
sort(folder_data.children, sorted_array);
DivideDisplayArea(sorted_array, folder_rectangle,
0, folder_data.num_children,
folder_data.total_size_of_descendants);
foreach (child in folder_data.children) {
if (child is a folder) {
area_for_children = child.rectangle;
LayoutChildren(child, area_for_children);
}
}
}
function DivideDisplayArea(children, area_rectangle,
first_child, num_children_to_layout, totalsize) {
if (num_children_to_layout <= 0) {
return;
}
if (num_children_to_layout == 1) {
children[first_child].rectangle = area_rectangle;
}
children_in_list_a = 0; size_a = 0;
children_in_list_b = 0; size_b = 0;
size_a = children[first_child].size;
children[first_child].list = LIST_A;
children_in_list_a++;
for (child = first_child + 1;
child < first_child + num_children_to_layout
&& (size_a + children[child].size) * 2 < totalsize
&& children[child].size > 0;
child++) {
size_a += children[child].size;
children[child].list = LIST_A;
children_in_list_a++;
}
for (; child < first_child + num_children_to_layout;
child++) {
size_b += children[child].size;
children[child].list = LIST_B;
children_in_list_b++;
}
temp_array_a = new array(children_in_list_a);
temp_array_b = new array(children_in_list_b);
dest_a = 0;
dest_b = 0;
for (src = first_child;
src < first_child + num_children_to_layout; src++) {
if (children[src].list == LIST_A)
temp_array_a[dest_a++] = children[src];
else temp_array_b[dest_b++] = children[src];
}
dest = first_child;
for (src = 0; src < dest_a; src++)
children[dest++] = temp_array_a[src++];
delete temp_array_a;
for (src = 0; src < dest_b; src++)
children[dest++] = temp_array_b[src++];
delete temp_array_b;
x = area_rectangle.x;
y = area_rectangle.y;
width = area_rectangle.width;
height = area_rectangle.height;
if (size_a + size_b <= 0) {
midpoint = 0;
orientation = HORIZONTAL;
}
else {
if (width >= height) {
midpoint = (size_a * width) / totalsize;
orientation = HORIZONTAL;
}
else {
midpoint = (size_a * height) / totalsize;
orientation = VERTICAL;
}
}
if (orientation == HORIZONTAL) {
DivideDisplayArea(children,
Rectangle(x, y, midpoint, height),
first_child, children_in_list_a,
size_a);
DivideDisplayArea(children,
Rectangle(x + midpoint, y, width - midpoint, height),
first_child + children_in_list_a, children_in_list_b,
size_b);
}
else {
DivideDisplayArea(children,
Rectangle(x, y, width, midpoint),
first_child, children_in_list_a,
size_a);
DivideDisplayArea(children,
Rectangle(x, y + midpoint, width, height - midpoint),
first_child + children_in_list_a, children_in_list_b,
size_b);
}
}
Conclusion
SpaceMonger uses many, many optimizations on these techniques
to ensure that the layout is performed very, very fast.
Some of those optimizations are Sixty-Five trade secrets, so we
can't release them. Others include avoiding copies in the
loops above by using pointers; caching size lookups in the
sort arrays so that sizes can be sampled from a variety of
sources (logical, physical, count, etc.); and splitting off
the free space and system space specially so that they are always
separate from the rest of the display. But even without all
these optimizations and tricks, the basic algorithm above
will produce a treemap layout that's very similar to
that of SpaceMonger's, even if it's not as fast or as flexible
as SpaceMonger's.
Footnotes
* Why are we releasing information about
how our treemap algorithm works? Well, we believe in the free
exchange of ideas, for one thing; and for another, we have benefited from
public programming documents ourselves. It would be hypocritical
then to build on knowledge we learned from the public and not return
at least some of our work to the public. You're welcome to use any of the
information you see above any way you see fit, if it benefits
you.
** LayoutFolder, in SpaceMonger, is
not actually given the coordinates of the folder because it
doesn't need them; instead, it's merely given the aspect
ratio of the folder. But most typical implementations would
probably want to include the coordinates of the folder instead.
*** It's impossible for DivideDisplayArea
to do this splitting perfectly. First, it's very unlikely that
the lists can be made to have equal sizes. And second, it's
been shown that finding the "most equally-sized sublists" is a
problem that's NP-complete. Thus SpaceMonger uses a greedy
algorithm that, while not always perfect, usually achieves very,
very good results.