More on Bresenhams run-length slice algoritm

  Journal:    Dr. Dobb's Journal  Nov 1992 v17 n11 p171(6)
  -------------------------------------------------------------------------
  Title:     The good, the bad, and the run-sliced. (Bresenham's run-length
             slice algorithm) (Graphics Programming) (Column)
  Author:    Abrash, Michael
  Attached:  Program: GP-NOV92.ASC  Source code listing.
             Program: XSHARP21.ZIP  Library for 3-D animation.

  Abstract:  Programmers developing graphics applications should never
             believe that they have developed the fastest possible code;
             others often have better ideas.  Performance-intensive
             applications should never have code that performs the same
             calculation more than once. Programmers have a duty to
             implement the desired functions while enabling the hardware to
             work as efficiently as possible.  Bresenham's run-length slice
             algorithm is a good model for efficient programming.  The
             approach implemented by the algorithm steps one pixel at a
             time along the major axis, while an integer term indicating
             how close the line is to advancing halfway to the next major
             pixel along the minor axis is maintained.  The algorithm
             provides an automatic decision-making structure that allows
             infrequent decisions to be made as quickly as possible.
  -------------------------------------------------------------------------
  Full Text:

  Years ago, I worked at a company that asked me to write blazingly fast
  line-drawing code for an AutoCAD driver.  I implemented the basic
  Bresenham's line-drawing algorithm; streamlined it as much as possible;
  special-cased horizontal, diagonal, and vertical lines; broke out
  separate, optimized routines for lines in each octant; and massively
  unrolled the loops.  When I was done, I had line drawing down to a mere
  five or six instructions per pixel, and I handed the code over to the
  AutoCAD driver person, content in the knowledge that I had pushed the
  theoretical limits of the Bresenham's algorithm on the 80x86
  architecture, and that this was as fast as line drawing could get on a
  PC.  That feeling lasted for about a week, until Dave Miller, who these
  days is a Windows display-driver whiz at Engenious Solutions, casually
  mentioned Bresenham's faster run-length slice line-drawing algorithm.

  Remember Bill Murray's safety tip in Ghostbusters?  It goes something
  like this.  Harold Ramis tells the Ghostbusters not to cross the beams of
  the antighost guns.  "Why?"  Murray asks.

  "It would be bad," Ramis says.

  Murray says, "I'm fuzzy on the whole good/bad thing.  What exactly do you
  mean by 'bad'?"  It turns out that what Ramis means by bad is basically
  the destruction of the universe.

  "Important safety tip," Murray comments dryly.

  I learned two important safety tips from my line-drawing experience;
  neither involves the possible destruction of the universe, so far as I
  know, but they are nonetheless worth keeping in mind.  First, never,
  never, never think you've written the fastest possible code.  Odds are,
  you haven't.  Run your code past another good programmer, and he or she
  will probably say, "But why don't you do this?" and you'll realize that
  you could indeed do that, and your code would then be faster.  Or relax
  and come back to your code later, and you may well see another, faster
  approach.  There are a million ways to implement code for any task, and
  you can almost always find a faster way if you need to.

  Second, when performance matters, never have your code perform the same
  calculation more than once.  This sounds obvious, but it's astonishing
  how often it's ignored.  For example, consider the snippet of code in
  Example 1.  Here, the programmer knows which way the line is going before
  the main loop begins--but nonetheless performs that test every time
  through the loop, when calculating the address of the next pixel.  Far
  better to perform the test only once, outside the loop, as shown in
  Example 2.

  Think of it this way.  A program is a state machine.  It takes a set of
  inputs and produces a corresponding set of outputs by passing through a
  set of states.  Your primary job as a programmer is to implement the
  desired state machine.  Your additional job as a performance programmer
  is to minimize the lengths of the paths through the state machine.  This
  means performing as many tests and calculations as possible outside the
  loops, so that the loops themselves can do as little work--pass through
  as few states--as possible.

  Which brings us full circle to Bresenham's run-length slice line-drawing
  algorithm, which just happens to be an excellent example of a minimized
  state machine.  In case you're fuzzy on the good/bad performance thing,
  that's "good"--as in fast.

  Run-length Slice Fundamentals

  First off, I have a confession to make: I'm not sure that the algorithm
  I'll discuss is actually, precisely Bresenham's run-length slice
  algorithm.  It's been a long time since I read about this algorithm; in
  the intervening years, I've misplaced Bresenham's article, and I was
  unable to locate it in time for this column.  (Vermont libraries leave
  something to be desired in the high-tech area.)  As a result, I had to
  derive the algorithm from scratch, which was admittedly more fun than
  reading about it, and also ensured that I understood it inside and out.
  The upshot is that what I discuss may or may not be Bresenham's
  run-length slice algorithm--but it surely is fast.

  The place to begin understanding the run-length slice algorithm is the
  standard Bresenham's line-drawing algorithm.  (I discussed the standard
  Bresenham's algorithm at length in the May 1989 issue of the now-defunct
  Programmer's Journal.)  The basis of the standard approach is stepping
  one pixel at a time along the major axis (the longer dimension of the
  line), while maintaining an integer error term that indicates at each
  major-axis step how close the line is to advancing halfway to the next
  pixel along the minor axis.  Figure 1 illustrates standard Bresenham's
  line drawing.  The key point here is that a calculation and a test are
  performed once for each step along the major axis.

  The run-length slice algorithm rotates matters 90 degrees, with
  salubrious results.  The basis of the run-length slice algorithm is
  stepping one pixel at a time along the minor axis (the shorter
  dimension), while maintaining an integer error term indicating how close
  the line is to advancing an extra pixel along the major axis, as
  illustrated by Figure 2.

  Consider this:  When you're called upon to draw a line with an
  X-dimension of 35 and a Y-dimension of 10, you have a great deal of
  information available, some of which is ignored by standard Bresenham's.
  In particular, because the slope is between 1/3 and 1/4, you know that
  every single run--a run being a set of pixels at the same minor-axis
  coordinate--must be either three or four pixels long.  No other length is
  possible, as shown in Figure 3 (apart from the first and last runs, which
  are special cases that I'll discuss shortly).  Therefore, for this line,
  there's no need to perform an error-term calculation and test for each
  pixel.  Instead, we can just perform one test per run, to see whether the
  run is three or four pixels long, thereby eliminating about 70 percent of
  the calculations in drawing this line.

  Take a moment to let the idea behind run-length slice drawing soak in.
  Periodic decisions must be made to control pixel placement.  The key to
  speed is to make those decisions as infrequently and quickly as possible.
   Of course, it will work to make a decision at each pixel--that's
  standard Bresenham's.  However, most of those per-pixel decisions are
  redundant, and in fact we have enough information before we begin to know
  which are the redundant decisions.  Run-length slice drawing is exactly
  equivalent to standard Bresenham's, but it pares the decision-making
  process down to a minimum.  It's some-what analogous to the difference
  between finding the greatest common divisor of two numbers using Euclid's
  algorithm and finding it by trying every possible divisor.  Both
  approaches produce the desired result, but that which takes maximum
  advantage of the available information and minimizes redundant work is
  preferable.

  Run-length Slice Implementation

  We know that for any line, a given run will always be one of two possible
  lengths.  How, though, do we know which length to select?  Surprisingly,
  this is easy to determine.  For the following discussion, assume that we
  have a slope of 1/3.5, so that X is the major axis; however, the
  discussion also applies to Y-major lines, with X and Y reversed.

  The minimum possible length for any run in an X-major line is
  int(XDelta/YDelta), where XDelta is the X-dimension of the line and
  YDelta is the Y-dimension.  The maximum possible length is
  int(XDelta/YDelta) + 1.  The trick, then, is knowing which of these two
  lengths to select for each run.  To see how we can make this selection,
  refer to Figure 4.  For each one-pixel step along the minor axis (Y, in
  this case), we advance at least three pixels.  The full advance distance
  along X (the major axis) is actually three-plus pixels, because there is
  also a fractional portion to the advance along X for a single-pixel Y
  step.  This fractional advance is the key to deciding when to add an
  extra pixel to a run.  The fraction indicates what portion of an extra
  pixel we advance along X (the major axis) during each run.  If we keep a
  running sum of the fractional parts, we have a measure of how close we
  are to needing an extra pixel; when the fractional sum reaches 1, it's
  time to add an extra pixel to the current run.  Then we can subtract 1
  from the running sum (because we just advanced one pixel), and continue
  on.

  Practically speaking, however, we can't work with fractions because
  floating-point arithmetic is slow and fixed-point arithmetic is
  imprecise.  Therefore, we take a cue from standard Bresenham's and scale
  all the error-term calculations up so that we can work with integers.
  The fractional X (major axis) advance per one-pixel Y (minor axis)
  advance is the fractional portion of XDelta/YDelta.  This value is
  exactly equivalent to (XDelta % YDelta)/YDelta.  We'll scale this up by
  multiplying it by YDelta*2, so that the amount by which we adjust the
  error term up for each one-pixel minor-axis advance is (XDelta %
  YDelta)*2.

  We'll similarly scale up the one pixel by which we adjust the error term
  down after it turns over, so our downward error-term adjustment is
  YDelta*2.  Therefore, before drawing each run, we'll add (XDelta %
  YDelta)*2 to the error term.  If the error term turns over (reaches one
  full pixel), then we'll lengthen the run by 1, and subtract YDelta*2 from
  the error term.  (All values are multiplied by 2 so that the initial
  error term, which involves a 0.5 term, can be scaled up to an integer, as
  discussed below.)

  This is not a complicated process, involving only integer addition and
  subtraction and a single test, and it lends itself to many and varied
  optimizations.  For example, you could break out hardwired optimizations
  for drawing each possible pair of run lengths.  For the aforementioned
  line with a slope of 1/3.5, for example, you could have one routine
  hardwired to blast in a run of three pixels as quickly as possible, and
  another hardwired to blast in a run of four pixels.  These routines would
  ideally have no looping, but rather just a series of instructions
  customized to draw the desired number of pixels at maximum speed.  Each
  routine would know that the only possibilities for the length of the next
  run would be three and four, so they could increment the error term, then
  jump directly to the appropriate one of the two routines depending on
  whether the error term turned over.  Properly implemented, it should be
  possible to reduce the average per-run overhead of line drawing to less
  than one branch, with only two additions and two tests (the number of
  runs must also be counted down), plus a subtraction half the time.  On a
  486, this amounts to something on the order of 150 nanoseconds of
  overhead per pixel, exclusive of the time required to actually write the
  pixel to display memory.

  That's good.

  Run-length Slice Details

  A couple of run-length slice implementation details yet remain.  First is
  the matter of how error-term turnover is detected.  This is done in much
  the same way as it is with standard Bresenham's: The error term is
  initialized to a value equivalent to -1 pixel and incremented for each
  step; when the error term reaches 0, we've advanced one full pixel along
  the major axis, and it's time to add an extra pixel to the current run.
  This means that we only have to test the sign of the error term after
  advancing it to determine whether or not to add an extra pixel to each
  run.

  The second and more difficult detail is balancing the runs so that
  they're centered around the ideal line, and therefore draw the same
  pixels that standard Bresenham's would draw.  If we just drew full-length
  runs from the start, we'd end up with an unbalanced line, as shown in
  Figure 5.  Instead, we have to split the initial pixel plus one full run
  as evenly as possible between the first and last runs of the line, and
  adjust the initial error term appropriately for the initial half-run.

  The initial error term is simply one-half of the normal fractional
  advance along the major axis, because the initial step is only one-half
  pixel along the minor axis.  This half-step gets us exactly halfway
  between the initial pixel and the next pixel along the minor axis.  All
  the error-term adjusts are scaled up by two times precisely so that we
  can scale up this halved error term for the initial run by two times, and
  thereby make it an integer.

  The other trick here is that if an odd number of pixels are allocated
  between the first and last partial runs, we'll end up with an odd pixel,
  since we are unable to draw a half-pixel.  This odd pixel is accounted
  for by adding half a pixel to the error term.

  That's all there is to run-length slice line drawing; the partial first
  and last runs are the only tricky part.  Listing One (page 190) is a
  run-length slice implementation in C.  This is not an optimized
  implementation, nor is it meant to be; this listing is provided so that
  you can see how the run-length slice algorithm works.  Next month, I'll
  move on to an optimized version, but this month's listing will make it
  much easier to grasp the principles of run-length slice drawing, and to
  understand next month's code.

  Notwithstanding that it's not optimized, Listing One is reasonably fast.
  If you run Listing Two (page 191), a sample line-drawing program that you
  can use to test-drive Listing One, you may be as surprised as I was at
  how quickly the screen fills with vectors, considering that Listing One
  is entirely in C and has some redundant divides.  Or perhaps you won't be
  surprised, in which case I suggest you check back next month.

  Next Time

  Next month, I'll switch to assembly language and speed up run-length
  slice lines considerably.  I'll also spend some time discussing the
  limitations of run-length slice drawing, and I'll look at possible
  further optimizations.  After that, perhaps we'll have a look at seed
  fills, or more 3-D animation, or some new 2-D animation topics--or maybe
  something completely different.  Your suggestions are, as always,
  welcome.

  [BACK] Back

Discuss this article in the forums


Date this article was posted to GameDev.net: 7/16/1999
(Note that this date does not necessarily correspond to the date the article was written)

See Also:
Michael Abrash's Articles

© 1999-2011 Gamedev.net. All rights reserved. Terms of Use Privacy Policy
Comments? Questions? Feedback? Click here!