Implementing pinch-zoom and pan/drag in an Android view on the canvas

I was trying to get pinch-zoom and panning working on an Android view today. Basically I was trying to implement the same behavior you see when you use Google Maps (for example). You can zoom in and pan around until the edge of the image, but no further. Also, if the image is fully zoomed out, you can’t pan the image. Implementing the pinch-zoom functionality was pretty easy. I found an example on StackOverflow. I then wanted to implement panning (or dragging) as well. However, I wasn’t able to easily find examples and tutorials for this functionality. I started with this example that comes from the third edition of the Hello, Android! book but I didn’t get too far. So I started playing around a little bit with the events and started writing some code from scratch (using the example from Hello, Android!) so that I could have a better idea of what was happening.

As I mentioned before, getting zoom to work was pretty easy. Implementing panning/dragging was the hard part. The major issues I encountered and subsequently fixed were the following:

Panning continues indefinitely in all directions.
When you zoom and then pan, stop, and then start again, the view jerks to a new position instead of panning from the existing position.
Excessive panning towards the left and top can be constrained, but panning towards the right and bottom is not so easily constrained.

Once I fixed all the problems, I figured that it would be nice to document it for future reference, and I also think it would be a useful resource for others who have the same problem. Now a little disclaimer before I go any further: I’m not an Android expert and I’m really not that great with graphics; I just started it learning to program for Android this semester for one of my Masters electives. So there might be a better way of doing all this, and if there is, please let me know! Also, if you want to skip all the explanations and just see the code, you can skip to the last page.

Let’s start with the simple stuff first, that is implementing pinch-zoom. To implement pinch-zoom, we make use of the ScaleGestureDetector class. This class helps you detect the pinch-zoom event. Using it is pretty simple:

public class ZoomView extends View {

    private static float MIN_ZOOM = 1f;
    private static float MAX_ZOOM = 5f;

    private float scaleFactor = 1.f;
    private ScaleGestureDetector detector;

    public ZoomView(Context context) {
        super(context);
        detector = new ScaleGestureDetector(getContext(), new ScaleListener());
    }

    @Override
    public boolean onTouchEvent(MotionEvent event) {
        detector.onTouchEvent(event);
        return true;
    }

    @Override
    public void onDraw(Canvas canvas) {
        super.onDraw(canvas);

        canvas.save();
        canvas.scale(scaleFactor, scaleFactor);

        // ...
        // your canvas-drawing code
        // ...

        canvas.restore();
    }

    private class ScaleListener extends ScaleGestureDetector.SimpleOnScaleGestureListener {
        @Override
        public boolean onScale(ScaleGestureDetector detector) {
            scaleFactor *= detector.getScaleFactor();
            scaleFactor = Math.max(MIN_ZOOM, Math.min(scaleFactor, MAX_ZOOM));
            invalidate();
            return true;
        }
    }
}

Your view class has four private members: MIN_ZOOM, MAX_ZOOM, detector, and scaleFactor. The first two are static constants that define the maximum and minimum zoom allowed. The third is of type ScaleGestureDetector and does all the heavy lifting as far as zooming is concerned. The fourth member holds the scaling factor i.e., a number that represents the amount of "zoom".

Now what does this do?

In the constructor, you initialize the detector. The constructor to ScaleGestureDetector takes two parameters: the current context, and a listener. Our listener is defined inside the class ScaleListener which extends the abstract class ScaleGestureDetector.SimpleOnScaleGestureListener. Inside the onScale method, we get the current scale factor from the detector. We then check to see if it is greater or smaller than our upper and lower bounds. If so, we make sure that it we limit the value to be between those bounds. We then call invalidate() which forces the canvas to redraw itself.

The actual scaling happens inside the onDraw method. There, we save the canvas, set its scaling factor (which is the one we got from the detector), draw anything we need to draw, and then restore the canvas. What happens now is that anything you draw on the canvas is now scaled by the scaling factor. This is what gives you the "zoom" effect.

The code as it stands is not very useful. After you zoom into an object, you can’t really pan around to examine it. So we now need to implement panning. This part gets a little tricky. The reason is that we now have to keep track of two types of touch events: the drag event, and the zoom event. To do this, we have to start looking at the different events that can be generated when the user touches the screen:

@Override
public boolean onTouchEvent(MotionEvent event) {

    //This is the basic skeleton for our code. We examine each of the possible motion-events that can happen
    switch (event.getAction() & MotionEvent.ACTION_MASK) {
        case MotionEvent.ACTION_DOWN:
            //This event happens when the first finger is pressed onto the screen
            /*
             * ... code to handle this event ...
             */
            break;

        case MotionEvent.ACTION_MOVE:
            //This event fires when the finger moves across the screen, although in practice I've noticed that
            //this fires even when you're simply holding the finger on the screen.
            /*
             * ... code to handle this event ...
             */

            break;
        case MotionEvent.ACTION_POINTER_DOWN:
            //This event fires when a second finger is pressed onto the screen
            /*
             * ... code to handle this event ...
             */
            break;

        case MotionEvent.ACTION_UP:
            //This event fires when all fingers are off the screen
            break;

        case MotionEvent.ACTION_POINTER_UP:
            //This event fires when the second finger is off the screen, but the first finger is still on the
            //screen
            /*
             * ... code to handle this event ...
             */
            break;
    }

    detector.onTouchEvent(event);

    /*
     * ... code ...
     */

    return true;
}

Using these events, we can now decide if we need to pan or zoom. So let’s use a variable called mode to keep track of our mode:

public class ZoomView extends View {

    private static int NONE = 0;
    private static int DRAG = 1;
    private static int ZOOM = 2;

    private int mode;

    ...
    ...

    @Override
    public boolean onTouchEvent(MotionEvent event) {

        switch (event.getAction() & MotionEvent.ACTION_MASK) {
            case MotionEvent.ACTION_DOWN:
                //The first finger has been pressed. The only action that the user can take now is to pan/drag so let's
                //set the mode to DRAG
                mode = DRAG;
                ...
                break;

            case MotionEvent.ACTION_MOVE:
                //We don't need to set the mode at this point because the mode is already set to DRAG
                ...
                break;
            case MotionEvent.ACTION_POINTER_DOWN:
                //The second finger has been placed on the screen and so we need to set the mode to ZOOM
                mode = ZOOM;
                break;

            case MotionEvent.ACTION_UP:
                //All fingers are off the screen and so we're neither dragging nor zooming.
                mode = NONE
                ...
                break;

            case MotionEvent.ACTION_POINTER_UP:
                //The second finger is off the screen and so we're back to dragging.
                mode = DRAG
                ...
                break;
        }

        detector.onTouchEvent(event);

        /*
         * ... code ...
         */

        return true;
    }
}

Now that we’ve identified the different states we are in, we can go on and start to implement the dragging/panning functionality. Panning essentially equates to a translation operation on the canvas. By translating the canvas, we control what part of the zoomed-in canvas we display on the device screen. So what information do we need to translate? Well, think about what we expect to see when we pan. We place our finger on the screen and then we move it by a certain amount in a certain direction. We essentially expect the image to move by that same amount, in that same direction as well. So the direction and magnitude that we need to translate the image by, can be gathered from the coordinate where we first pressed the screen, through each point until we take our finger off the screen. Getting all this information is pretty easy. We can get the X and Y coordinate of the finger using event.getX() and event.getY(). So all we need to do is get the X and Y coordinate when we first press our finger on the screen, and translate the image as we move our finger across the screen, by getting the finger’s X and Y coordinates each time the finger moves. We can do that like this:

public class ZoomView extends View {

    ...
    ...

    //These two variables keep track of the X and Y coordinate of the finger when it first
    //touches the screen
    private float startX = 0f;
    private float startY = 0f;

    //These two variables keep track of the amount we need to translate the canvas along the X
    //and the Y coordinate
    private float translateX = 0f;
    private float translateY = 0f;

    ...
    ...

    @Override
    public boolean onTouchEvent(MotionEvent event) {

        switch (event.getAction() & MotionEvent.ACTION_MASK) {
            case MotionEvent.ACTION_DOWN:
                mode = DRAG;

                //We assign the current X and Y coordinate of the finger to startX and startY. So these variables now
                //hold the X and Y coordinate of the finger as it first touches the screen.
                startX = event.getX();
                startY = event.getY();
                ...
                break;

            case MotionEvent.ACTION_MOVE:
                //We calculate the values of translateX and translateY by finding the difference between the X/Y coordinate
                //and the starting X/Y coordinate. Since this event is fired every time the finger moves, we're constantly
                //updating the values of these two coordinates
                translateX = event.getX() - startX;
                translateY = event.getY() - startY;
                break;

            ...
            ...
        }

        ...
        ...
    }
}

After we get the translation information, we actually want to translate the canvas. We can do this in the onDraw(Canvas canvas) method. Recall that earlier when we had only implemented zooming, we called invalidate() inside the onScale(ScaleGestureDetector detector) listener. The problem is that the onScale(…) method is called only when zooming happens and not when panning or dragging happens. So any calculations we make inside the onTouchEvent(…) method during panning won’t get reflected inside the onDraw(…) method. So what we need to do first, is to remove the call to invalidate() out of the onScale(…) method. When we do that, the method ends up looking like this:

@Override
public boolean onScale(ScaleGestureDetector detector) {
    scaleFactor *= detector.getScaleFactor();
    scaleFactor = Math.max(MIN_ZOOM, Math.min(scaleFactor, MAX_ZOOM));
    return true;
}

This is not a huge deal because all we really care about is the value of scaleFactor; we can call invalidate() whenever we deem it convenient. So now that we have both the scaling and translating information at hand, we can add code to the onDraw(…) method to take this into account. We will also add code to the onTouchEvent(…) method that will call invalidate():

@Override
public boolean onTouchEvent(MotionEvent event) {

    switch (event.getAction() & MotionEvent.ACTION_MASK) {
        ...
        ...
    }

    //This will set the value of scaleFactor
    detector.onTouchEvent(event);

    //The only time we want to re-draw the canvas is if we are panning (which happens when the mode is
    //DRAG and the zoom factor is not equal to 1) or if we're zooming
    if ((mode == DRAG && scaleFactor != 1f) || mode == ZOOM) {
        invalidate();
    }

    return true;
}

@Override
public void onDraw(Canvas canvas) {
    super.onDraw(canvas);

    canvas.save();

    //We're going to scale the X and Y coordinates by the same amount
    canvas.scale(scaleFactor, scaleFactor);

    //We need to divide by the scale factor here, otherwise we end up with excessive panning based on our zoom level
    //because the translation amount also gets scaled according to how much we've zoomed into the canvas.
    canvas.translate(translateX / scaleFactor, translateY / scaleFactor);
    canvas.restore();
}

As I had mentioned before, we are able to pan and zoom but we still have problems: panning continues in all directions indefinitely, and our canvas "jerks" when we pan, lift our finger, and then pan again; it doesn’t remember where we left off. Let’s start with the second problem first, as it’s the easier one. The reason that our canvas jerks is precisely because we don’t keep track of where we left off, the last time we panned. This is fine if you’re translating just once from the origin. However, if you want to translate again, you need to remember how much you’ve already translated by, otherwise the canvas is going to jerk back to where you place your finger again. To fix this problem, we use two new variables: previousTranslateX and previousTranslateY:

public class ZoomView extends View {

    ...
    ...

    //These two variables keep track of the amount we translated the X and Y coordinates, the last time we
    //panned.
    private float previousTranslateX = 0f;
    private float previousTranslateY = 0f;

    ...
    ...

    @Override
    public boolean onTouchEvent(MotionEvent event) {

        switch (event.getAction() & MotionEvent.ACTION_MASK) {
            case MotionEvent.ACTION_DOWN:
                mode = DRAG;

                //We assign the current X and Y coordinate of the finger to startX and startY minus the previously translated
                //amount for each coordinates This works even when we are translating the first time because the initial
                //values for these two variables is zero.
                startX = event.getX() - previousTranslateX;
                startY = event.getY() - previousTranslateY;
                break;

            case MotionEvent.ACTION_MOVE:
                //We calculate the values of translateX and translateY by finding the difference between the X/Y coordinate
                //and the starting X/Y coordinate. Since this event is fired every time the finger moves, we're constantly
                //updating the values of these two coordinates
                translateX = event.getX() - startX;
                translateY = event.getY() - startY;
                break;

            case MotionEvent.ACTION_POINTER_DOWN:
                mode = ZOOM;
                break;

            case MotionEvent.ACTION_UP:
                mode = NONE;

                //All fingers went up, so let's save the value of translateX and translateY into previousTranslateX and
                //previousTranslateY
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;

            case MotionEvent.ACTION_POINTER_UP:
                mode = DRAG;
                //This is not strictly necessary; we save the value of translateX and translateY into previousTranslateX
                //and previousTranslateY when the second finger goes up
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;
        }

        ...
        ...
    }
}

We don’t need to make any changes to onDraw(…) because we have appropriately adjust the values of translateX and translateY by taking into account the previous translation values when we set startX and startY. So now we need to solve the problem of indefinite panning. Let’s tackle the easier aspect of that problem first: stopping panning past the top and left edges of the canvas (i.e., where the X and Y coordinates are zero):

@Override
public void onDraw(Canvas canvas) {
    super.onDraw(canvas);

    canvas.save();

    //We're going to scale the X and Y coordinates by the same amount
    canvas.scale(scaleFactor, scaleFactor);

    //I multiplied translateX and translateY by -1 because it made more sense to me to see if they were lesser than 0. If they are lesser
    //than zero, we set their values to 0. Otherwise, we leave them as is. This ensures that we never pan past the top of left edge of the
    //zoomed-in canvas.
    translateX = (translateX * -1) < 0 ? 0 : translateX;
    translateY = (translateY * -1) < 0 ? 0 : translateY;

    //We need to divide by the scale factor here, otherwise we end up with excessive panning based on our zoom level
    //because the translation amount also gets scaled according to how much we've zoomed into the canvas.
    canvas.translate(translateX / scaleFactor, translateY / scaleFactor);
    canvas.restore();
}

Now let’s tackle the more difficult aspect of the indefinite-panning problem: panning past the bottom and right edges of the canvas. This part was hard for me. The solution seems pretty obvious now and so it might not actually be all that difficult; it took me a little time to figure it out however! The height of my display is 320px. I noticed that when I had zoomed in by a factor of 2, the value for translateY was 320. At a zoom factor of 3, the value was 640px and so on. So basically the limit seemed to be the scale factor minus one, times the height of the display. I’m sure if I had spent more time I could have proved why that is, but at the time I was more concerned with getting this to work:

@Override
public void onDraw(Canvas canvas) {
    super.onDraw(canvas);

    canvas.save();

    //We're going to scale the X and Y coordinates by the same amount
    canvas.scale(scaleFactor, scaleFactor);
.
    //If translateX times -1 is lesser than zero, let's set it to zero. This takes care of the left bound
    if((translateX * -1) < 0) {
       translateX = 0;
    }

    //This is where we take care of the right bound. We compare translateX times -1 to (scaleFactor - 1) * displayWidth.
    //If translateX is greater than that value, then we know that we've gone over the bound. So we set the value of
    //translateX to (1 - scaleFactor) times the display width. Notice that the terms are interchanged; it's the same
    //as doing -1 * (scaleFactor - 1) * displayWidth
    else if((translateX * -1) > (scaleFactor - 1) * displayWidth) {
       translateX = (1 - scaleFactor) * displayWidth;
    }

    if(translateY * -1 < 0) {
       translateY = 0;
    }

    //We do the exact same thing for the bottom bound, except in this case we use the height of the display
    else if((translateY * -1) > (scaleFactor - 1) * displayHeight) {
       translateY = (1 - scaleFactor) * displayHeight;
    }

    //We need to divide by the scale factor here, otherwise we end up with excessive panning based on our zoom level
    //because the translation amount also gets scaled according to how much we've zoomed into the canvas.
    canvas.translate(translateX / scaleFactor, translateY / scaleFactor);
    canvas.restore();
}

Now this change will make sure that the user cannot pan past the top, left, bottom, and right bounds of the canvas. There is one other, slight improvement that we can make. Recall that I mentioned earlier that the MotionEvent.ACTION_MOVE event gets triggered event when the finger is not moving. This leads to unnecessary redrawing of the canvas. If we kept track of the distance that the finger moves, we could be sure that we redraw the canvas only when needed:

public class ZoomView extends View {

    ...
    ...

    //This flag reflects whether the finger was actually dragged across the screen
    private boolean dragged = true;
    ...
    ...

    @Override
    public boolean onTouchEvent(MotionEvent event) {

        switch (event.getAction() & MotionEvent.ACTION_MASK) {

            ...
            ...

            case MotionEvent.ACTION_MOVE:
                translateX = event.getX() - startX;
                translateY = event.getY() - startY;

                //We cannot use startX and startY directly because we have adjusted their values using the previous translation values. This is why we need to add those
                //values to startX and startY so that we can get the actual coordinates of the finger.
                double distance = Math.sqrt(Math.pow(event.getX() - (startX + previousTranslateX), 2) + Math.pow(event.getY() - (startY + previousTranslateY), 2));

                if(distance > 0) {
                   dragged = true;
                }

                break;

            case MotionEvent.ACTION_POINTER_DOWN:
                mode = ZOOM;
                break;

            case MotionEvent.ACTION_UP:
                mode = NONE;
                dragged = false;
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;

            case MotionEvent.ACTION_POINTER_UP:
                mode = DRAG;
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;
        }

        detector.onTouchEvent(event);

        //We redraw the canvas only in the following cases:
        //
        // o The mode is ZOOM
        //        OR
        // o The mode is DRAG and the scale factor is not equal to 1 (meaning we have zoomed) and dragged is
        //   set to true (meaning the finger has actually moved)
        if ((mode == DRAG && scaleFactor != 1f && dragged) || mode == ZOOM) {
            invalidate();
        }

        return true;
    }
}

With this change, the canvas is redrawn only when it needs to be redrawn. With this code in place you should be able to perform zooming and panning on your canvas in your view. There is still one drawback that I haven’t taken care of. Usually when you zoom, you want to center on the area that you’re zooming in on. The implementation here doesn’t do that. There is a way to do it, and it involves using the getFocusX() and getFocusY() methods on ScaleGestureDetector. These two values give you the focal point of the gesture, and you can use that to make sure that you are centered on that focal point. I haven’t figured out how to do it exactly, but if/when I do, I’ll make another post about that.

As far as what I’ve covered so far, here is the code its entirety; I hope you find it useful:

public class ZoomView extends View {

    //These two constants specify the minimum and maximum zoom
    private static float MIN_ZOOM = 1f;
    private static float MAX_ZOOM = 5f;

    private float scaleFactor = 1.f;
    private ScaleGestureDetector detector;

    //These constants specify the mode that we're in
    private static int NONE = 0;
    private static int DRAG = 1;
    private static int ZOOM = 2;

    private int mode;

    //These two variables keep track of the X and Y coordinate of the finger when it first
    //touches the screen
    private float startX = 0f;
    private float startY = 0f;

    //These two variables keep track of the amount we need to translate the canvas along the X
    //and the Y coordinate
    private float translateX = 0f;
    private float translateY = 0f;

    //These two variables keep track of the amount we translated the X and Y coordinates, the last time we
    //panned.
    private float previousTranslateX = 0f;
    private float previousTranslateY = 0f;

    public ZoomView(Context context) {
        super(context);
        detector = new ScaleGestureDetector(getContext(), new ScaleListener());
    }

    @Override
    public boolean onTouchEvent(MotionEvent event) {

        switch (event.getAction() & MotionEvent.ACTION_MASK) {

            case MotionEvent.ACTION_DOWN:
                mode = DRAG;

                //We assign the current X and Y coordinate of the finger to startX and startY minus the previously translated
                //amount for each coordinates This works even when we are translating the first time because the initial
                //values for these two variables is zero.
                startX = event.getX() - previousTranslateX;
                startY = event.getY() - previousTranslateY;
                break;

            case MotionEvent.ACTION_MOVE:
                translateX = event.getX() - startX;
                translateY = event.getY() - startY;

                //We cannot use startX and startY directly because we have adjusted their values using the previous translation values.
                //This is why we need to add those values to startX and startY so that we can get the actual coordinates of the finger.
                double distance = Math.sqrt(Math.pow(event.getX() - (startX + previousTranslateX), 2) +
                                            Math.pow(event.getY() - (startY + previousTranslateY), 2)
                                           );

                if(distance > 0) {
                   dragged = true;
                }

                break;

            case MotionEvent.ACTION_POINTER_DOWN:
                mode = ZOOM;
                break;

            case MotionEvent.ACTION_UP:
                mode = NONE;
                dragged = false;

                //All fingers went up, so let's save the value of translateX and translateY into previousTranslateX and
                //previousTranslate
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;

            case MotionEvent.ACTION_POINTER_UP:
                mode = DRAG;

                //This is not strictly necessary; we save the value of translateX and translateY into previousTranslateX
                //and previousTranslateY when the second finger goes up
                previousTranslateX = translateX;
                previousTranslateY = translateY;
                break;
        }

        detector.onTouchEvent(event);

        //We redraw the canvas only in the following cases:
        //
        // o The mode is ZOOM
        //        OR
        // o The mode is DRAG and the scale factor is not equal to 1 (meaning we have zoomed) and dragged is
        //   set to true (meaning the finger has actually moved)
        if ((mode == DRAG && scaleFactor != 1f && dragged) || mode == ZOOM) {
            invalidate();
        }

        return true;
    }

    @Override
    public void onDraw(Canvas canvas) {
        super.onDraw(canvas);

        canvas.save();

        //We're going to scale the X and Y coordinates by the same amount
        canvas.scale(scaleFactor, scaleFactor);

        //If translateX times -1 is lesser than zero, let's set it to zero. This takes care of the left bound
        if((translateX * -1) < 0) {
           translateX = 0;
        }

        //This is where we take care of the right bound. We compare translateX times -1 to (scaleFactor - 1) * displayWidth.
        //If translateX is greater than that value, then we know that we've gone over the bound. So we set the value of
        //translateX to (1 - scaleFactor) times the display width. Notice that the terms are interchanged; it's the same
        //as doing -1 * (scaleFactor - 1) * displayWidth
        else if((translateX * -1) > (scaleFactor - 1) * displayWidth) {
           translateX = (1 - scaleFactor) * displayWidth;
        }

        if(translateY * -1 < 0) {
           translateY = 0;
        }

        //We do the exact same thing for the bottom bound, except in this case we use the height of the display
        else if((translateY * -1) > (scaleFactor - 1) * displayHeight) {
           translateY = (1 - scaleFactor) * displayHeight;
        }

        //We need to divide by the scale factor here, otherwise we end up with excessive panning based on our zoom level
        //because the translation amount also gets scaled according to how much we've zoomed into the canvas.
        canvas.translate(translateX / scaleFactor, translateY / scaleFactor);

        /* The rest of your canvas-drawing code */
        canvas.restore();
    }

    private class ScaleListener extends ScaleGestureDetector.SimpleOnScaleGestureListener {
        @Override
        public boolean onScale(ScaleGestureDetector detector) {
            scaleFactor *= detector.getScaleFactor();
            scaleFactor = Math.max(MIN_ZOOM, Math.min(scaleFactor, MAX_ZOOM));
            return true;
        }
    }
}

UPDATE

Chris Rogers figured out how to make sure that the display stays centered on the screen. In the canvas.scale(…) call, do the following instead:

canvas.scale(this.scaleFactor, this.scaleFactor, this.detector.getFocusX(), this.detector.getFocusY());

Comments