The image that lied about itself

A few weeks ago, I ran into a puzzling issue at work. Someone was uploading an image which made it past our file-size checks, but caused an OutOfMemoryError and a heap dump when the code attempted to resize it. The information we had from the heap-dump was that it was trying to allocate memory for an image whose resolution was 18,375×11,175. What didn’t make sense is how this image was even getting through our file-size checks, because there is no way we would ever let in an image of that size.

In the code, we have a global limit for the largest file we will accept. We also have a separate limit for image uploads. If the image is over this size, but under the global limit, we will resize the image to a smaller size. The strange part was that the large image was making it past the global check, which meant that the size of the incoming data was below the global limit, but above the image-size limit. How could this be?

On a hunch, I hypothesized that perhaps the entire image wasn’t making it through. Perhaps only a part of the image was making it through with its header left intact. In an image file, there is usually a header that conveys information about the file format, the color space, and the resolution of the image! I figured that even though the data are incomplete, enough information was present in the header to enable the resizing code to make sense of it. When it tries to allocate memory for this image, based on the resolution it gets from the header, it runs out of memory!

To verify this, I manually hex-edited a file that was of proper size to have a ridiculously-large resolution. I then uploaded this file and was able to witness the behavior happening even though the file was of the proper size! So what’s the lesson here? Don’t only rely on file-size limits for images; you have to look at the resolution as well!