Jun 062013
 

I have been working a project for the last few days, that deals with rendering PDF’s in-browser. Initially, I was going to parse the PDF and extract the text content, but then I ran into pdf.js, which is a library developed by Mozilla for rendering PDF’s in-browser via JavaScript. The project I am working on has a requirement that users should be able to select text within the PDF. This is possible using pdf.js. Unfortunately, the example code only shows you how to render a PDF, but not how to enable text-selection. I wasn’t able to find any API access to enable text-selection either. I finally ended up on the #pdfjs IRC channel and the friendly folks there gave me some direction. The logic for enabling text-selection was buried inside the code for Mozilla’s PDF viewer, and was heavily intertwined with the viewer code as well. I spent a few days playing around with the viewer and tracing through the code. I was stumped many times since the code was complex and I know jack about parsing PDF’s. But eventually I was able to focus on the part of the code that actually took care of enabling text-selection.

pdf.js’ approach to enabling text-selection is actually quite clever. The library overlays divs over the PDF, and these divs contain text that matches the PDF text that they are floating over. So when you select the text, you are actually selecting the text inside the overlaid divs. This was fine and dandy, but I was still stuck as far as getting this to work on my project. What I needed was a minimal example that I could adapt for my uses. After a day or two of tracing code, experimenting, debugging, and staring at the screen in frustration, I was eventually able to come up with a minimal example! To accomplish this, I extracted code that was relevant to creating the overlays out of the viewer code, into its own independent file. I also removed a lot of code that was dependent on the viewer itself. Keep in mind that this example doesn’t have functionality like text finding or matching, and that code is also heavily intertwined with the viewer code. All this example does is render a PDF with text-selection enabled. However, I think this is a good start!

If you are interested, you can check out the code on github and a working example on this fiddle.

The pertintent code is as follows (keep in mind you still require additional resources; all of that information is available on github):

window.onload = function () {
    var pdfBase64 = "..."; //base64 representing the PDF

    var scale = 1.5; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

    /**
     * Converts a base64 string into a Uint8Array
     */
    function base64ToUint8Array(base64) {
        var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
        var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
        for (var i = 0; i < raw.length; i++) {
            uint8Array[i] = raw.charCodeAt(i);
        }

        return uint8Array;
    }

    function loadPdf(pdfData) {
        PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
        //missing in the example code for rendering a pdf.

        var pdf = PDFJS.getDocument(pdfData);
        pdf.then(renderPdf);
    }

    function renderPdf(pdf) {
        pdf.getPage(1).then(renderPage);
    }

    function renderPage(page) {
        var viewport = page.getViewport(scale);
        var $canvas = jQuery("<canvas></canvas>");

        //Set the canvas height and width to the height and width of the viewport
        var canvas = $canvas.get(0);
        var context = canvas.getContext("2d");
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        //Append the canvas to the pdf container div
        var $pdfContainer = jQuery("#pdfContainer");
        $pdfContainer.css("height", canvas.height + "px").css("width", canvas.width + "px");
        $pdfContainer.append($canvas);

        //The following few lines of code set up scaling on the context if we are on a HiDPI display
        var outputScale = getOutputScale();
        if (outputScale.scaled) {
            var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
                (1 / outputScale.sy) + ')';
            CustomStyle.setProp('transform', canvas, cssScale);
            CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

            if ($textLayerDiv.get(0)) {
                CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
                CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
            }
        }

        context._scaleX = outputScale.sx;
        context._scaleY = outputScale.sy;
        if (outputScale.scaled) {
            context.scale(outputScale.sx, outputScale.sy);
        }

        var canvasOffset = $canvas.offset();
        var $textLayerDiv = jQuery("<div />")
            .addClass("textLayer")
            .css("height", viewport.height + "px")
            .css("width", viewport.width + "px")
            .offset({
                top: canvasOffset.top,
                left: canvasOffset.left
            });

        $pdfContainer.append($textLayerDiv);

        page.getTextContent().then(function (textContent) {
            var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
            //the page. It is set to page.number - 1.
            textLayer.setTextContent(textContent);

            var renderContext = {
                canvasContext: context,
                viewport: viewport,
                textLayer: textLayer
            };

            page.render(renderContext);
        });
    }

    var pdfData = base64ToUint8Array(pdfBase64);
    loadPdf(pdfData);
};

  3 Responses to “Rendering a PDF with text-selection, using pdf.js”

  1. Very helpful. Thank you.

  2. wow i have been trying to figure out how to do this for a while. thank you!!!!!

Leave a Reply

All original content on these pages is fingerprinted and certified by Digiprove