Skip to content

Rough Book

random musings

Menu
  • About Me
  • Contact
  • Projects
    • bAdkOde
    • CherryBlossom
    • FXCalendar
    • Sulekha
Menu

Rendering a PDF with text-selection, using pdf.js

Posted on June 6, 2013July 17, 2020 by vivin

I have been working a project for the last few days, that deals with rendering PDF's in-browser. Initially, I was going to parse the PDF and extract the text content, but then I ran into pdf.js, which is a library developed by Mozilla for rendering PDF's in-browser via JavaScript. The project I am working on has a requirement that users should be able to select text within the PDF. This is possible using pdf.js. Unfortunately, the example code only shows you how to render a PDF, but not how to enable text-selection. I wasn't able to find any API access to enable text-selection either. I finally ended up on the #pdfjs IRC channel and the friendly folks there gave me some direction. The logic for enabling text-selection was buried inside the code for Mozilla's PDF viewer, and was heavily intertwined with the viewer code as well. I spent a few days playing around with the viewer and tracing through the code. I was stumped many times since the code was complex and I know jack about parsing PDF's. But eventually I was able to focus on the part of the code that actually took care of enabling text-selection.

pdf.js' approach to enabling text-selection is actually quite clever. The library overlays divs over the PDF, and these divs contain text that matches the PDF text that they are floating over. So when you select the text, you are actually selecting the text inside the overlaid divs. This was fine and dandy, but I was still stuck as far as getting this to work on my project. What I needed was a minimal example that I could adapt for my uses. After a day or two of tracing code, experimenting, debugging, and staring at the screen in frustration, I was eventually able to come up with a minimal example! To accomplish this, I extracted code that was relevant to creating the overlays out of the viewer code, into its own independent file. I also removed a lot of code that was dependent on the viewer itself. Keep in mind that this example doesn't have functionality like text finding or matching, and that code is also heavily intertwined with the viewer code. All this example does is render a PDF with text-selection enabled. However, I think this is a good start!

If you are interested, you can check out the code on github and a working example on this fiddle.

The pertintent code is as follows (keep in mind you still require additional resources; all of that information is available on github):

window.onload = function () {
    var pdfBase64 = "..."; //base64 representing the PDF

    var scale = 1.5; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

    /**
     * Converts a base64 string into a Uint8Array
     */
    function base64ToUint8Array(base64) {
        var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
        var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
        for (var i = 0; i < raw.length; i++) {
            uint8Array[i] = raw.charCodeAt(i);
        }

        return uint8Array;
    }

    function loadPdf(pdfData) {
        PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
        //missing in the example code for rendering a pdf.

        var pdf = PDFJS.getDocument(pdfData);
        pdf.then(renderPdf);
    }

    function renderPdf(pdf) {
        pdf.getPage(1).then(renderPage);
    }

    function renderPage(page) {
        var viewport = page.getViewport(scale);
        var $canvas = jQuery("<canvas></canvas>");

        //Set the canvas height and width to the height and width of the viewport
        var canvas = $canvas.get(0);
        var context = canvas.getContext("2d");
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        //Append the canvas to the pdf container div
        var $pdfContainer = jQuery("#pdfContainer");
        $pdfContainer.css("height", canvas.height + "px").css("width", canvas.width + "px");
        $pdfContainer.append($canvas);

        //The following few lines of code set up scaling on the context if we are on a HiDPI display
        var outputScale = getOutputScale();
        if (outputScale.scaled) {
            var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
                (1 / outputScale.sy) + ')';
            CustomStyle.setProp('transform', canvas, cssScale);
            CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

            if ($textLayerDiv.get(0)) {
                CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
                CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
            }
        }

        context._scaleX = outputScale.sx;
        context._scaleY = outputScale.sy;
        if (outputScale.scaled) {
            context.scale(outputScale.sx, outputScale.sy);
        }

        var canvasOffset = $canvas.offset();
        var $textLayerDiv = jQuery("<div />")
            .addClass("textLayer")
            .css("height", viewport.height + "px")
            .css("width", viewport.width + "px")
            .offset({
                top: canvasOffset.top,
                left: canvasOffset.left
            });

        $pdfContainer.append($textLayerDiv);

        page.getTextContent().then(function (textContent) {
            var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
            //the page. It is set to page.number - 1.
            textLayer.setTextContent(textContent);

            var renderContext = {
                canvasContext: context,
                viewport: viewport,
                textLayer: textLayer
            };

            page.render(renderContext);
        });
    }

    var pdfData = base64ToUint8Array(pdfBase64);
    loadPdf(pdfData);
};

13 thoughts on “Rendering a PDF with text-selection, using pdf.js”

  1. vi5in says:
    June 6, 2013 at 8:56 pm

    Rendering a PDF with text-selection, using pdf.js http://t.co/JVEozv9pph

    Reply
  2. rube says:
    February 26, 2014 at 10:42 am

    Very helpful. Thank you.

    Reply
  3. jonas says:
    February 27, 2014 at 10:01 am

    wow i have been trying to figure out how to do this for a while. thank you!!!!!

    Reply
  4. ndundupan says:
    July 1, 2014 at 7:48 pm

    any idea how to adding annotation into pdf js?

    Reply
  5. niks says:
    January 5, 2015 at 4:18 am

    Hi,

    Can you tell me how would you extract the text content? I need to parse a pdf in javascript without rendering.

    Thanks,
    Nikhil

    Reply
  6. Lucas says:
    January 27, 2015 at 8:05 pm

    Hi Vivin,

    Great work on this. I can get this all working with v0.8.223, but as that version has issues with IE, I’m trying to get it working with the latest (v1.0.1118).

    Have you since been able to get this to work with newer versions of PDFjs? It extracts the text properly, but never ends up populating the text layer.

    Reply
  7. prasan says:
    January 28, 2015 at 6:31 am

    Hi , your fiddler example doesn’t work for Safari (5.1.7) windows browser. Display text is like ‘XXXXXXXXXXX..etc’ . Do you have any work around ?

    Reply
  8. Sourav chakraborty says:
    March 12, 2015 at 1:36 am

    where is the pdf file

    Reply
  9. Supriya Waghamale says:
    July 20, 2015 at 11:59 pm

    How to read page from pdf file and implement text selection ?

    Reply
  10. Jerome Chung says:
    August 3, 2015 at 1:26 am

    thanks,it’s save my life.

    Reply
  11. Raj says:
    August 3, 2015 at 11:17 am

    Hi Vivin:
    This is very helpful. However, not sure if you know, the working example on fiddle does not render on Chrome or Safari on OSX at the time of writing.

    Reply
  12. Jamil says:
    January 27, 2016 at 12:35 pm

    Where is the function getOutputScale() defined?

    Reply
    1. krishna says:
      March 2, 2016 at 5:45 pm

      its not working in IE

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • February 2023
  • April 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • June 2017
  • March 2017
  • November 2016
  • August 2016
  • July 2016
  • June 2016
  • February 2016
  • August 2015
  • July 2014
  • June 2014
  • March 2014
  • December 2013
  • November 2013
  • September 2013
  • July 2013
  • June 2013
  • March 2013
  • February 2013
  • January 2013
  • October 2012
  • July 2012
  • June 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • July 2011
  • June 2011
  • May 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • August 2008
  • March 2008
  • February 2008
  • November 2007
  • July 2007
  • June 2007
  • May 2007
  • March 2007
  • December 2006
  • October 2006
  • September 2006
  • August 2006
  • June 2006
  • April 2006
  • March 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • February 2005
  • October 2004
  • September 2004
  • August 2004
  • July 2004
  • June 2004
  • May 2004
  • April 2004
  • March 2004
  • February 2004
  • January 2004
  • December 2003
  • November 2003
  • October 2003
  • September 2003
  • July 2003
  • June 2003
  • May 2003
  • March 2003
  • February 2003
  • January 2003
  • December 2002
  • November 2002
  • October 2002
  • September 2002
  • August 2002
  • July 2002
  • June 2002
  • May 2002
  • April 2002
  • February 2002
  • September 2001
  • August 2001
  • April 2001
  • March 2001
  • February 2001
  • January 2001
  • December 2000
  • November 2000
  • October 2000
  • August 2000
  • July 2000
  • June 2000
  • May 2000
  • March 2000
  • January 2000
  • December 1999
  • November 1999
  • October 1999
  • September 1999
©2023 Rough Book | Built using WordPress and Responsive Blogily theme by Superb
All original content on these pages is fingerprinted and certified by Digiprove