Rough Book

random musings of just another computer nerd

Tag: mozilla

Rendering a PDF with text-selection, using pdf.js

I have been working a project for the last few days, that deals with rendering PDF’s in-browser. Initially, I was going to parse the PDF and extract the text content, but then I ran into pdf.js, which is a library developed by Mozilla for rendering PDF’s in-browser via JavaScript. The project I am working on has a requirement that users should be able to select text within the PDF. This is possible using pdf.js. Unfortunately, the example code only shows you how to render a PDF, but not how to enable text-selection. I wasn’t able to find any API access to enable text-selection either. I finally ended up on the #pdfjs IRC channel and the friendly folks there gave me some direction. The logic for enabling text-selection was buried inside the code for Mozilla’s PDF viewer, and was heavily intertwined with the viewer code as well. I spent a few days playing around with the viewer and tracing through the code. I was stumped many times since the code was complex and I know jack about parsing PDF’s. But eventually I was able to focus on the part of the code that actually took care of enabling text-selection.

pdf.js’ approach to enabling text-selection is actually quite clever. The library overlays divs over the PDF, and these divs contain text that matches the PDF text that they are floating over. So when you select the text, you are actually selecting the text inside the overlaid divs. This was fine and dandy, but I was still stuck as far as getting this to work on my project. What I needed was a minimal example that I could adapt for my uses. After a day or two of tracing code, experimenting, debugging, and staring at the screen in frustration, I was eventually able to come up with a minimal example! To accomplish this, I extracted code that was relevant to creating the overlays out of the viewer code, into its own independent file. I also removed a lot of code that was dependent on the viewer itself. Keep in mind that this example doesn’t have functionality like text finding or matching, and that code is also heavily intertwined with the viewer code. All this example does is render a PDF with text-selection enabled. However, I think this is a good start!

If you are interested, you can check out the code on github and a working example on this fiddle.

The pertintent code is as follows (keep in mind you still require additional resources; all of that information is available on github):

window.onload = function () {
    var pdfBase64 = "..."; //base64 representing the PDF

    var scale = 1.5; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

    /**
     * Converts a base64 string into a Uint8Array
     */
    function base64ToUint8Array(base64) {
        var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
        var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
        for (var i = 0; i < raw.length; i++) {
            uint8Array&#91;i&#93; = raw.charCodeAt(i);
        }

        return uint8Array;
    }

    function loadPdf(pdfData) {
        PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
        //missing in the example code for rendering a pdf.

        var pdf = PDFJS.getDocument(pdfData);
        pdf.then(renderPdf);
    }

    function renderPdf(pdf) {
        pdf.getPage(1).then(renderPage);
    }

    function renderPage(page) {
        var viewport = page.getViewport(scale);
        var $canvas = jQuery("<canvas></canvas>");

        //Set the canvas height and width to the height and width of the viewport
        var canvas = $canvas.get(0);
        var context = canvas.getContext("2d");
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        //Append the canvas to the pdf container div
        var $pdfContainer = jQuery("#pdfContainer");
        $pdfContainer.css("height", canvas.height + "px").css("width", canvas.width + "px");
        $pdfContainer.append($canvas);

        //The following few lines of code set up scaling on the context if we are on a HiDPI display
        var outputScale = getOutputScale();
        if (outputScale.scaled) {
            var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
                (1 / outputScale.sy) + ')';
            CustomStyle.setProp('transform', canvas, cssScale);
            CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

            if ($textLayerDiv.get(0)) {
                CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
                CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
            }
        }

        context._scaleX = outputScale.sx;
        context._scaleY = outputScale.sy;
        if (outputScale.scaled) {
            context.scale(outputScale.sx, outputScale.sy);
        }

        var canvasOffset = $canvas.offset();
        var $textLayerDiv = jQuery("<div />")
            .addClass("textLayer")
            .css("height", viewport.height + "px")
            .css("width", viewport.width + "px")
            .offset({
                top: canvasOffset.top,
                left: canvasOffset.left
            });

        $pdfContainer.append($textLayerDiv);

        page.getTextContent().then(function (textContent) {
            var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
            //the page. It is set to page.number - 1.
            textLayer.setTextContent(textContent);

            var renderContext = {
                canvasContext: context,
                viewport: viewport,
                textLayer: textLayer
            };

            page.render(renderContext);
        });
    }

    var pdfData = base64ToUint8Array(pdfBase64);
    loadPdf(pdfData);
};

How to get AimExpress working on Mozilla, running on a Linux Box

So, I was at work and I was using AimExpress on the Windows side. I have Exceed running and I was working on some scripts on the AIX box. I didn’t want to switch back and forth, so I tried running AimExpress on Mozilla. This is Mozilla 1.6, and the website says that it supports Mozilla 1.4 and up. So I try to run it, and of course, it doesn’t work! It says the browser is unsupported! I look all over for documentation on this, but can’t find any. I guess no one has tried to do this.

This bothered me a lot, so I tried getting past it. I closed AimExpress on the Windows side and restarted it. When the login screen came up, I looked at the properties and found the URL for it. I used that in Mozilla and the login screen came up! Schweet! I logged in, but then I got the unsupported browser error… I wasn’t sure what to do… I knew that AimExpress ran on IE as some sort of DHTML application… If I was just able to get the address of the page, I was pretty sure I could run it on Mozilla by accessing that page directly. But they had disabled the context menu on the app, so there was no way I could find anything. There wasn’t even a menu bar. So I looked to see if there were any shortcut keys to View Source or Properties, or anything like that. I saw F11 for Full Screen. So just for the hell of it, I tried it out. And Voila! There was an address bar on the top, and I could see the address!

I took the address and put it in Mozilla. I accessed the page and I got the sign-on thing again. I signed in and I got the stupid “Unsupported Browser” page again. Then I figured they are definitely using cookies (or sessions), so I accessed the AIM DHTML App page again and BINGO! I was logged in! 🙂 Hehehe… I was so proud of myself!

Anyway, so here are the steps:

1. Fire up Mozilla and enter http://aimexpress.aim.com/BuddyList.svc in the address bar.
2. Log in. It will come up with an error page. Don’t worry. Enter http://aimexpress.aim.com/BuddyList.svc in the address bar again.
3. AimExpress should be up and running! Enjoy!

CUPS working

Got CUPS working. I really should document my work… smb://[email protected]/HPDeskjet812C… Anyways, all I have left to do is set up the colour scheme and stuff like that… of KDE… and install Mozilla again… Fun… Fun…

More FreeBSD

I learnt some new stuff. CvsUP is pretty neat. I learnt the hard way that when you do a make buildworld you have rebuild the kernel the non-traditional way. My kernel wasn’t compiling and for a moment I thought I would have to go through all that madness with write failure on transfers again. See, I was trying to get Mozilla to work. It wasn’t finding some shared libraries… at least the downloaded one wasn’t. So I tried to install it from /usr/ports/www/mozilla and /usr/ports/www/mozilla-devel. However, both died on some perl errors. I was also trying to install cups, and this wasn’t working either for the same reason. They were both dying on some errors in Perl5.0x. It was due to an outdated perl module. But anyway, I decided I should reinstall EVERYTHING. So I ran /stand/sysinstall and reinstalled everything through FTP. Best install I ever did. Sure, it took longer, but I had no stupid write transfer errors. Also, all my old stuff was still there. My kernel config file, my X Windows config file, and rc.conf were all there. I just had to edit hosts.conf. Everything else was fine. Even the programs I had installed before were there. Well, Mozilla works now. There are two other things I am trying to get to work. The first one involves trying to access my printer (which is currently hooked up to my XP box) from FreeBSD. I installed cups and samba and installed my printer from [url:http://localhost:631]http://localhost:631[/url]. However, lpstat says that my printer is disabled and jobs stay in the queue. So I’m not sure what’s wrong. The other involves trying to get a working version of Doom Legacy.

Anyway, through all of this bsdforums.org has been very helpful. There are a lot of people out there who are willing to help you out… After I’m a little more comfortable with FreeBSD, I plan to put up some documentation on this site. I totally love FreeBSD… I think the lines are being drawn… =) I can see why it’s better than Linux! Oh yeah, Mozilla is a pretty good browser. However, I wish they implemented support for IE filters. I’m not sure what their DOM is like, but they probably should all switch to Microsoft’s (sounds horrendous, but it’s the best choice, really) JavaScript DOM. It’s the best one. Netscape has it all backwards. I’m not sure what Mozilla’s DOM looks like, but like I said before, they should implement support for filters… I can’t see my own website on my FreeBSD box because I use filters extensively… cross browser? Yeah RIGHT! Ha!

Your own code. Use IE.

I probably should document my code. It’s not cool coming back to it after three months and wondering what the hell you were smoking when you wrote it.

I managed to figure out (vaguely) how I got fly working. I know I did something weird to the makefiles. I think I had to link it explicitly or something. Anyway, I can add Photos properly now. All I need to do is finish the “Delete Photos” and “Update Photos” option. Then I can put up the photo album for real. I need to make a non IE5.5+ version too… so that other people can use the photo album at least. I wish everyone would use IE. Seriously. Alright, alright… now you’re probably thinking “What? You can’t say that! That’s an affront to nerds everywhere!”… Microsoft is Evil, but it sure as hell makes my coding a whole lot easier if everyone just stuck to IE. Then there’s .001% Mozilla crowd going. Oh Mozilla is SOOOOOO much better than IE and Microsoft is Evil and you’re a traitor! Yeah whatever. Makes my job SOOOOOO much easier. Stick to IE and while you’re at it, PLEASE upgrade to the latest version – i.e 6.0 (eh? eh? get the pun?) If you’re still using IE4.0, that was sooo 4 years ago! UPGRADE!

All original content on these pages is fingerprinted and certified by Digiprove
%d bloggers like this: