Skip to main content

Font Encoding and Searchable PDFs

I ran into a weird issue today I thought I'd share in case anyone else runs
into this.

In one of my applications I'm populating PDF forms via CFPDFFORM in
ColdFusion. It works great but the PDFs generated aren't searchable, by
which I mean if you're in Acrobat Reader (or any PDF reader application
from what I tested), you can search the PDF but any data that was
programmatically inserted into the PDF form fields isn't searched. So for
example I can be looking at the name "Smith" in the PDF, but if I do a
search for "Smith" it will yield 0 results.

It turns out that the reason for this is due to the encoding of the font
being used on the form fields. I chose Arial for the font (in Acrobat Pro
on the Mac if I remember correctly) when I was creating the empty form but
didn't realize that the version of Arial I chose used Identity-H encoding.
Identity-H is a double-byte encoding so I find it a bit odd that it's not
searchable, but the solution (at least that I've found so far) is to use a
font with ANSI encoding instead.

Since I've been generating PDFs with this app for 2+ years now (funny no
one noticed until now!), I guess I'll be regenerating a lot of PDFs if I
want them to be searchable. Luckily there's a function in the app for just
that purpose, but my server's going to hate me for having to do all that
work over again.

Hope that saves someone else's head and nearest wall from unnecessary
abuse.

Comments

Matthew Woodward said…
Turns out this did NOT fix the issue when the PDF for is populated by ColdFusion. We'll see what Adobe Support has to say because I'm at a loss. If I type into the PDF form manually the text is searchable, but if it's put there by CF it isn't.

Popular posts from this blog

Installing and Configuring NextPVR as a Replacement for Windows Media Center

If you follow me on Google+ you'll know I had a recent rant about Windows Media Center, which after running fine for about a year suddenly decided as of January 29 it was done downloading the program guide and by extension was therefore done recording any TV shows.

I'll spare you more ranting and simply say that none of the suggestions I got (which I appreciate!) worked, and rather than spending more time figuring out why, I decided to try something different.

NextPVR is an awesome free (as in beer, not as in freedom unfortunately ...) PVR application for Windows that with a little bit of tweaking handily replaced Windows Media Center. It can even download guide data, which is apparently something WMC no longer feels like doing.

Background I wound up going down this road in a rather circuitous way. My initial goal for the weekend project was to get Raspbmc running on one of my Raspberry Pis. The latest version of XBMC has PVR functionality so I was anxious to try that out as a …

Running a Django Application on Windows Server 2012 with IIS

This is a first for me since under normal circumstances we run all our Django applications on Linux with Nginx, but we're in the process of developing an application for another department and due to the requirements around this project, we'll be handing the code off to them to deploy. They don't have any experience with Linux or web servers other than IIS, so I recently took up the challenge of figuring out how to run Django applications on Windows Server 2012 with IIS.

Based on the dated or complete lack of information around this I'm assuming it's not something that's very common in the wild, so I thought I'd share what I came up with in case others need to do this.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Assumptions and CaveatsThe operating system is Windows Server 2012 R2, 64-bit. If another variant of the operating system is being used, these instructions may not work properly.All of the soft…

Setting Up Django On a Raspberry Pi

This past weekend I finally got a chance to set up one of my two Raspberry Pis to use as a Django server so I thought I'd share the steps I went through both to save someone else attempting to do this some time as well as get any feedback in case there are different/better ways to do any of this.

I'm running this from my house (URL forthcoming once I get the real Django app finalized and put on the Raspberry Pi) using dyndns.org. I don't cover that aspect of things in this post but I'm happy to write that up as well if people are interested.

General Comments and Assumptions

Using latest Raspbian “wheezy” distro as of 1/19/2013 (http://www.raspberrypi.org/downloads)We’lll be using Nginx (http://nginx.org) as the web server/proxy and Gunicorn (http://gunicorn.org) as the WSGI serverI used http://www.apreche.net/complete-single-server-django-stack-tutorial/ heavily as I was creating this, so many thanks to the author of that tutorial. If you’re looking for more details on …