Sunday, November 18, 2007

An impending reality - the Patient Contributed Image Repository

Summary: A test site for the PCIR is now up and running, allowing contribution and downloading; feedback is sought on the idea, the contribution process, the contribution agreement, and the site design itself. Go to "".

Long Version.

As you may recall from an earlier blog entry, I have been exploring the feasibility of a repository to which members of the general public could contribute their own digital medical images. Rather than wait for some grand scheme involving multiple protagonists and sources of funding to come together, I thought that it might be easier just to "build it" in the hope that "they will come". The "they", in this case, being patients willing to contribute their data.

By leveraging some very simple tools, existing relatively cheap web and data hosting services and my own time and funds, this turned out to be relatively straightforward, at least for the initial pilot.

If you wish to take a look at the test site that I have created, go to "". Send any comments you have back to me at "".

The principles behind this site are straightforward, if perhaps somewhat naive:
  • that patients have an interest in promoting the common good;
  • that they can be convinced that contributing their own images to the public domain is for the common good;
  • that if only a modest level of effort is required they would be willing to do so;
  • that patients have sufficient basic computer skills, equipment and fast enough connections to do so;
  • that patients will be satisfied that their privacy will be protected;
  • that providing unrestricted downloading will disseminate the images to the most users;
  • most important of all, that the images will actually be useful.
As I said in the introduction, the currently deployed site is a test site, and this is the pilot phase of the project. Success criteria for the pilot include confirmation that:
  • there is sufficient interest in the concept to justify proceeding
  • the ease of use is within range of the target audience of non-medical, non-IT patient contributors
  • the level of effort to obtain images +/- accompanying information is feasible
  • the type of images and information collected will be of sufficient use
  • the concept of anonymous contribution to the public domain stands up to legal and ethical scrutiny
The next steps, if the pilot is successful, are to:
  • form the non-profit corporation to manage the effort,
  • have the "contribution agreement" tidied up by the lawyers,
  • start soliciting contributions of images and funds from the general public, and
  • engage the advocacy organizations in promoting and supporting the effort.
If you have any interest in assisting with any aspect of this, please contact me directly. All features of the site are accessible from the PCIR home page.


The basic approach is that:
  • patients agree to contribute their own images and documents TO THE PUBLIC DOMAIN
  • the PCIR receives and de-identifies their images and documents
  • the PCIR distributes the de-identified set to ANYONE WITHOUT RESTRICTION
Or, to put this another way, there is no "consent" to particular uses, and there are no "data use agreements".

The definition of "public domain" is somewhat nebulous in this context; it is well-defined with respect to "creative works" like art and music and literature (and even computer software), but it is unlikely that medical images are creative works. The term is also used in the context of patents and land. Perhaps there can be no formal definition of "public domain" with respect to medical images, or medical records in general, until the term is used in a legislative or common law context to apply to such things, or until it is declared that they should be treated as if they were creative works and subject to copyright (not that I am advocating the latter). Regardless, for the PCIR's purposes, the analogy to creative works may suffice to convey the intent of both the contributor and the PCIR in this regard.

Do patients' even have the right to contribute their own images ? For that matter who actually owns them ? Certainly in the US, the HIPAA Privacy Rule has clarified that patients have a right to a copy of their medical record, regardless of who owns the "original". This seems to be a general principle that spans international boundaries, including in Europe, where the Privacy Directive specifically addresses access rights in general, not just to medical records. We assume that medical images are to all practical intents and purposes also medical records; though some medical records departments in hospitals may deny this, that seems to be because they do not store them (nor have a responsibility to), the radiology department does. The PCIR agreement in its current draft proposes that contributors do have such a right and are agreeing that they are not constrained in any manner from exercising it. This seems to be a reasonable strategy until somebody argues otherwise.


You may also be interested in some of the details of how this test site currently works.

To make tractable maintenance of the informative web pages, I use Apache Forrest. This tool, as discussed in a previous blog entry, allows one to construct the source form of the text and organization and external links in a simple XML format, and then to "build" the site using an appropriate "skin" to generate the look and feel. I can't really say enough good things about Forrest. I dare say many folks have commercial web site design tools that are more sophisticated and produce a more visually appealing result, but for the humble novice like myself who is more comfortable at the command line with a plain text editor, Forrest gets the job done.

The uploading tool when the patient decides to make a contribution is a Java Applet. This approach was chosen because a platform-neutral approach is a basic requirement; I dare say something Microsoft Windows specific would cover the majority of potential contributors, but I do not want to exclude anyone if possible. Using Java applets requires that the contributor's browser be both capable of and enabled to allow these to work. The invocation of the applet is through an HTML page that will prompt the user's browser, or the user themselves if necessary, to install a sufficiently recent version of Java to work. The lowest level of JRE that will work is SE5 (1.5), due to the need for support of various encryption features used by the applet.

The applet also requires access to resources on the local machine, both in order to read files and CDs to be uploaded, as well as to be able to transfer these over the network to the PCIR. This requires a signed applet, and for the user to agree to "trust" the applet. It seems to have become relatively commonplace nowadays for users to routinely click on "yes I trust you", pretty much regardless of the source of the applet.

It is possible to use a "self-signed certificate" to sign a Java applet and allow it to work, as long as the user does not mind seeing a message that the "digital signature has not been verified"; if one goes to the trouble of obtaining a legitimate code signing certificate from a certifying authority that is installed in the JRE, then the user instead sees a message that the "digital signature has been verified". The difference, frankly, is a little subtle. For the purposes of the test site, the applet used has been signed by PixelMed's verifiable certificate from Comodo.

[As an aside, getting such a code signing certificate is actually reasonably cheap and not too difficult; being inherently cheap, I searched long and hard using Google to find the lowest cost provider. Verisign charges a fortune for these; Thawte is not much cheaper. Comodo themselves have a relatively high price on their own site, but their reselling partners are generally much cheaper. I ended up using KSoftware; their price is right ($USD 85 for one year), and though their web site sucks, and causes all sorts of browser error messages, and will not accept credit card numbers until you give up and use their PayPal payment method, eventually you get to the Comodo site and things go smoothly from there. Since PixelMed is a legitimate business entity already, I had no trouble providing the appropriate credentials (in this case a bank statement by fax) and got the certificate almost immediately. I was also worried about getting just a Microsoft Authenticode certificate, which is all Comodo offer, since I had read all sorts of early posts about how to convert the various certificate forms from one to another and into something that the Java jarsigner can use. I need not have worried; since I was using Firefox (on a Mac as it happens), when I picked up my certificate it got automatically saved in the browser's collection of certificates. All I needed to do was then "export" it (to a PKCS12 file, as it happens), specifying a password for that exported file that I would need every time I signed with it; it worked fine with jarsigner, by specifying the exported file as the "-keystore" command line option, and using the "-storetype pkcs12" option, though I am not sure if that is strictly necessary). The CAcert Wiki was somewhat helpful in figuring out some of this.]

How does the applet manifest itself to the user ? Well, when the user navigates to the page, it checks to see if they have agreed to the contribution agreement, if not it asks them to do so, then displays the applet in the page. The user can:
  • specify a reason for the exam that they are going to upload,
  • upload an entire CD (e.g., of DICOM images), or
  • upload selected image files (e.g., of scanned documents like reports)
Once they have chosen an upload option, a file dialog appears to allow them to choose what to upload, and then packaging, compression, encryption and transfer begins immediately. When the process is complete, they can upload more if they like.

You can try this out yourself by going to the PCIR upload page, and uploading your own images; please be sure that you really do agree to contribute these to the public domain if you do so.

What is happening behind the scenes is that:
  • on starting the applet, any existing session information (stored in local preferences) is checked, so as not to keep asking the user to re-agree
  • if it is a new session, then the agreement itself is downloaded from the web site (in order to keep the web site version and the applet displayed version in concordance) and rendered in a dialog box to the user; they must agree to in order to proceed
  • when "enter reason" is clicked, a pop-up dialog is opened with buttons that have automatically generated tear off menus attached to them - these menus allow the user to choose from a pre-defined hierarchy (by category or by alphabetical nesting) of reasons for imaging exams (more about this later)
  • when upload disk or files is selected, a file chooser dialog appears; the reason for the separate buttons are two-fold; firstly, that the default directory is different (e.g., to the "My Computer" directory on Windows for CDs, or the "My Documents" folder on Windows for files); secondly, there is a well-known Java bug related to not being able to select entire drives under Windows
  • once the user has selected a CD or a set of files, these are packaged into a zip file, compressed whilst doing so, and encrypted using an AES symmetric cipher
  • the packaged, compressed and encrypted files are then transferred to the PCIR server, together with an RSA encrypted copy of the symmetric key encrypted with the current PCIR uploading public key, as well as an encrypted copy of the contribution agreement; the received files are not accessible for downloading
Note that the files chosen by the user never leave their computer in an encrypted form, satisfying is a primary requirement of the upload process to protect the contributors privacy, which is of course of paramount concern.

Once uploaded, the files enter a manually supervised de-identification process and all images and documents are both mechanically and visually checked for leakage of identifiable information, which is then removed. This includes:
  • editing of the pixel data to remove burned in identification
  • removal of all text strings that contain identifying information
  • checking and removal of either all private attributes, or those that are unsafe
Dates and times are normalized to an epoch, and longitudinal contributions (exams for the same patient on different dates) maintain their relative temporal interval. Some effort is applied to detecting separate contributions for the same individual on different occasions, both by matching of one way hash values derived from original identifiers, as well as through detection of persistent session information ("cookies") set in the user's computer's preferences (which helpful only if they use the same computer and same account on it to perform successive uploads).

On the download side, since this is only a test site, there is relatively little present; just a few examples. The primary requirement here is to make bulk downloading easy; no unwieldy "shopping cart" interfaces here. The de-identified exams are packaged up as a single set into bzip'd tar files. The reasoning for this is explained in the FAQ, but in short is to make the most efficient use of the bandwidth and storage available in lieu of their being any need to "browse" or "visualize" individual images from the PCIR website itself; i.e., you need to download and unpackage the set to use them.

Entering Reasons and Other Conditions

One of the core issues with having patients contribute their own images, is that those images would be more useful in context than alone. The PCIR site tries to encourage the contributor to also scan their radiology and pathology reports, but frankly, this may be too burdensome for many of them. With luck, some uploaded CDs may contain at least the radiology reports. A modest amount of information may occasionally be present in the DICOM image headers. As a fall back position, better than no information at all might be something that the patient themselves was willing to enter.

Accordingly, I put some effort into constructing a set of menus from which the patient could chose from a list of categories. After looking at a bunch of different available coding schemes, including SNOMED, ICD-9CM and ICD10CM, I finally settled on the Medical Subject Headings (MeSH) used by the NLM to index articles in medical journals. MeSH seemed to offer a comprehensive range of terms without being too detailed, is not encumbered by expensive or nationally-specific licensing restrictions, can be downloaded in an easily processable XML form, and most importantly, was already organized into hierarchies that translated well into menus. Some massaging was required for the lay person (e.g., to turn words like "neoplasm" into "cancer"), and there were a few missing critical categories (such as for healthy screening exams).

Let me know what you think of the result, which you can test by going to the PCIR upload page and clicking "Enter Reason".



dickie said...

Great start David. I'm currently trying to drum up some images to post.

Eric said...

Hi David,

Nice idea. Couple of feature suggestions. First some clarification on "Reason". I interpreted this to be the clinical indications which prompted the exam to be performed rather than a summary report after the fact. I went to upload images taken after a skiing accident. The reasons for the exam were a) Trauma and b) Pain. All the terms provided from the NLM index seemed to be diagnostic rather than indicative in nature, certainly I couldn't find anything to fit my category. Second, while you may get a fair number of uploads from "imaging groupies", I suspect the general public would be more more inclined to upload their own images, if they got some visibility in how those images were subsequently used. It is one thing to throw your images into a black hole knowing that somebody might use them but not having any visibility to how or when. It is another thing to know "My images were used in a study looking at [name your interesting subject]". While providing such tracking is a set of functionality would substantially increase the complexity, I think you're likely to get a much greater response from the general public if they feel a sense of participation in specific research.

David Clunie said...

Hi Eric

The reason menu is a bit convoluted; there is actually a "pain" entry, which can be found alphabetically, or by category under "Pathological Condition, Sign or Symptom", then "Sign or Symptom". Trauma is classified as "injury", and in this case, specifically "leg injury"; by category this can be found under "Disorder of Environmental Origin" (!), then "Wound or Injury", etc. Obviously the top level category needs to be made more user friendly!

Also, I noticed that your upload did not run to completion; did you cancel it deliberately, or was there an error, or did you get the impression that it did work ?

As to your second point, yes, I had planned to include some information about projects that actually use the data - e.g., article references and web links to projects that credit the PCIR as the source of their material (even though such credit is not a pre-requisite for downloading); but I had not planned to go so far as to link it to specific contributions; indeed, quite the converse - the nature of the contribution process is intended to be as anonymous as possible - once a contribution has been made and the set de-identified, there will be no record kept of where it came from or who contributed it, to minimize risks of privacy breaches.

In particular, there is no way of contacting the contributor (e.g., I cannot contact you directly to tell you that your upload failed).

Thanks for testing the site ... David

Eric said...

Hi David, The upload was left running for about 15 or 20 minutes. It only appeared to make progress during the first few seconds, then progressed only by a few 10s of bytes per minute. Estimate was sitting at 4 hours for the upload, and growing. So I canceled. I was uploading a whole disc, which was IHE portable media compliant (efilm generated). After your de-id and compression steps ran, it was about 80 MB to upload

David Clunie said...

Hi Eric

Thought it might be something like that. The instructions do suggest leaving the upload to run overnight!

Seriously though, we were having bandwidth problems on the upload server yesterday that seem to be resolved now; feel free to try again, since it should not take that long.