Saturday, July 6, 2013

Pre-Fetching: Zombie Apocalypse or Nirvana?

Summary: Pre-fetching is back, driven by sluggish access to cloud-based archives and the need for a "local cache".

Long Version.

Like characters in a bad horror movie, or an eighties band, pre-fetching is back, resurrected from the dead (if it ever was truly dead).

For a while, with the concept of "all images spinning all the time for all users" we thought we were on a roll in terms of on-demand access. Assuming all those images were spinning "locally" that is. Tape and optical disk was going the way of the dodo and we didn't have to listen to StorageTek marketing presentations about hierarchical storage masquerading as scientific abstracts at SPIE and SCAR (SIIM) any more. Worst case one could approach image egalitarianism, i.e., all image access equally fast or slow for everyone, if one also made available equal bandwidth.

Not so, it would seem.

When the HIPAA Security rule required everyone in the US to have a means of disaster recovery, and reliable off-site archives came into vogue, it was not expected that these archives would necessarily have on-demand access performance, though it created an obvious opportunity for off-site access. Likewise with the DI-r's in Canada. But nowadays the distinction between the off-site archive and the only archive you have is becoming blurred, as everyone jumps on the "cloud" (aka. Software as a Service (SaaS), or Storage as a Service (STaaS), formerly Application Service Provider (ASP)) bandwagon, based on the naive assumption that if it is good for streaming movies on your smart phone or tablet, the "cloud" must be good for everything else too.

The aggressive marketing of the Vendor Neutral Archive (VNA) concept, often implemented as, or confounded with, cloud storage, has resulted in the introduction of another "layer" between the PACS user and where the images are, in some cases.

Some disk (arrays) and their interfaces are also cheaper, and potentially slower, than others, so even in the absence of awful media like tape and optical disk, the concept of different "tiers" of storage performance (in terms of either access or in some cases reliability), has not gone away either. Obsession with regulatory and legal issues has led many people to initially purchase far more expensive storage than is perhaps the minimum necessary to do the "caring for the patient" part of the job, and left a nasty (expensive) taste in some customers' mouths. Regardless, it is hard to argue with the economies of scale a provider like Amazon might be able to obtain (as long as it wasn't branded "medical" aka. unnecessarily regulated and excessively expensive and ripe for profit taking).

Anyhow, the buzzword du jour, much bandied about at the last SIIM, was "local cache". I.e., the images that you can access in reasonable time because they live on site and are optimized for performance, and perhaps are already "inside" your PACS and don't need to be retrieved from some other person's product (like a VNA). As opposed to those that are not, for which access performance may suck. Even if you don't have a PACS per se, or access images through it, but perhaps use a (buzzword alert) "universal viewer", the performance difference between images cached in a local server rather than pulled from off-site on demand may be "noticeable", to put it mildly.

I was interested in a comment from someone (can't remember who it was, or what system or architecture they were using), who reported that a colleague genuinely thought that the "A" flag in their study browser stood for "Absent". Apparently it really stands for "Archived", but they drew their own conclusion based on their experience. [Update: Skip Kennedy claims responsibility for telling me this :)]

So, whether you want one or not, it sounds like a "local cache" is in your future, if you don't already have one, whether it be for radiologists' priors or for other users' access to contemporary or older procedures.

How do images get into such a cache in the first place? If the cache is the PACS, the obvious way is to keep the recent stuff, i.e., stuff that was recently acquired, or imported from CD or received from outside for contemporary patient care events (even if they are in the ED or the clinic and have nothing to do with radiology, i.e., are not read again). If the cache is not the PACS, but some pseudo-pod of the off-site archiving allowed to extrude into your local area network (i.e., the on-site box bit of the off-site archiving solution), then likewise, anything recent can be routed to it. But the PACS or local box may fill up, and hence a purging strategy is required (assuming failure and buying more disks are not options, which this discussion presupposes). Not every PACS can do this but let's assume it can. It might even do so intelligently (e.g., purge dead people (assuming Haley Joel Osment doesn't take up radiology), adults, acute not chronic conditions, etc.), but that is a digression.

Sooner or later the priors that are potentially useful for new procedures or for clinical care will be purged and access will be slow or non-existent. Enter the pre-fetcher, which tries to bring some intelligence to bear (?bare) on the problem of what to fetch back and when, and hopefully do it in time. The literature from the 1990's and early 2000's is replete with articles about this (just search the SCAR/SIIM, CARS, SPIE Medical Imaging conference proceedings, journals like JDI and even RadioGraphics, as well as text books like Bernie Huang's). If you are interested, a couple of classics are Levin and Fielding from SPIE MI 1990, Siegel and Reiner JDI 1998, Andriole et al JDI 2000, Bui et al in JAMIA 2001, and the work of Olivia Sheng's group and Okura et al JDI 2002 on artificial intelligence methods. Approaches range from the simple expedient of the age of the study, through using the modality, the body part or the clinical question. The relevance of the body part in particular will be discussed in a follow up post here, and was my motivation for addressing this topic in the first place.

One of the important things to bear in mind is that pre-fetching is relevant not just for radiologists' priors before reporting the current procedure. It is also important for the clinicians, who may well be interested to know, even outside the context of a current radiological procedure, that other procedures have been performed in the past, whether locally or at other facilities, and want to access them without delay at the time of patient consultation, or surgery or some other intervention. Figuring out what is relevant for a clinician maybe considerably more complicated (to optimize) in some of these scenarios than finding priors for radiology reporting, and some of the systems in these users' offices may be much less robust. In particular, local cache sizes and bandwidth may be relatively low, and so not only is fast on-demand access for large studies like whole body CTs, PETs and breast tomosynthesis challenging, but excessive pre-fetching of all images for every scheduled patient encounter may overwhelm resources, and hence need to be selective and optimized. An interesting twist to this pre-fetching scenario, is that there may be no RIS involved and hence no access to the certain events and information; on the other hand the report will likely have been completed and more information may be available from the EMR/EHR/PHR.

Another SIIM theme this year, the decomposition of traditional PACS into its various component parts, archive, display and workflow, for example, seems to be well under way, with new hardware and software technology being brought to bear on classical problems, or having to leverage classical solutions. Hopefully lessons learned in the 1990's will be effectively reapplied, rather than needing to be reinvented. New factors, such as the ability to pre-fetch from central repositories and other facilities, will add interesting challenges, or opportunities if you choose to look at them that way. Likewise, the PACS migration problem potentially overlaps with pre-fetching when the decision is made to migrate patients or studies only on anticipated need, rather than all in advance.

Don't forget though, that "A" should be for "Accessible" not "Absent", and whether it is "Archived" or not should be irrelevant to the users' experience.

It is good to know that accessibility Nirvana (the goal, not the band) is just around the corner, once again.


PS. And yes, before you comment about it, I know about "server side rendering", and about Citrix, and why sometimes the images don't have to live locally, if these mechanisms float your boat.

PPS. Just for clarity, I am obviously not talking about the use of the term "cache" in the HTTP protocol sense, by which means, as Jim Philbin regularly reminds us, non-specific "stuff" that has not changed in its content can be served up closer to where it is needed by various caching proxies using technology that has nothing to do with medical imaging applications. This is one of the major justifications for the WADO-RS DICOM stuff that grew out of the MINT project. Though, of course, if it hasn't been pre-fetched, it won't have been seen by the HTTP caches recently either, and even if it has been pre-fetched, it still might not be cached in the intervening proxies on the way to the user.

1 comment:

Unknown said...

Very good point! This problem existed for a while. Dicom Systems helps our clients to deal with it by applying post C-FIND rules to get specific data they need and issue C-MOVE only on studies that are needed base on the body part or other returned tags. Also, we are triggering this base on HL7 order.