Thursday, September 12, 2013

What Template is that?

Summary: Determining what top-level template, if any, has been used to create a DICOM Structured Report can be non-trivial. Some SOP Classes require a single template, and an explicit Template ID is supposed to always be present, but if isn't, the coded Document Title is a starting point, but is not always unambiguous.

Long Version.

When Structured Reports were introduced into DICOM (Supplement 23), the concept of a "template" was somewhat nebulous, and was refined over time. Accordingly, the requirement to specify which template was used, if any, to author and format the content, was, and has remained, fairly weak.

The original intent, which remains the current intent, is that if a template was used, it's identity should be explicitly encoded. A means for doing so is the Content Template Sequence. Originally this was potentially encoded at each content item, but was later clarified by CP 452. In short, the identification applies only to CONTAINER content items, and in a particular to the root content item, and consists of a mapping resource (DCMR, in the case of templates defined in PS 3.16), and a string identifier.

The requirement on its presence is:

"if a template was used to define the content of this Item, and the template consists of a single CONTAINER with nested content, and it is the outermost invocation of a set of nested templates that start with the same CONTAINER"

Since the document root is always a container, whenever one of the templates that defines the entire content tree of the SR is used, then by definition, an explicit Template ID is required to be present.

That said, though most SR producers seem to get this right, sometimes the Template ID is not present, which presents a problem. I don't think this can be excused by lack of awareness of the requirement, or of failure to notice CP 452 (from 2005), since the original requirement in Sup 23 (2000) read:

"Required if a template was used to define the content of this Item".

Certainly CP 452 made things clearer though, in that it amended the definition to not only apply to the content item, but also "its subsidiary" content items.

Some SR SOP Classes define either a single template that shall be used, the KOS being one example, the CAD family (including Mammo, Chest and Colon) CAD being others. So, even if an explicit Template ID is not present, the expected template can be deduced from the SOP Class. Sometimes though, such instances are encoded as generic (e.g., Comprehensive) SR, perhaps because an intermediate system did not support the more specific SOP Class, and so one still needs to check for the template identifier.

In the absence of a specific SOP Class or an explicit template identifier, what is a poor recipient to do? One clue can be the concept name of the top level container content item, which is always coded, and always present, and which is referred to as the "document title". In many cases, within the scope of PS 3.16, the same coded concept is used only for a single root template. For example, (122292, DCM, "Quantitative Ventriculography Report”) is used only for TID 3202. That's helpful, at least as long as nobody other than DICOM (like a vendor) has re-used the same code to head a different template.

Other situations are more challenging. The basic diagnostic reporting templates, e.g., TID 2000, 2005 or 2006, are encoded in generic SOP Classes and furthermore don't have a single code or unique code for the document title, rather, any code can be used, and a defined set of them is drawn from LOINC, corresponding to common radiological procedures. It is not at all unlikely that some other completely different template might be used with the same code as (18747-6,LN,"CT Report"), or (18748-4,LN,"Diagnostic Imaging Report"), for instance.

One case of interest demonstrates that in the absence of an explicit Template ID, even a specific SOP Class and a relatively specific Document Title is insufficient. For Radiation Dose SRs, the same SOP Class is used for both CT and Projection X-Ray. Both TID 10001 Projection X-Ray Radiation Dose and  TID 10011 CT Radiation Dose have the same Document Title, (113701, DCM, “X-Ray Radiation Dose Report”).

One can go deeper into the tree though. One of the children of the Document Title content item is required to be (121058, DCM, ”Procedure reported”). For a CT report, it is required to have an enumerated value of (P5-08000,SRT, “Computed Tomography X-Ray”), whereas for a Projection X-Ray report, it may have a value of (113704, DCM, “Projection X-Ray”) or (P5-40010, SRT, “Mammography”), or something else, because these are defined terms.

So, in short, at the root level, the absence of a Template ID is not the end of the world, and a few heuristics might be able to allow a recipient to proceed.

Indeed, if one is expecting a particular pattern based on a particular template, and that pattern "matches" the content of the tree that one has received, does it really matter? It certainly makes life easier though, to match a top level identifier, than have to write a matching rule for the entire tree.

Related to the matter of the identification of the "root" or "top level" template is that of recognizing subordinate or "mini" templates. As you know, most of PS 3.16 is taken up not by monstrously long single templates but rather by invocation of sub-templates. So there are sub-templates for identifying things, measuring things, etc. These are re-used inside lots of application-specific templates.

Certainly "top-down" parsing from a known root template takes one to content items that are expected to be present based on the "inclusion" of one of these sub-templates. These are rarely, if ever, explicitly identified during creation by a Template ID, even though one could interpret that as being a requirement if the language introduced in CP 452 is taken literally. Not all "included" sub-templates start with a container, but many do. I have to admit that most of the SRs that I create do not contain Template IDs below the Document Title either, and I should probably revisit that.

Why might one want to be able to recognize such a sub-template?

One example is being able to locate and extract measurements or image coordinate references, regardless of where they occur in some unrecognized root template. An explicit Template ID might be of some assistance in such cases, but pattern matching of sub-trees can generally find these pretty easily too. When annotating images based on SRs, for example, I will often just search for all SCOORDs, and explore around the neighborhood content items to find labels and measurements to display. Having converted an SR to an XML representation also allows one to use XSL-T match() clauses and an XPath expression to select even complex patterns, without requiring an explicit ID.

David


1 comment:

J. Riesmeier said...

Hi David,

nice article, which summarizes what's state of the art in DICOM. But don't you think that it would be helpful to enhance the template identification in the standard?

Many years ago, I was working on SR template detection during my PhD thesis (dissertation). To me it still seems that the template mechanism has been defined in a way that mainly facilitates the creation of SR documents, but not the "consumption".

Unfortunately, the full text of my dissertation is available in German language only, but you might be aware that I wrote a SPIE Medical Imaging paper on the main aspects back in 2005/2006: "A unified approach for the adequate visualization of structured medical reports" (see http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1277221 ).

Some of the suggestions that came out of my work were (taken from the SPIE paper):

=== CUT ===

- Simplify the template definitions: The complexity which is caused by the large flexibility of the template structures should be reduced in certain aspects without limiting the basic expressiveness. For example, the order of content items should be mandatory for most templates [11]. Furthermore, each template should contain at least one mandatory content item which has to be defined in the first row of the table. Additional content items should be appended after the last row of the particular level.

- Formalize the template definitions: The automatic detection of templates which have been used to create a document requires that the template definitions are available in an unambiguous, machine-readable format. However, in particular the conditions and value set constraints in the standard currently need appropriate interpretation. Further formalization in this field would definitely help to improve the reliability of the detection process.

- Enhance the template identification: Currently, the standard only allows for marking templates in the document tree which have a single content item with the type “CONTAINER” in the topmost level. In our opinion, it would be more reasonable to mark each content item with the identifier of the template, probably also with the row number from the template table. Alternatively, a new value type “TEMPLATE” could be introduced that could be used to encapsulate certain sub-structures without changing the real content of the document.

=== CUT ===

[11] refers to CP-463: Clarify that order template rows is significant

In the discussion section, I was also thinking about introducing a way of standardizing the visualization of SR documents:

=== CUT ===

In analogy to the existing Softcopy Presentation State that is used to specify how a medical images should be displayed, a Structured Reporting Presentation State (SRPS) could be introduced. That way, the document creator would be able to specify the preferred way of presenting the document to the observer. Likewise for DICOM images there could be multiple SR Presentation States for a single document allowing for a definition of multiple views. Based on the presented approach such an SRPS object would need to cover at least those parts of the knowledge base that are required for the stepwise visualization process. For an SR document that exclusively consists of standard templates from the DCMR this would be limited to the required display components. The final transformation into a visually perceivable presentation could be left to the visualizing application. This would be another analogy to the existing Softcopy Presentation State.

=== CUT ===

I know that in the meantime a UID reference to an encapsulated PDF has been added, which allows for specifying the intended rendering of the document...

Regards,
Jörg