Saturday, 25 February 2012

Separation of concerns - Refactoring XSLT for business logic

Doing some work for a client in the manufacturing sector, we've been thinking a lot about legacy code, and how to make the business rules which it contains accessible. One example of this is a set of legacy XSLT reports, which operate on XML produced by a package solution. The original reports use mostly simple features of the XSLT language, and kept very closely to the data structures in the XML.

However, as the business requirements have become more complex, the reports are increasingly cross-referencing different parts of the xml tree using foreign keys;
removing duplicates where the XML doesn't model, for instance, a many-many relationship;
adding unique ids for a particular stage in the processing;
and so forth.

So there's a technical issue, how best to resolve a structure clash*, and a management issue, how to keep the code as self-documenting as possible, and so avoid future maintenance headaches.

The approach is to transform the data, selecting the key featues and applying the business rules. It becomes a good place to comment on why certain things have been done slightly differently in different situations. Then the presentation logic can be applied separately.

Background.

An order uses several machines,
each machine's work may be split into several runs,
and each run may perform either or both of two tasks, version A or B.
The fragment below shows planned start dates and times (planstart) for runs, which is a small part of the actual table output.

Original code **


Well, it's not bad in terms of readability***. The calculations are done once at top of file, then applied in lots of places (as well as in the fragment shown). Then some matching to get the results for each run. The comment tells us why the logic applies, not just repeats the code. However, there's a sense that the code getting bulky and is becoming hard to add new rules to. If someone wanted to skim the data structures and the business rules, it wouldn't be trivial.

        <!-- Generate unique ids consistently, and make sure we match the expected sort order everywhere -->
        <xsl:variable name="runs" select="some rules"/>
        <xsl:variable name="generatedidentifiersforruns">
          <xsl:call-template name="generateidentifiersforruns">
            <xsl:with-param name="runs" select="$runs"/> 
          </xsl:call-template>
        </xsl:variable >
  ....
        <table width="745" border="1">
            <tr>
              <td style="font-size:18px; background-color:#CCC;" colspan="13">
                Overview
              </td>
            </tr>
            <tbody>
     <!--Convert template result to nodeset -->
              <xsl:variable name="ids" select="msxsl:node-set($generatedidentifiersforruns)[1]/keyAndID"/>
              <xsl:for-each select="$ids">
                <xsl:sort select="number(generatedID)"/>
                 <xsl:variable name="generatedID" select="number(generatedID)"/>
                <xsl:variable name="machineId" select="number(machine_id)"/>
                <xsl:variable name="planDate" select="string(plan_start)"/>
                <xsl:for-each select="$runs[number(machine_id)=$machineId and string(plan_start)=$planDate]">
                  <!-- expect only one match -->
                  <tr >
                    <td style="background-color:#CCC;" colspan="7">
                      Machine:
                      <xsl:value-of select="machine_name"></xsl:value-of>
                      Start: <xsl:value-of select="plan_start"></xsl:value-of>
                    </td>
                  </tr>
                  <tr>
                    <td>ID</td>
                    <td>Sub Job</td>
                    <td>Sub Job Desc</td>
     <td>Version Id</td>
                    <td>Version Desc</td>
                    ...
                    <td>Job reference</td> 
                    <td>.... total</td>
                  </tr>
                  <xsl:for-each select="task">
                    <xsl:sort select="task_version"/>
                    <tr>
                      <td>
                        <xsl:if test="position()=1">
                          <xsl:value-of select="$generatedID"/>
                        </xsl:if>
                      </td>
                      <td>
                        <xsl:call-template name="formatsubjob">
                          <xsl:with-param name="...." select="."/>
                          <xsl:with-param name="jobnumber" select="...."/>
                        </xsl:call-template>
                      </td>
      .....
                      <td style=" font-size:20px" align="right">
                         <xsl:value-of select="$job_no"/> - <xsl:value-of select="$generatedID"/>
                      </td>
                      <td>
         <!-- Maintenance note: If both A and B are carried out, task_data_total refers to both of them -->
      <xsl:for-each select="task">
           <xsl:if test="position()=1">
                              <xsl:value-of select="task_data_total"/> 
                           </xsl:if >
      </xsl:for-each>   
                      </td>
                   </xsl:for-each>
                  <!-- end task -->
                </xsl:for-each>
                <!-- end run -->
              </xsl:for-each>
              <!-- end runs -->
            </tbody>
          </table>
 

Why change things?

I certainly don't want to change anything that ain't broke. However, there were new requirements concerning whether task A, task B or both were needed (and other conditions to be taken into account).
It was important to keep our heads clear! Well, I was constantly scrolling up and down the xml file and the xslt file while debugging, so decided to write some extension**** methods to get the data only that we were concerned with, and inspect it.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
  <xsl:output method="xml" indent="yes"/>
  <xsl:include href="****.xsl"/>

  <xsl:template name="machine-extensions">
    <xsl:variable name="runs-to-show" select="...."/>
    <xsl:variable name="generatedidentifiersforruns">
      <xsl:call-template name="generateidentifiersforpressmows">
        <xsl:with-param name="runs" select="$runs-to-show"/>
      </xsl:call-template>
    </xsl:variable >
    <xsl:variable name="key-and-id-for-runs" select="msxsl:node-set($generatedidentifiersforruns)[1]/keyAndID"/>
 <!-- aha - we could remove msxsl:node-set which causes a browser dependency, 'cos the generate identifier code is only in a template for sharing reasons -->
 
 <xsl:element name="runs">
        <xsl:for-each  select="$runs-to-show">
          <xsl:variable name="machine_id" select="machine_id"/>
          <xsl:variable name="plan_start" select="plan_start"/>
          <xsl:element name="run">
            <xsl:element name="machine_id" >
              <xsl:value-of select ="$machine_id"/>
            </xsl:element>
            <xsl:element name="machine_name" >
              <xsl:variable name="machine_name">
                <xsl:call-template name="machinenamedisplay">
                  <xsl:with-param name="machine_name" select="machine_name"/>
                </xsl:call-template>
              </xsl:variable>
              <xsl:value-of select ="string($machine_name)"/>
            </xsl:element>
            <xsl:element name="plan_start" >
              <xsl:value-of select ="plan_start"/>
            </xsl:element>
            <xsl:element name="generatedID">
              <xsl:value-of select="string($key-and-id-for-runs[number(press_id)=number($machine_id)
                            and string(plan_start)=string($plan_start)]/generatedID)"/>
            </xsl:element>
            <xsl:element name ="task">
              ...
            </xsl:element>
            <xsl:element name ="machine">
    ...
            </xsl:element>
            <xsl:element name="is_total_for_all_tasks_in_the_run">
              <!-- Explanation of why task A and task B are done together, and when task_data_total refers to both together -->
              <xsl:value-of select="count(task)>1
                  and .... "/>
            </xsl:element>
   ... and lookups on raw materials, anything else the machine and pre-machine people need to know.
          </xsl:element>
        </xsl:for-each>
      </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

which produces output like this

<?xml version="1.0" encoding="utf-8"?>
<job>
 <runs>
    <run>
      <machine_id>12</machine_id>
      <machine_name>L1 (something)</machine_name>
      <plan_start></plan_start>
      <generatedID>2</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
    <run>
      <machine_id>12</machine_id>
      <machine_name>L1 (something)</machine_name>
      <plan_start></plan_start>
      <generatedID>2</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
    <run>
      <machine_id>13</machine_id>
      <machine_name>L2 (something else)</machine_name>
      <plan_start></plan_start>
      <generatedID>1</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
  </runs>
</job>

And what have we gained?

From the client's point of view, we've added a new feature at reasonable cost.
Design best practice: Separation of concerns, the business logic is in the extensions file. There are more work-practice related rules than I've shown here.
Intermediate results: if any subtotals are done on the report its far easier to see where any problems lie.
Clarity of the report xslt: We can feed the result data into the report, which becomes much easier to maintain.
Consistency: We can feed the result data into several report xslts, rather than repeating code. It's a permissive change, because it would feed an xslt for a touchpad,say, now.
Clearer code: it becomes feasible to clean up the business logic


* The term comes from Jackson Structured Programming - a design paradigm which encouraged you to follow the structure of your incoming, often magnetic tape friendly, data, and/or the structure of your outgoing data or report. A structure clash happens when the two don't coincide.
** Some of the xml element names have been changed to preserve client confidentiallity. Apologies if there are mis-edits.
*** Yes, I know this is a warts and all example. Just bear with it.
**** Not quite extension methods as used in C#, but same principle. Add to what we already have, since we can't get into the package to change it.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.