Saturday, 25 February 2012

Separation of concerns - Refactoring XSLT for business logic

Doing some work for a client in the manufacturing sector, we've been thinking a lot about legacy code, and how to make the business rules which it contains accessible. One example of this is a set of legacy XSLT reports, which operate on XML produced by a package solution. The original reports use mostly simple features of the XSLT language, and kept very closely to the data structures in the XML.

However, as the business requirements have become more complex, the reports are increasingly cross-referencing different parts of the xml tree using foreign keys;
removing duplicates where the XML doesn't model, for instance, a many-many relationship;
adding unique ids for a particular stage in the processing;
and so forth.

So there's a technical issue, how best to resolve a structure clash*, and a management issue, how to keep the code as self-documenting as possible, and so avoid future maintenance headaches.

The approach is to transform the data, selecting the key featues and applying the business rules. It becomes a good place to comment on why certain things have been done slightly differently in different situations. Then the presentation logic can be applied separately.

Background.

An order uses several machines,
each machine's work may be split into several runs,
and each run may perform either or both of two tasks, version A or B.
The fragment below shows planned start dates and times (planstart) for runs, which is a small part of the actual table output.

Original code **


Well, it's not bad in terms of readability***. The calculations are done once at top of file, then applied in lots of places (as well as in the fragment shown). Then some matching to get the results for each run. The comment tells us why the logic applies, not just repeats the code. However, there's a sense that the code getting bulky and is becoming hard to add new rules to. If someone wanted to skim the data structures and the business rules, it wouldn't be trivial.

        <!-- Generate unique ids consistently, and make sure we match the expected sort order everywhere -->
        <xsl:variable name="runs" select="some rules"/>
        <xsl:variable name="generatedidentifiersforruns">
          <xsl:call-template name="generateidentifiersforruns">
            <xsl:with-param name="runs" select="$runs"/> 
          </xsl:call-template>
        </xsl:variable >
  ....
        <table width="745" border="1">
            <tr>
              <td style="font-size:18px; background-color:#CCC;" colspan="13">
                Overview
              </td>
            </tr>
            <tbody>
     <!--Convert template result to nodeset -->
              <xsl:variable name="ids" select="msxsl:node-set($generatedidentifiersforruns)[1]/keyAndID"/>
              <xsl:for-each select="$ids">
                <xsl:sort select="number(generatedID)"/>
                 <xsl:variable name="generatedID" select="number(generatedID)"/>
                <xsl:variable name="machineId" select="number(machine_id)"/>
                <xsl:variable name="planDate" select="string(plan_start)"/>
                <xsl:for-each select="$runs[number(machine_id)=$machineId and string(plan_start)=$planDate]">
                  <!-- expect only one match -->
                  <tr >
                    <td style="background-color:#CCC;" colspan="7">
                      Machine:
                      <xsl:value-of select="machine_name"></xsl:value-of>
                      Start: <xsl:value-of select="plan_start"></xsl:value-of>
                    </td>
                  </tr>
                  <tr>
                    <td>ID</td>
                    <td>Sub Job</td>
                    <td>Sub Job Desc</td>
     <td>Version Id</td>
                    <td>Version Desc</td>
                    ...
                    <td>Job reference</td> 
                    <td>.... total</td>
                  </tr>
                  <xsl:for-each select="task">
                    <xsl:sort select="task_version"/>
                    <tr>
                      <td>
                        <xsl:if test="position()=1">
                          <xsl:value-of select="$generatedID"/>
                        </xsl:if>
                      </td>
                      <td>
                        <xsl:call-template name="formatsubjob">
                          <xsl:with-param name="...." select="."/>
                          <xsl:with-param name="jobnumber" select="...."/>
                        </xsl:call-template>
                      </td>
      .....
                      <td style=" font-size:20px" align="right">
                         <xsl:value-of select="$job_no"/> - <xsl:value-of select="$generatedID"/>
                      </td>
                      <td>
         <!-- Maintenance note: If both A and B are carried out, task_data_total refers to both of them -->
      <xsl:for-each select="task">
           <xsl:if test="position()=1">
                              <xsl:value-of select="task_data_total"/> 
                           </xsl:if >
      </xsl:for-each>   
                      </td>
                   </xsl:for-each>
                  <!-- end task -->
                </xsl:for-each>
                <!-- end run -->
              </xsl:for-each>
              <!-- end runs -->
            </tbody>
          </table>
 

Why change things?

I certainly don't want to change anything that ain't broke. However, there were new requirements concerning whether task A, task B or both were needed (and other conditions to be taken into account).
It was important to keep our heads clear! Well, I was constantly scrolling up and down the xml file and the xslt file while debugging, so decided to write some extension**** methods to get the data only that we were concerned with, and inspect it.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
  <xsl:output method="xml" indent="yes"/>
  <xsl:include href="****.xsl"/>

  <xsl:template name="machine-extensions">
    <xsl:variable name="runs-to-show" select="...."/>
    <xsl:variable name="generatedidentifiersforruns">
      <xsl:call-template name="generateidentifiersforpressmows">
        <xsl:with-param name="runs" select="$runs-to-show"/>
      </xsl:call-template>
    </xsl:variable >
    <xsl:variable name="key-and-id-for-runs" select="msxsl:node-set($generatedidentifiersforruns)[1]/keyAndID"/>
 <!-- aha - we could remove msxsl:node-set which causes a browser dependency, 'cos the generate identifier code is only in a template for sharing reasons -->
 
 <xsl:element name="runs">
        <xsl:for-each  select="$runs-to-show">
          <xsl:variable name="machine_id" select="machine_id"/>
          <xsl:variable name="plan_start" select="plan_start"/>
          <xsl:element name="run">
            <xsl:element name="machine_id" >
              <xsl:value-of select ="$machine_id"/>
            </xsl:element>
            <xsl:element name="machine_name" >
              <xsl:variable name="machine_name">
                <xsl:call-template name="machinenamedisplay">
                  <xsl:with-param name="machine_name" select="machine_name"/>
                </xsl:call-template>
              </xsl:variable>
              <xsl:value-of select ="string($machine_name)"/>
            </xsl:element>
            <xsl:element name="plan_start" >
              <xsl:value-of select ="plan_start"/>
            </xsl:element>
            <xsl:element name="generatedID">
              <xsl:value-of select="string($key-and-id-for-runs[number(press_id)=number($machine_id)
                            and string(plan_start)=string($plan_start)]/generatedID)"/>
            </xsl:element>
            <xsl:element name ="task">
              ...
            </xsl:element>
            <xsl:element name ="machine">
    ...
            </xsl:element>
            <xsl:element name="is_total_for_all_tasks_in_the_run">
              <!-- Explanation of why task A and task B are done together, and when task_data_total refers to both together -->
              <xsl:value-of select="count(task)>1
                  and .... "/>
            </xsl:element>
   ... and lookups on raw materials, anything else the machine and pre-machine people need to know.
          </xsl:element>
        </xsl:for-each>
      </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

which produces output like this

<?xml version="1.0" encoding="utf-8"?>
<job>
 <runs>
    <run>
      <machine_id>12</machine_id>
      <machine_name>L1 (something)</machine_name>
      <plan_start></plan_start>
      <generatedID>2</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
    <run>
      <machine_id>12</machine_id>
      <machine_name>L1 (something)</machine_name>
      <plan_start></plan_start>
      <generatedID>2</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
    <run>
      <machine_id>13</machine_id>
      <machine_name>L2 (something else)</machine_name>
      <plan_start></plan_start>
      <generatedID>1</generatedID>
      <task />
      <task />
<is_total_for_all_tasks_in_the_run>false</is_total_for_all_tasks_in_the_run>
    </run>
  </runs>
</job>

And what have we gained?

From the client's point of view, we've added a new feature at reasonable cost.
Design best practice: Separation of concerns, the business logic is in the extensions file. There are more work-practice related rules than I've shown here.
Intermediate results: if any subtotals are done on the report its far easier to see where any problems lie.
Clarity of the report xslt: We can feed the result data into the report, which becomes much easier to maintain.
Consistency: We can feed the result data into several report xslts, rather than repeating code. It's a permissive change, because it would feed an xslt for a touchpad,say, now.
Clearer code: it becomes feasible to clean up the business logic


* The term comes from Jackson Structured Programming - a design paradigm which encouraged you to follow the structure of your incoming, often magnetic tape friendly, data, and/or the structure of your outgoing data or report. A structure clash happens when the two don't coincide.
** Some of the xml element names have been changed to preserve client confidentiallity. Apologies if there are mis-edits.
*** Yes, I know this is a warts and all example. Just bear with it.
**** Not quite extension methods as used in C#, but same principle. Add to what we already have, since we can't get into the package to change it.

Tuesday, 31 January 2012

Fibonnacci knitting

Other than building software, I enjoy making knitwear. Thanks to
 I've been playing with stripes this month.

The idea of using the Fibonacci numbers to make either a full scarf or to finish a neckline or cuffs isn't a new one, because they converge to converge to the Golden Ratio (which is visually A Good Thing).  And it's always good to use some kind of pattern in the 'dribble zones' of baby wear.

The idea is to come up with a balanced stripe pattern, by adding consecutive numbers
so, if we start with the seeds 0,1  we get the sequence 0, 1, 0+1=1, 1+1=2, 1+2=3, 2+3=5, 3+5=8, 13,21 etc.

The fun starts here. Any set chosen from a smallish range of these number makes a good pattern
so, choosing 2, 3,5, 8 and b (blue yarn) and G (green yarn) our garment can be striped bbGGGbbbbbGGGGGGGGG, which is boring but balanced.
Even better, if we want a 5 row rib edging (to stop the garment rolling)
we mix up the sequence numbers to 5,2,8,3,2,5 to stripe bbbbb(in rib), then change to stocking stitch GGbbbbbbbbGGGbbGGGGG  The result is almost always easy on the eye.

Even better, since the Fibonnaci numbers contains 1 and 2, we can always match our total row count from some part of the sequence. So to knit the center of the fish-and-chip baby jumper, we need 30 rows stocking stitch. My cream and coral version used (5cream +1 coral) 6 times,. The shoulder section was repeats of (3cream+2coral). 

I still haven't played with the Lucas numbers which also converge to the Golden Ratio. The first two Lucas numbers are 2 and 1 instead of 0 and 1, and so we get  2,1,3,4,7,11,18,29 and so on.