myGrid
  1. myGrid
  2. TAV-709

T2 Enactment Error with pauls workflow

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.7
    • Fix Version/s: 1.7.1
    • Component/s: None
    • Labels:
      None

      Description

      There is a workflow of Pauls attached to TAV-706 (phenotype_to_pubmed.xml - takes the input "african trypanosomiasis AND mouse") that fails to run in T2, although its processor types and construction indicates it should. The workflow "sticks" when the queuesize of the nested workflow is 89 - I've found this is consistent when first running the workflow, but not necessarily when clicking Reset and re-running.

      Very difficult to determine what the problem may be due to a lack of decent error reporting and monitoring.

        Issue Links

          Activity

          Hide
          David Withers (Inactive) added a comment -

          This seems to be a problem with the monitor. I'm getting lots of stack traces similar to the one below.

          Exception in thread "net.sf.taverna.t2.workflowmodel.processor.dispatch.events.DispatchJobEvent@3f0f86" 
            java.lang.IllegalStateException: Timer already cancelled. 
            at java.util.Timer.sched(Timer.java:354) 
            at java.util.Timer.schedule(Timer.java:170) 
            at net.sf.taverna.t2.monitor.impl.MonitorImpl.deregisterNode(MonitorImpl.java:136) 
            at net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invoke$1.receiveResult(Invoke.java:196) 
            at net.sf.taverna.t2.activities.wsdl.WSDLActivity$1.run(WSDLActivity.java:142) 
            at java.lang.Thread.run(Thread.java:613) 
          

          If I turn the monitoring off this workflow completes and I get the same results as taverna 1.

          Show
          David Withers (Inactive) added a comment - This seems to be a problem with the monitor. I'm getting lots of stack traces similar to the one below. Exception in thread "net.sf.taverna.t2.workflowmodel.processor.dispatch.events.DispatchJobEvent@3f0f86" java.lang.IllegalStateException: Timer already cancelled. at java.util.Timer.sched(Timer.java:354) at java.util.Timer.schedule(Timer.java:170) at net.sf.taverna.t2.monitor.impl.MonitorImpl.deregisterNode(MonitorImpl.java:136) at net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invoke$1.receiveResult(Invoke.java:196) at net.sf.taverna.t2.activities.wsdl.WSDLActivity$1.run(WSDLActivity.java:142) at java.lang. Thread .run( Thread .java:613) If I turn the monitoring off this workflow completes and I get the same results as taverna 1.
          Hide
          David Withers (Inactive) added a comment -

          The sequence of events that causes this is:

          1. The invoke layer calls MonitorImpl.deregisterNode() which schedules nodeRemovalTimer to call monitorTree.removeNodeFromParent(nodeToRemove).
          2. monitorTree.removeNodeFromParent() throws IllegalArgumentException: node does not have a parent; this kills the timer thread.
          3. The next call to MonitorImpl.deregisterNode() results in nodeRemovalTimer.schedule() throwing IllegalStateException: Timer already cancelled
          4. This exception propagates back to the invoke layer so the activity invocation doesn't happen.

          There are several problems here:

          1. nodeToRemove doesn't have a parent. Not too sure why but I think it's a timing issue: the parent gets removed before the child? The parent node always seems to be DataflowActivity.
          2. The scheduled TimerTask shouldn't allow an exception to kill the timer thread.
          3. Calls to MonitorImpl.deregisterNode() shouldn't allow monitoring exceptions to stop the activity invocation.

          I think a solution would be to separate the monitoring and invocation code; perhaps by adding a monitor layer before the invoke layer in the dispatch stack.

          Show
          David Withers (Inactive) added a comment - The sequence of events that causes this is: The invoke layer calls MonitorImpl.deregisterNode() which schedules nodeRemovalTimer to call monitorTree.removeNodeFromParent(nodeToRemove) . monitorTree.removeNodeFromParent() throws IllegalArgumentException: node does not have a parent ; this kills the timer thread. The next call to MonitorImpl.deregisterNode() results in nodeRemovalTimer.schedule() throwing IllegalStateException: Timer already cancelled This exception propagates back to the invoke layer so the activity invocation doesn't happen. There are several problems here: nodeToRemove doesn't have a parent. Not too sure why but I think it's a timing issue: the parent gets removed before the child? The parent node always seems to be DataflowActivity. The scheduled TimerTask shouldn't allow an exception to kill the timer thread. Calls to MonitorImpl.deregisterNode() shouldn't allow monitoring exceptions to stop the activity invocation. I think a solution would be to separate the monitoring and invocation code; perhaps by adding a monitor layer before the invoke layer in the dispatch stack.
          Hide
          David Withers (Inactive) added a comment -

          The root of this bug is child nodes being removed from the monitor tree after their parents have already been removed. I've checked in changes to DataflowActivity and WorkflowInstanceFacadeImpl to fix this.

          Show
          David Withers (Inactive) added a comment - The root of this bug is child nodes being removed from the monitor tree after their parents have already been removed. I've checked in changes to DataflowActivity and WorkflowInstanceFacadeImpl to fix this.

            People

            • Assignee:
              David Withers (Inactive)
              Reporter:
              Stuart Owen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: