Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

IEEE 1394 Asynchronous Requests Complete With Incorrect Data or Status and/or Time Out Under High Transfer Rates


View products that this article applies to.

Symptoms

When performing Asynchronous transfers at a high rate to or from a device connected to an IEEE 1394 bus, the Microsoft 1394 stack may return incorrect data (for a Read or Lock request) and/or incorrect status (for a Write, Read or Lock request).

In addition, the Microsoft 1394 stack may complete a particular Asynchronous request with a STATUS_IO_TIMEOUT status, when in fact a different Asynchronous request actually timed out but was reported by the Microsoft 1394 stack as having completed successfully.

This problem may affect any device connected to an IEEE 1394 bus that communicates via Asynchronous transactions, such as external hard disks and other specialized devices.

↑ Back to the top


Cause

This symptom is caused by a problem in the Microsoft IEEE 1394 bus driver stack.  The Microsoft IEEE 1394 bus driver stack does not ensure that the Transaction Label (used to match an Asynchronous Request/Response pair) is unique among the uncompleted Asynchronous requests for a given node on the 1394 bus, as required by the IEEE 1394 specification. 

This problem is exposed if for some reason a Response to an Asynchronous Request is not received from the target of the Request.  This lack of response could be due to a packet transmission error, the responding node being too busy to process the request within the required timeout period, or other errors. 

As a result, if the rate of Asynchronous transfers is high enough, a second Asynchronous Request may be sent to a device using the same Transaction Label as a previous Asynchronous Request that the Microsoft 1394 stack has not received a response and has not yet timed out.  Under these conditions, a node's Response to the second Asynchronous Request with the duplicate Transaction Label will be incorrectly matched to the prior, uncompleted Asynchronous Request.

↑ Back to the top


Resolution

If you are impacted by this issue and are unable to effectively use the following workaround, please contact Microsoft Product Support to investigate the possibility of obtaining a hotfix for the Microsoft 1394 stack to resolve this issue.

It may be possible to work around this problem in the client driver for an IEEE 1394 device by implementing the following mechanism to prevent duplication of Transaction Labels among outstanding Asynchronous Requests.

The goal is to avoid submitting an Asynchronous Request to the Microsoft 1394 stack that might use the same Transaction Label (values 0-63) as one that has not yet completed.  Therefore, the approach is to sequence the Asynchronous Requests being submitted, and avoid getting more than approximately 63 requests ahead of the oldest one that's not yet completed.  See MORE INFORMATION for background.

However, this workaround has the following limitations:

  • This workaround effectively limits the rate of Asynchronous transfers across the IEEE 1394 bus (specifically, across all devices connected to a specific 1394 host controller on the computer running Windows and using the Microsoft 1394 stack).
  • If one 1394 device fails to complete an Asynchronous transfer, or does not do so within a short time period, this workaround effectively halts the flow of Asynchronous transfers across the entire IEEE 1394 bus until the dropped or delayed Asynchronous transfer is timed out by the Microsoft 1394 stack.  Only then can Asynchronous transfers safely resume on the IEEE 1394 bus.
  • This workaround is only effective if all devices that are connected to the IEEE 1394 bus load the same driver, and thus all Asynchronous transfers (involving all devices) across the IEEE 1394 bus originate from the code that implements this workaround.  If multiple devices are connected to the IEEE 1394 bus which load different drivers, this workaround cannot be used and will not be effective.
  • This workaround cannot take into account Asynchronous transfers which are originated directly by the Microsoft 1394 stack, and thus cannot completely eliminate the occurrence of duplicate Transaction Labels among active Asynchronous transfers.

Sample code illustrating this implementation is provided below:

//
// --------------------------------------------------------------------------
// define Global Async Request tracking resources
//
#define MAX_ASYNC_REQUESTS  60
// bitmap of active Async Requests
ULARGE_INTEGER  gAsyncRequestMask;
// current Async Requests index ( 0 to MAX_ASYNC_REQUESTS )
ULONG           gAsyncRequestIndex;
// spinlock to protect modification of
// Async Request tracking resources
KSPIN_LOCK  gAsyncRequestIndexLock;
//
// --------------------------------------------------------------------------
// declare Global Async Request tracking routines
//
BOOLEAN
GetNextAsyncRequestIndex(
    OUT PULONG AsyncReqIdx
    );
VOID
FreeAsyncRequestIndex(
    IN ULONG AsyncReqIdx
    );
VOID
InitAsyncRequestIndex();

//
// --------------------------------------------------------------------------
// implement Global Async Request tracking routines
//
BOOLEAN
GetNextAsyncRequestIndex(
    OUT PULONG AsyncReqIdx
    )
/*++
    Routine Description:
        Checks whether the previous Async Request that would use
        the same Transaction Label as the current Async Request
        has completed.
        If so, returns TRUE, and sets the Async Request Index (0-63)
        for the  current Async Request (to be used later in a call
        to FreeAsyncRequestIndex, to mark this Async Request as
        completed.
        If not, returns FALSE.
 
    Arguments:
        AsyncReqIdx - Pointer to the Async Request Index for this
        new request.
    Return Value:
        TRUE is returned (and AsyncReqIdx is set) if there are no
        pending Async Requests that would be using the same TLabel value
        as the current Async Request, and the current Async Request can
        be submitted to the 1394 stack immediately.
        FALSE is returned (and AsyncReqIdx is not set) if there is
        an Async Request still pending that would use the same TLabel
        value as the current Async Request.  In this case, further
        Async Requests must be held/queued until this pending Async
        Request has completed.
--*/
{
    ULARGE_INTEGER SetMask = 0;
    KIRQL  Irql = KeGetCurrentIrql();
    BOOLEAN  bCanProceed = TRUE;
    // global value check
    ASSERT(gAsyncRequestIndex < MAX_ASYNC_REQUESTS);
    // acquire spinlock
    KeAcquireSpinLock(&gAsyncRequestIndexLock, &Irql);
    // set the bit mask
    SetMask.QuadPart = ( 1 << gAsyncRequestIndex );
    // check whether the previous Async Request 
    // (that would use this same TLabel value)
    // has completed
    if ( gAsyncRequestMask.QuadPart | SetMask.QuadPart ) {
        // The previous Async Request (that would use 
        // this same TLabel value) has NOT completed yet.
        // We need to wait (and possibly initiate recovery
        // for a failed Async Request) before proceeding
        // with further Async Requests.
        bCanProceed = FALSE;
    } else {
        // The previous Async Request (that would use 
        // this same TLabel value) HAS completed.
        // Full speed ahead.
        // set the bit to mark this Async Request as pending
        gAsyncRequestMask.QuadPart |= SetMask.QuadPart;
        // return index value for this Async Request
        *AsyncReqIdx = gAsyncRequestIndex;
        // advance to next Async Request index
        gAsyncRequestIndex++;
        // wrap back to 0 as needed
        if (gAsyncRequestIndex == MAX_ASYNC_REQUESTS) {
            gAsyncRequestIndex = 0;
        }
    }
    // release spinlock
    KeReleaseSpinLock(&gAsyncRequestIndexLock, Irql);
    return bCanProceed;
}
VOID
FreeAsyncRequestIndex(
    IN ULONG AsyncReqIdx
    )
/*++
    Routine Description:
        Marks the indicated Async Request Index as completed to that
        further Async Requests can proceed.
 
    Arguments:
        AsyncReqIdx - Async Request Index for this completed request.
    Return Value:
        None.
--*/
{
    ULARGE_INTEGER ClearMask = 0;
    KIRQL  Irql = KeGetCurrentIrql();
    // parameter check
    ASSERT(gAsyncRequestIndex < MAX_ASYNC_REQUESTS);
    // acquire spinlock
    KeAcquireSpinLock(&gAsyncRequestIndexLock, &Irql);
    // set the bit mask
    ClearMask.QuadPart = ~( 1 << AsyncReqIdx );
    // clear the bit to mark this Async Request as completed
    gAsyncRequestMask.QuadPart &= ClearMask.QuadPart;
    // release spinlock
    KeReleaseSpinLock(&gAsyncRequestIndexLock, Irql);
    return;
}

VOID
InitAsyncRequestIndex()
/*++
    Routine Description:
        Initializes the Global Async Request tracking resources.
        This should be called on Start Device (or Driver Entry)
        and Bus Reset events.
    Arguments:
        None.
    Return Value:
        None.
--*/
{
    // initialize bitmap of active Async Requests
    gAsyncRequestMask = 0;
    // initialize current Async Requests index (0-63)
    gAsyncRequestIndex = 0;
    // initialize spinlock
    KeInitializeSpinLock(&gAsyncRequestIndexLock);
    return;
}

//
// --------------------------------------------------------------------------
// check whether you can proceed with the next Async Request before submitting the Irp
//
NTSTATUS
YourAsyncRequestSubmissionRoutine()
{
...
    while ( !GetNextAsyncRequestIndex(&YourAsyncReqContext->AsyncReqIdx) ) {
    
...
    
        // Wait, retry, etc.
        // Do not proceed with additional Async Requests
        // (pause your queue, etc.) until the current oldest
        // Async Request completes, and GetNextAsyncRequestIndex
        // returns TRUE.


    
...
    
    }
...
    // ensure that AsyncReqIdx value returned by GetNextAsyncRequestIndex
    // is preserved in the context associated with this Async Request,
    // and is accessible in the completion routine for this IRP.
    IoSetCompletionRoutine( 
        Irp,
        YourAsyncRequestCompletionRoutine,
        YourAsyncReqContext,
        TRUE,
        TRUE,
        TRUE
        );
...
    status = IoCallDriver( TargetDeviceObject, Irp );
...
}
NTSTATUS
YourAsyncRequestCompletionRoutine(
    IN PDEVICE_OBJECT   DeviceObject,
    IN PIRP             Irp,
    IN PVOID            Context
    )
{
...
    PYOUR_ASYNC_REQ_CONTEXT YourAsyncReqContext = (PYOUR_ASYNC_REQ_CONTEXT)Context;
...
    FreeAsyncRequestIndex(YourAsyncReqContext->AsyncReqIdx);
...
}

Notes on the above sample implementation:

Defining MAX_ASYNC_REQUESTS as something less than 64 (for example, 60) in this sample code assumes that there will may be some reordering of Asynchronous Requests between the time when this check occurs in the 1394 client driver, when the request is submitted to the Microsoft 1394 stack by the client driver, and when the request is assigned a Transaction Label and queued to the host controller's DMA program.  On multi-processor systems with a large number of processors, it may be necessary to adjust this value downward somewhat (to perhaps (60 - Number of Processors)) to provide an adequate safety margin.

Since the Microsoft 1394 stack implements a single Transaction Label counter per host controller, a similar scope needs to be implemented for this tracking data.  If the 1394 client driver is implemented as a 1394 Virtual Device driver, where a single Device Object is created through which communication to all of the supported nodes on the bus is managed, then these "global" resources could be part of the Device Extension for the 1394 Virtual Device.  If the 1394 client driver is implemented as a PnP driver for bus-enumerated 1394 devices, and thus there is one Device Object per supported device attached to the bus, then these global resources would need to be part of the 1394 client driver's global data.

There is no effective way to cancel a pending Asynchronous Request until it is timed out by the Microsoft 1394 stack.  Thus, if a pending Asynchronous Request has become "stuck" and will eventually time out, the 1394 client driver needs to wait until that request is actually timed out by the Microsoft 1394 stack (2.2 seconds after the request was initially submitted) before it can safely issue any further Asynchronous Requests on the bus.

As an optimization, it may be possible to separately mark the "stuck" Asynchronous Request as one that has failed, clear its bit in the tracking bitmask (by calling FreeAsyncRequestIndex), and avoid sending any more requests to that particular node until that "stuck" request times out and the appropriate recovery actions can be completed for that node.  Continuing to send Asynchronous Requests in that case would cause the Transaction Label of the "stuck" request to be reused, but if such requests are only being sent to a different node than the one to which the "stuck" request has been sent, this would not be expected to have any negative consequences.

↑ Back to the top


More information

Affected Scenarios

There are two scenarios under which this problem may result in incorrect data and/or status being returned for an Asynchronous transfer by the Microsoft 1394 stack:

Scenario 1: Out-of-Order Completion of Asynchronous Requests

It is possible for an IEEE 1394 device (Node) to respond to Asynchronous Requests in an order other than the order in which the Asynchronous Requests were received.

For example, an Asynchronous Read request to a specific offset may map to a hardware register, to which the device (Responding Node) can respond very quickly.  Another Asynchronous Read request to a different offset may map to a memory address that requires additional processing by the software/firmware running on the IEEE 1394 device, and which would require a longer delay before a Response can be provided.

If, under a high rate of Asynchronous Requests across the IEEE 1394 bus, multiple Asynchronous Requests are received by the Responding Node with the same Transaction Label, the following sequence may occur:

  1. The first Asynchronous Request using a specific Transaction Label is mapped to a software-controlled address, which requires additional processing before an Asynchronous Response can be provided.
  2. The second Asynchronous Request using the same Transaction Label is mapped to a hardware-controlled address, which requires very little processing before an Asynchronous Response can be provided.
  3. The Responding Node responds to (issues an Asynchronous Response for) the second Asynchronous Request.
  4. The Microsoft 1394 stack incorrectly matches this response (for the second Asynchronous Request) to the first Asynchronous Request, since the Transaction Labels match, and completes the first Asynchronous Request with the data/status that actually correspond to the second Asynchronous Request.
  5. The Responding Node responds to (issues an Asynchronous Response for) the first Asynchronous Request.
  6. The Microsoft 1394 stack incorrectly matches this response (for the first Asynchronous Request) to the second Asynchronous Request, since the Transaction Labels match, and completes the second Asynchronous Request with the data/status that actually correspond to the first Asynchronous Request.

In this scenario, both Asynchronous Requests appear to be completed successfully.  No error is reported, and it may not be possible for the client driver for the IEEE 1394 device, or any other software component, to detect that the incorrect data/status has been returned.  Thus, silent data corruption may result.

Scenario 2: Dropped Response to Asynchronous Requests

If the Microsoft 1394 stack does not receive a response to a pending Asynchronous request, the expected behavior would be for the Asynchronous request that did not receive a valid response to time out and complete with a STATUS_IO_TIMEOUT status.

However, under these conditions, the Microsoft 1394 stack may incorrectly match Asynchronous Response packets to the wrong Asynchronous Request.  This results in the incorrect status and/or data being returned for Asynchronous Requests.  This incorrect matching of Asynchronous Responses to Asynchronous Requests may be repeated for a number of submitted Asynchronous Requests, while the high rate of Asynchronous transfers continues.

After the rate of Asynchronous transfers is reduced, an Asynchronous Request will eventually time out, since the Response that actually matched the Request was incorrectly matched to a previous Asynchronous Request.  However, when this problem occurs, the Asynchronous Request that eventually times out is not the one for which a Response was not received.  The original Asynchronous Request that did not receive a response was already completed with possibly erroneous status and/or data from a different Response.

Thus, silent data corruption may result.  The STATUS_IO_TIMEOUT error indicates that some error occured at some unknown time, but it may not be possible to determine which Asynchronous transaction actually timed out, and which previously-completed Asynchronous transactions actually contained incorrect data and/or status.

IEEE 1394 Specification Requirements

The IEEE 1394 (1995) specification defines the Transaction Label as a 6-bit value in an Asynchronous packet, with allowable values from 0-63, and requires that no Asynchronous Request may be sent to a given node using a given Transaction Label while there is an active (uncompleted) Asynchronous Request for that node using the same Transaction Label.

The Microsoft IEEE 1394 stack maintains a single Transaction Label counter for all of the nodes connected to a single 1394 host controller.  This counter is incremented for each new Asynchronous Request submitted by the Microsoft IEEE 1394 stack, and reset to 0 when incremented from 63.  Thus, if more than 64 total Asynchronous Request are submitted by the Microsoft IEEE 1394 stack before the oldest Asynchronous Request has been completed or timed out, this problem may occur.

The 1394 specification requires that, if the Responding node cannot respond to an Asynchronous Request within the timeout period, it should not respond at all.  The IEEE 1394 specifications define this timeout period as a range between 0.1 and 1.0 seconds, depending on the value of a register on each node intended to be set by the node that is performing the role of Bus Master on the IEEE 1394 bus. 

However, the Microsoft IEEE 1394 stack on Windows Vista and earlier uses an internal timeout value of 2.2 seconds, during which an Asynchronous Request will be held in a pending state awaiting a Response packet.  Therefore, this problem could occur when the overall rate of Asynchronous transfers submitted by the Microsoft IEEE 1394 stack (for all nodes on the bus) exceeds 64 Asynchronous transfers per 2.2 seconds.

References:
IEEE 1394-1995 specification: 6.2.4.3 Transaction label (tl)
IEEE 1394A-2000 specification: 8.3.2.2.6 SPLIT_TIMEOUT register

IEEE specifications can be ordered from the IEEE Standards Association.

↑ Back to the top


Keywords: KB2004130

↑ Back to the top

Article Info
Article ID : 2004130
Revision : 2
Created on : 3/15/2010
Published on : 3/15/2010
Exists online : False
Views : 407