1. Introduction
Ultra320 SCSI - Page 1
Source: Adaptec
- Introduction
SCSI celebrates its 20th anniversary with a bang by moving to the seventh generation
of the bus that introduces a maximum data transfer at a staggering 320 MB/sec.
Over the course of the past two decades the protocol has evolved from an 8-bit,
single-ended interface transferring data at 5 MB/sec to a 16-bit, differential
interface transferring data at 160 MB/sec. For the first time the SCSI protocol
has been revised in order to reduce the time spent on processing overhead, resulting
in increased performance. Ultra320 SCSI launched at end of 2001.

The Three electrical levels of SCSI:
SE=Single Ended
HVD SCSI or Differential SCSI=High voltage differential SCSI, based on EIA485
LVD SCSI=Low voltage differential SCSI
2. Features
Ultra320 SCSI - Page 2
Source: Adaptec
- SCSI Feature Sets
Mandatory features for Ultra320 SCSI as currently defined
include:
? 320 megabyte per second transfer rate using DT data transfers
? 32-bit CRC
? Simple domain validation
? Backward compatibility
? Packetized transfers only
? A free-running clock
? Skew compensation
? A training pattern
? Transmitter precompensation with cutback
Optional features for Ultra320 SCSI currently include:
? AAF
? QAS
? Fairness
? AIP
- What's New
Ultra320 SCSI introduces additional technologies that will reduce overhead
and improve performance. These changes will allow data to transfer safely and
reliably at 320 MB/sec. Ultra320 SCSI includes the following key features:
- Double Transfer Speed: This doubles the transfer rate across the
SCSI bus to a burst rate of 320 MB/sec allowing higher transfer rates across
the SCSI bus and increasing the disk drive saturation point. This results
in increased performance, especially in environments that use extended transfer
lengths or have many devices on a single bus.
- Packetized SCSI: This includes support for packet protocol. Packetized
devices decrease command overhead by transferring commands, data, and status
using DT (dual transition) data phases instead of slower asynchronous phases.
This improves performance by maximizing bus utilization and minimizing command
overhead. Furthermore, packet protocol also enables multiple commands to be
transferred in a single connection. In Ultra160 SCSI, data is transferred
in synchronous phase at 160 MB/sec, while the command and status phases are
still transferring at slower asynchronous phases and limited to a single transfer
per connection.
- Quick Arbitration and Selection (QAS): This reduces the overhead
of control release on the SCSI bus from one device to another. This improvement
reduces command overhead and maximizes bus utilization.
- Read and Write Data Streaming: This minimizes the overhead of data
transfer by allowing the target to send one data stream LUN Q-TAG (LQ) packet
followed by multiple data packets. In a non-streaming transfer, there is one
data LQ packet for each data packet. Write data streaming performance is also
increased because the bus turn-around delay (from DT data in to DT data out)
is not incurred between each LQ and data packet.
- Flow Control: This allows the initiator to optimize its pre-fetching
of data during writes and flushing of data FIFOs during reads. The target
will indicate when the last packet of a data stream will be transferred which
will allow the initiator to terminate the data pre-fetch or begin flushing
data FIFOs sooner than was previously possible.
- Ultra320 SCSI lines up with PCI-X
Faster I/O performance will saturate the PCI bus, therefore most host implementations
are tied to PCI-X. Disk drive media rates continue to increase. Later this year
the drive data rates are expected to exceed 40MB/sec. SCSI will need to jump
past Ultra160 SCSI in order to support sustained throughput from the average
number of drives in a server (four).
Under standard PCI the host bus has a maximum speed of 66 MHz. This allows
for a maximum transfer rate of 533 MB/sec across a 64-bit PCI bus. With Ultra160
SCSI, two SCSI channels on a single device achieve a maximum transfer rate of
320 MB/sec leaving plenty of overhead before saturating the PCI bus. However,
at 320 MB/sec, two SCSI channels can now achieve 640 MB/sec, which will saturate
a 64-bit / 66MHz PCI bus. In addition to PCI-X doubling the performance of the
host bus from 533 MB/sec to a maximum of 1066 MB/sec, there are protocol improvements
so that efficiency of the bus is improved over PCI. Together PCI-X and Ultra320
SCSI provide the bandwidth necessary for today’s applications.
3. Conclusion
Ultra320 SCSI - Page 3
Source: Adaptec
- Conclusion
Ultra320 SCSI is sure to add to the existing legacy of past SCSI technologies.
SCSI has come a long way from its original 5MB/sec transfer rate. At 320 MB/sec,
Ultra320 SCSI is only the latest in SCSI evolution. As technology continues
to move into the 21st century, the industry can continue to look forward to
new and faster SCSI technology. Ultra640 is already in development.
With new technologies such as packetized SCSI, QAS, training and pre-comp,
SCSI will continue to deliver performance safely and reliably for generations
to come. As performance continues to grow, so will the applications that can
take full advantage of greater I/O performance. PCI-X accelerates performance
across the host bus to 1066 MB/sec and Ultra320 SCSI is there to take full advantage
of this available bandwidth.
And as always, SCSI maintains its backward compatibility allowing customers
to protect their investment while concurrently giving them the ability to grow
as their needs increase. No other I/O technology can provide these advantages.
SCSI continues to increase its performance, features, enhancements and market
share. Ultra320 SCSI is the newest example of SCSI’s continued commitment
to providing the industry with the I/O bandwidth necessary for an increasing
number of performance hungry applications. SCSI will continue to evolve and
with Ultra640 SCSI already on the roadmap, it will be impossible to replace.
4. Detailed Features
Ultra320 SCSI - Page 4
Source: Maxtor
- SCSI features
Additional detail about the features described below is available in the ANSI
standard document SCSI Parallel Interface ? 4 (SPI-4). The latest draft of this
standard is available at
ftp://ftp.t10.org/t10/document.00/00-378r0.pdf
and
ftp://ftp.t10.org/t10/document.00/00-378r0.pdf.
DT (or ?Double-transition?) data transfers: DT transfers use both asserting
and negating transitions of the ACK and REQ signals on the SCSI bus for clocking
data transfers. This allows the transfer rate to be doubled without increasing
the frequency of the clock signal. Each transition of the clock signal transfers
two bytes of data as DT transfers are defined for use only with wide (16-bit)
transfers.
CRC (or ?Cyclic redundancy check?): CRC is an algorithm that a sender
uses to generate check bytes from transferred data. These check bytes are then
transmitted immediately following the data. The recipient calculates check bytes
from the received data and compares the result to the check bytes received following
the data. If the two sets of check bytes match, the data is correct. In this
manner CRC provides improved data reliability. CRC is defined for use only with
DT transfers.
Note: DT clocking, CRC, and other protocol components were developed for Ultra160
and patented by Quantum and are offered under ?no-fee? license agreements to
all.
Simple domain validation (also known as ?Physical layer integrity checking?):
Simple domain validation defines how an initiator can use the INQUIRY command
to query targets to determine their capabilities (e.g., maximum transfer rate),
the system configuration (e.g., the width of the bus), basic functionality of
the system components, and how the initiator can use the READ and WRITE BUFFER
commands to send and receive known data patterns from the targets for simple
data integrity validation.
Backward compatibility: Backward compatibility means that a device supporting
a new feature set can be used in physical configurations with devices that only
support transfer rates and protocols previously defined for the SCSI interface.
Examples include: the ability for transceivers to operate in ?single-ended?
mode (as opposed to the LVD, or ?low-voltage differential?, mode required by
the higher transfer rates), the ability to tolerate five volt single-ended signaling
from older devices, and the ability to function properly with the current cable
plant specifications (i.e., 25 meters in a point-to-point configuration or 12
meters with up to 16 devices on the bus).
Information unit transfers (or ?IU transfers?, also know as ?packetized?
or ?packetization?): IU transfers provide a protocol to significantly increase
overall system performance. Some of the elements of the protocol that provide
this performance increase include:
? A method for non-data transfers (like commands sent from the initiator to
the target and status sent from the target to the initiator) to occur at the
maximum negotiated data rate of up to 320 megabytes per second for Ultra320
SCSI ? as opposed to those same transfers occurring in asynchronous mode at
five megabytes per second;
? A method to transfer SPI information units for a number of I/O processes without
an intervening physical disconnection (e.g., an initiator could send several
packets each containing a queued command to the target during a single physical
connection without intervening BUS FREE phases);
? Minimizing the overhead required by eliminating several bus phase changes
per I/O process, for example: a typical WRITE operation using normal data group
transfers would require ARBITRATION, SELECTION, COMMAND, DATA OUT, STATUS, and
MESSAGE IN phases. The same WRITE operation using IU transfers would only require
ARBITRATION, SELECTION, DATA OUT, and DATA IN phases. The command and data would
be transferred during the DATA OUT phase, and the STATUS and COMMAND COMPLETE
message information would be transferred during the DATA IN phase, all at the
maximum data rate.
QAS (or ?Quick Arbitration and Selection?): QAS allows for increased
overall system performance by providing a method for arbitration to occur without
intervening BUS FREE phases. QAS can only be enabled if information unit transfers
are enabled.
Note: Packetized and QAS can each save several microseconds per operation as
this is the scale of the time it takes to perform functions like arbitration
and bus turnaround. For example: it takes 3.2 microseconds to transfer one sector
of data (512 bytes) at 160 megabytes per second for Ultra160 SCSI. Since this
time goes down to 1.6 microseconds at 320 megabytes per second for Ultra320
SCSI, it?s possible for the overhead required for a single sector READ command
to be several times greater than the time required to transfer the data for
the command for normal data group transfers (i.e., ?non-packetized? or standard
parallel SCSI transfer mode).
SCSI bus fairness (or simply ?fairness?): Fairness prevents a device
from ?hogging? the bus by guaranteeing that all devices have an opportunity
to arbitrate. Fairness must be enabled when QAS is enabled as ?hogging? could
potentially be more of an issue with that protocol.
AIP (or ?Asynchronous Information Protection?): AIP provides an enhanced
error detection method for the COMMAND, MESSAGE, and STATUS asynchronous information
transfer phases. In systems without AIP, these phases transfer information on
the lower eight data bits of a SCSI bus with only parity protection on those
transfers. AIP transfers error detection information (a BCH Hamming code) on
the upper eight data bits of the data bus simultaneous with the information
transfer. The protection code will detect all errors of three bits or fewer,
all errors of an odd number of bits, and 98.4% of all possible errors.
Free-running clock (sometimes called ?FRC?): A free-running clock is
used to improve data integrity of the clock signal by removing intersymbol interference
(or ?ISI?). ISI is the effect of a transition on a signal line on transitions
immediately before or after it on the same line. A pulse (or ?symbol?) will
cause a nearby preceding pulse to shift forward in time, and it will cause a
nearby subsequent pulse to shift backward in time (i.e., a pulse will ?interfere?
with the placement in time of adjacent pulses). By having a clock running at
a constant frequency, this effect is neutralized. The free-running clock is
restricted for use with packetized DT transfers at a 320-megabyte per second
or greater transfer rate.
Skew compensation of data signals relative to the clock signal: Skew
is the difference in time between one signal on a bus arriving at a point (e.g.,
a recipient?s connector) relative to a second signal launched by the sender
at the same time on another line on the same bus. This is caused by any combination
of several factors including differences in PCB trace or cable length and different
electrical characteristics of the different signal paths. A device looks for
the state of the data signals during a ?data valid? window in time established
by the clock. If a data transition is skewed so much relative to the clock that
it falls outside of the window, the device will not accurately detect the data.
One of the largest numbers in the error budget for Ultra320 is skew. At this
transfer rate a one nanosecond difference in the time a signal arrives at the
recipient relative to the clock could be the difference between good data and
an error. For Ultra320 the receiving device performs skew compensation on all
data signals simultaneously while examining a known data pattern (see the description
of Training pattern that follows for more detail). By knowing when data transitions
should occur on the signal lines, the receiving device determines any shift
of the data signals in time required to make the signals fall at, or near the
center of the data valid window. This shift is then applied to the signals on
all subsequent data transmissions.
Training pattern: The training pattern is a pre-determined pattern that
is transmitted from the sender to the receiver at a specified time. Because
the receiver knows what the pattern will be (i.e., exactly when data transitions
should occur), it can use portions of this pattern to perform skew compensation.
Other portions of this pattern are used by devices implementing adaptive active
filtering (a.k.a., ?AAF? or receiver equalization described later in this section)
to set the gain of the amplification and other signal adjustments. The definition
of the training pattern in the most recent draft of the ANSI standard allows
the target to control how often the pattern is sent. The pattern may be sent
before each data transmission or after some period of time or event such as
a bus reset caused by a new device being added to the system.
Transmitter precompensation with cutback: Transmitter precompensation
with cutback is an ?open loop? method of trying to compensate for signal loss
on the first pulse of a transition by ?boosting? the amplitude of the first
part of a transition, or ?cutting back? the signal for the remainder of the
transition. This method compensates for some of the signal loss that is most
severe on the first part of a transition. Transmitter precomp is called ?open
loop? because there is no standard method for the transmitter to receive feedback
from the receiver as to how much cutback should be used in any particular case
or to adjust dynamically to changes in configurations (e.g., ?hot swapping?
of devices in systems).
AAF (or ?Adaptive Active Filter?, also know as ?receiver equalization with
filtering?): AAF uses the training pattern for adaptive equalization of
the received signal while removing unwanted noise components of the signal with
a filter. This method significantly improves the quality of the received signal
(background on Quantum?s development and additional detail of this feature are
in the next section below). Using the training pattern to perform this adjustment
of signal amplitude provides for an inherent ?closed loop? system that adjusts
signal quality for different cable plants and changes in other conditions. In
addition, a standard method has been developed to provide a method for a receiver
to disable transmitter precomp in a transmitter. This method was developed because
a transmitter-receiver nexus where the receiver implements AAF provides better
signal quality when transmitter precomp is disabled, and significantly better
signal quality than a nexus with transmitter precomp only.