Bunch of Wires (BoW) PHY Specification
The Open Domain-Specific Architecture BoW Workstream
DRAFT Version 1.9d
January 2nd, 2023
This section provides glossary used in this specification.
Term | Abbreviation | Definition |
---|---|---|
Bunch of Wires | BoW | The name for the PHY specification defined in this document |
Die-to-die | D2D | Generic term used to refer to on-package interconnect |
BoW Mode | N/A | A specific defined mode of operation for a BoW interface |
picojoules per bit | pJ/bit | Energy required to transport a bit of data over a D2D interface |
PHY | The set of circuitry physically communicating bits from one die to another | |
Core Logic | Digital logic transmitting data to/from the PHY | |
Control Logic | Logic used to manage the operation of the PHY | |
Tera/Giga bits per second | Tbps/Gbps | Measures of the speed of data transmission on the PHY |
Beachfront | The length of die edge required by a PHY implementation | |
Clock | A signal that regulates the speed of data transmission | |
Bump | Solder balls grown on a die to allow connection to off-die wires | |
Channel | A term for the physical connection between a transmitter and a receiver | |
Test | The process of verifying functional correctness of a circuit | |
Initialization | The process of preparing an interface for data transmission | |
This document uses the following terms as defined below.
Contributions to this Specification are made under the terms and conditions set forth in the modified Open Web Foundation Contributor License Agreement (“OWF CLA 1.0”) (“Contribution License”) by:
ANALOG PORT, BLUE CHEETAH ANALOG DESIGN, D-MATRIX, IBM, KEYSIGHT, TESSOLVE, VENTANA MICRO
You can review the signed copies of the applicable Contributor License(s) for this Specification on the OCP website at http://www.opencompute.org/products/specsanddesign
Usage of this Specification is governed by the terms and conditions set forth in the modified Open Web Foundation Final Specification Agreement (“OWFa 1.0”) (“Specification License”).
Notes:
NOTWITHSTANDING THE FOREGOING LICENSES, THIS SPECIFICATION IS PROVIDED BY OCP “AS IS” AND OCP EXPRESSLY DISCLAIMS ANY WARRANTIES (EXPRESS, IMPLIED, OR OTHERWISE), INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE, OR TITLE, RELATED TO THE SPECIFICATION. NOTICE IS HEREBY GIVEN, THAT OTHER RIGHTS NOT GRANTED AS SET FORTH ABOVE, INCLUDING WITHOUT LIMITATION, RIGHTS OF THIRD PARTIES WHO DID NOT EXECUTE THE ABOVE LICENSES, MAY BE IMPLICATED BY THE IMPLEMENTATION OF OR COMPLIANCE WITH THIS SPECIFICATION. OCP IS NOT RESPONSIBLE FOR IDENTIFYING RIGHTS FOR WHICH A LICENSE MAY BE REQUIRED IN ORDER TO IMPLEMENT THIS SPECIFICATION. THE ENTIRE RISK AS TO IMPLEMENTING OR OTHERWISE USING THE SPECIFICATION IS ASSUMED BY YOU. IN NO EVENT WILL OCP BE LIABLE TO YOU FOR ANY MONETARY DAMAGES WITH RESPECT TO ANY CLAIMS RELATED TO, OR ARISING OUT OF YOUR USE OF THIS SPECIFICATION, INCLUDING BUT NOT LIMITED TO ANY LIABILITY FOR LOST PROFITS OR ANY CONSEQUENTIAL, INCIDENTAL, INDIRECT, SPECIAL OR PUNITIVE DAMAGES OF ANY CHARACTER FROM ANY CAUSES OF ACTION OF ANY KIND WITH RESPECT TO THIS SPECIFICATION, WHETHER BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND EVEN IF OCP HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The Bunch of Wires (BoW) specification defines a versatile, open and interoperable physical interface between two chiplets or chip-scale-packages (CSP) in a common package, and is fully backwards compatible with the Bunch of Wires specification. This document specifies the BoW interface PHY layer, and defines a set of die-to-die parallel interfaces that implementers / adopters the the flexibility to trade off throughput/chipedge for design complexity, cost, and packaging technology. The use of BoW is expected to be confined to connect die placed close to one another within the same package. In this environment, signal attenuation is small and the interface can be simple. The definition of the BoW interface aims to meet the following OCP tenets as follows:
The BoW specification provides several key advantages for chiplet-based systems:
Compared to SerDes, BoW uses a lower data rate/wire so it requires more wires. However the lower data rates allow use of single-ended signaling and denser wire packing. In addition, in laminates, BoW can take advantage of multiple wiring layers and in advanced packaging it can take advantage of the much-increased wire density.
Date | Revision | Author | Description |
---|---|---|---|
07/26/22 | Draft 1.1a | Elad Alon | Initial updates from 1.0 to 1.1 |
09/30/22 | Draft 1.1b | Marek Hempel, | Half-slice and power management added |
Elad Alon | |||
10/26/22 | Draft 1.1c | Ken Poulton, | Configurable directionality, redundancy |
Shahab Ardalan | added | ||
Elad Alon | |||
1/2/23 | Draft 1.1d | Elad Alon | Updated timing requirements, sideband |
slice definition added | |||
1/18/23 | Draft 1.9 | Bapi Vinnakota | Updated ODSA overview |
The scope of this document has several levels.
The specification of the BoW interface includes these requirements:
The specification includes recommendations for these elements:
The following activities are outside the scope of this document:
The following aspects may be addressed in subsequent versions of this specification:
This section provides an overview of the BoW physical interface (PHY) and its use in a multi-chiplet design.
The specifications must be met over process variation, supply voltage range and temperature range (PVT). Each implementation must document its supported I/O voltage range, supply voltage range and temperature range.
Table 2 summarize the conformance points that shall be met in order to comply with the BoW specification. Each of the conformance points is discussed in the specification.
Description | Section | Detail |
---|---|---|
BoW Modes | 5.4 | |
Die-to-die Signals (Wires) | 6.2 | |
Slice Logic Interface | 6.4.1 | |
BoWx Modes and Reach | 7 | Table 8 |
Wire and Slice Ordering | 8 | |
Voltages and Termination Resistance | 9.1 | |
PHY Protection | 9.2 | |
ESD | 9.3 | |
Return Loss and Parasitic Capacitance | 9.4 | |
Clocking | 10.2 | |
Clock and Data Specs | 10 | |
Channel Skew | 12.1 | |
External Facilities | 14.1 | |
Initialization | 14.2 | |
Control Register Mapping | 16 | |
BoW is an energy-efficient, easy-to-use PHY interface between a pair of die inside a single package as shown in Figure 1. The BoW PHY is defined as a single unidirectional slice. Multiple slices are combined to create links of the desired throughput. A link may be symmetric, asymmetric or unidirectional. The BoW PHYs between two die are physically connected through wires on a substrate or interposer. A BoW PHY does not have enough drive strength for off-package interfaces, nor is it designed for buses that are entirely on die.
This document specifies the protocol for a BoW PHY slice. The aggregation of multiple PHYs into a link is beyond the scope of this document.
A BoW PHY slice either transmits or receives 16 bits of data between die. The BoW is a source-synchronous PHY and each transmitting PHY slice transmits a complementary clock signal CLK+ and CLK- with the data. A BoW PHY optionally has two additional wires designated FEC (for Forward Error Correction) and AUX, for other optional functions such as Data Bus Inversion (DBI).
Within the package, the BoW datapath is transported on physical passive wires between the pair of connected die. The specifics of the wires, such as their density, maximum length, impedance characteristics and how they are realized vary with the packaging technology. In order to minimize power, unterminated and source-terminated links will have short reaches requiring chips to be adjacent.
A BoW PHY must be operable in one of the BoW Modes listed in ascending order in Table 3. A BoW Mode defines the speed of clock and data of the PHY on the die-to-die wires. In all modes, the data must be clocked DDR: the chip-to-chip data wire bit rate is double the clock wire frequency. All BoW interfaces faster than BoW-64 should also be able to support BoW-64. Supporting rates other than the defined modes is an implementation choice. There is more detail on BoW Modes in section 7.
BoW Mode | Slice Data Rate | Wire Bit Rate | TxClk |
---|---|---|---|
Gbps | Gbps/wire | GHz | |
BoW-32 | 32 | 2 | 1 |
BoW-64 | 64 | 4 | 2 |
BoW-128 | 128 | 8 | 4 |
BoW-256 | 256 | 16 | 8 |
BoW-384 | 384 | 24 | 12 |
BoW-512 | 512 | 32 | 16 |
Figure 2 shows the tradeoff between package, data rate, termination, and reach. Source-terminated BoW on laminate allows a longer reach than advanced packaging, but the wider design rules in laminate means that both of these cases are barely able to reach 8 Gbps/wire. A doubly-terminated link offers longer distances and higher rates, but requires a more complicated receiver design.
Figure 3 shows the logic interface between a BoW slice and the digital Link Layer logic in a chip. The speed at the logic interface (Figure 1) is implementation-dependent. Typically, PCLK will be the TxClk frequency divided by a power of 2, so 250, 500 and 1000 MHz are common rates. The data at the logic interface is SDR (bit rate equal to PCLK frequency).
This section specifies the control data signals into and out of device logic and package for BoW RX and TX slices.
As shown in Figure 1, each BoW slice consists of a differential clock pair, 16 single-ended data wires, and an optional pair of wires FEC and AUX.
FEC (Forward Error Correction) is an optional signal that allows using FEC to improve the bit error rate (BER), or may be used for redundancy for defect repair. By using an additional wire when FEC is enabled, the payload data rate is not affected and the wire data rate is unaffected. This allows F(PCLK) = F(TxClk) / 2n with FEC off or on, which simplifies the clock generation and serialization functions. If used, FEC is implemented in the Link Layer, and the PHY treats the FEC bit the same as the other data bits.
AUX is an optional signal that may be used for purposes such as Data Bus Inversion (DBI), flow control, redundancy for defect repair, etc. The Link Layers of Chiplets A and B will need to agree on the details on FEC and AUX usage. An implementation may choose to support the FEC and AUX wires, or to omit both of them. If FEC and AUX are included in a PHY implementation, the PHY carries them in the same way as the data bits without acting on the content.
Table 4 summarizes these signals.
Function | # Wires | Signal Name | Notes |
---|---|---|---|
Clock | 2 | CLK+, CLK- | Differential |
Data | 16 | D0-15 | |
Forward Error | 0/1 | FEC | Optional |
Correction | |||
Auxiliary | 0/1 | AUX | Optional |
Data Bus Inversion (DBI) may be used to mitigate simultaneous switching output (SSO) noise or to optimize energy of a BoW PHY by reducing the number of BoW data wires that switch between adjacent data transfer cycles. DBI functionality is optional; it one of several possible uses of the AUX wire. If implemented, DBI is in the Link Layer and must be implemented on both RX and TX.
Figure 3 shows the data and control signals in the interfaces to the logic in the die in each BoW transmit and receive slice. The data at the slice logic interface must be SDR (Single Data Rate - bit rate equal to the PCLK frequency).
The signals in Table 6 shall constitute the data and clocks in the logic interface of the PHY. N is the ratio of the chip-to-chip per-wire data rate to the logic interface per-wire data rate.
Signal | # Bits | TX Slice | RX Slice | Description |
---|---|---|---|---|
PD | 16*N | In | Out | Data |
PFEC | N or 0 | In | Out | Forward Error Correction (optional) |
PAUX | N or 0 | In | Out | Auxiliary uses (optional) |
PCLK | 1 | Out | Out | |
TxClk | 1 | In | NA | Comes from a PLL or other clock source, |
not the Link Layer. | ||||
The TxClk source is usually shared | ||||
among many TX slices. | ||||
May be differential | ||||
RxClk | 1 or 0 | NA | Out | May be differential |
Figure 4 shows the block diagram of a bidirectional slice. A bidirectional slice shall have one set of 18 or 20 signal bumps and wires and two sets of signals connecting to the chiplet's core logic (or Link Layer).
In mission mode, each slice must have only the PHY TX or the PHY RX enabled. For loopback test, both may be enabled. For loopback, it is recommended to enable testing one slice at a time to avoid drawing both RX and TX power of all the slices in the whole link at the same time.
The data at the slice logic interface must be SDR (Single Data Rate - bit rate equal to the TXPCLK/RXPCLK frequency).
The signals in Table 6 shall constitute the data and clocks in the logic interface of a bidirectional PHY. N is the ratio of the chip-to-chip per-wire data rate to the logic interface per-wire data rate.
Signal | # Bits | Direction | Description |
---|---|---|---|
TXD | 16*N | In | TX Data |
TXFEC | N or 0 | In | Forward Error Correction (optional) |
TXAUX | N or 0 | In | Auxiliary uses (optional) |
TXPCLK | 1 | Out | SDR clock for the TXD, TXFEC, TXAUX bits |
TXCLK | 1 | In | Comes from a PLL or other clock source, |
not the Link Layer. | |||
The TXCLK source is usually shared | |||
among many TX slices. | |||
May be differential | |||
RXD | 16*N | Out | RX Data |
RXFEC | N or 0 | Out | Forward Error Correction (optional) |
RXAUX | N or 0 | Out | Auxiliary uses (optional) |
RXPCLK | 1 | Out | SDR clock for the RXD, RXFEC, RXAUX bits |
RXCLK | 1 or 0 | Out | May be differential |
A BoW interface slice must provide the control and status signals shown in Table 7.
Signal | # Bits | TX Slice | RX Slice | Description |
---|---|---|---|---|
PHYResetB | 1 | In | In | Resets the BoW slice. |
0 causes a reset | ||||
PHYReady | 1 | Out | Out | Indicates that the PHY is ready to |
transmit/receive mission mode data. | ||||
1 indicates ready | ||||
PHYIdle | 1 or 0 | In | N/A | Optional signal |
Active high indicates to the TX slice that | ||||
it should enter the clock gated | ||||
state on the next parallel word | ||||
aligned clock edge | ||||
The PHYResetB pin shall be asserted by the link controller to initialize the PHY. While the PHYResetB signal is asserted, the PHY shall stay in its reset state. When the PHYResetB signal is de-asserted, the PHY shall perform any necessary self-alignment. The reset states are otherwise implementation-dependent and shall be documented in the datasheet of a particular implementation.
On a TX slice, the PHY shall assert PHYReady to indicate it is transmitting appropriate CLK and PCLK signals, and that it is ready to transmit data.
On an RX slice, when PHYResetB is deasserted, the PHY assumes that the corresponding TX slice is sending CLK and that the TX Link Layer is sending training data on the data wires.
After the RX slice clock self-alignments are complete, each RX PHY slice shall assert its PHYReady pin. How an RX PHY slice determines completion of the self-alignment is implementation-dependent. For instance, it may be determined by observing the settling of the DLL or by a simple timer. PHYReady asserted indicates that any data received will be captured correctly.
Further description of this optional signal and its functionality are provided in Section 11.
There shall be an AMBA APB programming interface to control internal registers for control and status readout of the PHY.
The internal registers are implementation-dependent. The internal registers shall be fully documented in the PHY datasheet.
There shall be a Link Controller (LC) outside the PHY. This will manage initialization of the Link. It may reside on one of the chiplets of the link, in a third chiplet in the package or outside the package.
Communication from the Link Controller across chiplets shall be by a transport mechanism outside the BoW link. This could be a serial link like SPI or I2C, but this is not specified at this time.
Link initialization is described in Section 14. Clocks are described in 10.2.
A BoW PHY slice must conform to at least one of the BoW Modes seen in Table 3. The recommended maximum wire reach for different packaging types and terminations is seen in Table 8. Exceeding these reach values may degrade the voltage margins at the receiver. See section 12 for how TX, RX and channels are qualified.
“Laminate” is intended to include organic laminate packages (a.k.a. “buildup”") and similar technologies with approximately 25 μm line and space rules. The minimum wire length for closely spaced chips in these technologies is around 3 mm for the slice closest to the chip edge.
“Advanced” is intended to include silicon interposer and similar technologies. These have much finer line and space dimensions, but traces are usually much more resistive than in organic laminate packages and will be limited to much shorter trace lengths. Due to these short traces, termination is not expected to be useful for implementations targeting Advanced packaging. The minimum wire length in these technologies may be less than 1 mm.
Package | Laminate | Laminate | Laminate | Advanced | ||
---|---|---|---|---|---|---|
Termination | None | Source | Double | None | ||
BoW Mode | Wire Bit Rate | TxClk | Reach | Reach | Reach | Reach |
(Gbps/wire) | (GHz) | (mm) | (mm) | (mm) | (mm) | |
BoW-32 | 2 | 1 | 10 | 20 | 25+ | 4 |
BoW-64 | 4 | 2 | NA | 10 | 25+ | 2 |
BoW-128 | 8 | 4 | NA | 5 | 25+ | 2 |
BoW-256 | 16 | 8 | NA | NA | 25+ | 2 |
BoW-384 | 24 | 12 | NA | NA | 25+ | 2 |
BoW-512 | 32 | 16 | NA | NA | 25+ | 2 |
Adding termination increases the speed and/or reach, at the expense of greater design complexity and power.
The physical diagrams and descriptions in this document must be interpreted as looking down at the top layer of the unpackaged chiplets. Since these are flip-chip packages, these views are equivalent to looking through the bottom of the package with the balls up (dead bug view). For the view as seen looking down on a package as mounted on a PCB (live bug view), these views must be mirrored.
A BoW link between two chiplets is made up of wires, slices, and stacks as seen in Figure 5.
The minimal bidirectional reference link is shown in Figure 6.
In this example, each chiplet has one TX slice and one RX slice, arranged in two one-slice stacks on each chiplet. This is a dead-bug view.
For low-bandwidth or constrained die-edge applications a half-link or 8-bit link consisting of 8 data wires per half-slice is defined as shown in Figure 7.
The minimal bidirectional reference half-link is shown in Figure 8 .
In this example, each chiplet has one TX half-slice and one RX half-slice, arranged in two half-slice stacks on each chiplet. This is a dead-bug view.
A half-link shall be compatible with a regular-width link by fanning out the signal wires D0-D7, the differential clock pins and optionally AUX and FEC, see Figure 9 . Each half-slice shall be connected to a unique regular-width slice on the other chiplet. Connecting two half-slices on one chiplet to a regular slice on another chiplet is not permitted.
To accommodate this compatibility, the intended configuration (8-bit or 16-bit slices) shall be programmed into the PHY at startup via APB. Unused transmit pins should be open-circuited to save power, and unused receive pins should be disabled or make use of a weak pulldown (as discussed in Section 9.2).
Function | # Signals | Signal Name | Notes |
---|---|---|---|
Clock | 2 | CLK+, CLK- | Differential |
Data | 16 | D[15:0] | |
Forward Error | 0/1 | FEC | Optional |
Correction | |||
Auxiliary | 0/1 | AUX | Optional |
Each BoW slice consists of a differential clock pair, 16 single-ended data wires, and optional wires FEC and AUX. Each BoW slice is unidirectional when in operation. A PHY may be designed as RX-only and TX-only slices, or each slice may have both TX and RX capability, one of which is selected at configuration time. A bidirectional link is composed of some whole number of slices configured for RX and some whole number of slices for TX.
FEC (Forward Error Correction) is an optional signal that allows using error correction to improve the bit error rate (BER). AUX is an optional signal that may be used for purposes such as DBI, flow control, redundancy, etc. Chiplets A and B will need to agree on the details on FEC and AUX usage, which is defined in the Link Layer.
A BoW interface must conform to these wire order rules at the edge of the chip:
Note that bump patterns are not specified by BoW; only the signal ordering at the chip edge is specified for interoperability.
The reference example in Figure 6 uses hexagonal closest packing for the bumps: two rows for signal bumps and one row for power and ground bumps. In this pattern, the wire pitch is half the bump pitch.
Alternate bump arrangements may include:
Somewhat different wire pitches between two chiplets may be accommodated with fan-out in the chip-to-chip wires. This is limited by the maximum skew due to different wire lengths - see section 12.1.
An example cross section for an organic laminate (a.k.a. “buildup”) package is shown in Figure 10.
In an organic laminate package, signal layers should be alternated with ground layers in order to maintain a controlled impedance of 50 Ω. Each slice position (A, B, C, D) should be associated with one signal layer and there should be no mixing of signals from multiple slices.
In any technology, the position-A slice on chiplet A must be connected to the position-A slice on chiplet B (one must be configured for TX and one for RX). The position-B slices are connected together, and so on.
There is no specified limit to the number of slices in a stack. In organic laminate, the practical limit in 2020 is an 8-2-8 laminate which supports 4 slices as shown in Figure 10. A 7-2-7 laminate may support 4 slices by omitting the top GND layer, but with reduced signal integrity. Layers on the bottom side of the package typically cannot be used for BoW signals due to low via density passing through the thick central core layer.
In advanced packaging technologies, the shorter wire lengths and higher wire resistance suggests the use of non-controlled-impedance wires and unterminated transmitters and receivers. The smaller wire and space dimensions may allow the wires for multiple slices to be interleaved on a single wiring layer. The wire order within each slice must be maintained, even if interleaving with other slices is used.
To optimize the density of hexagonal bump arrays, slices in positions B and D may be offset horizontally by one half the bump pitch as seen in Figure 11. This necessitates a one-bump-pitch horizontal jog in the wires for slices B and D. The practical effect of this 130-um jog across a 2.5+ mm wire between chiplets is very small.
An alternative arrangement is to keep the slices aligned vertically. This requires adding a small extra vertical space between the slices, for an overall increase of 4% of the slice area.
A BoW interface composed of unidirectional slices must conform to these slice numbering rules:
An example of this numbering is shown in Figure 12.
The signal ordering and slice numbering rules allow BoW chiplets to be connected without signal reordering regardless of chiplet rotations.
For bidirectional links, a pattern of alternating TX and RX stacks should be used. Figure 12 shows an example bidirectional link with 4 stacks of 4 slices each, for 8 TX and 8 RX slices on each chiplet. The first TX stack should be at the left edge of the link.
Asymmetric and unidirectional links may use any slice pattern, but the slice numbering rules must be observed.
In BoW-256 at 16 Gbps/wire, the link in Figure 12 provides a total of 2.0 Tb/s in each direction. In an organic substrate using the hexagonal bump pattern of Figure 6 with a bump pitch is 130 um, the total edge width is 5.2 mm (4.16 mm without AUX and FEC); the depth from the edge is 1.35 mm. In an interposer, if the bump pitch is 40 um, the edge width is 1.60 mm (or 1.28 mm) and the depth is 0.42 mm.
A bidirectional BoW PHY slice is designed to operate as RX or TX, to be configured upon powerup. This allows complete flexibility in link configuration and interoperability and also provides an opportunity for wafer-level loopback testing before package assembly (known good die).
In bidirectional slices, the wires and slices must be numbered as if they are all TX slices.
Bidirectional slices are connected from chiplet to chiplet as in Figure 13: AUX to FEC, D0 to D15, D1 to D14… FEC to AUX.
In order to ensure interoperability between differing BoW PHY implementations, this chapter provides a set of electrical specifications that all such BoW PHY implementations must meet.
All BoW implementations must support signaling based on a 0.75 V “I/O voltage”. BoW PHYs may also support higher or lower signaling voltages, but must support 0.75 V based signaling for interoperability.
Note that the simplest implementation is to provide a 0.75 supply voltage to the BoW VDD bumps, but the supply voltage may be different from the I/O voltage as long as the signal voltages meet the specification.
In doubly terminated modes of operation, the RX termination resistance must be connected to 0V, the I/O voltage, or mid-rail of the I/O voltage (e.g., 375 mV with a 0.75 V I/O voltage). The selection of termination voltage is expected to be static (hardwired) in the RX, and must be specified in the receiver's datasheet. It is expected that Source-Series-Terminated (SST) Transmitters will be largely agnostic to the choice of termination voltage on the receiver. Note that BoW receivers which support the optional data idle state described in Section 11.1 must terminate to 0V, and must not terminate to the I/O voltage or mid-rail.
Regardless of the value selected for the I/O supply voltage, BoW transmitters and receivers must meet the DC termination resistance requirements defined in Table 10. Note that TX/RX termination (output/input) resistance values are skewed low/high compared to the channel impedance in order to ensure that the DC single-ended voltage swing at the RX is never reduced to less than half of the I/O voltage (i.e., 375 mV for a 0.75 V I/O voltage). Note that these termination resistance values must be met with all combinations of data inputs (logical 1 and logical 0), termination voltage selections (ground terminated, supply terminated, or mid-rail terminated), and termination resistance values. For example, a BoW TX must achieve between 36 Ω and 50 Ω resistance when driving a load resistance (modeling the RX termination) of 50-69 Ω with any of the three termination options.
Unterminated | Source-Terminated | Doubly Terminated | Doubly Terminated | |
---|---|---|---|---|
(BoW-256 or lower) | (BoW-256 or lower) | (BoW-384 or higher) | ||
TX DC Term. | As required to meet TX risetime | 36 Ω - 50 Ω | 36 Ω - 50 Ω | 36 Ω - 43 Ω |
(0.72 - 1.0 Zchan) | (0.72 - 1.0 Zchan) | (0.72 - 0.86 Zchan) | ||
RX DC Term. | - | - | 50 Ω - 69 Ω | 59 Ω - 69 Ω |
(1.0 - 1.38 Zchan) | (1.18 - 1.38 Zchan) | |||
Within-Slice | - | σ = 1.333% | σ = 0.667% | σ = 0.667% |
DC Term. | (8% over 6 σ) | (4% over 6 σ) | (4% over 6 σ) | |
Matching | ||||
Especially in doubly terminated modes, within-slice variations of termination resistance would directly result in varying swing levels at each pin. Thus, in order to reduce or eliminate the need for per-pin voltage reference adjustment at the RX, Table 10 also specifies requirements on DC termination resistance matching across all I/O's within a given BoW slice. The σ for this variation in the table must be interpreted as capturing within-slice manufacturing variability across worst-case voltage/temperature operating conditions, and is expected to be primarily influenced by some combination of transistor and explicit resistor matching (with the mix depending on the circuit implementation).
The lifetime of a BoW PHY should not be negatively impacted (i.e., the PHY should not be damaged) despite exposure to the following conditions:
A BoW PHY implementation must document how long it can withstand without damage (i.e., without degradation in part lifetime) an accidental incorrect state with TX configured and enabled on both ends of a wire.
A bidirectional PHY slice shall have the TX circuits disabled (high impedance) upon assertion of PHYResetB and require an APB command to be turn them on.
A unidirectional PHY TX slice need not be disabled at reset.
All RX slices must avoid “crowbar” states where a floating or mid-range input voltage can cause a large DC current to flow in the RX circuit. Therefore a PHY must either:
BoW I/O shall be designed to withstand 50 V CDM (Charged Device Model) and 250 V HBM (Human Body Model) at the bumps. This requirement is deemed sufficient for intra-package signaling, similar to other die-to-die interface standards.
Note that for CDM, the ESD current corresponding to 50 V depends on the package size. (E.g., 50 V CDM on a 65 x 65 mm package is ~1.3 A.) A BoW PHY shall document the package size assumed for CDM.
Since BoW PHYs are targeted for relatively dense and simple realizations, it is expected that the primary frequency-dependent parasitics seen at a PHY's I/Os will be capacitive in nature. Table 11 provides limits on the maximum “equivalent” capacitance allowed on each side of each BoW I/O pin. (E.g., a BoW-128 TX is allowed to have up to 500 fF of equivalent capacitance.) Note that while the maximum capacitance specification does increase at lower data-rates, it is recommended that BoW PHY implementations retain as low of a capacitance as practical in order to reduce power consumption and improve signal integrity.
BoW–32 or BoW-64 | BoW-128 | BoW-256 | BoW-384 or BoW-512 | BoW-384 or BoW-512 | |
---|---|---|---|---|---|
Source Term. | Doubly Term. | ||||
Maximum Equivalent | 800 fF | 400 fF | 200 fF | 200 fF | 125 fF |
Capacitance (TX or RX) | |||||
Since the actual frequency-dependent impedance profile of any given implementation may be comprised of a complex electrical network, conformance with the “equivalent” capacitance metric is formally defined by requiring that the magnitude of the return loss of any BoW I/O must be lower than the maximum limits shown in Figures 14 and 15 below. (Note that the return loss requirements are different for TX and RX because of the differences in DC termination between the two sides.) Similarly to DC termination resistance, the maximum s11 magnitude in the figure must be met with all combinations of data inputs (logical 1 and logical 0), termination voltage selections (ground terminated, supply terminated, or mid-rail terminated), and termination resistance values.
While this specification does not place a direct requirement on the bandwidth of a BoW receiver implementation, such receivers should maintain an effective 3 dB bandwidth of at least (0.667/Tbit) Hz. For example, for a BoW-256 PHY, the receiver 3 dB bandwidth is recommended to be at least 10.667 GHz.
The PHY TX serializer shall order data this way (referring to Figure 16):
The RX PHY shall order bits in the same fashion. However, the bits at the RX PHY logic interface P_D[*] may be offset by a multiple of 16 bits from the TX order if the TX and RX PCLK dividers are not aligned. A PHY implementation may provide a way to align the TX and RX dividers, or it may rely on the Link Layer to rotate the RX P_D[*] bits to provide that alignment as part of the training of the Link Layer.
Figure 16 shows the clock and data flow for a single TX slice and a single RX slice. On the TX side, data bits (and optional FEC and AUX bits) come in a wide word from the Link Layer, and are serialized to the line rate. At the RX side, they are sampled with a common slicer clock in most BoW implementations. BoW PHYs may optionally implement per-bit delay adjust or per-bit slicer clock adjust.
BoW PHYs shall be DDR (Double Data Rate) at the chip-to-chip interface: the data bit rate is twice the clock frequency, so data is clocked in on both edges of the clock in the RX slice. BoW PHYs shall be SDR (Single Data Rate) at the logic interface.
Table 12 provides recommended clock and data rates for each BoW mode. The ratio M should be limited to integers, preferably powers of two, and any other ratios should be implemented outside the PHY.
Note that higher PCLK rates (and lower M ratios) help reduce gate count and Link Layer latency, but lower rates are often more power efficient. The best PCLK rate(s) to implement for a particular chiplet will tend to be a function of its process node. For implementations in process nodes at 16 nm and below, supporting 1000 MHz is recommended.
Data | PCLK | Mux | Logic | |
---|---|---|---|---|
Mode | Rate | Ratio | Data | |
(Gbps) | (MHz) | M | Width | |
BoW-32 | 2 | 250 | 8 | 8x18 |
500 | 4 | 4x18 | ||
1000 | 2 | 2x18 | ||
BoW-64 | 4 | 250 | 16 | 16x18 |
500 | 8 | 8x18 | ||
1000 | 4 | 4x18 | ||
2000 | 2 | 2x18 | ||
BoW-128 | 8 | 500 | 16 | 16x18 |
1000 | 8 | 8x18 | ||
2000 | 4 | 4x18 | ||
BoW-256 | 16 | 500 | 32 | 32x18 |
1000 | 16 | 16x18 | ||
2000 | 8 | 8x18 | ||
BoW-384 | 24 | 750 | 32 | 32x18 |
1500 | 16 | 16x18 | ||
BoW-512 | 32 | 1000 | 32 | 32x18 |
2000 | 16 | 16x18 | ||
BoW-384 | 24 | 750 | 32 | 32x18 |
1500 | 16 | 16x18 | ||
BoW-512 | 32 | 1000 | 32 | 32x18 |
2000 | 16 | 16x18 | ||
Table 13 provides clock and data rates for an example with 4 Gbps wire data rate and M=4 to support a 1 Gbps data rate at the Link-PHY interface.
Signal | Rate | SDR/DDR |
---|---|---|
TxClk | 2 GHz | |
CLK+,CLK- | 2 GHz | |
D[15:0],AUX,FEC | 4 Gbps | DDR |
PCLK | 1 GHz | |
P_D[63:0],P_AUX[3:0],P_FEC[3:0] | 1 Gbps | SDR |
The DDR clock TxClk is provided to the TX PHY from elsewhere on Chiplet-A. This may come for example from an on-chip PLL (typically shared across multiple slices) or routed from the RxClk of an RX slice on Chiplet-A. In order to meet duty cycle requirements, a Duty Cycle Corrector (DCC) may be needed in the TX slice. TxClk is used to drive the serializers and provide the output CLK+, CLK- to Chiplet-B.
On the RX side, the PHY must align the slicer clock to sample the data correctly. This may be done with a DLL, adjustable delays, or other methods. If the PHY includes control logic to self-align the slicer clock for correct sampling of the data, the PHYReady signal must be asserted after the logic has determined that such alignment is complete. The RX PHY may output the received CLK as RxClk to the logic interface.
All BoW interfaces shall be source synchronous at the die-to-die interface within a slice. No modes of BoW require per-wire or per-slicer delay adjustments, but such capability may be optionally included.
Clock skew between the slices in each direction of a link likely depends on the implementation of the TxClk distribution to all the TX slices. That is, for the data flow from Chiplet A to Chiplet B, the TxClk distribution on Chiplet A probably dominates the the clock skew of the TX slices on Chiplet A and the clock skew of the RX slices on Chiplet B, and vice versa for flow from B to A. The skew between TX CLK signals within one direction of a link should be no more than 150 ps/stack along the chip edge. There is no specification of the skew between TxClk on Chiplet A vs. TxClk on Chiplet B nor between different links.
Note that the dividers creating PCLK in each PHY slice are not required to be aligned. This implies that they will tend to have random starting states, leading to additional PCLK misalignment between slices of up to one PCLK period. PHY implementations may optionally include methods to align these dividers.
On both the TX and RX sides, the Link Layer will usually need to include a Clock Domain Crossing (CDC) to align the data between CoreClk and PCLK. The Link Layer must be able to absorb the slice-to-slice clock skew and core clock distribution skew across a whole BoW link. Word alignment across a link need not be supported by the PHY; if required, it should be done in the Link Layer.
In order to not introduce excess pessimism into the link budgets implied by the BoW specification and avoid unnecessary over-design of BoW PHY circuitry, note that both the TX and RX voltage and timing error component requirements account for deterministic (bounded) terms separately from random (unbounded) terms. However, in order to retain some degree of design flexibility on each of the TX and RX, a bound is always placed on the maximum deterministic error and on the maximum total error budget at the target error rate of 1e-15. Thus, if a given BoW PHY design achieves deterministic error performance better than that requirement set by the deterministic component, the random errors introduced by that design may be increased as long as the total error requirement at 1e-15 probability is still met.
Note that the error rate of 1e-15 is at the level of any individual wire within a slice. In other words, in a conformant BoW interface, no indvidual wire within the interface would have an error-rate exceeding 1e-15.
The maximum 20% - 80% rise-time at the output of BoW TX shall not exceed 23% of a UI. For example, for a BoW-128 transmitter, the 20% - 80% rise-time shall not exceed 28.75 ps. This rise-time shall be simulated with the TX (including all of its parasitics) driving an ideal load of 50 Ω (Zchan).
The maximum timing mismatch between CLK+ and CLK- outputs at the TX shall not exceed 2.5% of a UI. For example, for a BoW-128 TX, the timing mismatch shall not exceed 3.125 ps.
For timing-error specifications provided in the following sections that are impacted by transmitter jitter, this jitter must be evaluated for CLK edges that are up to Nck-d UI earlier than the CLK edge that launched the data bit being captured at the receiver. This is due to the fact that even though jitter on the data edges may be correlated with the CLK jitter, the slicer in the RX side is likely to use a different CLK edge due to delays in the RX-side clock alignment circuit (usually a DLL and clock distribution). See Section 10.3.7 for the receiver's requirements. For BoW-256 mode and lower rates, Nck-d = 3 UI. For BoW-384 mode, Nckd = 5 UI, and for BoW-512 mode, Nck-d = 6 UI.
In order to properly account for the jitter filtering/peaking that will occur due to the difference in delay between the data launching edge at the TX and the data capturing edge at the RX, when evaluating the transmitter's jitter (and whether it meets the requirements described in this document), the jitter at the TX output that is correlated between the CLK and D lines shall be filtered by the following frequency-dependent transfer function:
Htx_jit(jω) = 1-e(-jωtclk-d)
where tclk-d is the delay between the CLK edge that launched the data bit and the CLK edge used to capture it. Note that jitter that is not correlated between the CLK and D signals shall not be filtered by this transfer function. (I.e., if the CLK signal and a given D signal have completely independent sources of jitter added to them such as non-shared portions of the clock distribution network, those jitter sources shall not be filtered by Htx_jit(jω). Since the total TX jitter after filtering by this transfer function might not be monotonic with tclk-d, and since receiver implementations may realize varying values of tclk-d, a transmitter must meet all related specifications for tclk-d = Nd*Tbit, with Nd taking all integer values between 1 and Nck-d+1.
The total deterministic (bounded) timing errors introduced by the TX
shall not exceed 18% of a UI peak-to-peak. The evaluation of these timing errors must
include all possible deterministic contributors, such as reference clock, clock distribution
networks, duty cycle error (i.e., deviation from 50% duty cycle), skew between CLK and any D line,
and power supply variation induced jitter or skew. Note that any such time-dependent error terms (i.e., jitter)
that are correlated between the CLK and D lines must be filtered as described in Section 10.3.3.
This specification is a peak-to-peak requirement, so if a given design has e.g. +/-5% UI of duty cycle error, this
would imply that it can achieve a TX deterministic timing error of no better than 10% UI.
The total timing error introduced between the CLK and any data (D) line at the output of the TX shall not exceed 29% of a UI peak-to-peak at an error rate of 1e-15. The evaluation of errors must encompass all possible deterministic as well as random timing error contributors, including all sources of random jitter in addition to the representative deterministic error sources described in Section 10.3.4.
Assuming a Gaussian distribution for the random jitter, then in order to account for the 1e-15 error rate and peak-to-peak requirement, the total timing error terr,tot may be computed as:
terr,tot = terr,determinsitic + 15.9 σtj,random
The differential receiver for the CLK signal within a BoW receiver must achieve an input-referred common-mode to differential conversion gain of less than 0.2 V / V. This requirement must be met across any common-mode input frequency less than or equal to 1/Tbit. For example, with a conformant BoW RX, 20mV of common-mode variation on the CLK+/CLK- lines must impact the effective differential input by less than 4mV.
Note that common-mode variations on the CLK+/CLK- lines of ~10-15% of the signal swing may be expected on typical in BoW channels.
In order to be compatible with the TX jitter requirements provided in Section 10.3.3, for BoW-256 and low rate modes, a BoW receiver must capture data using clock edges that were launched by the TX no more than 3UI earlier than the data being captured. For BoW-384 receivers, data must be captured using clock edges that were launched by the TX no more than 5UI earlier than the data being captured. For BoW-512 receivers, data must be captured using clock edges that were launched by the TX no more than 6UI earlier than the data being captured.
A BoW receiver must meet a set of requirements on the following sets of timing and voltage error components:
Maximum RX Deterministic Voltage Error: This term (Verr,det,RX) must include all deterministic voltage errors that would shift the receiver's voltage threshold relative to its ideal position in the middle of the signal swing. For example, any deterministic voltage errors due to residual offset, reference level error, and supply noise must be included. This specification accounts for the double-sided loss in margin, so if a given design has e.g. a residual threshold error of 0mV to +10mV, this would imply that the design can achieve a Verr,det,RX of no better than 20 mV.
Maximum Total Required RX Voltage Margin: This term must include all possible voltage errors at the RX at a probability of 1e-15 or higher. In addition to the deterministic RX voltage error sources mentioned above, error sources such as receiver thermal/flicker noise must therefore be included in this term. For Gaussian random voltage noise, the total required voltage margin (Verr,tot,RX) may be computed as:
Verr,tot,RX = Verr,det,RX + 15.9 σVerr,random,RX
Maximum RX Deterministic Timing Error: This term (terr,det,RX) must include all deterministic timing errors that would shift the receiver's sampling timing relative to its ideal position for any data line. This term must therefore include errors due to e.g. residual sample timing position error, DLL dither, and power supply induced jitter. This specification accounts for the double-sided loss in margin, so if a given design has a mismatch-induced shift of the sampling position (relative to the ideal) of 0% to 5%, this would imply that the design can achieve a terr,det,RX of no better than 10% UI.
Maximum Total RX Timing Error: This term must include all possible timing errors at the RX at a probability of 1e-15 or higher. In addition to the deterministic RX timing error sources mention above, error sources such as RX clock receiver or clock distribution random jitter must be included in the total timing error. For Gaussian random jitter, the maximum total required timing error (terr,tot,RX) may be computed as:
terr,tot,RX = terr,det,RX + 15.9 σterr,random,RX
Since swing and signal integrity are expected to vary with termination as well as data-rate, the RX voltage and timing requirements are termination- and rate-dependent, as outlined in Table 14 and Table 15
BoW-256 | BoW-128 | BoW-128 | BoW-64 or BoW-32 | BoW-64 or BoW-32 | |
---|---|---|---|---|---|
Any Termination | Doubly Terminated | Source- or Unterminated | Doubly Terminated | Source- or Unterminated | |
Verr,det,RX | 40 mV | 40 mV | 100 mV | 65 mV | 150 mV |
Verr,tot,RX | 75 mV | 75 mV | 150 mV | 100 mV | 200 mV |
terr,det,RX | 28% Tbit | 28% Tbit | 28% Tbit | 28% Tbit | 28% Tbit |
terr,tot,RX | 36.5% Tbit | 36.5% Tbit | 36.5% Tbit | 36.5% Tbit | 36.5% Tbit |
BoW-512 or BoW-384 | |
---|---|
Any Termination | |
Verr,det,RX | 40 mV |
Verr,tot,RX | 75 mV |
terr,det,RX | 24% Tbit |
terr,tot,RX | 32.5% Tbit |
This specification does not place a specific requirement on the overshoot observed by the RX, but it is expected that the overshoot should have magnitude of less than 300 mV for 750 mV I/O supply. Since overshoot is most likely to create potential reliability and other issues in unterminated operating modes, BoW RX's are allowed to turn on termination resistors to reduce the overshoot they observe. The value of the RX termination resistance in this case must be larger than 50 Ω, but is otherwise unconstrained as long as the receiver is able to meet its timing margin requirements with the resulting swing/channel.
BoW TX's therefore shall be designed to achieve their lifetime, reliability, and other requirements regardless of whether the BoW RX selects to operate with or without termination.
The slice to slice clock skew tskew across the width of a BoW transmit link (along the chip edge) must be less than 150 ps/stack. (E.g., for a 4-stack interface, the skew from end to end must be less than 600 ps.) This skew includes only analog delays and specifically does not include any clock-related timing skew due to flip-flops/latches or varying reset states.
The slice to slice clock skew within a stack (orthogonal to the chip edge) for slices that are used within the same link must be less than 50ps/slice.
This skew is expected to be dominated by the TxClk distribution network.
BoW PHYs may optionally implement the features/states described in this section in order to reduce average power consumption.
The data idle state is defined by having the link layer set the value of all parallel data lines (including the optional AUX and FEC signals) fed into the PHY to logical 0 (0V at the output of the TX). In doubly terminated modes, a BoW PHY supporting this feature must connect the receiver's termination to 0V (ground). I.e., the selection of termination voltage is more restricted than as described in Section 9.
For the remainder of this section, a UI where all of the data lines are at logic 0 (0V) will be referred to as an “IDLE”. Note that real data with values of all 0's can still be transmitted/received by BoW PHY's supporting this feature.
While in the clock gated state, a BoW PHY TX ceases toggling the forwarded differential clocks CLK+/CLK-. In order to ensure proper operation of the PHY upon entry and exit from the clock gated state, a preamble and a postamble are appended to the payload data; these and further detailed requirements for entry, exit, and occupancy in this state are defined in the subsections below.
During the clock gated state, a BoW PHY TX must drive differential low on the CLK+/CLK- signals. Similarly, while the PHY is in the clock gated state, the data signals must be IDLE (i.e., set to the data idle state).
Note that the use of static clock gated and data idle levels might result in aging of the TX/RX circuits within the PHY. PHY implementations may hence need to budget for the impact of this aging and/or include mechanisms to compensate for the errors introduced by such aging.
Note that the use of static clock gated and data idle levels might result in aging of the TX/RX circuits within the PHY. PHY implementations may hence need to budget for the impact of this aging and/or include mechanisms to compensate for the errors introduced by such aging.
In order to avoid the need to adjust the time alignment of parallel data words at the input/output of the PHYs due to entry/exit from the clock gated states, all durations of preamble, postamble, and payload must be set such that they are an integer multiple of the largest (between TX and RX) number of UI contained within a parallel input/output. For example, if a BoW TX with 4:1 serialization is communicating with a BoW RX using 8:1 deserialization, the preamble, postamble, and payload must all have durations of Npar*8 UI, where N is an integer greater than 1.
Since arbitrary serialization/deserialization ratios could result in very large minimum UI durations (due to the need to find the lowest common multiple), PHYs that support the clock gating mode must use only powers of 2 for the “Mux Ratio M” from Table 12.
BoW PHYs must publish as part of their datasheet the preamble, postamble, and payload lengths they support. It is strongly recommended that BoW interfaces support a configurable set of preamble, postamble, and payload length - particularly 4, 8, 16, and 32 UI.
As shown in Fig. 17, the PHY_Idle signal must be asserted (set to logic high) one TX parallel clock cycle earlier than the forwarded clocks enter the gated state. Similarly, the PHY_Idle signal must be deasserted (set to logic low) one TX parallel clock cycle earlier than the forwarded clock restarts.
The data driven into the PHY must be IDLE for all TX parallel clock cycles contained within the preamble, the postamble, or the clock gated states. Note that as shown in Fig. 18, if the pre or postamble lengths extend beyond a single TX parallel clock cycle, the pre/postamble duration must be extended by the link layer by padding the payload with additional IDLE data.
To mitigate potential issues with drifts in various circuitry and/or control loops that depend on the clocks actively toggling (e.g., the receiver's DLL), BoW interfaces must not leave the forwarded clock gated for more than 1024 UI or 128 largest (between TX and RX) parallel words, whichever is smaller.
BoW does not place any direct requirements on characteristics such as channel loss or crosstalk. Instead, BoW channels are considered conforming if they are able to achieve the required error rate of 1e-15 in conjunction with reference transmitters and receivers that meet all of the requirements provided in Section 9 and Section 10.3.
To assist with evaluating conformance of a given channel, open-source software evaluating signal integrity and the overall link budgets with the reference transmitters and receivers will be provided at a future date.
Within each slice, a BoW channel should not introduce more than 2% UI of skew between any D lines and the CLK lines. For BoW-256, this corresponds to ~187.5 μm of length mismatch on a substrate with an εr of 4.
Note that the skew recommendation above is based on achieving sufficient timing margin on representative channels; channels with better signal integrity may allow for larger skew between the D and CLK lines as they meet the overall timing margin requirements.
Note further that if the BoW PHYs used on a given channel include per-bit delay adjustment, channels with larger skew can be supported. Note however that all of the timing requirements described in Section 10.3 must then be met with the residual skew and its variation over time taken into account.
In laminate packages, the channel characteristic impedance should be between 45 and 55 Ω.
To provide guidance on the types of channels that are expected to meet the requirements for conformance with the BoW reference receiver and transmitter, this section provides examples of typical loss and crosstalk profiles for doubly-terminated channels supporting 16 Gbps operation (which are most sensitive to channel signal integrity). Note that when operating at lower rates, the frequency axes in the figures below should be scaled with the data-rate relative to 16 Gbps.
To avoid the need for equalization, a BoW-256 channel should typically have lower loss than shown in Figure 19.
The total crosstalk observed on an individual signal within a BoW-256 channel should typically be less than ~35% of the signal swing.
Particularly in advanced package applications, it is expected that I/O redundancy will be utilized to improve overall product yield. This section hence describes the requirements BoW slices supporting redundancy must meet in order to ensure interoperability.
To illustrate a few examples of how the redundancy mechanism would be implemented:
These facilities must be provided outside the PHY:
An example topology is shown in Figure 20. The BoW interface communicates to interface and core logic (I&C) blocks.
Replace this with a figure with the LC embedded rather than external.
Note that BoW PHY implementations that do not adopt the order recommended above are not be guaranteed to interoperate with other BoW PHY implementations.
Implementation dependent:
To reverse direction on a link with bidirectional slices on both ends, it is expected that a full reset, reconfiguration and initialization of the slices will be performed. This full procedure is expected to require us to ms, but will be implementation dependent.
PHY configuration is implementation dependent. It may include:
PHY configuration may be hardwired in the chiplet implementation, or it may be programmable.
Link training will be addressed in a future revision of the specification.
The interface control registers are implementation dependent. The registers shall be fully documented in the PHY datasheet.
In order to support die-to-die (in package) testing, within a BoW implementation, either the link layer, the PHY, or both must be able to support the generation (on the TX side) or checking (on the RX side) of repeating data patterns.
Users of BoW systems should check that one or more of the test patterns supported on the TX is also supported on the RX. Such pattern generators / checkers should therefore support the following patterns:
PRBS-9 Pattern, defined by polynomial of X9+ X5 +1
PRBS-31 Pattern, defined by polynomial of X31 + X28 +1
Isolated 1 and 0 pattern to test DC wander and single bit response:
A BoW interface may implement loopback testing for several use cases: at chiplet wafer-sort test, post-assembly package test, and debug/validation.
Wafer sort tests are currently only practical for the BoW interface with regular bump pitches (~130 μm), where ATE (automatic testing equipment) probe boards with matching pin pitches are available. Microbump probes will require additional effort.
Unidirectional links should support open-loop testing. In TX open loop testing, shown in Figure 22, Chiplet-A transmits a known test pattern (PRBS9 or PRBS31) to a golden reference receiver through the ATE load board. The received pattern should be verified in the ATE load board.
RX open loop testing, shown in Figure 23, is used for a link where the DUT is only a receiver. A golden reference TX transmits a known pattern (PRBS9 or PRBS31) through the channel to the chiplet. The received pattern should be analyzed for quality and functional tests.
The logic for generating and testing the PRBS sequences is outside the PHY, e.g., in the Link Layer.
In bidirectional links, loopback tests may be implemented in several modes:
Both loopback modes may potentially be used for in-field validation bring-up and test. Cooperation across chiplets will be required to execute these tests in the field. Open-loop testing requires the use of a fixed test pattern recognized by both ends and is the only option for unidirectional links. Long loopback mode can be implemented on interposer or organic laminate for validation/verification purposes.
Figure 26 shows how a long loopback mode is executed across two chiplets for in-field validation and test where TX and RX are in different chiplets. Furthermore, this configuration may be expanded to loop back the data from the transmitter of chiplet-A to the receiver of chiplet-A.
This section defines the requirements and characteristics of BoW slices that are nominally intended to be used for carrying sideband information associated with one or more mainband BoW interfaces between a pair of chiplets. The BoW specification does not require the use of BoW sideband slices; this definition is provided only as a PHY option implementers may select.
BoW sideband slices have a number of primary differentiators relative to mainband slices:
An implementer may optionally choose to implement slices that serve the functionality of a mainband or a sideband slice, so long as the implementation meets all of the requirements of both the mainband and sideband specifications in the respective modes.
Table 16 summarizes these signals.
Function | # Wires | Signal Name | Notes |
---|---|---|---|
TX Clock | 1 | TCLK | |
TX Data | 1 | TD | |
TX Frame | 1 | TF | |
RX Clock | 1 | RCLK | |
RX Data | 1 | RD | |
RX Frame | 1 | RF | |
Table 17 summarizes these signals.
Function | # Wires | Signal Name | Notes |
---|---|---|---|
TX Clock | 1 | TCLK | |
TX Data | 1 | TD | |
TX Frame | 1 | TF | |
RX Clock | 1 | RCLK | |
RX Data | 1 | RD | |
RX Frame | 1 | RF | |
SB Reset | 1 | SB_Reset_b | |
Sideband must meet the requirements described in Table 10 for unterminated or source terminated operation. Similarly, the parasitic capacitance and return loss requirements for sideband slices are identical to those of BoW-64 or lower rate mainband slices, and are provided in Table 11 and Figure 14.
A BoW sideband slice receiver should maintain an effective 3dB bandwidth of at least (1.0/Tbit) Hz. For example, for a sideband slice that supports up to the maximum rate of 1 Gb/s, the receiver bandwidth should be at least 1 GHz.
Using the same definitions from Section 10.3 (unless otherwise defined differently below), sideband transmitters and receivers must meet the following requirements:
Verr,det,RX | 250 mV |
Verr,tot,RX | 355 mV |
terr,det,RX | 28% Tbit |
terr,tot,RX | 36.5% Tbit |
As with BoW mainband slices, the specification does not place any direct requirements on characteristics such as channel loss or crosstalk for BoW sideband slice channels. Instead, BoW sideband channels are considered conforming if they are able to achieve the required error rate of 1e-25 in conjunction with reference transmitters and receivers that meet all of the requirements provided in Section 18.3 and Section 18.4.
In addition to physical connectivity, chiplet-based designs require logical connectivity between the die within a package, either in a proprietary protocol or in protocols such as AXI or PCIe.
Figure 27 shows how a BoW interface may be used to transport transactions in multiple protocols. A BoW-based link between two chiplets requires and consists of the following components:
Figure 27 depicts the mapping of functionality to ODSA specifications. The actual implementation of functionality in hardware blocks may not match the specific block decomposition in the figure.
List all the requirements in one summary table with links from the sections.
Requirements | Details | Link to which section in the Spec |
---|---|---|
Contribution License Agreement | OWF CLA 1.0 (modified) | Please refer to Section 1 |
Are All Contributors listed in Sec 1: License? | Yes | |
Did All the Contributors sign the appropriate | Yes | |
license for this spec? Final Spec Agreement/HW | ||
License? | ||
Which 3 of the 4 OCP Tenets are supported by this Spec? | All four | |
Is there a Supplier(s) that is building a product | ||
based on this Spec? (Supplier must be an OCP Solution Provider) | ||
Will Supplier(s) have the product | Seeking exception to have extended | |
available for | period for silicon availability. | |
GENERAL AVAILABILITY within 120 days? | Test chips expected in 2H’2022 by multiple vendors. | |