



# Digital Design Flow and techniques for HEP Applications

Sandeep Miryala ASIC Group, Instrumentation Division smiryala@bnl.gov

Date 06/30/2022



# Outline

#### Introduction

- Semi-Custom Design Flow
- Static Timing Analysis (STA)
- Timing Library overview
- Synthesis overview
- Placement and Routing (PnR) Overview
- HEP ASIC requirements
  - TID tolerant digital design
  - SEE tolerant digital design
  - ASICs operating for cryogenic temperature
- Al on chip or Edge Computing
- DEMO of design flow using a simple counter



# **Custom vs Semi-Custom**

#### Analog designs are often a full custom design

- Designed from transistor level, approprate device sizing
- Amplifiers, Phase Locked Loops (PLLs), Data Converters, Bias References, Drivers, High speed serializers

#### Digital design is mostly a semi-custom design

- Standard cell libraries as part of Process Design Kit (PDK)
- 65nm process has ~730 cells in a standard cell library
- Various standard cell libraries
  - Differ in terms of height of a standard cell tracks (7T, 9T, 12T)
  - Vt flavor of device (Standard, Low and High threshold voltage)
- Command decoders, counters, pattern generators, encoders, error detection and correction alg., data compression, scrambler, frame formation, FIFOs Etc.



# **Design Abstraction**





# Semi-Custom Digital Design Flow

- Use of CAD tools at every stage of the design flow
- Automation is through TCL scripts
- Static Timing Analysis (STA) is run at all the stages





# **Static Timing Analysis**

+5%

HOLD.

400068





- STA breaks a design down into timing paths, calculates the signal propagation delay along each path, and checks for violations of total checks timing constraints
- Algorithms determine the max and min propagation delays of the data path





- setup slack = (cycle\_time  $t_{ckq} t_{pd} t_{su}$ ) clock\_skew
- ▶ hold slack =  $(t_{ckg} + t_{pd} t_h)$  clock\_skew

# **Timing Library Information (.lib)**

#### Timing

- Combinational
  - Delay & Transition Time
- Sequential
  - Delay & Transition Time
  - Hold and Setup
  - Recovery and Removal

#### Power

- Leakage
- Internal Power

#### Noise

- Signal integrity analysis
- Data is arranged in Look Up Table (LUT) format
  - Sample 4 x 4 LUT



| Index | cl1 | cl2 | cl3 | cl4 |
|-------|-----|-----|-----|-----|
| Trin1 | D11 | D12 | D13 | D14 |
| Trin2 | D21 | D22 | D23 | D24 |
| Trin3 | D31 | D32 | D33 | D34 |
| Trin4 | D41 | D42 | D43 | D44 |



# **Liberate Tool Flow**





## **Define\_Cell and Define\_Arc**

if {[ALAPI active cell "AND2 A XL TSMC HVTN"]} { # delay arcs from A => Z positive unate combinational define\_cell \ define\_arc \ -input { A B } \ -vector {RxR} \ -related pin A \ -output { Z } \ -pinlist { A B Z } \ -pin Z \ -delay delay\_template\_7x7 \ AND2\_A\_XL\_TSMC\_HVTN -power power\_template\_7x7 \ Cell Definition -si\_immunity si\_immunity\_template\_7x7 \ # delay arcs from A => Z positive\_unate combinational AND2\_A\_XL\_TSMC\_HVTN define\_arc \ -vector {FxF} \ define\_leakage -when "!A&!B" AND2\_A\_XL\_TSMC\_HVTN -related pin A \ define leakage -when "A&!B" AND2 A XL TSMC HVTN -pin Z \ Leakage define\_leakage -when "!A&B" AND2\_A\_XL\_TSMC\_HVTN AND2\_A\_XL\_TSMC\_HVTN define leakage -when "A&B" AND2 A XL TSMC HVTN # delay arcs from B => Z positive\_unate combinational # power arcs from => A hidden define\_arc \ define arc \ -vector {xRR} \ -type hidden \ -related\_pin B \ -vector {Rxx} \ -pin Z \ -pin A \ AND2\_A\_XL\_TSMC\_HVTN AND2 A XL TSMC HVTN # delay arcs from B => Z positive\_unate combinational # power arcs from => A hidden define arc \ define arc \ -vector {xFF} \ -related\_pin B \ -type hidden \ -vector {Fxx} \ -pin Z \ AND2\_A\_XL\_TSMC\_HVTN -pin A \ Delay arcs AND2 A XL TSMC HVTN # power arcs from => B hidden define template -type delay \ define arc \ -index 1 {0.0224 0.0608 0.12 0.32 0.72 1.6 3.0 } \ -type hidden \ -vector {xRx} \ -index 2 {0.0014 0.003 0.0062 0.0125 0.0251 0.0504 0.101 } \ delay\_template\_7x7 -pin B \ AND2 A XL TSMC HVTN define\_template -type power \ # power arcs from => B hidden -index\_1 {0.0224 0.0608 0.12 0.32 0.72 1.6 3.0 } \ define arc \ -index\_2 {0.0014 0.003 0.0062 0.0125 0.0251 0.0504 0.101 }\ -type hidden \ power\_template\_7x7 -vector {xFx} \ -pin B \ define template -type si immunity \ Dynamic power arcs AND2 A XL TSMC HVTN -index\_1 {0.224 0.608 1.2 3.2 7.2 16.0 30.0 } \ Templates -index 2 {0.0014 0.003 0.0062 0.0125 0.0251 0.0504 0.101 } \ si\_immunity\_template\_7x7



# **Synthesis**

#### Synopsis Design Constraints (SDC)

- Create Clocks
- IO delays
- Loads
- Timing exceptions
- Don't touch nets





# **Floor planning**

- □ Initialize with aspect ratio (AR), core utilization
- □ Initialize Row Configuration and Cell Orientation
- □ Specify core to pad/IO spacing
- Pins/Pads placement
- Macro placement and orientation
- Blockage Management

#### Power Planning





# Placement

- Automated standard cell placement for placing the standard cells in placement tracks
- Placement Objectives
  - Total wire length
  - Routability
  - Performance
  - Power
  - Heat distribution

Standard cells in the rows





# **Clock Tree Synthesis**

#### □ Clock tree objectives and associated issues

- Clock skew
- □ Long clock insertion delay
- Heavy clock net loading
- Clock is power hungry
- □ Clock to signal coupling effect





# **CTS Algorithms**

# H Tree based algorithmPi Configuration



**Pi Configuration** 



# Routing

#### Routing Objectives

- □ Skew requirements
- Open/Short circuit cleaning
- Routed paths must meet setup and hold timing req
- Metal traces must meet DRC constraints
- Layout geometries must meet current density requirements







Signoff





#### Metal Density





# **Corners & Analysis View**

Process (SS, TT, FF)
 Voltage (1.08, 1.2, 1.32)
 Temperature (-40, 25, 125)
 RC corners





# **Physical Verification**

- DRC : Process of checking physical layout data against foundry-specific rules to ensure successful fabrication
- LVS : Layout vs Schematic, e.g. check shorts and opens
- Antenna checks
- □ Wire/Bump bonding rules





# Radiation Effects on Semi-Conductor Devices



Sandeep | HEPIC Summer Week 2022

National Laboratory

# Total Ionizing Dose (TID)

- Ionizing radiation builds up interface trap states
- TID-induced charge in the oxide decreases with the thickness
- Thick oxide used for device isolation introduces leakage

#### Parameters:

- Temperature
- Bias
- Annealing
- Dose Rate





# **Device Irradiation (65nm)**



#### Narrow Transistors

NMOS less affected by TID

#### **Short Transistors**

PMOS and NMOS devices are degraded

#### Narrow and Short devices

PMOS completely off for 1 Grad



*F.Faccio et al., TWEPP 2015, TID effects in 65nm transistors: summary of long irradiation study at the CERN X-rays facility* 

# **TID effects on Digital Circuits**

#### Analog Designs

- Avoid excessively narrow and short transistors
- Keep the same bias for branches of the differential structures
- For Analog Designs (upto 500 Mrad)
  - NMOS: L >= 120nm, any W ok
  - PMOS: L >= 120nm, W>= 300nm
- Digital Designs (Semi-Custom Design Flow)
  - Standard cells often use short and narrow devices
  - Speed degradation due to increase of Vth
  - Development of new radiation models
  - Characterize new timing libraries and included as additional corners
    - Using Liberate Tool

Recommendations from Radiation Working Group in RD53



# **Total Ionizing Dose (TID) Tolerant ASICs**

- TID alters the performance of CMOS device
  - Degrades logic gate performance





# **Radiation Effects on 9T NVT Library**

Lib





# **Radiation Effects on 12T NVT Library**

Lib





## **Digital Radiation Test Chip (DRAD)**

- 9 different standard cell libraries
- Investigated subset of cells of in each library (INV, NAND, NOR, XOR, DFF, Latch)





# Simulation models were pessimistic

LMJ cases et.al., "Characterization of Radiation Effects in 65nm Digital Circuits using DRAD test chip", JINST, 2017

# **Synergetic Applications**

- Characterized timing libraries are shared with the community
  - New models are available from CERN foundry services
  - 100Mrd, 200Mrad and 500Mrad
  - · Limited set to corners

#### RD53 Chips

- Same methodology adopted for RD53A and RD53B chips
- Chips are operating as expected for TID of 500Mrad

#### ASICs for DUNE Liquid Argon Time Projection Chamber (TPC)

- Digital logic implementations in ColData and ColdADC chips
- Must operate at cryogenic temperature 87K or -189°C
- ~225 custom standard cells (65nm  $\rightarrow$  90nm)
- Timing libraries are characterized at all corners (regular as well as cryogenic temperature)
- Never saw a failure in digital logic

#### Cryo-CMOS ASICs for quantum computing applications

Must be designed for 4K







2 🖋 7d

The timing libraries are characterized for 9-track standard cell library using radiation corner device models, released by the CERN foundry services.

This work has been carried out and made available for the community by:

Sandeep Miryala - ASIC Design Engineer -Fermilab, Batavia, USA - smiryala@fnal.gov Grzegorz Deptuch - ASIC Group Leader -Fermilab, Batavia, USA - deptuch@fnal.gov

Please kindly consider acknowledging in your future publications for works using these blocks the names of the designers of this libraries and the CERN ASIC support service.

It helps the recognition of the work of fellow designers and motivates other designers to contribute with new material for the community.

#### The libraries are available at the following link

https://gitlab.com.ch/asic\_de

# Single Event Effects (SEE) Classification

#### Soft Errors

- Transient
  - Single Event Transient (SET)
- Static
  - Single Event Upset (SEU)
- Soft Error mitigation
  - Technology level
    - Bulk CMOS Vs SOI
  - Cell level
    - Dice latch
  - System level redundancy : Triple Modular Redundancy (TMR)



Mitigate Soft Errors

using TMR

# **SEE Issues in Digital Logic Overview**



#### SEE possibilities in digital logic

- SEU's in registers
- SET's in clock buffers
- SET's in combinational data path



# Triple Modular Redundancy (TMR)

- Data is replicated on multiple nodes
- Mitigation of SEUs in sequential elements
- Additional area and power consumption





Q = (Q1&Q2)|(Q2&Q3)|

(Q3&Q1)

# **TMR Insertion by RTL designers**



#### **Disadvantages:**

- 1. Redundant logic is removed by synthesis tools
- 2. Additional don't touch commands
- 3. Complex for chips with huge # of registers



# Triple Modular Redundancy Generator (TMRG)

- In-house tool developed by CERN
- HDL language dependent (only Verilog)
- Successfully used in LPGBT, MPA and SSA ASICs





#### **Logic Synthesis Flow : Generic**







#### **TMR Insertion Automation**





foreach I [filter –regexp libcell DFQD\* [find /designs/\$DESIGN –instance instances\_seq/\*reg\*]] { Change\_link –inst \$i –design\_name /designs/TMR\_DFQD



module TMR\_DFQD (CP, D, Q);

input CP, D;

output Q;

4

wire CP, D1, D2, D3, Q, Q1, Q2, Q3, n\_0;

Each flipflop type must have its corresponding TMR netlist

DFQD1 DFFQ3\_reg(.CP (CP), .D (D), .Q (Q3)); MAOI222D0 p4324D(.A (Q1), .B (Q2), .C (Q3), .ZN (n\_0)); CKND0 Fp4461A(.I (n\_0), .ZN (Q));

endmodule



#### **TMR Insertion Scenarios**

37 7/1/2022

#### Triplicating only the registers having \*tmr\* in RTL

| module TOPModule ();<br><br>reg in_tmr;<br><br>endmodule                                                | <pre>read_netlist TMR_DFQD.v foreach I [filter -regexp libcell DFQD* [find /designs/\$DESIGN/* -instance instances_seq/*tmr*]] {     Change_link -inst \$i -design_name     /designs/TMR_DFQD</pre>                   |  |  |  |
|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| <ul> <li>Triplicating all the registers in the RTL</li> </ul>                                           |                                                                                                                                                                                                                       |  |  |  |
| module TOPModule ();<br><br>reg in;<br><br>endmodule<br>• Triplicating in one of the h                  | read_netlist TMR_DFQD.v<br>foreach I [filter –regexp libcell DFQD* [find<br>/designs/\$DESIGN/* –instance instances_seq/*reg*]] {<br>Change_link –inst \$i –design_name<br>/designs/TMR_DFQD                          |  |  |  |
| module submodule();<br>reg test;<br><br>endmodule<br>module TOPModule ();<br><br>submodule hierInst (); | read_netlist TMR_DFQD.v<br>foreach I [filter –regexp libcell DFQD* [find<br>/designs/\$DESIGN/instances_hier/hierInst –instance<br>instances_seq/*reg*]] {<br>Change_link –inst \$i –design_name<br>/designs/TMR_DFQD |  |  |  |
| endmodule                                                                                               |                                                                                                                                                                                                                       |  |  |  |

## Physical Design

- Flops within the same TMR module must be at least X um apart
- Voter logic can be placed anywhere without any spacing restrictions



#### Innovus Implementation



#### Innovus 16.X has native space group constraints

- Create space group constraint
- Enable space group constraint during placement
- Place design



### INNOVUS 16.2

Takes horizontal and vertical spacing during place & route

create\_inst\_space\_group
tmrSpaceGrp\${n} -inst [
dbget
[dbget\$thPtr.allInsts.cell.i
sSequential 1 -p2].name ]
-spacing\_x \$reg\_spacing
-spacing\_y \$reg\_spacing





### **SET Mitigation: Combinational Data Path**



- This delay insertion must be handled globally
- TMR cell level is expensive in area and timing



#### **SET Mitigation : Clock Delay Insertion**



set CkDel1 0.50 set CkDel2 1

# Clock Insertion Delays on the TMR flops for SET Mitigation

set\_ccopt\_property insertion\_delay -delay\_corner {av\_typ\_dc} \$CkDel1 -pin DFFQ2/CP

set\_ccopt\_property insertion\_delay -delay\_corner av\_typ\_dc} \$CkDel2 -pin DFFQ3/CP







#### **SEE Mitigation: Clock Buffers**

National Laboratory

437/1/2022



# **Measurements (Proton Beam)**

RD53SEU 08/2018





□Simple triplication, TMR improved SEU crosssection by a factor of ~10

□No improvement with memory element spacing





Sandeep | HEPIC Summer Week 2022

45 7/1/2022

# High Level Synthesis (HLS) tools for a neural processor design



- Reads in weights and bias from Qkeras
- Supports only MLP and CNN

#### Catapult

- Maps C++ code to RTL (Verilog / VHDL)
- Also offers verification framework



S.Miryala et.al., Peak Prediction Using Multi Layer Perceptron (MLP) for Edge Computing ASICs Targeting Scientific Applications ISQED, 2022

Design

Flow

# Acknowledgements

- Velopix design team
- □ IC Design team, Nikhef
- RD53 Collaboration
- □ ASIC group, Fermilab
- DUNE Collaboration
- □ ASIC group, BNL
- CERN Microelectronics
- BNL LDRD 021-23, PI: Sandeep Miryala
- □ Cadence Support

#### **!!! THANK YOU FOR YOUR ATTENTION !!!**

