

**From the Institute of Computer Engineering  
of the University of Lübeck  
Director: Prof. Dr. Mladen Berekovic**

**“Design Technology Co-Optimization for below N7 technology nodes”**

Dissertation  
for Fulfillment of  
Requirements  
for the Doctoral Degree  
of the University of Lübeck

from the Department of Computer Sciences and Engineering

Submitted by

Luca Mattii  
from San Miniato (Pisa), Italy.

Lübeck 2020

First referee: **Prof. Dr.-Ing. Mladen Berekovic**

Second referee: **Prof. Dr.-Ing. Harald Michalik**

Date of oral examination: **8th of May 2020**

Approved for printing. Lübeck, **11th of May 2020**

I would like to dedicate this thesis first of all to my family, who continued to support me during all these long years abroad. I would also like to dedicate this achievement to my troubled country, never losing the hope that one day there will be the opportunity and the right conditions to go back, and put this international experience at its service.

\*\*\*

Vorrei innanzitutto dedicare questa tesi alla mia famiglia, che ha continuato a supportarmi durante questi lunghi anni all'estero. Vorrei inoltre dedicare il raggiungimento di questo traguardo al mio travagliato paese, nella speranza mai abbandonata che nel futuro ci possano essere l'occasione e le circostanze giuste per mettere al suo servizio l'esperienza maturata in questa avventura internazionale.



## Acknowledgements

I would like first and foremost to express my gratitude to **Cadence**, **imec** and the **University of Luebeck**, the three entities which set up the collaboration that funded hosted and enabled this entire project. I would also like to thank individually all the people that guided and accompanied me during this exciting journey. I will try to acknowledge all of them following a chronological order.

The initial thanks goes to **Luca Aluigi** (*former intern at Cadence*), who helped me to establish the first contact with Cadence, and once I was hired took exceptional care of my training. With his dedication and friendliness he helped me to move the first steps in the company and in Munich. I wish to give an exceptional acknowledgement to **Patrick Haspel** (*Program Management Director at Cadence*), who hired me in Cadence in the first place, and was my first manager when I joined the Cadence Academic Network team. Thanks to his invaluable guidance I learned countless things about the company and I had the opportunity to work and interact with the greatest part of **Cadence EMEA**. He contributed in setting up the collaboration with imec, and in finding a University to support this industrial PhD. I am also extremely grateful to **Anton Klotz** (*Principal Program Manager at Cadence*), my second manager, that together with Patrick strongly endorsed me for this project, regardless from my different background at the time. They gave me the chance to be part of this activity, and I learned with them many of the soft skills that were fundamental for working in an industrial-research environment. I hope of having been able to repay their trust.

The mention of honor should be for **Franck Gerome** (*Senior Account Executive at Cadence*), who was the fundamental and irreplaceable "playmaker" of the whole collaboration. He was always present to discuss and resolve any sort of organizational problem, with incredible driving force and cheerfulness, and he was the main creator and promoter of this idea.

This work would not have been possible without the approval and blessing of Cadence's management, that invested resources and man-power in order to make this happen. I would like to extend my gratitude especially to **Jens Werner** and **Alexander Duesener** (*former VPs EMEA at Cadence*), and to **Paolo Pezzati** (*Senior AE group director at Cadence*), **Peter Groos** (*AE Group Director at Cadence*) and **Sanjay Lall** (*VP EMEA at Cadence*), that

supervised and gave continuity to the initiative. Paolo and Jens were particularly crucial in conceiving and creating the activity.

I would like to mention all the **professors of the Cadence Academic Network**, that I worked with in my first role. In particular I would like to thank Prof. **Michael Hübner** (*Professor at Ruhr-Universität Bochum*), who invited me to a dinner with Cadence co-founder **Alberto Sangiovanni-Vincentelli**. I will never forget being introduced to such an inspiring and great Italian.

One of the biggest acknowledgements is for Prof. **Mladen Berekovic** (*Professor at Luebeck University*), who kindly accepted to be my PhD supervisor, and constantly supported the activity first in Braunschweig University of Technology, and then in Luebeck University. I would like to thank Mladen, not only for his contributions in the academic publications derived from this work, but also for the friendliness and helpfulness demonstrated over the last years.

The most important role in enabling the project from the field was played by **Pascal Basset** (*AE Director at Cadence*) and his team. Among his team my infinite gratefulness is especially for **Antoine Dekeyser** (*Staff AE at Cadence*). Pascal was my third manager in Cadence, and accepted me for this role under his supervision. He took extreme care of preparing me for the three years in imec, hosting me for two months in the **Velizy Cadence Office**. The atmosphere in the Velizy office was always friendly and cooperative, and such a welcoming environment facilitated my integration in the team. Antoine was incredibly helpful, friendly and efficient in managing my training on P&R and advanced nodes, and chose to do a real "hands on" training that was vital during my stay at imec. Both Antoine and Pascal closely supervised the activity, always showing interest and providing useful feedback in the weekly calls. This made me feel always connected to the company while working remotely from Belgium. Another important acknowledgement is for **Fei Zhang** (*Senior Product Engineering Manager at Cadence*), who shared with me his unique knowledge of the tech LEF syntax.

The initiator, and technical leader of the collaboration with imec was **Vassilios Gerousis** (*former Distinguished Engineer at Cadence*). Vassilios was among the first people in Cadence to believe in a joint effort with imec, and his invaluable skills, decades of experience, and prompt feedback, convinced imec R&D to select Cadence as primary EDA partner. I had the privilege to work closely with him for more than two years, and I would like to express to him my deepest gratitude for involving me in the project and for sharing his knowledge with me. Just before the start of my PhD he had already enabled the first 5nm testchip with imec.

The other initiator and pillar of this R&D cooperation, was **Praveen Raghavan** (*former Distinguished Member of Technical Staff at imec*). The list of things for which I need to

acknowledge Praveen is very long. He was among the first people in imec to establish the partnership with Cadence, and he was one of the key promoters of the role that I have been covering during this PhD. I have rarely seen a single person being so technically competent on so many different domains, from device engineering to machine learning. On top of his technical and scientific skills, he managed and coordinated the R&D team in which I was hosted, with unbelievable energy and dedication. He often shielded the team from the external pressure, never losing his calm and good mood. With his vision he contributed to the genesis of this work, and many of its ideas. During the two years that I "dotted-line" reported to him, he gave me the chance to absorb in a small part his knowledge and way to work.

Another outstanding contribution to this work was given by **Peter Debacker** (*R&D Team Leader at imec*). Just like Praveen, Peter is astonishingly capable of having the "big picture" and of addressing with competence every single topic of our multidisciplinary research space. He always provided essential contributions and ideas to keep our work going. He was involved in basically all the aspects and achievements of this joint effort, and he succeeded in the difficult task to replace Praveen in leading the **System Level Design team** at imec. At least the following *imec researchers* need to be explicitly acknowledged for their valuable contributions in the generation of the PDK and the results. Without them this work would not have been possible: **Dragomir Milojevic, Syed Muhammad Yasser Sherazi, Jung Kyu Chae, Rogier Baert, Dimitrios Rodopoulos, Bharani Chava, Pieter Schuddinck, Marie Garcia Bardon, Doyoung Jang, Dmitry Yakimets, Arindam Mallik**.

A special thanks is for Prof. **Dragomir Milojevic** (*Professor at Université Libre de Bruxelles and consultant at imec*). Dragomir not only contributed to the research and results shown in this thesis, but he also had a primary importance in the dissemination of our results within industry conferences, international papers and journals. His accurate and precious feedback was crucial in order to produce many of the publications derived from this work, and I am immensely grateful to him for having spent his precious time in the accurate review and improvement of my submissions.

I want to thank my "dotted line" management in imec for having renewed their approval and support to this project over the years: **Julien Ryckaert** (*Distinguished Member of Technical Staff at imec*), **Diederik Verkest** (*Program Director at imec*), **Alessio Spessot** (*Program Manager and Team Leader at imec*), **Anda Mocuta** (*Director of Technology Solutions and Enablement at imec*), **An Steegen** (*Executive Vice President Semiconductor Technology and Systems at imec*). I have always been impressed by how efficiently imec's management is capable of conjugating the technical focus with the attention to the customer needs, which is what makes imec really unique. Although working in imec as a resident from

an external company, I was always treated as a "virtual component" of the team, that allowed a better interaction and increased efficiency.

My fourth (and last) Cadence manager **Stephane Cesari** (*Senior AE Manager at Cadence*) has been tremendously helpful for this activity already starting from his previous role. As a staff Application Engineer, supporting Digital Front End tools, he trained me on logical and physical synthesis and he always provided his help whenever needed. As a manager I have to thank him for having smoothly handled the transition period, together with Antoine, and for having allowed me to write this thesis during my last months of assignment.

Another person I would like to thank is **Philippe Hurat** (*Product Management Director at Cadence*). Philippe always showed great interest for our data and results and we collaborated in more than one occasion.

These results, as the greatest part of the research programs in imec, were made possible by the ecosystems of partners and suppliers that fund and support this pre-competitive platform, and guarantee through their indispensable feedback the industrial relevance of the research efforts. A key enabler of the results shown in this work was **Arm holdings**, that kindly provided a 64-Bit CPU to be used for our benchmarking experiments. In particular I would like to thank **Brian Cline** (*Principal Research Engineer at Arm*), and **Greg Yeric** (*Fellow at Arm*), for having promoted the involvement of Arm in this project and for having patiently reviewed the results submitted for approval to be published.

Last but not least I would like to thank **Rod Metcalfe** (*Product Management Group Director at Cadence*), **Ming Yue** (*Software Engineering Director at Cadence*) and **Chin-Chi Teng** (*Corporate VP & GM-Research & Development at Cadence*) that showed great enthusiasm and interest for the continuation of this initiative.

## Abstract

In this work we will initially review the status of Moore's law and introduce the concept of Design Technology Co-Optimization (DTCO) as an holistic alternative to conventional scaling. We will show why this new approach is increasingly needed in order to address the manifold and fundamental limitations arising at the end of the CMOS roadmap. We will consider as state-of-the-art, and staring reference, a generic 7nm (or N7) technology node based on data derived from literature. In order to quantitatively evaluate at IP-Block level, the Power, Performance, Area (PPA) and Cost deltas, of innovative design and technology solutions, multiple predictive Process Design Kits (PDK) were assembled based on the data available in imec, and used in conjunction with a state-of-the-art Electronic Design Automation (EDA) flow, for benchmarking purposes. This platform has been used in order to perform multiple typologies of Design of experiments, that enabled the path-finding towards new nodes, and the selection and validation of the best design-technology choices through post Place and Route assessments. The principles and roadmap of Extreme Ultra Violet (EUV) lithography will be illustrated, and its advantages will be shown in the context of a comparative analysis between the reference N7 node and an EUV enabled N7+ node. The value proposition of EUV for the following nodes will also be highlighted. The dimensions for a predictive 5nm node will be proposed, and within this technology, multiple design and technology solutions will be explored, in order to mimic the PPA benefits of migrating to a new node, without changing the ground rules. The track height of the standard cells libraries was progressively reduced from 7.5 down to 5 tracks, documenting the physical and electrical impact. The Lateral Nano-Wire device will be comparatively evaluated versus FinFET as a viable alternative for the 5nm node. The centrality of Electromigration and IR-drop (EMIR) problems, as a key bottleneck affecting design closure for high performance designs will be demonstrated, and the reduction of the gear ratio between Metal1 and poly, will be investigated as a possible solution. A predictive 3nm node will be defined, and the challenges of designing standard cells libraries with 5.5 and 4.5 tracks in this technology will be described. An IR-drop aware cross-node PPA comparison between our 5nm and 3nm node will be additionally presented. This comparison will highlight the value proposition of the Nano-Sheet device for reaching the PPA targets of the 3nm node. The path-finding

towards newer nodes will be concluded by a physical study on an innovative 3D device named Complementary FET (CFET), observing that it could provide an opportunity for delivering an additional "half-node" area gains compared to the FinFET based 3nm node, thus representing an interesting option for a 2nm technology. Finally, we will show several examples demonstrating how taking into account the full System-On-Chip (SoC) complexity can provide further opportunities to individuate bottlenecks that only emerge at SoC level. This indicates the need to step up from a Design Technology Co-Optimization, to a System-Technology Co-Optimization (STCO) approach, where the new technology solutions should be aimed to optimize the whole SoC and its "infrastructure". This can result into very different optimal technologies for each SoC component, paving the way to the concept of hybrid scaling.

## Zusammenfassung

In dieser Arbeit werden wir zunächst den Status des Moore'schen Gesetzes überprüfen und das Konzept der Design Technology Co-Optimization (DTCO) als ganzheitliche Alternative zur herkömmlichen Skalierung vorstellen. Wir werden zeigen, warum dieser neue Ansatz zunehmend notwendig wird, um die vielfältigen und grundlegenden Einschränkungen zu beseitigen, die sich am Ende der CMOS-Roadmap ergeben. Wir werden einen generischen 7nm (oder N7) Technologieknoten, der auf Daten aus der Literatur basiert, als den neuesten Stand der Technik betrachten. Um auf IP-Block-Ebene, den Power, Performance, Area (PPA) und Cost Deltas, innovative Design- und Technologielösungen quantitativ zu bewerten, wurden mehrere prädiktive Process Design Kits (PDK) auf Basis der in imec verfügbaren Daten zusammengestellt und in Verbindung mit einem hochmodernen Electronic Design Automation (EDA) Flow für Benchmarkingzwecke verwendet. Diese Plattform wurde verwendet, um mehrere Typologien des Designs von Experimenten durchzuführen, die die Wegfindung zu neuen Knoten und die Auswahl und Validierung der besten Design-Technologie Entscheidungen durch Post-Place- und Route-Assessments ermöglichen. Die Prinzipien und der Fahrplan der Extreme Ultra Violet (EUV)-Lithographie werden veranschaulicht, und ihre Vorteile werden im Rahmen einer vergleichenden Analyse zwischen dem Referenz-N7-Knoten und einem EUV-fähigen N7+-Knoten dargestellt. Das Leistungsversprechen von EUV für die folgenden Knoten wird ebenfalls hervorgehoben. Die Dimensionen für einen prädiktiven 5nm-Knoten werden vorgeschlagen, und innerhalb dieser Technologie werden mehrere Design- und Technologielösungen untersucht, um die PPA-Vorteile einer Migration auf einen neuen Knoten nachzuahmen, ohne die Grundregeln zu ändern. Die Spurhöhe der Standardzellenbibliotheken wurde schrittweise von 7,5 auf 5 Spuren reduziert und dokumentiert die physikalischen und elektrischen Auswirkungen. Das laterale Nano-Wire-Gerät wird im Vergleich zu FinFET als praktikable Alternative für den 5nm-Knoten bewertet. Die Zentralität der Elektromagnetischen Migrations- und IR-Drop (EMIR)-Probleme als zentraler Engpass beim Designabschluss für Hochleistungsdesigns wird demonstriert, und die Reduzierung des übersetzungsverhältnisses zwischen Metal1 und Poly wird als mögliche Lösung untersucht. Es wird ein prädiktiver 3nm-Knoten definiert und die Herausforderungen beim Design von Standardzellenbibliotheken mit 5,5- und 4,5-Spuren in dieser Technolo-

gie werden beschrieben. Ein IR-Tropfen-bewusster Cross-Node PPA-Vergleich zwischen unserem 5nm und 3nm Knoten wird zusätzlich vorgestellt. Dieser Vergleich wird das Leistungsversprechen des Nano-Sheet-Geräts zur Erreichung der PPA-Ziele des 3nm-Knotens hervorheben. Die Wegfindung zu neueren Knoten wird mit einer physikalischen Studie an einem innovativen 3D-Gerät namens Complementary FET (CFET) abgeschlossen, in der festgestellt wird, dass es eine Möglichkeit bieten könnte, zusätzliche "HalbknotenFlächenge-  
winne im Vergleich zum FinFET-basierten 3nm-Knoten zu erzielen, was eine interessante Option für eine 2nm-Technologie darstellt. Schließlich werden einige Beispiele gezeigt, wie die Berücksichtigung der vollen System-On-Chip (SoC)-Komplexität weitere Möglichkeiten zur Individualisierung von Engpässen bieten kann, die nur auf SoC-Ebene auftreten. Dies zeigt die Notwendigkeit, von einer Design Technology Co-Optimization zu einem System-Technology Co-Optimization (STCO) Ansatz überzugehen, bei dem die neuen Technologielösungen darauf ausgerichtet sein sollten, das gesamte SoC und seine Infrastruktur zu optimieren. Daraus können sich für jede SoC-Komponente sehr unterschiedliche optimale Technologien ergeben, die den Weg zum Konzept der hybriden Skalierung ebnen.

# Contents

|                                                                     |              |
|---------------------------------------------------------------------|--------------|
| <b>List of Figures</b>                                              | <b>xvii</b>  |
| <b>List of Tables</b>                                               | <b>xxiii</b> |
| <b>Nomenclature</b>                                                 | <b>xxv</b>   |
| <b>1 Introduction</b>                                               | <b>1</b>     |
| 1.1 History and current satus of Moore's law . . . . .              | 1            |
| 1.1.1 Conventional Scaling . . . . .                                | 1            |
| 1.1.2 Design Technology Co-Optimization . . . . .                   | 3            |
| 1.1.3 Scaling knobs from the device to the standard cells . . . . . | 5            |
| 1.1.4 Current state of the art . . . . .                            | 6            |
| 1.2 Challenges of scaling beyond N7 . . . . .                       | 7            |
| 1.2.1 Patterning limits . . . . .                                   | 8            |
| 1.2.2 Device performance . . . . .                                  | 11           |
| 1.2.3 BEOL parasitics . . . . .                                     | 14           |
| 1.2.4 Routability . . . . .                                         | 16           |
| 1.2.5 Cost . . . . .                                                | 17           |
| 1.3 State of the art . . . . .                                      | 18           |
| 1.4 Objectives of this thesis . . . . .                             | 19           |
| 1.5 Key contributions of this Thesis . . . . .                      | 20           |
| 1.6 Organization of the manuscript . . . . .                        | 21           |
| <b>2 Experimental framework for advanced nodes</b>                  | <b>23</b>    |
| 2.1 Assembling the Predictive PDK . . . . .                         | 23           |
| 2.1.1 Producing the techlef from the design rules . . . . .         | 23           |
| 2.1.2 Standard cell design . . . . .                                | 25           |
| 2.1.3 Library characterization . . . . .                            | 26           |
| 2.1.4 BEOL stack choice and modelling . . . . .                     | 26           |

|          |                                                             |           |
|----------|-------------------------------------------------------------|-----------|
| 2.2      | Design flow . . . . .                                       | 27        |
| 2.2.1    | Choice of reference design . . . . .                        | 30        |
| 2.2.2    | Synthesis . . . . .                                         | 30        |
| 2.2.3    | Place and Route . . . . .                                   | 32        |
| 2.2.4    | Power Integrity . . . . .                                   | 33        |
| 2.3      | DoE Design Methodology . . . . .                            | 34        |
| 2.3.1    | The DTCO space . . . . .                                    | 34        |
| 2.3.2    | DoE examples with rules or patterning options . . . . .     | 35        |
| 2.3.3    | DoE example with standard cells . . . . .                   | 36        |
| 2.3.4    | DoE example with device options . . . . .                   | 39        |
| 2.3.5    | DoE example with BEOL options . . . . .                     | 39        |
| 2.3.6    | DoE example with physical design options . . . . .          | 40        |
| 2.3.7    | DoE example with different EDA tool versions . . . . .      | 40        |
| 2.4      | Summary and conclusions . . . . .                           | 41        |
| <b>3</b> | <b>EUV and the enablement of N7+ and below</b>              | <b>43</b> |
| 3.1      | Introduction to EUV lithography . . . . .                   | 43        |
| 3.1.1    | Motivations . . . . .                                       | 43        |
| 3.1.2    | Overview of an EUV system . . . . .                         | 46        |
| 3.1.3    | EUV status and challenges . . . . .                         | 47        |
| 3.1.4    | EUV and the roadmap . . . . .                               | 51        |
| 3.2      | N7 vs N7+ PPAC comparison . . . . .                         | 53        |
| 3.2.1    | 193i vs EUV patterning . . . . .                            | 53        |
| 3.2.2    | Physical comparison . . . . .                               | 53        |
| 3.2.3    | Electrical comparison . . . . .                             | 56        |
| 3.2.4    | Cost . . . . .                                              | 56        |
| 3.3      | Study on system-level impact of LER . . . . .               | 58        |
| 3.3.1    | Impact of stochastic effects as corners . . . . .           | 58        |
| 3.3.2    | Statistical STA model . . . . .                             | 59        |
| 3.4      | Summary and conclusions . . . . .                           | 61        |
| <b>4</b> | <b>DTCO for the N5 node</b>                                 | <b>63</b> |
| 4.1      | Moving from 7.5 to 6-Tracks with scaling Boosters . . . . . | 64        |
| 4.1.1    | Alternative Solutions to Pitch Scaling . . . . .            | 64        |
| 4.1.2    | Physical Results . . . . .                                  | 68        |
| 4.1.3    | Power and Performance results . . . . .                     | 72        |
| 4.1.4    | IR-drop results . . . . .                                   | 75        |

---

|          |                                              |           |
|----------|----------------------------------------------|-----------|
| 4.1.5    | Final PPAC Comparison                        | 76        |
| 4.2      | Introduction of Cobalt in the BEOL           | 78        |
| 4.2.1    | Technology Parameters                        | 78        |
| 4.2.2    | IP-Block Benchmarking                        | 79        |
| 4.3      | Transition to 5-Tracks standard cells        | 80        |
| 4.3.1    | Design Arcs and patterning                   | 80        |
| 4.3.2    | Physical Results                             | 82        |
| 4.3.3    | Electrical Results                           | 83        |
| 4.4      | FinFet vs NanoWires comparison               | 83        |
| 4.4.1    | Block Level electrical comparison            | 83        |
| 4.4.2    | RO level level variability assessment        | 86        |
| 4.5      | High Performance challenges                  | 86        |
| 4.5.1    | Higher Drive cells and physical synthesis    | 87        |
| 4.5.2    | (EM)IR issues becoming the bottleneck        | 87        |
| 4.5.3    | Tighter Gear ratio between M1 and poly       | 90        |
| 4.6      | Summary and conclusions                      | 90        |
| <b>5</b> | <b>Pathfinding for below N5</b>              | <b>95</b> |
| 5.1      | Standard cells for the N3 Node               | 96        |
| 5.1.1    | Scaling Boosters for the N3 Node             | 96        |
| 5.1.2    | 5.5-Tracks architecture                      | 100       |
| 5.1.3    | 4.5-Tracks architecture                      | 100       |
| 5.1.4    | 4.5-Tracks architecture with Nanosheet       | 102       |
| 5.1.5    | Standard cells summary and comparison        | 102       |
| 5.2      | Cross Node Comparison Between N5 and N3      | 104       |
| 5.2.1    | N3 patterning and ruleset                    | 106       |
| 5.2.2    | BEOL in N3                                   | 107       |
| 5.2.3    | Power mesh in N3                             | 109       |
| 5.2.4    | Physical results                             | 111       |
| 5.2.5    | Electrical Results                           | 111       |
| 5.2.6    | PPAC Summary                                 | 116       |
| 5.3      | CFET as alternative solution (N2)            | 116       |
| 5.3.1    | Introduction to CFET. From the device to P&R | 117       |
| 5.3.2    | Physical Results                             | 122       |
| 5.3.3    | Final Comparison                             | 125       |
| 5.4      | Summary and conclusions                      | 126       |

|                                                                             |            |
|-----------------------------------------------------------------------------|------------|
| <b>6 Towards STCO</b>                                                       | <b>129</b> |
| 6.1 Motivations and Introduction to STCO . . . . .                          | 129        |
| 6.2 STCO Boosters . . . . .                                                 | 130        |
| 6.2.1 High-NA EUV . . . . .                                                 | 130        |
| 6.2.2 SRAM scaling . . . . .                                                | 131        |
| 6.2.3 Emerging Non Volatile Memories . . . . .                              | 132        |
| 6.2.4 IGZO as Power Switch Off Cells . . . . .                              | 135        |
| 6.2.5 AirGap in BEOL . . . . .                                              | 136        |
| 6.2.6 Backside PDN . . . . .                                                | 138        |
| 6.3 Overview on Emerging Devices in the context of Hybrid Scaling . . . . . | 139        |
| 6.3.1 III-V Materials . . . . .                                             | 139        |
| 6.3.2 Vertical FET (VFET) . . . . .                                         | 140        |
| 6.3.3 Negative Capacitance FET (NC-FET) . . . . .                           | 140        |
| 6.3.4 From 2.5 to 3D integration . . . . .                                  | 141        |
| 6.4 New Computing Paradigms . . . . .                                       | 143        |
| 6.4.1 In-memory computing . . . . .                                         | 143        |
| 6.4.2 Spiking Neural Networks . . . . .                                     | 144        |
| 6.5 Summary and conclusions . . . . .                                       | 145        |
| <b>7 Conclusions</b>                                                        | <b>147</b> |
| <b>Bibliography</b>                                                         | <b>153</b> |
| <b>Appendix A Components of a predictive PDK</b>                            | <b>165</b> |
| A.1 example of technology .lef . . . . .                                    | 165        |
| A.2 example of macro .lef . . . . .                                         | 167        |
| A.3 example of .lib files . . . . .                                         | 171        |
| A.4 example of .ict file . . . . .                                          | 196        |
| A.5 File management through SVN repository . . . . .                        | 199        |
| <b>Appendix B Examples of EDA scripts</b>                                   | <b>201</b> |
| B.1 Logical and Physical synthesis . . . . .                                | 201        |
| B.2 Cadence Foundation Flow . . . . .                                       | 208        |
| B.3 Dynamic Vectorless analysis . . . . .                                   | 208        |
| B.4 Compare Metrics . . . . .                                               | 210        |

# List of Figures

|      |                                                                                                                                                                                                            |    |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1  | Evolution of transistor count of CPU/microprocessor and memory ICs.<br>Source: [1].                                                                                                                        | 2  |
| 1.2  | Shrinking line widths over time. Source: [1].                                                                                                                                                              | 2  |
| 1.3  | Gate length and node naming evolution. Source: [2].                                                                                                                                                        | 3  |
| 1.4  | Logic transistor density for latest Intel nodes. Source: [3].                                                                                                                                              | 3  |
| 1.5  | Technology Design and EDA interaction in the technology definition phase<br>for advanced nodes.                                                                                                            | 4  |
| 1.6  | Structural comparison of planar and FinFET transistor. Source: [4].                                                                                                                                        | 5  |
| 1.7  | Comparison of electrical properties of planar and FinFET transistors. Source: [4].                                                                                                                         | 6  |
| 1.8  | Main scaling knobs for FinFET process nodes.                                                                                                                                                               | 7  |
| 1.9  | CPPMxPlot                                                                                                                                                                                                  | 9  |
| 1.10 | Cliffs of different patterning techniques for advanced nodes.                                                                                                                                              | 9  |
| 1.11 | Evolution of wavelength used in lithography, and minimum Feature size over<br>time                                                                                                                         | 10 |
| 1.12 | Different patterning options for advanced nodes: (a) Litho-Etch (LE). (b)<br>Litho-Etch Litho-Etch (LE2). (c) Self-Aligned Double Patterning (SADP).<br>(d) Self-Aligned Quadruple Patterning. Source: [5] | 12 |
| 1.13 | Generation of wire intent from a sea of lines through the usage of a cut mask.                                                                                                                             | 13 |
| 1.14 | Cross section for a FinFET device schematically showing the main sources<br>of parasitic resistances and capacitances. Source: [6].                                                                        | 13 |
| 1.15 | Device $I_{ON}$ degradation due to CPP reduction. Source: [7].                                                                                                                                             | 14 |
| 1.16 | Copper wire resistance per unit length for decreasing metal pitch. Source: [8].                                                                                                                            | 15 |
| 1.17 | Simplified capacitance model for IC interconnects.                                                                                                                                                         | 16 |
| 1.18 | Cross Node Mx RC scaling. Source: [9].                                                                                                                                                                     | 16 |
| 1.19 | ElectricalCliffs                                                                                                                                                                                           | 17 |
| 1.20 | Normalized wafer cost evolution across technology nodes. Source: [10] .                                                                                                                                    | 18 |
| 1.21 | Contribution of P&R blocks to BEOL test-chips aimed to technology definition.                                                                                                                              | 20 |

---

|      |                                                                                                                                                                                                                                                                                                                      |    |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1  | DTCO Flow for imec Design of Experiments. . . . .                                                                                                                                                                                                                                                                    | 24 |
| 2.2  | Steps from technology constraints to <i>tech.lef</i> generation . . . . .                                                                                                                                                                                                                                            | 24 |
| 2.3  | Cross sections: FEOL/MOL and first layers of BEOL for imec N7. . . . .                                                                                                                                                                                                                                               | 25 |
| 2.4  | BEOL Cross section (dimensions are not in scale). . . . .                                                                                                                                                                                                                                                            | 28 |
| 2.5  | Comparing metrics for <i>.html</i> reports of different implementation runs: header indicates the run ID; column in yellow is considered as the "golden" run, and the values for run "0" and run "1" are marked in red or green depending whether they are worse or better compared to the golden reference. . . . . | 29 |
| 2.6  | Designs used in this work. ARM M0 [11] for pipe-cleaning, LDPC [12] and ARM-64 bit CPU for benchmarking purposes. (figures are in scale) . . . . .                                                                                                                                                                   | 31 |
| 2.7  | Inputs and outputs files for Logical and Physical Synthesis. . . . .                                                                                                                                                                                                                                                 | 32 |
| 2.8  | Implementation flow in Innovus. . . . .                                                                                                                                                                                                                                                                              | 33 |
| 2.9  | Multidimensional DTCO space to be explored though DoEs. . . . .                                                                                                                                                                                                                                                      | 35 |
| 2.10 | Typical behaviour of Design Rule Violations (DRCs) as a function of target placement density. . . . .                                                                                                                                                                                                                | 37 |
| 2.11 | Frequency sweep flowchart for different standard cells libraries. . . . .                                                                                                                                                                                                                                            | 38 |
| 3.1  | Basic elements of an optical lithography system. . . . .                                                                                                                                                                                                                                                             | 44 |
| 3.2  | Evolution of Rayleigh factor across technology nodes. Source: [13]. . . . .                                                                                                                                                                                                                                          | 45 |
| 3.3  | Position of EUV wavelengths in the electromagnetic spectrum. . . . .                                                                                                                                                                                                                                                 | 46 |
| 3.4  | A schematic of the main components of an EUV lithography system. Source: [14].                                                                                                                                                                                                                                       | 47 |
| 3.5  | Throughput improvement of EUV. Source [15]. . . . .                                                                                                                                                                                                                                                                  | 49 |
| 3.6  | Non ideal profile of the metal determined by LER. . . . .                                                                                                                                                                                                                                                            | 50 |
| 3.7  | Graphical representation of RLS tradeoff. Source: [16] . . . . .                                                                                                                                                                                                                                                     | 50 |
| 3.8  | Progression of standard cells design style towards unidimensional patterns.                                                                                                                                                                                                                                          | 51 |
| 3.9  | DRC count versus placement density for N7 and N7+. . . . .                                                                                                                                                                                                                                                           | 55 |
| 3.10 | Wirelength breakdown on the Mx layers for N7 and N7+ at 77.5% density .                                                                                                                                                                                                                                              | 55 |
| 3.11 | Electrical comparison between N7 and N7+ options. . . . .                                                                                                                                                                                                                                                            | 57 |
| 3.12 | Reduction in the number of process steps with EUV. Souce: [17]. . . . .                                                                                                                                                                                                                                              | 57 |
| 3.13 | RC corners including both systematic and stochastic variability. A $\sigma_{LER}$ of 2.8nm was assumed. . . . .                                                                                                                                                                                                      | 59 |
| 3.14 | Normalized RC at block level highlighting variations induced by corners. .                                                                                                                                                                                                                                           | 59 |
| 3.15 | Results from statistical STA post-processing: (a) critical path delay distribution for different $\sigma_{LER}$ ; (b) Timing yield as a function of margin for different $\sigma_{LER}$ . Source: [18] . . . . .                                                                                                     | 61 |
| 4.1  | SDB and SAGC reducing the usage of dummy gates. . . . .                                                                                                                                                                                                                                                              | 66 |

---

|      |                                                                                                                     |     |
|------|---------------------------------------------------------------------------------------------------------------------|-----|
| 4.2  | MINT and M1 open to routing. . . . .                                                                                | 67  |
| 4.3  | PDN architectures (a) Original (b) 6-Tracks compatible. . . . .                                                     | 68  |
| 4.4  | Insertion of porous cells under the M1 VDD/VSS stripes. . . . .                                                     | 69  |
| 4.5  | Standard cell area for reference library (Library#1). . . . .                                                       | 70  |
| 4.6  | Scaling factor of standard cell with respect to reference library area. . . . .                                     | 70  |
| 4.7  | Summary of the physical results. Normalized to the reference run ( <b>Run#1</b> ). . . . .                          | 72  |
| 4.8  | Slack distributions for 7.5 and 6-Tracks for the frequency sweep runs. . . . .                                      | 74  |
| 4.9  | Post P&R power comparison of 7.5-Track 3Fins and 6-Tracks 2 Fins. . . . .                                           | 75  |
| 4.10 | CDF of IR drop values across power mesh. . . . .                                                                    | 77  |
| 4.11 | CDF of IR drop values across power mesh. . . . .                                                                    | 78  |
| 4.12 | Graphical description of the design arcs required for efficient standard cell design and Place and Route. . . . .   | 81  |
| 4.13 | Comparison of an AO21D1 cell in 6-Tracks vs 5-Tracks libraries. . . . .                                             | 82  |
| 4.14 | MOL constraints for 5-Tracks standard cells. (a) with two fins per devices.<br>(b) with one fin per device. . . . . | 84  |
| 4.15 | Normalized PPA metrics for 6-Tracks and 5-Tracks libraries. . . . .                                                 | 84  |
| 4.16 | 3-D sketches of FinFET and lateral NWFET. Source: [19]. . . . .                                                     | 85  |
| 4.17 | Power gains of LNW devices versus FinFET for frequency points where LNW close timing. . . . .                       | 86  |
| 4.18 | Layout comparison of a D1 and D8 Inverter in a 6-Tracks library. . . . .                                            | 88  |
| 4.19 | Timing comparison using high drive cells, with and without the physical synthesis flow. . . . .                     | 88  |
| 4.20 | Motvations behind the traditional usage of 1:1 gear ratio between M1 and Poly. . . . .                              | 91  |
| 4.21 | Poly and M1 grids for different Gear Ratios: 1, 2/3 and 0.5. . . . .                                                | 91  |
| 5.1  | (a) Stacked Vias with landing pad. (b) Supervia. . . . .                                                            | 97  |
| 5.2  | OAI cell in 5.5-Tracks showing the usage of SV pins . . . . .                                                       | 98  |
| 5.3  | Layout of a 4.5-Tracks AOI cell using MTHS. . . . .                                                                 | 98  |
| 5.4  | Cross section of front end with Buried Power Rail. Source: [20]. . . . .                                            | 99  |
| 5.5  | TEM Cross section of Nanosheet devices. Source: [21]. . . . .                                                       | 100 |
| 5.6  | Top view of the N3 5.5-Tracks template. Source: [22]. . . . .                                                       | 101 |
| 5.7  | Cross section for the N3 5.5-Tracks cells. Source: [22] . . . . .                                                   | 101 |
| 5.8  | Top view of N3 4.5-Tracks template. Source: [22] . . . . .                                                          | 102 |
| 5.9  | Cross section for 4.5-Tracks cells. Source: [22] . . . . .                                                          | 103 |
| 5.10 | Cross section for 4.5-Tracks variant with Nanosheet device. Source: [22] . . . . .                                  | 103 |
| 5.11 | Centrality of IR-drop assessment for a fair PPA benchmarking across nodes and technology options. . . . .           | 107 |

|                                                                                                                                                                                                                                                                                                                                |     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 5.12 Comparison between Power delivery network in N5 and N3. Deltas shown for different layers and views. (a) Mint layer with only standard cells shapes. (b) M1 layer with power and ground nets and standard cells shapes visible. (c) Power mesh in $M_X$ layers. (d) power mesh, standard cells and signals in M1. . . . . | 110 |
| 5.13 Performance comparison between reference N5 and N3 libraries. Horizontal dashed lines indicate timing closure thresholds. . . . .                                                                                                                                                                                         | 113 |
| 5.14 Comparison of power related metrics between reference N5 and N3 libraries. (a) Normalized power curves for all libraries. (b) Power Gain of N3 runs versus N5. (c) Power density increase in N3 runs versus corresponding N5 runs. . . . .                                                                                | 114 |
| 5.15 Values of IR at 99% percentile on $M_X$ layers versus normalized power for different PDN dimensions. (a) N5 power mesh. (b) N3 power mesh. . . . .                                                                                                                                                                        | 115 |
| 5.16 Final PPA summary of the cross node comparison. . . . .                                                                                                                                                                                                                                                                   | 117 |
| 5.17 CFET Device views. (a) The idea comes from "folding" the nFET and the pFET. (b) 3D view of the nFET stacked on top of the pFET. (c) cross section of the CFET. . . . .                                                                                                                                                    | 118 |
| 5.18 Comparison of an AOI cell in N3 5.5-Tracks, and CFET libraries. Dimensions are in scale. . . . .                                                                                                                                                                                                                          | 118 |
| 5.19 Views of MINT layer in implementation with CFET libraries. (a) MINT shapes in the standard cells. (b) MINT extensions from the router. (c) Pins + extensions. . . . .                                                                                                                                                     | 121 |
| 5.20 Top view of IP block showing M1 obstructions. (a) In N3. (b) In the CFET implementation. . . . .                                                                                                                                                                                                                          | 122 |
| 5.21 M1 view for CFET implementation showing sparse metal cuts with "kissing corners" and denser pin access. . . . .                                                                                                                                                                                                           | 123 |
| 5.22 Buried Power Rail mesh in PnR. . . . .                                                                                                                                                                                                                                                                                    | 123 |
| 5.23 Comparison of wire length and via distributions between reference N3 and CFET runs. . . . .                                                                                                                                                                                                                               | 125 |
| 6.1 STCO Roadmap proposed by imec. . . . .                                                                                                                                                                                                                                                                                     | 130 |
| 6.2 Area pie for a high performance SoC. Adapted from: [23]. . . . .                                                                                                                                                                                                                                                           | 131 |
| 6.3 High Density (111) and High Performance (112/122) bit cell scaling versus technology node. Source:[24] . . . . .                                                                                                                                                                                                           | 132 |
| 6.4 Floorplans in imec N5. (a) 7.5-Tracks library. (b) 6-Tracks library with SAGC.133                                                                                                                                                                                                                                          | 133 |

|      |                                                                                                                                                                                                                                            |     |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 6.5  | Magnetic RAM (MRAM) basic operational scheme. (a) Parallel magnetization between the free and pinned layers results in low resistance ( $R_p$ ). (b) Magnetization that is not parallel yields high resistance ( $R_{ap}$ ). Source: [25]. | 134 |
| 6.6  | Cross section of a ReRAM memory cell. . . . .                                                                                                                                                                                              | 134 |
| 6.7  | Comparison of I-V characteristics of IGZO and Si transistor. Source: [26].                                                                                                                                                                 | 135 |
| 6.8  | Schematic of a PSO cell. . . . .                                                                                                                                                                                                           | 136 |
| 6.9  | Power figures showing gains of Airgap in $M_Z$ layers. . . . .                                                                                                                                                                             | 137 |
| 6.10 | Cross section showing backside PDN concept. Source: [27]. . . . .                                                                                                                                                                          | 138 |
| 6.11 | Bulk mobility of Si, Ge, and III-V materials with their respective energy bandgap, where empty and solid symbols are used for hole and electron respectively. Source: [28]. . . . .                                                        | 139 |
| 6.12 | 3D view of a Vertical FET .Source: [29]. . . . .                                                                                                                                                                                           | 140 |
| 6.13 | Evolution of supply and threshold voltage. Source: [30]. . . . .                                                                                                                                                                           | 141 |
| 6.14 | NC-FET transistor. (a) 3D structure of the trasnistor. (b) Equivalent capacior model. Source: [31]. . . . .                                                                                                                                | 142 |
| 6.15 | Concept of using heterogeneous technologies specialized for each block, in conjunction with 3D stacking techniques. . . . .                                                                                                                | 143 |
| 6.16 | Comparison of different 3D integration techniques. . . . .                                                                                                                                                                                 | 144 |
| A.1  | Managing PDK files versions for different nodes and DTCO options through SVN repository. . . . .                                                                                                                                           | 199 |



# List of Tables

|      |                                                                                                                                                              |    |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.1  | Main process parameters assumed as reference N7 in this work. . . . .                                                                                        | 8  |
| 1.2  | Comparative table highlighting key contributions of this thesis versus state of the art. (N7* in [32] standard cells were scaled from 45nm layouts). . . . . | 21 |
| 2.1  | Thickness and dielectric constant for reference N7 . . . . .                                                                                                 | 27 |
| 3.1  | Top priority problems individuated by the EUV symposium every year. . . . .                                                                                  | 48 |
| 3.2  | Expected insertion of EUV into nodes below N7. 193i options are indicated in blue, EUV options in yellow. . . . .                                            | 52 |
| 3.3  | Improvements in EUV machines for different generations of NXE machines from ASML. . . . .                                                                    | 52 |
| 3.4  | Comparison of N7 vs. N7+ design rules for metal vias and metal cuts. . . . .                                                                                 | 54 |
| 3.5  | Relative deltas in Power and Performance of Multi Corner (MC) runs compared to the single-corner Typical implementation. . . . .                             | 60 |
| 4.1  | Main process parameters for N7 and N5 nodes. . . . .                                                                                                         | 63 |
| 4.2  | Summary Table of the Scaling Boosters explored. . . . .                                                                                                      | 64 |
| 4.3  | Impact of different scaling boosters on NAND2 and DFF area. Dimensions of figures are in scale. . . . .                                                      | 65 |
| 4.4  | Setup of the runs for the physical experiments. "-" indicates that the scaling booster was not used. . . . .                                                 | 69 |
| 4.5  | Pin density increase determined by scaling boosters (die area in scale). . . . .                                                                             | 73 |
| 4.6  | Design metrics for different frequencies (LDPC design). . . . .                                                                                              | 73 |
| 4.7  | IR drop Comparison for different PDN dimensions. . . . .                                                                                                     | 77 |
| 4.8  | Material and dielectric configurations for different BEOL scenarios . . . . .                                                                                | 79 |
| 4.9  | Percent variations of electrical metrics versus reference BEOL. Benchmarking done at maximum target frequency. . . . .                                       | 80 |
| 4.10 | Ruleset honouring the constraints in Equations 4.1- 4.4. . . . .                                                                                             | 81 |
| 4.11 | Normalized area metrics for the three libraries in Figure 4.13. . . . .                                                                                      | 82 |

|      |                                                                                                                                                                                                                            |     |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 4.12 | Timing closure across the frequency sweep for FF and LNW devices. . . . .                                                                                                                                                  | 85  |
| 4.13 | Maximum placement density, normalized cell and core areas for different PDN scenarios. 6-Tracks and porous 6-Tracks cells are compared. *Numbers for the 8CPP scenario are extrapolated. . . . .                           | 89  |
| 5.1  | Main process parameters for N7, N5 and N3 nodes. . . . .                                                                                                                                                                   | 95  |
| 5.2  | Scaling boosters for N3 node. . . . .                                                                                                                                                                                      | 96  |
| 5.3  | Comparative table showing Scaling boosters usage and technology features across different N5 and N3 libraries. . . . .                                                                                                     | 105 |
| 5.4  | P&R related considerations on N5 and N3 libraries. . . . .                                                                                                                                                                 | 105 |
| 5.5  | Delta in patterning assumptions between N5 and N3. . . . .                                                                                                                                                                 | 108 |
| 5.6  | Ruleset for N3 SAQP patterning + DPT EUV cuts. . . . .                                                                                                                                                                     | 108 |
| 5.7  | Main deltas in BEOL for N5 and N3 $M_X$ layers. . . . .                                                                                                                                                                    | 109 |
| 5.8  | Maximum placement density, normalized cell and core areas for different PDN scenarios in N5 and N3. Areas are normalized to the N5 6-Tracks with 48CPP spacing. *Numbers for the 8CPP scenario are extrapolated in N5. . . | 112 |
| 5.9  | Power density values, driving the selection of the PDN. The selected PDN and the target frequency determine the target density. *In N5 PDN spacings below 24CPP require the introduction of porous cells. . . . .          | 115 |
| 5.10 | Comparative table showing Scaling boosters usage and technology features in reference N3 and CFET libraries. . . . .                                                                                                       | 119 |
| 5.11 | P&R related considerations on N3 and CFET libraries. . . . .                                                                                                                                                               | 120 |
| 5.12 | Comparison of main physical metrics between reference N3 and CFET runs. .                                                                                                                                                  | 124 |
| 5.13 | Final comparison table between CFET and reference N3. . . . .                                                                                                                                                              | 126 |
| 6.1  | Comparative table for mainstream and emerging memories. Source: [33]. F is the minimum feature size. . . . .                                                                                                               | 134 |
| 6.2  | Setup of the experiments with Airgap introduction in $M_Z$ layers. . . . .                                                                                                                                                 | 137 |

# Nomenclature

## Greek Symbols

$\eta_{LER}$  Average LER

$\lambda$  wavelength in optical lithography system

$\sigma_{LER}$  LER Standard deviation

$\tau$  RC delay

$\theta$  half-angle of the cone of light that can enter the lens

## Other Symbols

$K_1$  Rayleigh factor

$n$  refractivity index

193i 193nm wavelength immersion lithography

Co Cobalt

Cu Copper

Ru Ruthenium

M0G Gate Contact

MINT Metal 0 layer

Mx Tightest Metal layer

TaN Tantalum Nitride

## Acronyms / Abbreviations

*CMP* Chemical Mechanical Polishing

*DoF* Depth of Focus

*BEOL* Back End Of Line

*BPR* Buried Power Rail

*C2C* Center-to-Center

*CFET* Complementary FET

*CPP* Contacted Poly Pitch

*DB* Database

*DFF* D Flip-Flop

*DFM* Design For Manufacturability

*DoE* Design of Experiments

*DPT* Double Patterning

*DRC* Design Rule Check

*DRM* Design Rule Manual

*DTCO* Design Technology Co-Optimization

*EDA* Electronic Design Automation

*EM* Electromigration

*FEOL* Front End Of Line

*GR* Gear Ratio

*HEMT* High Electron Mobility

*HVM* High Volume Manufacturing

*IC* Integrated Circuit

*ILD* Inter Layer Dielectric

*IMD* Inter Metal Dielectric

LE2 Litho-Etch Litho-Etch

LE3 Litho-Etch Litho-Etch Litho-Etch

LE Litho-Etch

LER Line Edge Roughness

MIM Metal-Insulator-Metal

MOL Middle Of Line

EUV Extreme Ultra Violet

MPT Multiple Patterning

MTHS Mid Track Hand Shake

NA Numerical Aperture

NC-FET Negative Capacitance FET

NDR Non Default Rules

NSH Nanosheet device

OPC Optical Proximity Correction

P&R Place and Route

PDK Process Design Kit

PDN Power Delivery Network

PD Placement Density

PGV Power Grid View

PPAC Power Performance Area and Cost

PPA Power Performance and Area

RC Resistance and Capacitance

RET Resolution Enhancement Techniques

RLS Resolution LER and Sensitivity

RO Ring Oscillator

RTL Register Transfer Level

SADP Spacer Assisted Double Patterning

SAGC Self Aligned Gate Contact

SAQP Spacer Assisted Quadruple Patterning

SCE Short Channel Effects

SDB Single Diffusion Break

SDC Spacer Defined Cut

SDC Synopsys Design Constraints

SEM Scanning Electron Microscope

SHE Self Heating Effect

SoC System On Chip

STA Static Timing Analysis

STCO System Technology Co-Optimization

SV Super Via

T2T Tip-to-Tip

TAT Turn Around Time

TH Track Height

TNS Total Negative Slack

TPT Triple Patterning

TSV Through Silicon Vias

TTF median Time To Failure

UHD Ultra High Density

VDD Power Net

VFET Vertical FET

VSS Ground Net

W2W Wafer to Wafer

WNS Worst Negative Slack

WPH Wafers Per Hour



# Chapter 1

## Introduction

### 1.1 History and current status of Moore's law

#### 1.1.1 Conventional Scaling

In 1965, Intel co-founder Gordon Moore formulated a visionary prediction that subsequently became widely known as Moore's law [34]. According to this forecast the number of transistors per unit area (i.e. transistor density) of an Integrated Circuit (IC) would have doubled approximately every two years at the same cost, determining an exponential growth of IC complexity and exponential decrease of cost per transistor. This hypothesis, initially based on the empirical observation of the trends of the first generations of IC technologies, was fully confirmed across the following decades as shown in Figure 1.1, and was de facto established by the semiconductor industry as the primary roadmap to fuel a cost-effective advancement of microelectronics.

Historically, the progress in the semiconductor manufacturing process demanded to sustain Moore's law, was mainly obtained by scaling the minimum feature size, corresponding to the gate length or the tightest metal width, as illustrated in Figure 1.2.

Industry refers to the different process technologies that were developed over time as "technology nodes" or simply "nodes". Traditionally the name of the node coincided with the gate length. As shown in Figure 1.3, this naming convention was abandoned in recent nodes, and further discrepancies emerged between different foundries for below 28nm nodes. This inconsistency is only partly due to marketing reasons, but as we are going to show, it also reflects the increased complexity of scaling and the difficulty to quantify it through a single figure. For completeness we also reported in Figure 1.4 the transistor density of the latest nodes from Intel, and the time-line indicating their insertion into High Volume



Figure 1.1 Evolution of transistor count of CPU/microprocessor and memory ICs. Source: [1].



Figure 1.2 Shrinking line widths over time. Source: [1].



Figure 1.3 Gate length and node naming evolution. Source: [2].



Figure 1.4 Logic transistor density for latest Intel nodes. Source: [3].

Manufacturing (HVM), which also confirms that Moore's law has been kept valid until the time of writing (2018).

### 1.1.2 Design Technology Co-Optimization

As the roadmap is being pushed to the extreme limit, sustaining Moore's law is increasingly more challenging due to fundamental constraints imposed by lithography cliffs, material resistivity, manufacturability, device performance and ultimately wafer cost [35]. In this context, a more holistic approach was started to be adopted by industry for technology nodes below 28nm, commonly defined as Design Technology Co-Optimization (DTCO) [36] [37]. The key idea of DTCO is to perform the process technology choices based on standard cells level [38] and logic-block level [39] assessments. In a similar way, the standard cell design and IC physical design strategies need to be more and more aware of the new technology features, in order to optimize Power Performance and Area (PPA) at block level, making



Figure 1.5 Technology Design and EDA interaction in the technology definition phase for advanced nodes.

sure that the technology improvements are effectively translated into real gains at a higher level of abstraction. The role of Electronic Design Automation (EDA) software tools, and particularly Place and Route (P&R) tools, is therefore of primary importance to assist the decision making and design enablement in the technology definition phase, providing PPA benchmarking results, and feedback on the feasibility of supporting new technology features in the digital implementation flow. In some cases the existing versions of the EDA tools will not be able to support, model or properly handle the innovations demanded by the technology or design side. In such situations the danger of a deadlock arises. In order to avoid it, the deployment of new capabilities needs to be discussed with the EDA company, estimating the effort level that would be required for such enhancements, along with the expected time-line for their readiness. This further highlights the importance of the EDA involvement in the DTCO loop [40] and of leveraging state of the art, or even "beta builds" of the tools, fully incorporating the latest enhancements and bug fixes. The DTCO loop graphically described in Figure 1.5, is an iterative process whose target is to converge, producing as outputs:

- the technology specifications
- the standard cells libraries
- the physical design strategies
- the upgraded versions of EDA tools



Figure 1.6 Structural comparison of planar and FinFET transistor. Source: [4].

### 1.1.3 Scaling knobs from the device to the standard cells

Until the 28nm node the conventional planar CMOS architecture had allowed to meet the power and performance specifications targeted the newer nodes. Shrinking the gate length contributed to improve the transistor switching speed while keeping the leakage current (i.e. the current in the off state) manageable. However, this started to be no longer applicable when the gate reached a length of approximately 20nm. Beyond those dimensions, multiple Short Channel Effects (SCE) started to severely affect the electrical behaviour of the planar transistor, preventing its adoption for the 22nm node and beyond [41]. The planar structure was then replaced by a three-dimensional structure consisting of silicon channels raised above the level of the insulator, wrapped by the gate from the top and laterally as illustrated in Figure 1.6. This structure was named FinFET or tri-gate transistor. The main purpose of this structure is to allow the gate to fully deplete the channel and recover electrostatic control through an increased overlap between the channel and the gate oxide, that virtually extends the device width. Unlike the planar transistor, for which the transistor width can be controlled by the circuit designer, for a FinFET device the effective width ( $W_{eff}$ ) is quantized [42] as in Equation 1.1.  $H_{fin}$  and  $W_{fin}$  are the fin height and fin width respectively while the  $n_{fin}$  is the number of fins per device, that can be selected by the designer. As shown in [41] the adoption of FinFET allowed to improve the electrical properties of the device, outperforming the planar transistor in terms of both power and performance. Figure 1.7 indicates for the first generation of finFET technology (22nm), a reduction of an order of magnitude in leakage current, and a 37% reduction in delay at the same operating voltage versus planar. The finFET device proved to be scalable for the subsequent process nodes, and it is still the device used by all foundries in the the most advanced technologies currently in production [3][43][44].

$$W_{eff} = n_{fin}(2 \cdot H_{fin} + W_{fin}) \quad (1.1)$$



Figure 1.7 Comparison of electrical properties of planar and FinFET transistors. Source: [4]

The standard cell template in Figure 1.8, reports the main scaling knobs in a state-of-the-art FinFET process. Many of these knobs are defined as "pitches", i.e. the center to center distance between lines belonging to the same layer. As explained in [35], since the 10nm node, a "litho-friendly" design style demanded for unidirectional shapes in the Front End Of Line (FEOL) layers and the fist layers of the Back End Of Line (BEOL). In order to shrink the standard cell area by a factor 2X, as requested by Moore's law, a 0.7 scaling factor needs to be targeted in the Gate Pitch, also called Contacted Poly Pitch (CPP), and in the Tightest Metal layer (Mx). In fact, these two layers being orthogonal, their product will be tightly correlated to the standard cells area. In this work we will call Track Height (TH) the ratio between the cell height and Mx pitch. The fins are manufactured as a regular array running orthogonally respect to the gate. The Cell Height, the Fin pitch and the p/n separation will then determine the maximum number of fins ( $n_{fin}$  in Equation 1.1) that can be fit for each device (pFET and nFET). Figure 1.8 is thus an example showing a 7.5-Tracks architecture with three fins per device. We can also see that the width and spacing pattern of Mx can be non uniform, in order to accommodate an enlarged width for the power (VDD) and ground (VSS) nets. The channel length ( $L_g$ ) of the transistor is defined by the intersection between the gate and the fins. The Gate is separated by a dielectric (Gate Spacer) from the active contact (M0A), which is essentially a trench filled with a conductive material (typically tungsten) that allows to access the source and the drain and connect them to the upper layers.

#### 1.1.4 Current state of the art

Table 1.1 documents the values that will be used as starting reference for all the subsequent considerations in this work. For some experiments different technologies that will be



Figure 1.8 Main scaling knobs for FinFET process nodes.

developed starting from this initial reference point will become the new reference, and for each benchmarking the baseline will be explicitly indicated. These figures were derived from literature and internal information available in imec, and are meant to be representative of a state-of-the-art process from a leading foundry. Due to the aforementioned inconsistencies in the naming conventions, this generic reference node could present similar values to N10 and N7 nodes used in production from different foundries at the time of writing. As a metric to quantify area scaling at standard cell level we propose herein the formula in Equation 1.2, which is agnostic of foundry specific naming conventions:

$$CellArea = CPP \cdot M_x \cdot TH \cdot PolyCount \quad (1.2)$$

## 1.2 Challenges of scaling beyond N7

Focusing on the variables in Equation 1.2, we can think to visualize our starting reference point on a plot as the one in Figure 1.9. In this chart the X-axis shows the CPP in nm, that as seen in Figure 1.8, is correlated to standard cell area scaling in the horizontal dimension (i.e.

Table 1.1 Main process parameters assumed as reference N7 in this work.

| Node                     | N7         |
|--------------------------|------------|
| Track Height (TH)        | 7.5]       |
| Process parameter        | value [nm] |
| Gate length ( $L_g$ )    | 21         |
| Gate Spacer width        | 8          |
| Fin Height ( $H_{fin}$ ) | 45         |
| Fin Pitch                | 30         |
| Fin Width ( $W_{fin}$ )  | 5          |
| p/n separation           | 85         |
| $M_X$ Pitch              | 40         |
| Gate Pitch (CPP)         | 54         |

the width of the cell). The Y-axis is the  $M_X$  Pitch in nm, that instead enables the scaling in the vertical direction, thus allowing to reduce the height of the cell. The constant product curves (branches of hyperbola) are basically iso-area curves, graphically representing different options to implement one node and to migrate from one node to another. The dashed arrow indicates the scenario where both dimensions are scaled by a factor 0.7 across consecutive nodes, that constituted the traditional strategy to achieve the 2X area gain moving to the next node. Scaling beyond N7 dimensions exclusively through pitch reduction is made more and more challenging by several factors related to patterning, device performance, BEOL Resistance and Capacitance (RC), routability and finally cost. The major issues and bottlenecks in each of these domains will be addressed from subsection 1.2.1 to 1.2.5.

### 1.2.1 Patterning limits

The first limitation to be considered is patterning resolution, whose cliffs are indicated in Figure 1.10 by the dashed lines, for the several patterning options available at advanced nodes geometries (i.e. below 80nm pitches). Until the 28nm node, the progressive reduction of the wavelength (Figure 1.11) used in the photolithography process, had allowed, combined with Design For Manufacturability (DFM), Resolution Enhancement Techniques (RET), and Optical Proximity Correction (OPC), to enable the scaling of the minimum feature size across the nodes, with a single iteration of the traditional Litho-Etch (LE) patterning [13], whose main process steps are sequentially indicated in Figure 1.12 (a). At the 22nm node, with the introduction of pitches in the range of 80nm [13], the 193nm wavelength immersion (193i) lithography [45] was no longer able to provide sufficient resolution through a single LE exposure.



Figure 1.9 Mx versus CPP plot, indicating N7 reference.



Figure 1.10 Cliffs of different patterning techniques for advanced nodes.



Figure 1.11 Evolution of wavelength used in lithography, and minimum Feature size over time

**LEx:** As a lower wavelength lithography was not yet available, the increasing gap between the minimum feature size and wavelength (orange area in Figure 1.11) was resolved by enacting a decomposition of the original layout into two separate masks. In other words, neighbouring shapes that are not printable through a single mask, are split between mask1 and mask2, generating two patterns that are less dense than the original, that is constituted by the boolean "union" of the two masks. As indicated by the sequence of steps in Figure 1.12 (b), the first pattern is transferred onto an underlying hardmask through a first LE step. The second pattern is then aligned to the first and transferred onto a second hardmask through a second LE iteration. The end result is the final pattern that can target more aggressive pitches and spacings compared to a single LE. This technique is the simplest form of Multiple Patterning (MPT) and it is commonly named Litho-Etch Litho-Etch and abbreviated as LELE or LE2. Adding a third LE iteration then results into a Litho-Etch Litho-Etch Litho-Etch (LELELE or LE3) patterning. Although in principle this procedure could be generalized to an increased number of masks, an unavoidable misalignment exists between the multiple exposures, which makes practically difficult to extend this approach, and limits its resolution gains.

**EUV:** In order to bridge the gap between the minimum feature size and resolution, an optical system leveraging a wavelength of 13nm was started to be considered, and alpha demo tools were developed in 2006 by ASML [? ]. This lithography was called Extreme Ultra Violet (EUV), and its main purpose is to reduce the increase of mask count, process steps, and wafer cost, by collapsing the multiple steps into a single LE step with higher resolution. More details will be given in Chapter 3, entirely devoted to illustrate the principles and value proposition of EUV, for the state-of-the-art and future nodes. The expected resolution using

the latest EUV machines available at imec, is in the range of 30nm for a unidimensional design style, while it is between 35 and 40nm for bi-dimensional geometries.

**SAxP:** As geometries for the most advanced nodes moved beyond the resolution of LEx [3] [46], and the EUV systems were not yet ready for mass production, spacer assisted patterning techniques were introduced, in order to further extend the resolution limits of 193i. The Spacer Assisted Double Pattering (SADP) process flow is reported in Figure 1.12 (c). In this scheme spacers are grown on the sidewalls of a pre-defined sacrificial layer, called "mandrel", through deposition and etch process steps. Next, the mandrel is removed by an additional etch step, leaving only the spacers, which are then used to define the desired final structures. Since there are two spacers for each mandrel, the feature density is doubled, allowing to target pitches in the range of 40nm. Iterating this procedure two times results in the Spacer Assisted Quadruple Pattering (SAQP) that can further scale down the pitches towards the range of 20nm. Both SADP and SAQP are typically unidirectional techniques [46], that unlike the LEx and EUV patterning do not directly create the intended patterning, but they generate a "sea of lines" that needs to be cut through additional cut (also called block) masks, that separate the continuous lines defining the actual wire intent. This concept is illustrated by Figure 1.13.

The considerations in this subsection were mainly focused on the patterning resolutions of the metal layers. Qualitatively similar considerations can be applied to the vias and cut masks, where the main metric to be considered will be the Center-to-Center (C2C) spacing rather than the pitch.

## 1.2.2 Device performance

Although the constraints in subsection 1.2.1 are the major bottleneck limiting Mx scaling, the reduction of CPP is instead mostly limited by device performance issues. From Figure 1.8 it is clear that the budget that can be allocated for the gate pitch is determined by the Gate Length ( $L_g$ ), the Gate Spacer width ( $W_{SP}$ ) and by the active contact width ( $W_{M0A}$ ) that are simply linked by Equation 1.3:

$$CPP = L_g + 2 \cdot W_{SP} + W_{M0A} \quad (1.3)$$

All of these knobs are challenged by fundamental limitations. In fact scaling  $W_{SP}$  beyond 5nm is currently not possible due to reliability issues and by a sharp increase of the Gate to Source/Drain parasitics capacitances [47]. This is intuitive if we consider the cross-sectial



Figure 1.12 Different patterning options for advanced nodes: (a) Litho-Etch (LE). (b) Litho-Etch Litho-Etch (LE2). (c) Self-Aligned Double Patterning (SADP). (d) Self-Aligned Quadruple Patterning. Source: [5]



Figure 1.13 Generation of wire intent from a sea of lines through the usage of a cut mask.



Figure 1.14 Cross section for a FinFET device schematically showing the main sources of parasitic resistances and capacitances. Source: [6].

view for a FinFET device as depicted in Figure 1.14, where it is clear that the parasitic capacitances will be roughly proportional to  $1/W_{SP}$ . On the other hand, scaling  $W_{M0A}$  is beyond 15nm is currently limited by the super-linear increase of contact resistance, even assuming low contact resistivity, and by the loss of volume and hence strain [47]. Figure 1.14 clearly highlights the contact width being the major contributor to the resistance to access the contacts. Finally, scaling the  $L_g$  comes with a problem of electrostatic control [19]. In order to maintain control over the SCE, the  $L_g$  reduction should be compensated by an increase of the fin height. However the aspect ratio of the fins is also restricted by variability and reliability concerns, that limit the maximum fin height of processes currently in production in the range of 50nm. The cumulative result of these constraints is that, scaling the CPP beyond 40nm (i.e. the contact width below 20nm), determines a steep degradation of device  $I_{ON}$ , as shown by the plot in Figure 1.15, that severely impacts the possibility to utilize the device for high-performance applications. Therefore all the CPP dimensions currently considerable as realistic, are within the resolution limits of SADP.



Figure 1.15 Device  $I_{ON}$  degradation due to CPP reduction. Source: [7].

### 1.2.3 BEOL parasitics

In subsection 1.2.1 we have seen that the most aggressive lithography option (SAQP) allows to target pitches in the range of 20nm. Assuming a unitary width to spacing ratio for the tightest metals, we then need to explore BEOL options that can be viable down to widths of 10nm. This is functional to avoid introducing a supplementary bottleneck in the Mx direction, determined by BEOL RCs.

**Interconnect:** Copper (Cu) based dual damascene BEOL has been the workhorse for IC interconnects for more than 20 years, since they were first introduced in a sub 250nm technology [48] to substitute Aluminium. In order to prevent Cu diffusion into the surrounding dielectric material, different barrier/liner combinations can be used [49] to encapsulate the metals and vias. The width of this barrier [50] has not scaled significantly over the last nodes, saturating between 4nm and 3nm. This fact, together with the shrinking width of the minimum features, has considerably increased the relative impact of the barrier/liner to the total width. As the traditionally used Tantalum Nitride (TaN) barrier, is significantly more resistive than the Cu core [51], the increased dominance of the barrier/liner adversely affect total resistivity. Finally, decreasing the line width towards 10nm increases the scattering of charge carriers at interfaces and grain boundaries, further contributing to the resistivity degradation [52]. From this qualitative considerations it is evident that the resistivity increase as a function of metal width scaling will be super-linear. This behaviour has been investigated in [8] and is reported in Figure 1.16, that actually confirms an extremely sharp increase below 40nm. Wire Resistance directly impacts the RC delay ( $\tau$ ), finally resulting into a performance



Figure 1.16 Copper wire resistance per unit length for decreasing metal pitch. Source: [8].

degradation. Another important metric that is challenged by extreme line-width scaling is Electromigration (EM) that is mainly caused by an increase of grain boundaries and by a decrease of the void volume that can cause a failure [49] [53]. The need to prevent wire delay and EM being the limiters for the enablement of future technologies, has motivated the search for novel materials to replace Copper. Cobalt (Co) and Ruthenium (Ru), are currently being examined as possible candidates to meet increased reliability and performance [54] [55]. The bulk resistivity of Co and Ru is actually higher than Cu, but unlike copper these interconnects can be integrated without the need for a barrier (hence the name "barrier-less" materials). Progressively scaling the metal width there will be a cross-over point beyond which these materials will exhibit a lower line resistance. These materials also outperform Cu with increased EM reliability and up to 4 orders of magnitude increase in median Time To Failure (TTF).

**Dielectric:** A simplified model of the parasitic capacitances in IC BEOL is shown in Figure 1.17, from [56]. In the cross section three layers are considered, and capacitances reported for the central metal of the intermediate layer. The coupling capacitance to adjacent wires ( $C_c$ ) will be linearly increasing with pitch scaling and linearly dependent by the dielectric constant ( $k$ ) of the Inter Metal Dielectric (IMD). The coupling capacitance to the top and bottom layers will be given by the the sum of the area capacitances ( $C_{xa}$ ) and fringe capacitances ( $C_{fx}$ ), where the area capacitance will be reduced by metal width scaling, thickness increase of the Inter Layer Dielectric (ILD), and  $k$  decrease. Since the 28nm node Low- $k$  dielectrics material, i.e materials with a dielectric constant lower than silicon dioxide (3.9) were started to be introduced since the 28nm node, targeting  $k$  values down to 2.5 [57]. Based on this overview we expect a sub-linear node over node increase of Capacitance.



Figure 1.17 Simplified capacitance model for IC interconnects.



Figure 1.18 Cross Node Mx RC scaling. Source: [9].

The normalized Resistance and Capacitance for the Mx layers published for the latest TSMC nodes are shown in Figure 1.18. The trends confirm the considerations above, with 3X node over node increase in resistance for the last two nodes and limited increase in capacitance.

### 1.2.4 Routability

Combining Figure 1.10 with the considerations in section 1.2.2 and 1.2.3, we can mark on the CPP-Mx plot the regions where a major electrical impact can be expected (Figure 1.19). If we start from our initial coordinate (56,40), and apply an 0.7 factor in both dimensions, we can easily calculate that, attempting to migrate to new nodes by exclusively reducing the pitches, requires targeting a (39,28) point for N5, and (27,19) in order to move to N3 dimensions. Since both these points enter very challenging regions, for all the motivations explained in the previous subsections, it is clear that we need to reduce the third parameter in Equation 1.2: the Track Height. Smaller footprint cells however, also imply a higher pin density and a reduced number of routing tracks, that pose to the Place and Route (P&R) tool,



Figure 1.19 Device Performance limiting CPP scaling and BEOL RCs limiting Mx scaling; regions where a severe electrical impact can be expected are highlighted in red.

pin accessibility and routing congestion challenges. Moreover, as a result of the reduced routing resources within the cell, a subset of standard cells might require to be enlarged horizontally, mitigating the benefits of track-height reduction. Taking into account both of these potential side effects requires multiple sets of standard cells to be designed, and their routability to be evaluated based on a realistic cell distribution. In other words, post P&R experiments are necessary in order to quantify how efficiently the pin density increase due to track-height reduction can be handled by state-of-the-art EDA tools.

### 1.2.5 Cost

Based on the technology and lithography assumptions imec developed a proprietary cost model aimed to evaluate the wafer manufacturing cost [10]. The non readiness of EUV, and the consequent adoption of multiple patterning for technology nodes below 28nm, caused an increase in the number of masks and process steps that led to the growing node over node wafer cost illustrated in Figure 1.20. The Die Cost is linked to the wafer cost according to Equation 1.4:

$$DieCost = \frac{WaferCost}{DiesPerWafer \cdot DieYield} \quad (1.4)$$

Growing yield issues started to be present for advanced nodes, but an accurate assessment of yield is hardly possible in a research environment, and actual data from the foundries are not accessible. Therefore when cost comparisons will be reported in this study they



Figure 1.20 Normalized wafer cost evolution across technology nodes. Source: [10]

will mainly refer to wafer cost. Besides wafer cost, other development costs have grown significantly in the last generations of technologies [58], as IC design cost, embedded software development, yield ramp-up. Collectively, these factors contribute to erode the economic gains of of Moore's law, that further justifies the need for a new approach to scaling as in subsection 1.1.2.

### 1.3 State of the art

Current state of the art already pointed out the importance of DTCO [37], however the focus is either on: a) generation of sub-10nm PDKs and their evaluation at standard cell and/or ring oscillator level or b) P&R analysis at design level using older or scaled technology nodes. A study at standard cell level comparing 7.5 and 6-Tracks cells for multiple technology nodes down to N5 was presented in [38]. Entire N7 predictive PDK with 7.5-Tracks standard cells for DTCO analysis is presented in [59]. Unified platform for power, performance, thermal, area and cost metrics optimization for  $\leq 7$ nm nodes has been shown in [60]. However, all the above works focus on standard cell level comparisons, without taking into account the actual PDN design and post P&R PPA metrics assessments on a real design. As for different device options, PPA comparison of FinFET and nanowire for 5nm process assumptions is presented in [61]. However, PPA results concern ring oscillator circuit only. At P&R level a comprehensive study was presented in [62] where different cell heights (8 and 12-Tracks) have been compared. However they are relative to today well known 28nm production technology. For more advanced nodes, PPA metrics for predictive 7nm FinFET is analyzed in [32]. However, standard cells are generated by applying scaling factors from the existing

45nm library, rather than designing the actual cells using effective process assumptions (only one standard cell height has been considered).

## 1.4 Objectives of this thesis

Starting from the proposed reference N7, we first want to individuate a feasible roadmap to maintain Moore's law for the next two nodes. Based on the targeted dimensions, the first objective is to assemble for each node all the components of a predictive Process Design Kit (PDK). The second goal is to set up a state-of-the art digital implementation flow, and use these PDKs for P&R experiments, sweeping patterning, device, standard cells and BEOL options, and benchmark PPA figures. This simulative activity, also called Design of Experiments (DoE), is aimed to provide important learnings on the best design and technology choices, help to individuate the main bottlenecks, and quantify the PPA and Cost impact at IP logic-block level. The third objective is to use feedback from the DoEs in order to contribute to the definition of the technology. The fourth aim is to contribute to the development of new features in the EDA tools, by submitting to their R&D teams, requests aimed to change existing features or deploy new ones, in case this is made necessary by the technology under test. The final goal is to contribute with a realistic place and route intent to patterning test vehicles to be manufactured with the purpose to identify the best process recipes, materials, and verify patterning assumptions and design rules. This flow is described in Figure 1.21. These ambitious goal will be pursued not only based on the information and data published in literature, but also leveraging the internal data and facilities available in imec. This work was entirely based on the collaboration framework between imec and Cadence Design Systems described in [40].

**Target N7+ dimensions:** Starting from the reference N7 node (56,40) we want to study the transition to a EUV-based N7+ node, keeping the pitches unchanged. This "half-node" is meant to provide a faster transition to HVM, reduce the wafer costs and additionally offer the PPA gains that will be shown in detail in Chapter 3.

**Target N5 dimensions:** Our predictive 5nm (or N5) technology targets (42,32) pitches. Within this node we want to show how the gains from pitch scaling can be "amplified" through the usage of design, technology and EDA solutions combined to achieve significant PPA benefits within the same node. One of the major goals in this node is to reduce the Track Height from 7.5 down to 6 and then to 5 tracks, the last one based on EUV patterning.



Figure 1.21 Contribution of P&R blocks to BEOL test-chips aimed to technology definition.

**Target N3 dimensions:** Finally, we want to explore a 3nm (or N3) scenario where we are forced by the device constraints (subsection 1.2.2), to keep the poly pitch unchanged, while aggressively scaling the tightest metal to a pitch of 21nm that currently requires SAQP patterning for the lines. Within this node we want to test two cell heights: 5.5 and 4.5 tracks.

It is clear that without DTCO and track-height reduction, moving from the N7 to N5 dimensions here proposed would only yield a 0.6 area shrink to be further reduced to 0.65 in the transition from N5 to N3.

## 1.5 Key contributions of this Thesis

To the best of our knowledge this work is the first attempt in showing a comprehensive set of post P&R Design of Experiments based on predictive PDKs for multiple sub-N7 nodes (N7+, N5 and N3), different patterning assumptions (193-i and EUV), different device options (FinFET, lateral nano-wires and nano-sheets), different configurations (1, 2 or 3 fins) and standard cell height (7.5-Track, 6-Tracks, 5-Tracks, 5.5-Tracks and 4.5-Tracks). Additionally, the generation of the results in this thesis is based on a novel experimental framework (Chapter 2) where technology definition, PDK generation and PPA assessment are tightly coupled, in such a way that the technology choices are driven by a systematic set of

Table 1.2 Comparative table highlighting key contributions of this thesis versus state of the art. (N7\* in [32] standard cells were scaled from 45nm layouts).

| previous works   | Nodes     | Devices                | Patterning | Track Height         | Post P&R | IR-aware |
|------------------|-----------|------------------------|------------|----------------------|----------|----------|
| [38]             | N5;N3     | FinFET;LNW             | 193i;EUV   | 7.5;6                | ✗        | ✗        |
| [59]             | N7        | FinFET                 | 193i;EUV   | 7.5                  | ✗        | ✗        |
| [60]             | N7        | FinFET;LNW             | 193i       | 7.5;6                | ✗        | ✗        |
| [61]             | N7;N5     | FinFET;LNW             | 193i       | 7.5;6                | ✗        | ✗        |
| [62]             | N28       | FinFET                 | 193i       | 12;8                 | ✓        | ✗        |
| [32]             | N7*       | FinFET                 | 193i       | n/a                  | ✓        | ✗        |
| [63]             | N14       | FinFET                 | 193i       | 12                   | ✓        | ✓        |
| <b>This work</b> | N7+;N5;N3 | FinFET;LNW<br>NSH;CFET | 193i;EUV   | 7.5;6;5.5<br>4.5;4;3 | ✓        | ✓        |

DoEs performed at P&R level. A key enabler of this framework is the parallel enhancement and debug of the EDA tools, that need to be capable of supporting the newly introduced technology features. In the context of this work more than 30 bug-fixes or enhancements were enabled through the collaboration with Cadence R&D. The key results of the thesis are reported in Chapters 3, 4 and 5. Table 1.2 highlights the deltas between the state of the art (as discussed in Section 1.3) and this work, indicating its novelty and major contributions. In [38], [59], [60], [61] sub N7 nodes are investigated, but only a transistor level or standard-cells analysis is performed. On the other hand in [62], [32] and [63] a post P&R evaluation is present, but the technologies considered are either older than N7, or obtained by applying scaling factors to pre-existing technologies ([32]). Except for [63] none of the other works takes into account chip level IR-drop. Moreover, none of the other works considers standard cells with less than 6-Tracks. Finally, in none of the other works a post P&R assessment with non FinFET devices and EUV patterning was found. On top of the simulative work, this project enabled the first patterning test-chips at 5nm [64] and 3nm [65] dimensions. The results derived from these test-chips were channelled by imec to the top foundries and fabless, thus contributing to the industry path-finding towards the next nodes.

## 1.6 Organization of the manuscript

The rest of the thesis is organized as follows: In Chapter 2 we illustrate how the predictive PDKs used for the Design of Experiments were assembled and a general overview of different types of experiments that can be performed is given. Chapter 3 is devoted to analyze the

deltas between 193-i and EUV patterning. The value proposition of EUV for N7+ and beyond nodes will be clarified, and the gains of an EUV-based N7+ versus the reference 193-i multiple patterning N7 are quantitatively shown. Chapter 4 focuses on the enablement of predictive N5, and demonstrates how a combined usage of "scaling boosters" can allow to mimic the PPA benefits of a new node keeping ground rules unchanged. Track height reduction from 7.5 to 6-Tracks will be shown and an option to efficiently enable a transition to 5-Tracks will be also illustrated. In Chapter 5 the results based on our predictive N3 node will be shown and the PPA benchmarking between N5 and N3 will also be analysed. In the same chapter, physical results based on standard cell libraries for a new device named Complementary FET (CFET) will additionally be presented. In Chapter 6 the idea to move from DTCO to a System Technology Co-Optimization (STCO) based on full System On Chip (SoC) assessments, will be illustrated and justified through some examples. The general conclusions of the work will then be summarized in Chapter 7.

# Chapter 2

## Experimental framework for advanced nodes

The complete DTCO flow that was set up is schematically represented in Figure 2.1. This flow is subdivided into two major parts:

- Generation of the files composing the PDK
- EDA flow used for design benchmarking

What we define as post P&R DTCO is essentially a loop between these two sides, driven by the post P&R feedback. Section 2.1 will show how each of the PDK components is generated based on technology assumptions, while Section 2.2 will describe the EDA flow used for benchmarking. Section 2.3 will provide a comprehensive list of examples on how to utilize this flow in order to perform different typologies of DoEs, aimed to enable the path-finding across the complex DTCO space.

### 2.1 Assembling the Predictive PDK

The PDK generation starts from the technology data and assumptions available in imec. This work will not cover the device modelling part which has been extensively described in previous works [19] [66] [67] and is outside the scope of this thesis. The subsections below will address the rest of the files. A few examples will be given in Appendix A.

#### 2.1.1 Producing the techlef from the design rules

The first step in order to enable the P&R engine to perform a Design Rule Check (DRC) is to collect and list the design rules imposed by the lithography and Design For Manufacturing



Figure 2.1 DTCO Flow for imec Design of Experiments.



Figure 2.2 Steps from technology constraints to *tech.lef* generation

(DFM) constraints, in a Design Rule Manual (DRM). These rules are typically illustrated in the DRM from the foundry, both in a graphical and textual form (e.g. Figure 3.4). For the sake of making them readable by implementation tools, it is then necessary to encode these rules into a syntax which is EDA vendor specific. With respect to Cadence Design Systems, this syntax is explained in the document in [68]. For this work it was fundamental to use the latest version of this document that incorporates all the syntax extensions for advanced nodes. Based on this manual it is then possible to translate the DRM into a file called technology *.lef* or *tech.lef* that is parsed during the initialization step and used throughout the flow to check DRC violations. An example of how to translate basic DRM rules into techLEF syntax is given in Appendix A.1. The conceptual steps leading from the technology constraints to the *tech.lef* are shown in Figure 2.2.



Figure 2.3 Cross sections: FEOL/MOL and first layers of BEOL for imec N7.

### 2.1.2 Standard cell design

Designing standard cells for 7nm and below required the introduction of a local interconnect or Middle Of Line (MOL) scheme [69]. The Metal 0 or MINT layer was introduced in imec N7 technology in order to facilitate the internal routing of the standard cells and provide better connectivity between the Front End of Line (FEOL) and actual Back End of Line (BEOL) routing layers (i.e. Metal 1 and above). In Figure 2.3, the cross section from the gate level to Metal 2 clarifies the scheme adopted. The gate runs orthogonally with respect to the fins and is separated by a spacer from the M0A active contact. M0G is used to offset the gate laterally in order to guarantee a gridded alignment of the VINT via, that connects both M0A and M0G to the MINT layer. MINT allows to perform the greatest part of the cell internal routing that is completed on M1 and for particularly complex cells through connections up to M2. Of course the layers of the standard cells that will be exposed to the router, need to comply with the same rule set of the DRM and *tech.lef*. In order to use the standard cell library in the digital implementation flow a representation at a higher-level of abstraction of the layout view needs to be created. This representation, called abstract, contains information about the type and size of cells, position of pins or terminals, and the overall size of blockages. The abstracts are used in place of full layouts to improve the performance of P&R tools. For signoff the abstracts are replaced back with the actual layouts. The Abstract Generator User Guide in [70], explains how to convert the actual binary GDS file from Virtuoso, to an abstract in library exchange file (.lef) format [68]. This ASCII file is used in the digital implementation flow to parse the physical information on the standard cells needed in basically every P&R engine starting from the area optimization during synthesis. In this work, the layers in the abstract start from MINT unless otherwise specified. An example of GDS and .lef files is given in Appendix A.

### 2.1.3 Library characterization

Based on the Parastic Extracted (PEX) netlists of the standard cells, library characterization creates the electrical (power and timing) views associated to each cell, by running transistor-level analog (SPICE) simulations. The result of the characterization is a file in liberty format or *.lib* file, that is used by delay calculation and power calculation engines in the implementation flow in order to achieve SPICE comparable accuracy while dramatically decreasing runtime. A description of the syntax and attributes of the liberty format is provided in [71]. Until the 90nm node, the most widely used delay model was the one named NLDM (Non-Linear Delay Model). In this model the delay is considered to be a non-linear function of two independent variables, being the input transition time and the output load capacitance. This dependency is captured by look-up tables where for each combination of the two variables the delay (or other electrical parameters) is reported, allowing the timing engines to interpolate between the characterized values. As the feature size shrinks, the effect of interconnect resistance can result in large inaccuracy as the waveforms become highly non-linear. Various modelling approaches provide additional accuracy for the cell output drivers. Broadly, these approaches obtain higher accuracy by modelling the output stage of the driver by an equivalent current source. Examples of these approaches are CCS (Composite Current Source), or ECSM (Effective Current Source Model). For example, the CCS timing models provide the additional accuracy for modeling cell output drivers by using a time-varying and voltage-dependent current source. The timing information is provided by specifying detailed models for the receiver pin capacitance and output charging currents under different scenarios [72]. The details of the current based delay model are described in [73]. Examples for NLDM, ECSM and CCS formats are given for a standard cell in Appendix A.

### 2.1.4 BEOL stack choice and modelling

In modern process nodes metal pitches, widths and thicknesses gradually increase from lower to upper layers of the BEOL [32] as schematically shown in Figure 2.4 and documented by Table 2.1. In the cross-section in Figure 2.4, it is possible to individuate the main elements of the BEOL stack: the conductors, the vias, the  $IMD_i$  dielectrics and the  $ILD_i$  dielectrics. Unless otherwise specified, we assume in this work a unitary width-spacing ratio. Therefore the metal (and via) width can be easily derived from the pitch. The pitches for the reference N7 can be grouped into three categories called  $M_X$ ,  $M_Y$  and  $M_Z$ .  $M_X$  layers are the tightest and most expensive, and are utilized to optimally connect to the small geometries of the cells, and resolve congestion issues (e.g. MINT to M3).  $M_Y$  (e.g. M4,M5) and  $M_Z$  (e.g. M6 to

Table 2.1 Thickness and dielectric constant for reference N7

| layer |     | thickness<br>[nm] | k    | Pitch |
|-------|-----|-------------------|------|-------|
| $M_Z$ | ILD | 62                | 2.55 | 80    |
|       | IMD | 72                | 2.55 |       |
| $M_Y$ | ILD | 38                | 2.55 | 48    |
|       | IMD | 48                | 2.55 |       |
| $M_X$ | ILD | 17                | 2.8  | 40    |
|       | IMD | 24                | 2.8  |       |

$M_9$ ) layers offer lower parasitics due to increased width, thickness and pitches, and should be utilized as early as possible, especially for longer interconnects. In the experiments in Chapter 3,4 and 5, the modification in the stack were mainly done in the  $M_X$  layers, that require the most advanced patterning and are for this reason the most interesting from an advanced nodes perspective. For the  $M_X$  layers a pitch of 40nm and SADP patterning is initially assumed in N7. The  $M_Y$  pitch was set to 48nm, while the  $M_Z$  pitch is further relaxed to 80nm, that is printable with single LE. In the layers using spacer assisted patterning techniques (SADP/QP), a 1D routing [74] is required and even for the  $M_Z$  layers the router mainly uses a preferred direction. The preferred directions are "interleaved" following a Horizontal-Vertical-Horizontal sequence starting from MINT, that is horizontal. Table ?? shows the thickness and dielectric constant for the metal and via layers. For the conductor and vias, the resistivity was calculated with the methodology shown in [75]. Cadence extraction tools for below 32nm technologies currently need a binary file called *qrcTechFile*, that is generated by a one-time process characterization step [76]. The characterization is based on an input process description file in *.ict* format, that is an ASCII file containing geometrical information, along with resistivity and dielectric properties for each layer. During the characterization step, a 3D solver extracts the parasitics of a great number of layout patterns, creating look-up tables that are interpolated by the extraction engines used during implementation, based on the actual layout. An example of how to model a conductor, a dielectric and a via layer is supplied in Appendix A.

## 2.2 Design flow

The digital implementation flow used in this work is entirely based on Cadence tools. The trend witnessed in EDA in recent years, especially for advanced nodes, is to adopt more and more a "monolithic" flow (i.e. all the tools from one vendor). In fact, sharing the same engines can improve predictability and correlation across the different implementation steps, which



Figure 2.4 BEOL Cross section (dimensions are not in scale).

|              |                       | 0                         | 1                        | 2            |
|--------------|-----------------------|---------------------------|--------------------------|--------------|
| Tool Name    | innovus               | innovus                   | innovus                  | innovus      |
| Tool Version | 16.10-d132_1          | 16.10-d132_1              | 16.10-d132_1             | 16.10-d132_1 |
| Machine      | sjfsb452              | sjfsb241                  | sjfsb241                 | sjfib284     |
| Machine Load | 4.45                  | 0.36                      | 1.31                     |              |
| User         | jasond                | jasond                    | jasond                   | jasond       |
| Template     | Type                  | block                     | block                    | block        |
|              | Features              | dft_simple synth_physical | dft_simple synth_spatial | dft_simple   |
| Setup Timing | WNS (ns)              | 1.301                     | 1.295                    | 1.152        |
|              | TNS (ns)              | 0.000                     | 0.000                    | 0.000        |
|              | FEPS                  | 0                         | 0                        | 0            |
|              | WNS Reg2Reg (ns)      | 1.301                     | 1.295                    | 1.152        |
|              | TNS Reg2Reg (ns)      | 0.000                     | 0.000                    | 0.000        |
|              | FEPS Reg2Reg          | 0                         | 0                        | 0            |
| Hold Timing  | WNS (ns)              | 0.003                     | 0.008                    | 0.047        |
|              | TNS (ns)              | 0.000                     | 0.000                    | 0.000        |
|              | FEPS                  | 0                         | 0                        | 0            |
|              | WNS Reg2Reg (ns)      | 0.054                     | 0.045                    | 0.047        |
|              | TNS Reg2Reg (ns)      | 0.000                     | 0.000                    | 0.000        |
|              | FEPS Reg2Reg          |                           |                          |              |
| DRV          | Tran                  | 7                         | 8                        | 7            |
|              | Cap                   | 0                         | 0                        | 0            |
|              | Fanout                | 0                         | 0                        | 0            |
| Physical     | Density (%)           | 91.1                      | 90.7                     | 90.7         |
|              | #Instances            | 10,065                    | 10,029                   | 9,901        |
|              | #Buffers              | 1,010                     | 1,111                    | 669          |
|              | #Inverters            | 492                       | 502                      | 508          |
|              | #Sequentials          | 498                       | 498                      | 498          |
|              | Total Area (μm²)      | 42,593.59                 | 42,544.20                | 42,547.37    |
|              | Buffer Area (μm²)     | 1,178.18                  | 1,284.54                 | 804.91       |
|              | Inverter Area (μm²)   | 313.64                    | 325.81                   | 328.46       |
|              | Sequential Area (μm²) | 2,557.98                  | 2,534.87                 | 2,559.56     |

Figure 2.5 Comparing metrics for *.html* reports of different implementation runs: header indicates the run ID; column in yellow is considered as the "golden" run, and the values for run "0" and run "1" are marked in red or green depending whether they are worse or better compared to the golden reference.

allows to decrease the number of cycles across the flow, saving Turn Around Time (TAT). Secondly, an increased correlation and accuracy helps to safely reduce the margins, resulting in increased degrees of freedom for the optimization algorithms to improve PPA. Besides the monolithic flow, another major trend that is contributing to increase correlation is moving the "physical awareness" early in the flow, for example through physical synthesis as described in subsection 2.2.2. The physical effects that are starting to be propagated up to synthesis level include: congestion, actual wireload, realistic clock, IR-drop, electromigration.

Another advantage of the "RTL-to-GDS" flow is the usage of common metrics and commands across the flow [77]. Cadence also provides a functionality to dump all the relevant PPA metrics from each of the implementation steps into an *.html* report. The capability to compare the metrics of different runs as shown in Figure 2.5, was very useful

for our benchmarking purposes. Examples of how to produce and compare these reports are reported in Appendix B.

### 2.2.1 Choice of reference design

As all the files of the PDKs were being produced and progressively updated in the context of a research environment, a pipe-cleaning phase using a small core was used, in order to validate the flow and debug the files with fast TAT. The core used for this preliminary stage was the ARM M0 core [11] that counts approximately 10K gates. This core is not large enough to fully utilize the metal stack described in subsection 2.1.4, and the design complexity was not considered sufficient for a relevant PPA benchmarking. For this reason in the experiments shown from Chapter 3 to Chapter 5, more complex designs were adopted as in Figure 2.6: the LDPC core (50K gates) from [12] and an ARM 64-bit CPU (500K gates). It is clear that the TAT will be dependent by multiple factors, including area and performance targets, congestion, number of CPUs used etc., but it will be primarily affected by the design size. Moving to larger and more complex cores will guarantee more relevant conclusions, while demanding for increased runtime. In our case the runtime range goes from tens of Minutes for the ARM M0, up to ten or more hours for the 64-Bit CPU. The timing constraints for the design are normally specified using an industry standard named Synopsys Design Constraints (SDC) format. A guide to this format and on how to code an *.sdc* file can be found in [78]. In the case of the 64-bit CPU the *.sdc* was kindly provided by ARM along with the RTL.

### 2.2.2 Synthesis

Logical synthesis is the process through which an EDA program called synthesis tool (in our case Cadence® Genus™ Synthesis Solution) maps a Register Transfer Level (RTL) description of a digital circuit into standard cells belonging to a specific technology library. Synthesis is composed of the following major steps:

- **generic synthesis:** converts the RTL to generic elements, based on technology-independent criteria.
- **technology mapping:** maps the design to the technology library, optimizing PPA.
- **incremental optimization:** starts from the mapped design and performs a final incremental optimization.



Figure 2.6 Designs used in this work. ARM M0 [11] for pipe-cleaning, LDPC [12] and ARM-64 bit CPU for benchmarking purposes. (figures are in scale)

As previously mentioned, due the increased importance of taking into account the "physical awareness" early in the flow, physical synthesis tools were created and became more important for advanced nodes. The input and output files for logical and physical synthesis are shown in Figure 2.7, where the additional files for the physical flow are marked. In the physical flow the incremental optimization step is interleaved with a legalized placement, followed by a final optimization that is able to take into account the actual wirelength and wireload, and the relative position of the cells deriving from the actual placement. The interleaved placement is obtained by directly invoking the P&R tool, that on top of the input files normally used by logical synthesis (*.lef*, *.lib*, *.sdc*, RTL), also necessitates the *tech.lef*, the *qrcTechFile*, and the *.def* file [68], that describes the floorplan to be used for the placement. In terms of outputs, logical synthesis produces the mapped *.sdc* and optimized netlist, while the physical flow additionally generates the placed Database (DB). These outputs become inputs for the Place and Route. If the logical flow was used, the P&R will start from initialization, followed by a full placement. In case a physical synthesis was done, the placed DB will be imported in the implementation tool and only and incremental placement will be needed, to continue to the subsequent steps. An example script highlighting the deltas between the logical and physical synthesis is given in Appendix B.



Figure 2.7 Inputs and outputs files for Logical and Physical Synthesis.

### 2.2.3 Place and Route

The P&R is the automated process of producing the final IC-layout, starting from the gate level netlist coming from synthesis. In our case the tool used for the implementation was Cadence® Innovus™ Implementation System. The steps constituting a state-of-the-art flow are indicated by the flowchart in Figure 2.8. Although the flow is highly automated, every step in Figure 2.8 requires design and technology dependent user directives. The process of iterating through the settings for the different steps in order to achieve design closure is commonly referred to as physical design. After parsing the input files, the design is initialized. The core and chip size, the track patterns, the aspect ratio of the design, and the relative positions of the hard macros are defined during the Floorplanning. In our case, as logic blocks without macros were used, a simple floorplan with unitary aspect ratio was selected, unless otherwise specified. The next step is to create the Power Delivery Network (PDN), that delivers the power (VDD) and ground (VSS) nets from the upper layers to the standard cells. As extensively shown in Chapter 4 and 5 the PDN has a central role in advanced nodes. Once the floorplanning and power planning have been completed, the standard cells are placed and PPA is concurrently optimized. After each implementation step a timing check is performed, requiring changes upstream in case timing is not met. During placement the clock is considered ideal, meaning that zero delay is assumed through the clock network. The step of Clock Tree Synthesis (CTS) uses a detail routing engine (i.e. fully honoring the design rules) in order to route the clock tree, concurrently optimizing timing. A separate hold optimization step is also needed after CTS. Until this point, the signals (except for



Figure 2.8 Implementation flow in Innovus.

the clock) are routed by fast engines not honoring the design rules, that are used to have an early estimate of congestion and wireload, before the actual legal routing is available. The Routing step is used to connect the signals avoiding DRC violations. Once routing is finished a post-route optimization is needed to fix setup and hold timing violations. A final signoff step is normally required, with signoff accuracy engines for physical verification, extraction, Static Timing Analysis (STA) and power integrity analysis. In our case, as we are mostly utilized this flow for benchmarking purposes, we normally ended the assessment at post-route stage, as we are more interested in the relative comparison between the figures than in signoff accuracy. Indications on how to set up the Cadence Foundation Flow are provided in Appendix B.

#### 2.2.4 Power Integrity

Power integrity is an analysis aimed to check whether the current densities and the degradation of nominal VDD and VSS are within the limits requested by a given technology and design. Power Integrity can be checked throughout the entire design flow to identify IR-drop and EM issues. The tool that was used in this work to perform IR-drop analysis is Cadence® Voltus™ IC Power Integrity Solution and the power integrity check was typically done at post-route

step. In addition to the electrical and physical libraries used in P&R, a prerequisite for power rail analysis is a Power Grid View (PGV) library, generated as explained in [79]. These views, model each standard cell and macro through a distributed RC network, and current sources. Power integrity can be analysed in two ways: statically and dynamically. Static analysis considers the average currents from the PGV library and calculates an average IR-drop. Dynamic rail analysis uses the current waveforms and is able to calculate the peak IR-drop of the transient. The dynamic currents can be derived by the activity vectors produced by digital simulation or through a vector-less approach that statistically estimates the toggle rates. In this work both static and dynamic vector-less assessments were performed. The overall flow is then composed by the following steps:

- Initial one-time step PGV characterization.
- Power calculation: static power and currents for static-IR and dynamic power and currents for dynamic-IR.
- Power grid extraction.
- Calculation of currents and voltages in the power mesh.

A useful what-if functionality is available [79], in order to test the impact of changing capacitance, resistance and current density values without re-running the full flow. In Appendix B we will give an example of how to use this functionality to quickly assess the effect of different BEOL assumptions.

## 2.3 DoE Design Methodology

After having set up the flow as described in Sections 2.2 and 2.3, we can apply it in order to perform different types of DoEs. The following subsections ( 2.3.2 to 2.3.7) will categorize the DoEs performed into six major types giving a general description for each category.

### 2.3.1 The DTCO space

What we refer to as DTCO space, is the complex and multidimensional set of variables from technology and design side, that can be changed or swept, in order to individuate the relative deltas between the new option(s) and a given reference point. This concept is illustrated in Figure 2.9: multiple Patterning, BEOL, device and standard cells options can be combined. The resulting combination can be implemented using different physical design strategies and enhanced EDA tool versions, often resulting in relative deltas in the Figures of Merit



Figure 2.9 Multidimensional DTCO space to be explored through DoEs.

(FoM), comparable to the ones induced by the technology changes. In order to dominate this overwhelming complexity we need to adopt a "divide et impera" approach, or in other words we need to perform separate classes of DoEs where we keep part of the DTCO space constant and we sweep only those variables that are relevant to extract a specific learning from a given experiment.

### 2.3.2 DoE examples with rules or patterning options

One class of experiments that can be performed, involves comparison between different patterning options, or within the same patterning option we might want to try and isolate the design rules that are more critical for routability and perform what-if analyses aimed to quantify the improvements allowed by relaxing critical rules. This practice can be a useful feed-forward to the technologists. As we want to isolate the impact of patterning and design rule change, these experiments will have to be done with the same macro *.lef* and comparing different *tech.lef* versions, that will have to be compatible with the standard cell library. Since these experiments are mainly aimed to extract a "physical" learning, looking at the Flow in Figure 2.1, we can think to bypass a re-spin of synthesis for the different *tech.lef* options, and just perform separate P&R runs for each *tech.lef*. Additionally we deactivate the timing features in order to leverage improved runtime. Examples of physical metrics that we might want to benchmark are:

- Routability and Chip area.
- Wire-length distribution.
- Via Distribution.
- Congestion.

While many of these metrics can be reported quantified and benchmarked using standard commands available from the tool, it is hard to find a straightforward way to quantify routability. The methodology that was followed in this work, in order to individuate the routability limit, was to sweep placement density ( $PD$ ) with a resolution of 2.5% and check the number of DRCs (rules violations) as in Figure 2.10. A 2.5% step was chosen as a reasonable compromise between accuracy and number of runs. The typical behaviour that was observed is that the order of magnitude of the number of violations abruptly changes beyond a certain value of the Placement Density ( $PD$ ), facilitating an objective individuation of the routability limit. The maximum Placement Density ( $PD_{max}$ ) combined with the standard cells distribution from P&R, determines chip area according to the relationship in Equation 2.1:

$$ChipArea = \frac{1}{PD_{max}} \sum_i Count_i \cdot CellArea_i \quad (2.1)$$

The total standard cells area in Equation 2.1 is given by the linear combination between the standard cell area of each cell type ( $CellArea_i$ ) and its instance count ( $Count_i$ ) in the actual cell distribution. This methodology allows to have a fair comparison without targeting zero DRCs, that would require intensive manual or semi-manual fixes for all the runs, resulting in limited gains in terms of learning. Examples for this category of DoEs are the ones in subsections 3.2.2 and 4.3.2.

### 2.3.3 DoE example with standard cells

In this type of experiments we primarily want to modify the standard cell architecture, based for example on a reduced track-height and/or the usage of innovative technology solutions acting as "scaling boosters". For this class of experiments we should then test multiple versions of the macro *.lef*. As track-height reduction and other scaling boosters can significantly impact the cell geometry and affect the electrical properties of the standard cells, a new cell architecture typically implies a re-characterization of the libraries that will determine new *.lib* files to be associated to each *.lef*. In order to decouple the two evaluations, a first physical comparison can be done using the different *.lef* versions, with the same methodology explained in subsection 2.3.2. The second step of the DoE is to perform a frequency sweep aimed to benchmark the electrical metrics. This can be achieved as shown in Figure 2.11, by changing the target frequency in the *.sdc* file and re-spinning the whole synthesis and P&R loop for each library and for each frequency step ( $F_i + \Delta F$ ). Since a more challenging frequency target causes an increase of buffers and inverters insertion to achieve timing closure, the initial target density will be progressively decreased by a factor  $\Delta PD$



Figure 2.10 Typical behaviour of Design Rule Violations (DRCs) as a function of target placement density.

at each iteration, allowing more area to be allocated for buffering. The experiment will be terminated when timing will be no longer met for any of the libraries. A list of interesting electrical metrics is provided below:

- **Power:**
  - **Total power:** sum of leakage, dynamic and internal power.
  - **Leakage power:** power consumed in off state.
  - **Dynamic power:** power consumed charging and discharging the output loads.
  - **Internal power:** power consumed inside the cells during the transitions.
- **Timing:**
  - **Worst Negative Slack (WNS):** difference between the period and the delay of the most critical path.
  - **Total Negative Slack (TNS):** sum of all negative slacks in the design for a given *.sdc*.
  - **Faling Paths:** number of paths that don't meet the timing constraints.
- **RC :**



Figure 2.11 Frequency sweep flowchart for different standard cells libraries.

- **Pin Capacitance:** capacitance associated to the Pins.
- **Wire Capacitance:** capacitance deriving from the wires.
- **Wire Resistance:** resistance of the wires.
- **Power Integrity :**
  - **IR-drop:** difference between Nominal and actual voltage.
  - **Power Density:** total power divided by core area.
  - **Instance Voltage:** actual voltage on VDD and VSS pins of the instances.
  - **Current Density:** current divided by the cross-section area of the wires.

For the sake of enabling the benchmarking in a reasonable turn around time, the algorithm in Figure 2.11 will not target zero WNS and TNS. Otherwise, even a negligible number of timing violations would have to be manually fixed for every run, which would require great effort and add limited value to the benchmarking. Instead, some conventional limits can be defined to determine whether the sweep should continue or not. A possible criterion could be WNS reaching a certain percentage of the clock period (e.g. 10%) and/or the number

of failing paths going beyond a certain percentage of the total timing paths (e.g. 5%). The experiments in subsections 4.1.2, 4.3.2, 5.2.4 and 5.3.2, provide an example of the physical benchmarking with different cell architecture. Examples of electrical benchmarking through frequency sweep can be found in subsections 4.1.3, 4.3.3 and 5.2.5. This typology of experiments can be overall considered a step-up extensions to the power and performance experiments done at Ring Oscillator (RO) level, described in detail in [19].

### 2.3.4 DoE example with device options

In some experiments we might want to keep the ground rules unchanged and test different device architectures and device options while keeping the standard cells footprint unchanged. In this scenario a re-characterization is needed for each device option, resulting in a different versions of the *.lib* file and a single macro *.lef*. For this DoE we explored three different methods of benchmarking. These methods are complementary and performing them in sequence can allow to maximize the learning from the experiment. The first step, before even proceeding to a P&R evaluation, is to use an utility named Libscore [80]. As shown in [81], this utility can be used to produce a library-level comparison of metrics like transition times, delay, internal and leakage power, drive strength. The next step is swapping the different *lib* files in the same post-route DB, optimized for the reference scenario, and re-run STA for each testcase. This method can be good to isolate the effect of the device in the context of an IP block, but it neglects the impact of a different optimization through the implementation flow. The third way, which is the most complete and runtime intensive, is the re-spin of the synthesis and P&R runs as in Figure 2.11, for each *lib*. Examples of these DoE type can be found in subsections 4.4, 4.3.3 and 5.2.5.

### 2.3.5 DoE example with BEOL options

In other experiments we might want to test different options in the BEOL. The first scenario is a change in the RC and dielectric properties of the metal stack. These changes could affect all the layers, or just a part of the stack ( $M_X$ ,  $M_Y$  or  $M_Z$  layers), while keeping the number of layers and pitches constant. In order to perform this DoE we need to code a new process description file (e.g. *.ict*) for each R,C and *k* configuration, and re-run the process characterization to produce the corresponding RC techfiles (e.g. *qrcTechFile*). In case we want to isolate the impact deriving exclusively from the technology, we can swap the different RC techfiles in the same post-route DB, and re-run extraction, STA and power calculation. This approach isolates the impact of the technology, neglecting the deltas deriving from a different optimization. In order to take that into account we can re-run the synthesis and P&R

loop as in Figure 2.11. Another experiment related to the BEOL is adding/removing layers in the metal stack, mainly to test routability versus cost for the different options. If from the congestion analysis we detect an overflow of routing resources on specific layers, this type of experiment could be required. In this case the *tech.lef* file should be changed, and routability tests as in Subsection 2.3.2 should be performed. The experiments in subsections 3.3.1 and 4.2.2 belong to this category of DoE.

### 2.3.6 DoE example with physical design options

With respect to the DoE types presented in the previous subsections, it is important to try and perform them with the same setup and without radically changing the physical design options. For example we should use the same Floorplan, Power Delivery Network (PDN), placement directives, effort level of the optimization engines etc. This is functional not to skew the results of the benchmarking with too different setups. If instead we specifically want to test the impact of a modification in the physical design options, a separate DoE can be performed, keeping the PDK components unchanged. Similar experiments can be found in subsections 4.1.2 and 4.5.1.

### 2.3.7 DoE example with different EDA tool versions

In the context of a path-finding phase towards a new technology, it is possible that a new feature, demanded by technology or design requirements, will not be supported by the existing versions of the EDA tools. In such a scenario enhancements and/or debug are needed and it is therefore necessary to communicate with the EDA company, establishing a time-line for the expected integration of the code changes into a "beta" or production release of the tool. In this situation the DoE simply consists in running exactly the same experiment with two different tool versions. Through these type of runs, we can either detect a regression of QoR, prove improved results or confirm enhanced capabilities. We witnessed that changing the EDA tool version can have a significant impact on the PPA metrics, or can be the key enabler of a certain DTCO configuration. It is therefore important to specify the tool version, and keep it unchanged for all the other types of DoEs. During this work the version of Cadence® Innovus™ Implementation System was progressively upgraded from version 15.2 to version 18.2, and approximately 50 change requests were filed to Cadence R&D, contributing to the improvement of the tool in the new challenges encountered.

## 2.4 Summary and conclusions

In this chapter we proposed a novel DTCO approach based on post P&R Design of Experiments, enabling design and technology path-finding. The two pillars of this process are the generation of a predictive PDK, and the PPA benchmarking through a state of the art digital implementation flow. It was shown how to build up these two parts, illustrating the basic components of a predictive PDK and the main implementation steps. More importantly it was explained how to practically use this platform in order to perform different classes of DoEs, aimed to extract specific learnings, and drive the decision making across the complex DTCO space. In this space, the patterning, standard cells, device and BEOL options are deeply interrelated, and are also coupled with physical design choices and EDA advancements. The methodology shown in this chapter tries to dominate this complexity decoupling all this aspects into separate DoE types, that were used to enable the generation of the results in the subsequent chapters of the thesis.



# Chapter 3

## EUV and the enablement of N7+ and below

In Section 3.1, we provide an introduction to EUV lithography and illustrate its current status. In Section 3.2 we analyse the results of a Post P&R comparison between the reference N7 and an EUV-enabled N7 node with the same pitches, defined as N7+. Finally, we will show in Section 3.3 a BEOL variability study focused on Line-Edge Roughness, considered to be one of the key challenges for EUV adoption.

### 3.1 Introduction to EUV lithography

In this section we will initially cover the basics of optical lithography, showing the fundamental motivations driving the transition to EUV. Then we will describe in more detail the peculiarities of EUV lithography, also reviewing the major challenges that its adoption is posing. Based on the current status of scaling we will then draft a roadmap for the introduction of EUV in the next nodes.

#### 3.1.1 Motivations

The basic elements of an optical lithography system are shown in Figure 3.1: the light source which emits light with a certain wavelength, a first lens that collects the light towards the reticle, and a second lens that captures a certain number of diffracted orders, and projects the light towards the resist, which is coated on top of the wafer.

The purpose of photo-lithography advancement is of course to make Moore's law sustainable, by allowing a progressive reduction of the minimum feature size, that in lithography-related literature is often reported as Half Pitch (HP) [13] (i.e. the half of the pitch normally



Figure 3.1 Basic elements of an optical lithography system.

considered in this work). The set of Equations 3.1-3.5 is considered, in order to provide a qualitative explanation of which are the main knobs enabling the reduction the minimum feature size. Let's now evaluate the interdependencies between all these variables.

$$HP = K_1 \cdot \frac{\lambda}{NA} \quad (3.1)$$

$$DoF = K_2 \cdot \frac{\lambda}{NA^2} \quad (3.2)$$

$$NA = n \cdot \sin \theta \quad (3.3)$$

$$Power = h \cdot \frac{c}{\lambda} \cdot \frac{1}{\Delta t} \quad (3.4)$$

$$Dose = \frac{Power}{Area} \cdot \Delta t \quad (3.5)$$

One straightforward way of reducing the HP, is by decreasing Rayleigh factor  $K_1$  in Equation 3.1, which encapsulates the dependencies by process-related factors, for example resist quality and Optical Proximity Correction (OPC). As we see from the plot in Figure 3.2,



Figure 3.2 Evolution of Rayleigh factor across technology nodes. Source: [13].

this factor has been decreasing remarkably over the last technology nodes, helping to improve resolution. Unfortunately multiple patterning was needed in order to further decrease  $K_1$  beyond 0.3.

Another knob to reduce the feature size would be increasing the Numerical Aperture (NA). The Numerical Aperture quantifies the amount of light that can be captured by the lens and is given by the refractivity index ( $n$ ) of the medium, multiplied by the half-angle ( $\theta$ ) of the cone of light that can enter the lens (Equation 3.3). We can then increase this number either by enlarging the lenses, that beyond a certain size will be practically infeasible, or somehow managing to insert a more refractive medium between the lens and the resist. The second option is actually what was done by ASML in the transition to 193i lithography, currently the highest resolution lithography available for HVM. “193” indicates the wavelength of the light (193nm) and “i” stands for immersion, which means that a thin layer of water is created and maintained between the lens and the resist contributing to increase resolution [82]. Unfortunately, as shown in Equation 3.2, increasing NA has also a negative impact on the Depth of Focus ( $DoF$ ), which is an index of how much the exposure system can tolerate offsets in the direction orthogonal to the surface of the resist. A reduced  $DoF$  restricts the margins for thickness variations of the resist, that translates into more aggressive Chemical Mechanical Polishing (CMP) specifications [83]. In summary also NA cannot be indefinitely improved and anyhow trade-off with  $DoF$  needs to be considered. The third and most effective way to reduce the half-pitch is by decreasing the wavelength  $\lambda$ . In the past, the wavelength of the source has moved from 365nm to 248nm, and then to 193nm (Figure 1.11). Starting from 2010, industry worked around the non-readiness of EUV using 193i in conjunction with multiple patterning. From Figure 1.11 it is clear that the resulting scenario



Figure 3.3 Position of EUV wavelengths in the electromagnetic spectrum.

is a “deep sub-wavelength” regime that determined an increasing gap between the minimum feature size and the wavelength. In EUV this gap is bridged by reducing the wavelength of more than one order of magnitude: from 193nm to 13nm. Equation 3.2 shows that reducing  $\lambda$  also adversely affects *DoF*, but unlike for *NA* the dependency is linear rather than quadratic. Additionally, reducing the wavelength has the effect of increasing the energy (and power) of the electromagnetic radiation as described by Planck’s equation (Equation 3.4). As in Equation 3.5, an increased power makes possible to deliver the same energy per area, that is called "Dose", required to develop the resist, with a lower exposure time, or in other words to increase the throughput of the manufacturing.

### 3.1.2 Overview of an EUV system

Figure 3.3 highlights the position of EUV within the electromagnetic spectrum. It is clear that the range of wavelengths that are conventionally referred to as EUV, actually belongs to the range of soft X-rays, making "Extreme Ultra Violet" a relatively improper name for this radiation. Utilizing such a high energy radiation determines several problems, as it is absorbed by nearly all materials, including air.

For this reason the medium of the EUV lithography system has to be vacuum [84]. On top of this, all its elements need to be reflective instead of refractive (as in the 193i system), which means that mirrors are needed rather than lenses, minimizing the energy loss across the projection system. For this purpose Bragg mirrors are typically used. A Bragg mirror (also called distributed Bragg reflector) is a mirror structure which consists of an alternating sequence of layers of two different optical materials. The most frequently used configuration is a quarter-wave mirror, where the thickness of each layer corresponds to one quarter of the wavelength for which the mirror is designed [85]. For EUV mirrors, multilayer Molybdenum/Silicon (Mo/Si) stacks are normally used. In that case, it is possible to demonstrate that the maximum theoretical value of reflectivity is close to 75% [86]. The remaining 25% of the Energy will be dissipated into heat, with consequent challenges to the



Figure 3.4 A schematic of the main components of an EUV lithography system. Source: [14].

reliability of the system, and reduction of the energy that actually arrives at the resist. A simplified schematic for an EUV system is shown in Figure 3.4. As we mentioned the system operates in vacuum, and the radiation from the EUV source is at first collected and focused to the mask, that is also reflective. Then the EUV beam goes through a set of 6 mirrors before reaching the resist. Such a sophisticated system determines a cost per machine in the range of 100M EUR, as can be calculated from [87].

### 3.1.3 EUV status and challenges

The yearly International Conference on Extreme Ultraviolet Lithography (that imec contributes to organize) provides a forum to discuss and assess the worldwide status of EUV technology and infrastructure readiness. In this context the main challenges are discussed and prioritized. Table 3.1 shows the top four priorities, in decreasing order of importance, individuated over the last years. We highlight that from 2016 to 2017 the problem related to the source power has been downgraded from priority number one to number two, and the highest priority issue has become: resist resolution sensitivity and Line Edge Roughness (LER). We will examine in detail both these focus areas, while the problems categorized as the third and fourth priority will not be discussed, as they are mainly related to mask manufacturing and maintenance, which falls outside the scope of this work.

| Priority | 2014                                             | 2015                                             | 2016                                             | 2017                                             |
|----------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|
| #1       | Reliable source operation with >75% availability | Reliable source operation with >75% availability | Reliable source operation with >75% availability | Resist resolution sensitivity & LER              |
| #2       | Resist resolution sensitivity & LER              | Resist resolution sensitivity & LER              | Resist resolution sensitivity & LER              | Reliable source operation with >75% availability |
| #3       | Mask yield & defect inspection infrastructure    | Mask yield & defect inspection infrastructure    | Keeping mask defect free                         | Keeping mask defect free                         |
| #4       | Keeping mask defect free                         | Keeping mask defect free                         | Mask yield & defect inspection infrastructure    | Mask yield & defect inspection infrastructure    |

Table 3.1 Top priority problems individuated by the EUV symposium every year.

**The reliable source operation** is tightly related to source power and throughput. In 2017 an important milestone was reached by ASML, which showed more than 100 Wafers Per Hour (WPH) with a 250W power source [15]. This threshold had always been considered a crucial enabler for HVM. The plot in Figure 3.5 documents the massive throughput improvement over the last years, explaining why the priority of this issue was recently downgraded, as in Table 3.1.

**The tradeoff between Resolution LER and Sensitivity**, also known in literature as RLS tradeoff, has been indicated as the top priority problem. With Resolution, we refer to the pitch (or the half pitch) as indicated by Equation 3.1. Sensitivity means in this context the Dose required for the activation of the resist (Equation 3.5). Line edge roughness (LER) is the deviation of a feature edge from a smooth, ideal shape. The tradeoff is originated by the fact that finding a resist that simultaneously optimizes all the three variables is very challenging chemical engineering problem [88]. Optimizing these variables means to reduce the half pitch, to reduce the activation Dose and to reduce LER, which is graphically equivalent to minimize the area of the triangle in Figure 3.7. From this qualitative plot, we can see that for the same target resolution, some resists (Resist C in the plot) will allow a low activation



Figure 3.5 Throughput improvement of EUV. Source [15].

Dose but will be affected by high LER. Other resists (Resist B in the plot) will exhibit low LER values but will require a highly increased activation Dose. Since a low activation Dose is very attractive for economic reasons, as it is tightly related to throughput, studying LER in worst-case conditions is recommendable. Figure 3.6 shows the non ideal profile of the side wall of a metal, induced by LER. The non-ideality derives from several factors, as the non ideal intensity of the light illuminating the resist across the side wall, the chemical non uniformities present in the resist, and other stochastic effects. In Figure 3.6,  $X$  is the ideal coordinate of the side wall, while  $X_i$  is the actual coordinate at each point, that can be measured with a Scanning Electron Microscope (SEM). If we consider the distribution of all the points, we can quantify the non-ideality with average and standard deviation as in Equations 3.6 and 3.7 respectively. The profile in the side walls translates into stochastic variations of the metal width, that determine an increased variability of parasitic resistance and capacitance. Shrinking the feature size, the width variation caused by LER becomes an increasing fraction of the nominal width of the metal. Given the growing importance of this problem, we dedicated Section 3.3 of this Chapter to study its system level impact.

$$\eta_{LER} = \frac{\sum_{i=1}^N x_i}{N} \quad (3.6)$$

$$\sigma_{LER} = \sqrt{\frac{1}{N} \cdot \sum_{i=1}^N (x_i - \eta_{LER})^2} \quad (3.7)$$



Figure 3.6 Non ideal profile of the metal determined by LER.



Figure 3.7 Graphical representation of RLS tradeoff. Source: [16]



Figure 3.8 Progression of standard cells design style towards unidimensional patterns.

### 3.1.4 EUV and the roadmap

As previously mentioned, EUV would have been already helpful starting from the 22nm node, in order to avoid the introduction of multiple patterning. As a confirmation of this, we can examine the progression of the standard cells design style from the 28nm to the 7nm node, reported in Figure 3.8. The major trend that we can highlight, is the tendency towards a more litho-friendly design, through the progressive replacement of bi-dimensional (2D) shapes with a uni-dimensional (1D) style. For nodes above 28nm, 2D shapes were used for both poly and lowest metal layers. In the transition to the 28nm node the poly becomes 1D. In the 14nm node the usage of 2D shapes in the metal and the contacts is limited through the introduction of a new local interconnect layer. For 10nm and below we observe a transition to a fully 1D style, and the local interconnect also becomes a 1D horizontal layer (MINT). Although from Figure 1.10 we can see that EUV 2D would in principle be applicable for N7 dimensions, we avoided the usage of 2D shapes in our N7+ experiments, also to make the standard cells compatible with both EUV and 193i, enabling a fair comparison.

The target that was set in Chapter 1, to make the transition from N7 to N7+ with the same pitches, is functional to enable a fast transition and ramp up to a more cost effective and PPA efficient node. The roadmap indicating the expected usage of EUV for FEOL, MoL and BEOL layers in below N7 nodes is given by Table 3.2. For N7+ dimensions all metal and via layers in the MoL and BEOL are targeted with single exposure EUV. For the Fin and Gate layers, given their regular structure, spacer assisted patternings in 193i will continue to be used. Similar considerations apply for the N5 dimensions, where the pitch of the tightest layer (32nm) is still compatible with single LE EUV resolution. It is worth to remark that though the replacement of spacer assisted patterning with EUV, the metal cut layers are no longer needed. For our target N3 dimensions (21nm Mx), given the current resolution limits of EUV, it is necessary to assume SAQP for MINT and Mx layers. Moreover double patterning EUV is considered to be required for vias and metal cuts. It is important to observe that since the first "alpha" EUV systems introduced in 2006, to the current state-of-the-art machines,

| Layer                | N7      | N7+     | N5      | N3       |
|----------------------|---------|---------|---------|----------|
| Expected HVM ramp up | 2017/18 | 2018/19 | 2020/21 | ~2023    |
| <b>Fin</b>           | SAQP    | SAQP    | SAQP    | SADP EUV |
| <b>Fin Cut</b>       | LE      | LE2     | LE2     | EUV LE2  |
| <b>Gate</b>          | SADP    | SADP    | SADP    | SADP     |
| <b>Gate Cut</b>      | LE2     | LE2     | EUV     | EUV      |
| <b>M0A</b>           | LE3     | EUV     | EUV     | EUV      |
| <b>MINT</b>          | SADP    | EUV     | EUV     | SAQP     |
| <b>MINT Cut</b>      | LE2     | -       | -       | EUV LE2  |
| <b>VINT</b>          | LE3     | EUV     | EUV     | EUV LE2  |
| <b>Mx</b>            | SADP    | EUV     | EUV     | SAQP     |
| <b>Mx Cut</b>        | LE2     | -       | -       | EUV LE2  |
| <b>Vx</b>            | LE3     | EUV     | EUV     | EUV LE2  |

Table 3.2 Expected insertion of EUV into nodes below N7. 193i options are indicated in blue, EUV options in yellow.

| EUV Machine                 | Alpha/Demo Tools | NXE 3100 | NXE 3300B | NXE 3350B | NXE 3400B |
|-----------------------------|------------------|----------|-----------|-----------|-----------|
| <b>Year of Introduction</b> | 2006             | 2010     | 2013      | 2015      | 2017      |
| <b>Resolution [nm]</b>      | 40               | 27       | 22        | 16        | 13        |
| <b>Overlay [nm]</b>         | n/a              | 7        | 5         | 2.5       | 2         |
| <b>Throughput [WPH]</b>     | n/a              | <60      | 70        | 125       | 125       |
| <b>NA</b>                   | 0.25             | 0.25     | 0.33      | 0.33      | 0.33      |

Table 3.3 Improvements in EUV machines for different generations of NXE machines from ASML.

there has been a dramatic improvement of all the main specifications. A summary of this evolution is provided in Table 3.3 [89]. Therefore, especially for the N3 node, the patterning specifications should be considered a moving target, that is correlated to the improvements in the specifications of these machines. Industry is already looking into new generation EUV machines with NA in the range of 0.5 [90], that would possibly allow to avoid the SAQP and EUV LE2 usage for N3 dimensions, determining a re-spin of all the DTCO work reported in Chapter 5.

## 3.2 N7 vs N7+ PPAC comparison

In this section we will illustrate the deltas in the patterning options between N7 and N7+, and go through our benchmarking results on Power Performance Area and Cost (PPAC), demonstrating and quantifying the value proposition of the N7+ node. These experiments were based on the ARM 64-Bit CPU, and a 7.5-Tracks library which is compatible with both the 193i and the EUV ruleset.

### 3.2.1 193i vs EUV patterning

Table 3.4 highlights the deltas in the main rules of the  $M_X$  layers, between the 193i based N7 node, and the EUV based N7+.  $M_Y$  and  $M_Z$  layers are kept the same. In N7, the Metal layers use SADP with double patterning (DPT) metal blocks (cuts), while the vias are triple patterning (TPT). In multiple patterning layers, the different masks are normally visualized as different colors, allowing to use the terms "masks" and "colors" interchangeably in this context. In the EUV ruleset (N7+) both the lines and vias are printed with a single color, and the dummy extensions of the metal and the metal cuts are eliminated, as the wire intent is directly printed. In the N7 ruleset the vias have different Center-to-Center (C2C) spacing for same color (Vx.C.1) and different color (Vx.C.2), of 100nm and 42nm respectively. Using EUV ruleset allows to collapse the three masks into a single mask with a C2C spacing equivalent to the different color spacing in 193i. The extension of the metal above over via is slightly increased in the EUV ruleset from 8nm to 11nm (V.E.1). in N7+ the Tip-to-Tip (T2T) minimum spacing is set to 25nm, while in N7 it is defined by the metal cut width (Bx.W), that is also 25nm. We finally highlight that the complex spacing rules for the metal cuts (Bx.S.x and Bx.L.x) are not applicable for EUV.

### 3.2.2 Physical comparison

The first part of the comparison is the physical analysis, where we want to assess routability as described in subsection 2.3.2. In Figure 3.9 we can see that, for placement densities  $\geq 77.5\%$ , EUV exhibits better routability, with only a few tens of DRCs up to 80% density. On the other hand, for the N7 run we witness a DRC count between 100 and 300. Both the N7 and N7+ runs are clearly unroutable beyond 80% density. The increased DRC count in N7, for placement densities close to the routability limit was further investigated. The analysis of the DRC count by type revealed that almost all violations are related to the metal cut spacing rules (Bx.S.x in Table 3.4), which explains the improvement achieved with EUV. We also found that the patterning choice has a very significant impact on wirelength. Figure ??

| 193i: N7      |                                                              |                                   | EUV: N7+ |               |                                        |                                   |        |
|---------------|--------------------------------------------------------------|-----------------------------------|----------|---------------|----------------------------------------|-----------------------------------|--------|
| Rule          | Description                                                  | Value [um]                        | Figure   | Rule          | Description                            | Value [um]                        | Figure |
| <b>Mx.W</b>   | Minimum Width                                                | $\geq 0.016$                      |          | <b>Mx.W</b>   | Minimum Width                          | $\geq 0.020$                      |        |
| <b>Mx.S.1</b> | Minimum Spacing<br>Same color                                | $\geq 0.060$                      |          | <b>Mx.S.1</b> | Minimum Spacing<br>Same color          | $\geq 0.020$                      |        |
| <b>Mx.S.2</b> | Minimum Spacing<br>Other color                               | $\geq 0.020$                      |          | <b>N/A</b>    |                                        |                                   |        |
| <b>Mx.A.1</b> | Minimum Area<br>Copper                                       | $\geq 0.00160$<br>um <sup>2</sup> |          | <b>Mx.A.1</b> | Minimum Area<br>Copper                 | $\geq 0.00160$<br>um <sup>2</sup> |        |
| <b>Vx.W</b>   | Via Width                                                    | $= 0.020$                         |          | <b>Vx.W</b>   | Via Width                              | $= 0.020$                         |        |
| <b>Vx.H</b>   | Via Height                                                   | $= 0.020$                         |          | <b>Vx.H</b>   | Via Height                             | $= 0.020$                         |        |
| <b>Vx.C.1</b> | Center-to-Center<br>spacing same color                       | $\geq 0.1$                        |          | <b>Vx.C.1</b> | Center-to-Center<br>spacing same color | $\geq 0.042$                      |        |
| <b>Vx.C.2</b> | Center-to-Center<br>spacing other color                      | $\geq 0.042$                      |          | <b>N/A</b>    |                                        |                                   |        |
| <b>Vx.E.1</b> | Via enclosure metal<br>above                                 | $\geq 0.008$                      |          | <b>Vx.E.1</b> | Via enclosure metal<br>above           | $\geq 0.011$                      |        |
| <b>Vx.E.2</b> | Via enclosure metal<br>below                                 | $\geq 0$                          |          | <b>Vx.E.2</b> | Via enclosure metal<br>below           | $\geq 0$                          |        |
| <b>Bx.W</b>   | Block Width                                                  | $= 0.025$                         |          | <b>T2T</b>    | Tip to Tip                             | $\geq 0.025$                      |        |
| <b>Bx.L.1</b> | Block Length                                                 | $=$<br>Metal<br>Pitch             |          | <b>N/A</b>    |                                        |                                   |        |
| <b>Bx.L.2</b> | Maximum Merged<br>Block Length                               | $\leq 1$                          |          |               |                                        |                                   |        |
| <b>Bx.S.1</b> | Same color<br>spacing<br>on same track                       | $\geq 0.076$                      |          |               |                                        |                                   |        |
| <b>Bx.S.2</b> | Same color diagonal<br>spacing on other<br>track             | $\geq 0.050$                      |          |               |                                        |                                   |        |
| <b>Bx.S.3</b> | Same color spacing<br>on other track with<br>exact alignment | $\geq$<br>Metal<br>Pitch          |          |               |                                        |                                   |        |
| <b>Bx.C</b>   | Block Color                                                  | $=$<br>Metal<br>Color             |          |               |                                        |                                   |        |

Table 3.4 Comparison of N7 vs. N7+ design rules for metal vias and metal cuts.



Figure 3.9 DRC count versus placement density for N7 and N7+.

Figure 3.10 Wirelength breakdown on the  $M_x$  layers for N7 and N7+ at 77.5% density

reports the comparison of the wirelength breakdown of the  $M_x$  layers. In EUV there is only one typology of wires for each layer, which is the wire intent (M1, M2 and M3). Although the wirelength of the wire intent is very similar in N7 and N7+, in N7 there are other two categories of wires: the dummy extensions of the metals ( $M_x$  fill) and the floating fills ( $M_x$  float). The dummy extensions are patches that are created in order to extend the signal wires until the metal cuts, while the floating fills are wires that are disconnected from signals and are just used to fill the sea of lines between two cuts. The contribution of the dummy and floating fills causes a more than 2X increase of the aggregate  $M_x$  wirelength compared to the wire intent only. In our metal stack, this translated into a 1.5X increase of the total wirelength for N7 compared to the N7+, which is reasonable if we consider that the relative weight of the  $M_x$  wirelength will be dominating, due to the smaller pitch.

### 3.2.3 Electrical comparison

Figure 3.11 shows the comparison of the electrical metrics between N7 and N7+. The removal of the dummy fills in the  $M_X$  layers of N7+ determines a reduction of wire capacitance in the range of 25% compared to N7, while pin capacitance stays unchanged for the two technologies, since the same libraries are used. The ratio between wire and pin capacitance is of course design dependent and frequency dependent, but in our runs it oscillated between 0.8 and 1. The breakdown of total power in dynamic, internal and leakage power, shows a negligible contribution of leakage power, and total power approximately evenly split between dynamic and internal power. Since dynamic power is dependent on both pin and wire capacitance, the gain in terms of total power will be reduced compared to the wire capacitance gain. The benchmarking showed in fact a power reduction in the range of 6% for N7+. The performance improvement in N7+ is instead more remarkable, with 15% increase versus N7. This result was benchmarked in the frequency sweep, through the timing closure criteria explained in subsection 2.3.3. The explanation of this significant performance boost is twofold: on one side the absence of the design rules associated to the metal cuts (Table 3.4), determines a lower congestion for the EUV runs, and increased degrees of freedom to optimize for timing. On top of this the lower wire capacitance on the  $M_X$  layers facilitates timing closure for the critical paths that are dominated by the wire delay. In the benchmarking summarized in Figure 3.11 the same physical design options were used for the N7 and N7+ testcases, in order to make the comparison more fair. However, further gains for N7+ could be explored by using Non Default Rules (NDR) in the  $M_X$  layers, which is not allowed in spacer assisted patterning techniques. Examples of this strategy could be double width and/or double spacing metals for timing critical nets, clock nets, or EMIR critical nets. In state of the art N7 node, these types of nets are handled by escaping them to the first non double patterning layers through via pillars [43], which mitigates the electrical issues at the cost of increased routing congestion in the  $M_X$  layers, and enhanced EDA features [91].

### 3.2.4 Cost

In subsection 1.2.5 we documented the node over node increase in the wafer cost for the latest 193i nodes, which is mainly due to an increase in the number of process steps caused by multiple patterning. Figure 3.12 reports the inversion of this trend determined by the introduction of EUV. According to the data reported by ASML in [17], an EUV based N7 allows a 12% wafer cost reduction and a 9% yield improvement compared to a 193i based N7. A similar assessment was performed using imec's proprietary cost model, reaching consistent conclusions [92].



Figure 3.11 Electrical comparison between N7 and N7+ options.



Figure 3.12 Reduction in the number of process steps with EUV. Source: [17].

### 3.3 Study on system-level impact of LER

As described in subsection 3.1.3, LER results into stochastic variations of wire width and spacing. At small metal pitches, these variations become significant with respect to the nominal width and spacing, and affect the resistance and capacitance accordingly. Due to a highly resistive diffusion barrier and interface-proximity induced resistivity increase, the impact of LER on resistance is relatively larger than on capacitance [18]. In this Section we aim to quantify such impact from wire resistance to system-level timing through an analysis based on the ARM 64-Bit CPU with a 6-Tracks library and EUV patterning.

#### 3.3.1 Impact of stochastic effects as corners

Based on our process assumptions, we calculated the RC corners taking into account both systematic and stochastic variability in the  $M_X$  layers, as shown in Figure 3.13. The major contributor to stochastic variability is LER, for which in Figure 3.13, a  $\sigma_{LER}$  of 2.8nm was assumed. The plot shows the Typical, Best and Worst RC corners indicating Capacitance and Resistance per unit length. We can confirm that the resistance variability dominates over the capacitance variability. In fact the worst corner for resistance (i.e. the rightmost point in Figure 3.13) is roughly 3.3X more resistive than the typical corner, while capacitance variations are in the range of +/- 20%. The worst corner for resistance (point#3) will therefore be the "canary" corner. We notice that in this corner LER contributes with approximately 40% of the resistance, and considering only the systematic sources of variability, the resistance increase compared to the Typical corner would be reduced to 2.4X. Using the same database we swapped the *qrcTechFile* for the different corners and extracted RC in each scenario. This step aims to quantify how the  $M_X$  resistance and capacitance variations propagate at system level. The histogram reported in Figure 3.14, shows R and C values for all corners normalized to the values of the Typical scenario. Total wire capacitance variations are limited to + 5%, while resistance is reduced by 25% for corner #1 and increased by 2X for corner #3. We finally wanted to benchmark the Power and Performance figures of the Multi Corner (MC) run compared to the single-corner Typical run. For this comparison we used two values of  $\sigma_{LER}$  : 2.8nm, that corresponds to a pessimistic scenario, and a more optimistic value of 1.5nm. The relative deltas in power and performance obtained for these two scenarios are shown in Table 3.5. Variations in power are limited below a 5% increase, compared to the typical run, which is consistent with the limited wire capacitance increase seen in Figure 3.14. Performance degradation is extremely significant for the pessimistic  $\sigma_{LER}$  scenario, with a performance loss in the range of 17%, while it is limited to 8% for the testcase with  $\sigma_{LER}$  =1.5nm. A more in-depth analysis showed that the timing degradation in the Multi Corner



Figure 3.13 RC corners including both systematic and stochastic variability. A  $\sigma_{LER}$  of 2.8nm was assumed.



Figure 3.14 Normalized RC at block level highlighting variations induced by corners.

runs is already present after Clock Tree Synthesis step, that is also consistent with the worst resistive corner being the main problem. It is therefore clear that treating LER as an RC corner introduces too much pessimism, triggering the MC optimization engine to try and close timing on corner#3, that will determine a huge buffer insertion. This led to the idea of adopting the statistical timing approach proposed in subsection 3.3.2.

### 3.3.2 Statistical STA model

The traditional corner-based model is an extremely pessimistic approach in the case of a stochastic effect with no spatial correlation. In fact, a worst case corner imposes the highest possible value of LER on all wires, which is statistically nearly impossible. Additionally, LER has a significant length dependence, which is not considered in traditional parasitics

|                              | MC<br>( $\sigma_{LER}=1.5\text{nm}$ ) | MC<br>( $\sigma_{LER}=2.8\text{nm}$ ) |
|------------------------------|---------------------------------------|---------------------------------------|
| Power $\Delta$ vs Typ.       | +3%                                   | +4%                                   |
| Performance $\Delta$ vs Typ. | -8%                                   | -17%                                  |

Table 3.5 Relative deltas in Power and Performance of Multi Corner (MC) runs compared to the single-corner Typical implementation.

extraction tools. We therefore adopted a statistical interconnect timing analysis method, based on a post-processing of a custom report from our standard implementation tool. This report contains detailed wirelength and timing data from the post-route stage database. Considering a synchronous digital design, and traditional STA, the critical timing path is determined by evaluating all logic paths between storage elements. The delay of all logic cells and wire delays in a path is calculated and accumulated, and the maximum of these path delays is the critical timing path which determines the maximum clock frequency. In our post-processing, the wire delays are replaced by delay distributions that take into account the length of the wire and the layer on which it is routed [18]. Considering the delays on the different wire segments ( $D_n$ ) as statistically independent, the total delay of the timing path ( $D_p$  in Equation 3.8) will be replaced by the convolution of all the Probability Density Functions ( $PDF_{Dn}$ ) along the path (Equation 3.9). The critical path is then individuated by the multiplication of the Cumulative Distribution Functions ( $CDF_p$ ) of all paths ( $p$ ) (Equation 3.10).

$$D_p = D1 + D2 + \dots + Dn \quad (3.8)$$

$$PDF_p = PDF_{D1} \circledast PDF_{D2} \dots \circledast PDF_{Dn} \quad (3.9)$$

$$CDF_{crit} = \prod_p CDF_p \quad (3.10)$$

Unlike traditional STA, where a single path determines the critical delay, in this statistical analysis many paths contribute to the critical delay distribution because they all have a non-zero probability of being the critical path. Considering the impact of LER on the  $M_X$  layers, we obtained from this analysis the results shown in Figure 3.15.

- **Figure 3.15 (a)** shows the critical path delay distribution normalized to the target clock period as a result of LER for different  $\sigma_{LER}$ . Only for large values  $\sigma_{LER}$  there is a



Figure 3.15 Results from statistical STA post-processing: (a) critical path delay distribution for different  $\sigma_{LER}$ ; (b) Timing yield as a function of margin for different  $\sigma_{LER}$ . Source: [18]

variability larger than 1%. We can also look at results in terms of timing margin and yield.

- **Figure 3.15 (b)** indicates the relationship between the two is shown with as reference a 50% yield at 0% margin for  $\sigma_{LER}$  of 0.5nm. All curves converge within 2% of the clock period and the difference between an optimistic  $\sigma_{LER}$  of 0.5nm and a pessimistic value of 1nm, is maximum 1% of the clock period.

In conclusion the averaging effect of longer and shorter wires and multiple stages in each path showed a negligible impact on system-level timing.

## 3.4 Summary and conclusions

In this chapter an introduction to EUV lithography was given, explaining the motivations for its adoption at advanced nodes. An high level description of an EUV system was provided, along with a status update on the major challenges for EUV adoption in High Volume Manufacturing. A roadmap indicating EUV usage for technology nodes below N7 was proposed. The first opportunity to insert EUV is in the context of an EUV enabled N7 that was defined "N7+", to differentiate it from the 193i based N7. EUV single patterning is still viable for our predictive N5 node, while at N3 dimensions the need for EUV double patterning was justified. A standard cells library was designed in such a way to be compatible with both N7 and N7+ rule-sets, that were also documented. The PPAC comparison based on

an ARM 64-bit CPU showed for the N7+ node 6% lower power, 15% improved performance and improved routability compared to the reference N7. Furthermore a 12% wafer cost reduction and a 9% yield improvement are expected. The RC variability due to Line Edge Roughness was recently classified as the top priority problem affecting EUV lithography. Our study showed that modelling this effect through the conventional RC corners causes a detrimental impact on timing closure, with up to 17% performance loss. This is mainly due to the worst resistive corner, that is more than 3 times more resistive than the typical corner. However, given the stochastic nature of LER, a statistical interconnect timing analysis method was also developed and tested. This study led to the conclusion that the averaging effect of longer and shorter wires determines a negligible impact of LER on system level timing.

# Chapter 4

## DTCO for the N5 node

In this Chapter we will focus on the results of the DTCO for the N5 node in accordance with the roadmap defined in Chapter 1, target: CPP of 42nm and  $M_X$  pitch of 32nm. The main changes in the process assumptions compared to N7, are summarized in Table 4.1. In Section 4.1 different scaling boosters will be illustrated, and it will be shown how their combined usage can allow to mimic the PPA gains of a new node, keeping the ground rules unchanged. In Section 4.2 we will give an assessment of the electrical impact of replacing Copper with Cobalt in the  $M_X$  layers. The feasibility of a transition to a 5-Tracks cell architecture will be investigated in Section 4.3. In Section 4.4 an electrical comparison between FinFET and lateral nanowire devices will be presented. Finally section 4.5 explains in more detail the issues related to high performance implementations in N5, motivating the choice of a gear ratio lower than one between M1 and gate pitch.

Table 4.1 Main process parameters for N7 and N5 nodes.

| <b>Node</b>              | <b>N7</b>         | <b>N5</b>         |
|--------------------------|-------------------|-------------------|
| <b>Track Height (TH)</b> | <b>7.5]</b>       | <b>6</b>          |
| <b>Process parameter</b> | <b>value [nm]</b> | <b>value [nm]</b> |
| Gate length ( $L_g$ )    | 21                | 18                |
| Gate Spacer width        | 8                 | 8                 |
| Fin Height ( $H_{fin}$ ) | 45                | 45                |
| Fin Pitch                | 30                | 24                |
| Fin Width ( $W_{fin}$ )  | 5                 | 5                 |
| p/n separation           | 85                | 67                |
| $M_X$ Pitch              | 40                | 32                |
| Gate Pitch (CPP)         | 54                | 42                |

| Scaling Booster                          | Description                                                      | Intended impact                                                                    |
|------------------------------------------|------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Number of Tracks reduction               | Re-design of the standard cells with reduced number of tracks.   | More compact cells through cell height reduction.                                  |
| Single-Diffusion Break (SDB)             | Technology allowing to cut fins with only one dummy poly.        | More compact cells through cell width reduction.                                   |
| Self-aligned Gate Contact (SAGC)         | Technology allowing to contact gate over active fins.            | More flexibility in complex cell design, cell width reduction.                     |
| M1 and MINT Open to Routing              | Pins on M0 and M1, allowing for routing in M1 and M0 extensions. | More routing resources available. Helps router for pin access.                     |
| “Vertical” power mesh with outbound rail | Smaller rail footprint reduces VDD/VSS impact on cell area.      | Less routing resources consumed by the Power Delivery Network improve routability. |
| Deep Trench on MINT                      | Increasing height of MINT layer.                                 | Making the “Vertical” power mesh electrically viable.                              |
| Porous cells                             | Two dummy tracks are inserted into the center of largest cells.  | Enlarges some problematic cells but improves routability.                          |

Table 4.2 Summary Table of the Scaling Boosters explored.

## 4.1 Moving from 7.5 to 6-Tracks with scaling Boosters

The results provided in the set of experiments in this section are based on the LDPC core. This section is organized as follows. In subsection 4.1.1 novel DTCO solutions alternative to pitch scaling, named as "Scaling Boosters" are introduced and described. subsection 4.1.2 shows the physical results deriving from the adoption of these solutions. subsection 4.1.3 and subsection 4.1.4 complete the IP-Block level analysis with power-performance, and IR-drop results respectively. In subsection 4.1.5 a summary and final comparison is provided.

### 4.1.1 Alternative Solutions to Pitch Scaling

In order to enable area scaling at IP block level without modifying the set of ground rules, alternative solutions were explored. These solutions were named "Scaling Boosters" and defined as: Design, Process or EDA options, that when used in conjunction, allow to reduce area at IP block level. The list of the scaling boosters explored, with a description of their intended impact is provided in Table 4.2. Each of these solutions will be described in detail in the following paragraphs. Table 4.3 summarizes the area impact for the scaling boosters that directly decrease cell area. As in [3] a NAND2 and a D Flip-Flop (DFF) were chosen to analyse the impact of scaling. This choice is justified by the fact that these cells are representative of simple and complex cells respectively. Complex cells as the Flip-Flop were implemented as double-height cells.

|                        | 7.5-Tracks                                                                        | 6-Tracks                                                                          | 6-Tracks +SDB                                                                       | 6-Tracks +SAGC                                                                      |
|------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| <b>NAND2</b>           |  |  |  |  |
| <b>Cell H [nm]</b>     | 240                                                                               | 192                                                                               | 192                                                                                 | 192                                                                                 |
| <b>Cell W [nm]</b>     | 168                                                                               | 168                                                                               | 126                                                                                 | 168                                                                                 |
| <b>Normalized Area</b> | 1                                                                                 | 0.8                                                                               | 0.6                                                                                 | 0.8                                                                                 |
| <b>DFF</b>             |  |  |   |  |
| <b>Cell H [nm]</b>     | 480                                                                               | 384                                                                               | 384                                                                                 | 384                                                                                 |
| <b>Cell W [nm]</b>     | 630                                                                               | 630                                                                               | 588                                                                                 | 462                                                                                 |
| <b>Normalized Area</b> | 1                                                                                 | 0.8                                                                               | 0.74                                                                                | 0.58                                                                                |

Table 4.3 Impact of different scaling boosters on NAND2 and DFF area. Dimensions of figures are in scale.

**Number of Tracks reduction:** In this work the number of tracks is defined as the cell height divided by the MINT pitch. Standard cells single height is therefore 240nm and 192nm for 7.5 and 6-Tracks cells respectively. Moving from a higher ( $T_1$ ) to a lower number of tracks ( $T_2$ ), the maximum area shrink achievable is given by  $T_2/T_1$  that corresponds to a 0.8 factor in the case of the transition from 7.5 to 6-Tracks. The potential side effects [62] of such transition could be represented by: i) horizontal cell enlargement, due to increased difficulty of inter-cell connections. ii) decreased routability and placement density. iii) reduced performance at IP-Block level due to fin depopulation. As shown in Table 4.3, it was possible to avoid cell enlargement both for NAND2 and DFF and achieve the 0.8 full area shrink. Last two points will be discussed in subsection 4.1.2 and subsection 4.1.3 respectively.

**Single Diffusion Break (SDB):** The single diffusion break is a scaling booster enabled by the process flow explained in [93]. In essence, from a standard cell design point of view, a more selective fin-etching allows to separate different devices with a single dummy gate rather than two, (Figure 4.1) yielding an area shrink in the horizontal dimension. From Table 4.3 we see that this feature has more impact for simple cells as it allows the 6-Tracks NAND2 to further shrink from 0.8 to 0.6, with respect to the original 7.5-Tracks dimensions, while has a more limited benefit for the DFF that reduces from 0.8 to 0.74.

**Self Aligned Gate Contact (SAGC):** An additional technology process flow [94] that is meant to shrink the width of the cells is the SAGC. The SAGC makes it feasible to place the



Figure 4.1 SDB and SAGC reducing the usage of dummy gates.

gate contact over the active area rather than constraining its placement to the p-n separation track. As exemplified in Figure 4.1, this additional degree of freedom allows to better stagger the contacts and effectively reduce the usage of dummy gates. Table 4.3 indicates that this feature is particularly leveraged in complex cells: the 6-Tracks with SAGC allows in fact to further reduce the normalized DFF area to 0.57 while it provides no additional benefits for the NAND2.

**M1 and MINT Open to Routing:** The cell architecture has been engineered with most of the pins on MINT, and the remaining connections completed on M1. The version of the router used in this work [95] has been enhanced by Cadence R&D to better resolve pin accessibility issues extending the MINT pins in order to skip to free tracks and use the depopulated M1 for intra-cell routing and short connections. The clips from the tool in Figure 4.2 illustrate these concepts. The turnaround time, from the submission of the enhancement requests to the availability of a beta (non-production) build incorporating the additional features, was generally less than two months. It is clear that the development effort will be correlated to how much disruptive is the impact of the modification on the pre-existing flow, resulting in a significant delta between simple fixes, that could take less than a man-week, and complex methodology changes that might require multiple man-months to propagate the enhancement across several engines. This consideration further highlights the necessity to involve EDA early in the DTCO loop, and align on the expected efforts in order to guarantee the timely readiness of the new capabilities.



Figure 4.2 MINT and M1 open to routing.

**"Vertical" power mesh with outbound rail:** In order to improve routability in the 7.5-Tracks scenario, and especially to enable 6-Tracks cells, it was also necessary to co-optimize the cell architecture, with the local rails (from MINT to M3) of the Power Delivery Network (PDN). In fact, reducing cell height, the traditional solution [69] in Figure 4.3 (a) using a multi Critical Dimension (CD) power rail on M2 is no longer applicable as it would consume too many of the routing resources on M2, critically reducing the amount of tracks available for signal routing. This topology also degrades placement quality due to the interaction of the M1 pins in the standard cells, and the M1 power staples connecting the M2 to the MINT power rail. These qualitative considerations will be quantified in subsection 4.1.2. The solution adopted here, shown in Figure 4.3 (b), was to remove the power rail on M2, introducing vertical power rails on M1, and using M2 only to strap together the stripes on M1 with parallel stripes added on M3 to decrease the resistance. The solution in the next paragraph was also used to further compensate the M2 power rail removal. The electrical validation will be shown in Section 7, where IR-Drop robustness of this topology will be proved. The novel topology consumes the whole vertical track on M1 for the VDD/VSS stripes, whose utilization was nevertheless severely constrained also in the original topology due to the presence of M1 staples. The choice of the distance between the vertical stripes ( $S_i$  in Figure 4.3) is thus not determined by cell height, and it is evident that this spacing will play a fundamental role in the trade-off between routability and IR-Drop. In fact, tightening this dimension more stripes will be inserted reducing IR-drop, but routability will be challenged due to placement quality degradation and reduced signal tracks available on M1 [9].



Figure 4.3 PDN architectures (a) Original (b) 6-Tracks compatible.

**Deep Trench on MINT:** Removing the M2 power rail poses IR-drop challenges that could be mitigated utilizing a multi CD power rail on MINT. In our technology we decided to keep the mint power rail single CD that enables the more compact standard cell architecture adopted, and use the deep trench technology from [96] in order to mitigate IR-Drop.

**Porous cells:** In order to facilitate the insertion of the largest cells (e.g DFF, Full-Adder) below the VDD/VSS stripes, the insertion of two extra tracks in the middle of these cells can be considered [9]. This makes a reduced subset of cells more “porous” to the power stripes at the cost of approximately 15% area enlargement on the single cells, that is intended to be recovered with increased placement density. This solution is expected to be particularly efficient in a scenario requiring a tight spacing of the vertical stripes (e.g  $1\mu\text{m}$ ), that as shown in Figure 4.4 becomes comparable with the width of the largest cells, making placement legalization extremely challenging without this solution.

#### 4.1.2 Physical Results

The scaling boosters were combined into 7 different standard cells libraries. The libraries differ in the usage of the scaling boosters and in the P&R setup as indicated in Table 4.4. The proposed sequence of experiments was set up in order to progressively extract learnings on the Area impact of the several scenarios. Absolute cell area by cell type is reported in Figure 4.5 for library #1, used as reference. The scaling factor of each cell with respect to the reference library is plotted for all the other libraries in Figure 4.6.

Results from the comparative analysis are reported in Figure 4.7. Maximum placement densities are plotted on the right-axis scale. Total standard cell area and final chip area are plotted on the left-axis, normalized to the values of the reference run (**Run#1**). Comparing



Figure 4.4 Insertion of porous cells under the M1 VDD/VSS stripes.

|              | Run#1<br>(reference) | Run#2      | Run#3                               | Run#4      | Run#5                               | Run#6      | Run#7                               | Run#8                               | Run#9                               |
|--------------|----------------------|------------|-------------------------------------|------------|-------------------------------------|------------|-------------------------------------|-------------------------------------|-------------------------------------|
| Library ID   | #1                   | #2         | #3                                  | #2         | #3                                  | #4         | #5                                  | #6                                  | #7                                  |
| n-Tracks     | 7.5                  | 7.5        | 7.5                                 | 7.5        | 7.5                                 | 6          | 6                                   | 6                                   | 6                                   |
| SDB          | -                    | -          | -                                   | -          | -                                   | -          | <input checked="" type="checkbox"/> | -                                   | <input checked="" type="checkbox"/> |
| SAGC         | -                    | -          | -                                   | -          | -                                   | -          | <input checked="" type="checkbox"/> | -                                   | <input checked="" type="checkbox"/> |
| PDN strategy | “Horizontal”         | “Vertical” | “Vertical”                          | “Vertical” | “Vertical”                          | “Vertical” | “Vertical”                          | “Vertical”                          | “Vertical”                          |
| PDN spacing  | 2μm                  | 2μm        | 2μm                                 | 1μm        | 1μm                                 | 2μm        | 2μm                                 | 1μm                                 | 1μm                                 |
| Porous cells | -                    | -          | <input checked="" type="checkbox"/> | -          | <input checked="" type="checkbox"/> | -          | -                                   | <input checked="" type="checkbox"/> | <input checked="" type="checkbox"/> |

Table 4.4 Setup of the runs for the physical experiments. “-” indicates that the scaling booster was not used.



Figure 4.5 Standard cell area for reference library (Library#1).



Figure 4.6 Scaling factor of standard cell with respect to reference library area.

**Run#1** and **Run#2** we analyze the impact of the introduction of the Vertical PDN on a 7.5-Tracks library without scaling boosters, considering a PDN spacing ( $S_i$  in Figure 4.3) of  $2\mu\text{m}$ , that is suitable for a low IR-drop scenario. Engineering the standard cells with an outbound power rail made necessary to enlarge some of the cells with respect to the reference library as shown in Figure 4.6 and confirmed by the increased cell area of **Run#2** respect to **Run#1**. Nevertheless, this penalty was overcompensated by a placement density increase of 10% (from 70% to 80%) that resulted into an area gain of 7%. The proposed PDN is therefore already beneficial for 7.5-Tracks, while being an essential enabler for the 6-Tracks topology. Comparing **Run#2** and **Run#3** we quantify the impact of the usage of porous cells in the new PDN architecture still in the  $2\mu\text{m}$  spacing PDN scenario. This solution further enlarges the complex cells causing cell area to increase of more than 10% respect to the reference run. In this case the +2.5% density increase (from 80% to 82.5%) is not sufficient to compensate cell area enlargement and we conclude that porous cells do not significantly help to reduce area in a relatively large (e.g  $2\mu\text{m}$ ) PDN spacing scenario, where routability is already good. However if we tighten the PDN spacing from 2 to  $1\mu\text{m}$  as in **Run#4**, without the porous cells, as it would be required in a high-IR drop scenario, we observe a 10% (from 80% to 70%) placement density degradation with corresponding Chip area increase. In this scenario the porous cells help to recover +7.5% placement density as in **Run#5**. This yields a -5% Chip area respect to **Run#4**, demonstrating the usefulness of the porous cells in a tight (e.g  $1\mu\text{m}$ ) spacing scenario of the PDN. However, compared to **Run#2**, we still witness an area increase of approximately 10%, confirming the expected trade-off between routability and IR-drop. In **Run#6** we switch to the 6-Tracks cell architecture without additional scaling boosters in a  $2\mu\text{m}$  PDN spacing scenario. Comparing cell area for the libraries in **Run#6** and **Run#2**, we notice that it was possible to achieve the full 0.8 area gain on the greatest part of the cells. In P&R it was possible to maintain in **Run#6** the same density as in **Run#2** (80%) transforming the cell-level area gain into actual Chip area gain. In **Run#7** we further scale the 6-Tracks through the combined usage of the SDB and SAGC that shrinks cell area of more than 35% respect to **Run#6**, losing only 2.5% placement density in P&R respect to **Run#6**, in spite of the relevant increase of pin density. Comparing **Run#7** with the reference initial run (**Run#1**), we verify that it was possible to reduce chip area below a factor 0.5 or in other words, achieve the area benefits equivalent to a full node without pitch scaling. In **Run#8** and **Run#9** the PDN spacing is tightened to  $1\mu\text{m}$  in a 6-Tracks scenario without and with SAGC and SDB, respectively. Although porous cells can help in maintaining higher placement densities (75%), we confirm that in both the 6-Track scenarios the reduction of PDN spacing from 2 to  $1\mu\text{m}$  impacts final Chip area by more than 10%. In order to substantiate more quantitatively the pin density increase that needs to be handled by the tool, we compared in



Figure 4.7 Summary of the physical results. Normalized to the reference run (**Run#1**).

Table 4.5 the pin density histograms and heat-maps of a 7.5-Tracks scenario and a 6-Tracks with SDB and SAGC. We witness a dramatic increase in pin density that the tool has to resolve, with significant population of bins with pin densities between 40 and 50% in the most scaled 6-Tracks library.

#### 4.1.3 Power and Performance results

Moving from 7.5 to 6-Tracks implies the transition from a 3-Fins per device to a 2-Fins per device scenario. In order to electrically evaluate the impact of this transition, **Run#2** and **Run#7** were chosen and starting from the initial frequency (500MHz), a frequency sweep in steps of 500MHz was performed. The summary of the PPA results is presented in Table 4.6. Maximum placement density ( $PD_{max}$ ) was targeted in the first three frequency steps: 500MHz, 1GHz and 1.5GHz. For the highest frequency run (2GHz) target density was decreased by 5%. The motivation is that already at 1.5GHz, an increase of the final density in the range of 5% was witnessed due to buffer insertion, testifying challenging timing closure. Lowering the target density at maximum frequency allows to allocate area for the buffers making the design still routable and DRC clean. The area gain of the "boosted" 6-Tracks versus the 7.5-Tracks is consistent with what has been shown in subsection 4.1.2 all the frequency range.

Table 4.6 uses the following common Static Timing Analysis metrics: Worst Negative Slack (WNS), Total Negative Slack (TNS) and number of failing paths. From the analysis of



Table 4.5 Pin density increase determined by scaling boosters (die area in scale).

|                                     | 7.5-Tracks - 3Fins |       |        |         | 6-Tracks+SAGC+SDB - 2Fins |       |        |        |
|-------------------------------------|--------------------|-------|--------|---------|---------------------------|-------|--------|--------|
|                                     | 500M               | 1GHz  | 1.5GHz | 2GHz    | 500MHz                    | 1GHz  | 1.5GHz | 2GHz   |
| <b>Target placement density [%]</b> | 80                 | 80    | 80     | 75      | 80                        | 80    | 80     | 75     |
| <b>final placement density [%]</b>  | 80.8               | 81.8  | 84.4   | 83.1    | 80.9                      | 81.3  | 87.2   | 86.6   |
| <b>Final Area [um<sup>2</sup>]</b>  | 5530               | 5629  | 5619   | 5919    | 4440                      | 4521  | 4524   | 4762   |
| <b>postroute WNS [ns]</b>           | 0                  | 0     | 0      | -0.029  | 0                         | 0     | 0      | -0.03  |
| <b>postroute TNS [ns]</b>           | 0                  | 0     | 0      | -11.892 | 0                         | 0     | 0      | 17.403 |
| <b>Failing Paths (2100 total)</b>   | 0                  | 0     | 0      | 1296    | 0                         | 0     | 0      | 1682   |
| <b>leakage [mW]</b>                 | 0.37               | 0.37  | 0.39   | 0.46    | 0.27                      | 0.27  | 0.31   | 0.36   |
| <b>switching [mW]</b>               | 6.11               | 13.91 | 20.69  | 28.57   | 4.89                      | 10.95 | 18.59  | 25.14  |
| <b>internal [mW]</b>                | 10.07              | 18.07 | 25.32  | 34.91   | 6.89                      | 13.47 | 17.91  | 24.69  |
| <b>total power [mW]</b>             | 16.55              | 32.35 | 46.40  | 63.94   | 12.06                     | 24.69 | 36.81  | 50.19  |
| <b>wire_cap [pF]</b>                | 49.8               | 61.9  | 58.1   | 57.6    | 47.2                      | 51.1  | 66.0   | 60.5   |
| <b>pin_cap [pF]</b>                 | 74.3               | 76.7  | 78.4   | 83.6    | 52.4                      | 54.9  | 59.8   | 64.7   |

Table 4.6 Design metrics for different frequencies (LDPC design).



Figure 4.8 Slack distributions for 7.5 and 6-Tracks for the frequency sweep runs.

these metrics we observe that timing can be closed without violations up to 1.5GHz for both the 7.5-Tracks and the "boosted" 6-Tracks, while the 2GHz runs fail to reach the frequency target with more than the half of the paths failing. A more detailed way to analyse timing is to compare the slack distributions for the two scenarios across the frequency sweep, as in Figure 4.8. For the initial frequency (500MHz), we see that timing is not challenged, and the greatest part of the paths exhibit slacks larger than 500ps. Increasing frequency, the distributions shift left and their right tail (larger slack values) reduces, graphically showing the reduced margin from the timing targets. Finally, for the 2GHz runs the curves are approximately centered on zero with lowest values (WNS) up to -30ps and half of the area with slack lower than zero (TNS). From a Technology point of view the key learning is that, considering a full library at IP-block level, the transition to 2-Fins per device demanded by the 6-Tracks cell architecture can be enabled at iso-performance, as the slack distributions do not substantially differ in the two scenarios in each of the frequency steps examined.

Power calculation was performed propagating default switching activities, therefore using statistical methods rather than specific input vectors. Table 4.6 indicates that across all the frequencies the internal and switching power are responsible for the greatest part of the total power, with their relative contributions roughly evenly split. The internal power takes into account the power dissipated by charging/discharging parasitics capacitances inside the cells, plus the short-circuit power during the transition. The switching power derives instead from the charging and discharging of the load capacitances seen by the driving cells [97]. Leakage power does not exceed 3% of total power in neither of the runs, being partly related to the fact that the analysis was done at single typical corner. Figure 4.9 shows the linear increase of power versus frequency for the two scenarios. Both internal and switching power are



Figure 4.9 Post P&R power comparison of 7.5-Track 3Fins and 6-Tracks 2 Fins.

lower in the 6-Tracks-2Fins, determining power savings of more than 20% across all the frequency range. Investigating the origin of the unchanged performance and reduced power at IP-block level of the 6-Tracks-2Fins is an ambitious goal as it is dependent on a plethora of factors as: cell timing properties, cell power properties, cell distribution, wire distribution, wire resistance, wire capacitance, pin capacitance, frequency, buffer insertion and timing optimization strategy from the tool, congestion, etc. For this reason, an analytical model that takes into account this complexity would be an extremely challenging task, confirming the necessity for a post P&R approach. Nevertheless we highlight in Table 4.6, the reduced pin capacitance of the 6-Tracks runs (up to more than 20%) that is certainly one of the key contributors of the power benefits and iso-performance.

#### 4.1.4 IR-drop results

For the runs that closed timing (up to 1.5 GHz) we also verified power integrity that has been described in [98] as one of the major impediments in single-digit node implementations. As we want to focus the analysis on the lowest layers of the PDN, namely from MINT to M3, we assumed ideal supply voltage (VDD) and ground (VSS) above M4. Because of this assumption, we want to target an aggressive IR-Drop limit of 2.5% VDD, corresponding to 16 mV. A typical 5%VDD [99] target (on all layers) leaves an additional 2.5% VDD for the upper layers, where an efficient optimization is possible through Non Default Rules (NDR), metal width enlargement, via arrays and dedicated layers for the power mesh on thickest layers. The approach used to calculate IR-Drop is a vectorless dynamic, domain based analysis [97]. Figure 4.10 reports the Cumulative Distribution Functions (CDF) for

the dynamic IR-Drop values at the nodes of the power mesh. Also for IR-Drop we observe similar behaviour for the 7.5 and 6-Tracks runs. This is qualitatively understandable if we consider that the area reduction for the 6-Tracks, that contributes to increase power density, is counterbalanced by the power reduction seen in Section 6, and by an increased "density" of power rails on MINT (every 6-Tracks rather than 7.5). The curves in Figure 4.10 shift to higher values of IR-Drop as frequency is increased. We extracted the IR-drop values at the 99% percentile rather than the maximum value, filtering out the extreme hotspots that should be fixed manually and that could be misleading for a comparative analysis. We conclude that the 16mV IR-Drop limit target is met for the 500MHz and 1GHz runs, while we reach close to the limit at 1.5GHz. For this range of frequencies (and beyond) it is therefore reasonable to switch to a tighter value of the PDN spacing ( $S_i$ ) as shown in Section 5, moving from  $2\mu\text{m}$  to  $1\mu\text{m}$ . Using the results from Section 5 we can switch from the configurations in **Run#2** and **Run#7** to the ones in **Run#4** and **Run#9**, that use porous cells to maintain high placement densities: 77.5% and 75% respectively, corresponding to an area loss in the range of 10% with respect to the corresponding runs at  $2\mu\text{m}$  spacing. Comparing the IR-Drop values between the runs with  $2\mu\text{m}$  and  $1\mu\text{m}$  spacing allows to quantify the trade-off between routability and power integrity. This comparison is done in Table 4.7 both graphically, through the heat-maps and quantitatively in the table reporting the IR-drop values at 99% percentile. Both analyses demonstrate significant reduction of dynamic-IR tightening the PDN spacing to  $1\mu\text{m}$ , up to more than 40%, allowing the design to meet the IR-drop target and hypothetically allow further increase frequency through the introduction of high drive cells.

#### 4.1.5 Final PPAC Comparison

In this section the impact on post P&R PPA of different combinations of standard cell architectures, design solutions and technology options were investigated based on imec N7 node, using the same rules across all design-technology space. The quantitative area comparison demonstrated the possibility to achieve up to 50% area reduction within the same technology platform. The electrical analysis further proved with post P&R data that these area benefits can be obtained without performance penalty and power reduction up to more than 20%. Finally, the criticality of the trade-off between routability and IR-Drop for these geometries was quantified, showing that power mesh dimensions for tight IR-Drop requirements can determine up to more than 10% area penalty. All these results combined demonstrated the feasibility to mimic the PPA gains of a new node through cost-effective DTCO solutions rather than pitch scaling. In fact, based on the cost model available at imec, the estimate of the wafer cost increase due to the adoption of the N5 scaling boosters was



Figure 4.10 CDF of IR drop values across power mesh.



Table 4.7 IR drop Comparison for different PDN dimensions.



Figure 4.11 CDF of IR drop values across power mesh.

in the range of 5%, and it is therefore largely overcompensated by the 50% area shrink. Figure 4.11 consolidates the PPAC comparison.

## 4.2 Introduction of Cobalt in the BEOL

### 4.2.1 Technology Parameters

As illustrated in subsection 1.2.3, the exponential resistivity increase determined by extreme metal width reduction, and the need for improved EM properties, triggered the exploration of new barrierless materials to replace traditional copper based interconnects. As previously mentioned, Cobalt is one of the main candidates, and it has already been introduced in some leading edge nodes. According to the work in [100], Cobalt lines resistance starts to outperform Copper for below 13nm metal widths. Since the state-of-the-art metal widths for the processes in production are in the range of 20nm, it is likely that the transition has been mainly driven by the need to improve electromigration and reliability. Finally, the transition to Cobalt is being studied as a promising opportunity to significantly reduce the minimum run length (MRL) of the metals. Although final data are not yet published, early studies seem to indicate a MRL between 30 and 40nm for this material, while 80nm had been assumed for Copper (Table 3.4). Four BEOL scenarios with different  $M_X$  material and dielectric combinations were considered, as in Table ??, and their PPA impact was benchmarked in P&R. The reference BEOL is entirely copper based with low-k (2.8) dielectrics. From Scenario 1 we want to learn the electrical impact of replacing Co with Cu on all  $M_X$  layers. Scenario 2 also has Co on all  $M_X$  and a high-k (4.2) dielectric on Metal 1. Replacing low-k

|           | M1       |          | M2       |          | M3       |          |
|-----------|----------|----------|----------|----------|----------|----------|
|           | Material | <i>k</i> | Material | <i>k</i> | Material | <i>k</i> |
| Reference | Cu       | 2.8      | Cu       | 2.8      | Cu       | 2.8      |
| Scenario1 | Co       | 2.8      | Co       | 2.8      | Co       | 2.8      |
| Scenario2 | Co       | 4.2      | Co       | 2.8      | Co       | 2.8      |
| Scenario3 | Co       | 4.2      | Co       | 4.2      | Co       | 4.2      |
| Scenario4 | Co       | 4.2      | Cu       | 2.8      | Cu       | 2.8      |

Table 4.8 Material and dielectric configurations for different BEOL scenarios

with high-*k* dielectrics is favourable for reliability. Scenario3 is expected to be electrically the worst, using Cobalt and high-*k* dielectrics on all  $M_X$ . Scenario4 differs from the reference in M1 only, where Cu and low-*k* are replaced by Co and high-*k*.

### 4.2.2 IP-Block Benchmarking

For the benchmarking, the ARM 64-bit CPU was used, and a re-spin of synthesis and P&R was done for each BEOL scenario. The evaluation was done at maximum frequency, so that the impact on timing could be more evident. The changes in the  $M_X$  layers properties were taken into account both in the *.ict* and in the *.lib* files, as M1 they also affects the standard cells. Moreover a lower M1 (35nm) was specified for metals with Cobalt option. The relative deltas for the main electrical metrics are summarized in Table 4.9. In Scenario1 the same dielectrics as the reference are used and total power, PinCap and WireCap have variations below 3% . Violating paths increase of 30% compared to the reference due to the introduction of Cobalt in all  $M_X$ . In Scenario2 WireCap is roughly unchanged, since M2 and M3 dielectrics are kept low-*k*, but the high-*k* introduction in M1 dielectric determines a PinCap and power increase in the range of 5% compared to the reference. This causes a further degradation of timing with a 93% increase of failing paths. Predictably Scenario3 exhibits the worst electrical figures, as Cobalt and high-*k* dielectrics are used in all  $M_X$ , determining a power increase of 8% and a 130% increase in the number of violating paths versus the baseline. In Scenario4 PinCap and total power increases are below 3%, while WireCap stays roughly unchanged. The slightly improved timing, with 12% less failing paths, can be explained with a lower MRL on M1 that seems to allow the optimization to overcompensate the high-*k* in M1. This scenario is particularly interesting because it can allow a lower MRL on M1, with a more reliable dielectric and a minor impact on the electrical metrics. Section 4.3 will show an usage of this configuration in order to design more compact cells.

| %Variations compared to Reference BEOL |             |         |          |                 |          |
|----------------------------------------|-------------|---------|----------|-----------------|----------|
|                                        | Total Power | Pin Cap | Wire Cap | Violating Paths | WNS/Tclk |
| Scenario1                              | 1.58        | 1.77    | -2.84    | 30.53           | 5.2      |
| Scenario2                              | 5.43        | 5.87    | -0.31    | 93.15           | 7.6      |
| Scenario3                              | 8.44        | 7.85    | 9.94     | 130.23          | 6.2      |
| Scenario4                              | 2.61        | 2.80    | 0.65     | -12.18          | 0.2      |

Table 4.9 Percent variations of electrical metrics versus reference BEOL. Benchmarking done at maximum target frequency.

## 4.3 Transition to 5-Tracks standard cells

In this Section we will attempt to further reduce track height from 6 to 5-Tracks. The study will highlight that, unlike for the transition from 7.5 to 6-Tracks, enabling an efficient 5-Tracks cell architecture requires a change in the ruleset, dictated by the design arcs illustrated in subsection 4.3.1. Moreover, as shown in subsections 4.3.2 and 4.3.3, a major PPA tradeoff emerges between the 6-Tracks and 5-Tracks. All the scaling boosters described in Section 4.1 for the 6-Tracks were also used in the 5-Tracks and the P&R benchmarking was done on the ARM 64-Bit CPU.

### 4.3.1 Design Arcs and patterning

As previously mentioned a significantly more aggressive MRL rule is expected to be obtained moving from Copper to Cobalt interconnects. While this matter is still debated in literature, and the actual minimum area reduction needs to be experimentally confirmed, we will prove that this specification is crucial in order to fully leverage the area benefits of the 5-Track cells. Figure 4.12 graphically illustrates the main design arcs described by Equations 4.1-4.4. These constraints can be individually analyzed by a combined analysis of standard cell architecture and pin access considerations. In the cell architecture proposed the pins are on  $M0$  (MINT), and the connections internal to the cell are completed through the usage of the  $M1$  layer, that is also usable by the router. Through Equation ?? we want to impose the possibility to vertically stack two pins on the same track, that is meant to counterbalance the increased pin density with a denser pin access scheme. Equation 4.2 and 4.3 we want to



Figure 4.12 Graphical description of the design arcs required for efficient standard cell design and Place and Route.

Table 4.10 Ruleset honouring the constraints in Equations 4.1- 4.4.

| parameter  | N7 value [nm]                          | value |
|------------|----------------------------------------|-------|
| $MP_{M0}$  | Metal Pitch of Metal 0                 | 32nm  |
| $CD_{V0}$  | Critical Dimension of Via 0            | 16nm  |
| $EXT_{V0}$ | Minimum Extension of M1 over Via 0     | 11nm  |
| $T2T_{M1}$ | Minimum Tip to Tip distance on Metal 1 | 25nm  |
| $MRL_{M1}$ | Minimum Run Length on Metal 1          | 38nm  |
| $n_T$      | Number of Tracks                       | 5     |

make sure that the vertical abutment of two cells is legal, when the most outbound  $M0$  pin is accessed. Equation ?? finally allows to escape one  $M0$  pin every two towards upper layers, facilitating pin access. These requirements are satisfied with the dimensions in Table 4.10, that is compliant with our N5 EUV ruleset assuming Cobalt Metallization for M1.

$$n_T \cdot MP_{M0} \geq 2 \cdot T2T_{M1} + 2 \cdot MRL_{M1} \quad (4.1)$$

$$2 \cdot MP_{M0} \geq 2MRL_{M1} + T2T_{M1} \quad (4.2)$$

$$2 \cdot MP_{M0} \geq 2CD_{V0} + 2 \cdot EXT_{V0} + T2T_{M1} \quad (4.3)$$

$$MRL_{M1} \leq 2CD_{V0} + 2 \cdot EXT_{V0} \quad (4.4)$$

|                    |          |                                                            |                                                        |
|--------------------|----------|------------------------------------------------------------|--------------------------------------------------------|
|                    |          |                                                            |                                                        |
| <b>Library</b>     | 6-Tracks | 5-Tracks non D1-D4 compliant<br>( $MRL_{M1}=65\text{nm}$ ) | 5-Tracks D1-D4 compliant<br>( $MRL_{M1}=38\text{nm}$ ) |
| <b>Height [um]</b> | 0.192    | 0.16                                                       | 0.16                                                   |
| <b>Width [um]</b>  | 0.252    | 0.336                                                      | 0.252                                                  |

Figure 4.13 Comparison of an AO21D1 cell in 6-Tracks vs 5-Tracks libraries.

Table 4.11 Normalized area metrics for the three libraries in Figure 4.13.

| <b>Library</b>                      | <b>Max den-sity</b> | <b>Standard Cells Area</b> | <b>Core Area</b> |
|-------------------------------------|---------------------|----------------------------|------------------|
| 6-Tracks                            | 0.80                | 1.00                       | 1.00             |
| 5-Tracks ( $MRL_{M1}=65\text{nm}$ ) | 0.75                | 0.89                       | 0.95             |
| 5-Tracks ( $MRL_{M1}=38\text{nm}$ ) | 0.80                | 0.83                       | 0.83             |

### 4.3.2 Physical Results

Figure 4.13 shows an AO21D1 cell in a 6-Tracks library, in a 5-Tracks library not honoring the  $MRL_{M1}$  constraint in Table 4.10, and in a 5-Tracks library fully honoring Table 4.10 requirements. The cell comparison demonstrates that for complex cells, the area improvements of 5-Tracks are not fully leveraged unless an aggressive  $MRL_{M1}$  allowing to satisfy Equations 4.1- 4.4 can be used. In fact, in the cell architecture with  $MRL_{M1}$  of 65nm, the impossibility to leverage a denser intra-cell connection and pin access scheme, determines a width enlargement that mitigates the gains of lower cell height. The complete area comparison for all the cells of the three libraries is reported in [101]. Table 4.11 shows the maximum placement density, normalized cell area and chip area for all the three libraries. We can see that for the  $MRL_{M1}=65\text{nm}$  library, the standard cell level scaling is not ideal with 11% area reduction rather than the ideal 17% (5/6). Additionally a routability degradation of 5% density, limits the overall core area reduction to 5% compared to the reference 6-Tracks run. Moving to a  $MRL_{M1}=38\text{nm}$  allows to match the ideal cell area gain iso-placement density, resulting in a core area 17% smaller than the 6-Tracks.

### 4.3.3 Electrical Results

Let's finally examine the electrical implications of moving to 5-Tracks. The downside of lower track-height is fin depopulation, that can result in performance issues. Figure 4.14 suggests that in order to avoid the number of fins per devices to be reduced from 2 to 1 moving from 6 to 5 Tracks, more aggressive rules should be used in the MOL, such as a tighter p-n separation, a lower ploy cut height, and a lower gate extension over the fin. Based on the same standard cells footprints (i.e same *.lef*) we created two different electrical flavors (i.e *.lib*) of the 5-Tracks libraries: One with two fins per device, and one with a single fin. The results of the PPA benchmarking are summarized in Figure 4.15. Both the 5-Tracks library deliver a 17% area gain versus the reference 6-Track. In case of the 5-Tracks 2-Fins, this gain is achieved iso-performance and iso-power, while in the case of the 5-Tracks 1-Fin a 20% power gain is obtained, but with a performance loss in the range of 50%. The qualitative explanation for this, is that 6-Tracks 2-Fins and 5-Tracks 2-Fins have similar pin capacitance and  $I_{ON}$ . The lower pin capacitance of the 1-Fin device option allows a significant power reduction, but unlike in the transition from 3 to 2 fins, is not enough to compensate for the loss of  $I_{ON}$ . The fair comparison, using the same MOL for 6 and 5-Tracks, will therefore establish a major tradeoff between area power and performance, determining the library choice to be tightly tailored to the application domain.

## 4.4 FinFet vs NanoWires comparison

In Chapter 1 we highlighted that the transition from planar to FinFET device was driven by the need for a recovery of the electrostatic control of the gate over the channel. This concept is taken to the extreme by the lateral nanowire device [102] [47] [66], whose cross section along the metal gate is depicted in Figure 4.16. The main difference with a traditional FinFET, is that for the lateral nanowires each fin is constituted by multiple vertically stacked cylindrical channels that are completely wrapped by the gate. For this reason this type of devices are also called Gate All Around (GAA) FET.

### 4.4.1 Block Level electrical comparison

We adopted the algorithm described in Figure 2.11 in order to electrically compare the Lateral Nanowire (LNW) device versus FinFET. The benchmarking was based on the LDPC core, using a 7.5-Tracks library with 3-Fins per device. Two different LNW libraries were tested: The first library having three vertically stacked nanowires per fin (LNW3), and a second library with only two vertically stacked nanowires per fins (LNW2). Table 4.12 indicates the



Figure 4.14 MOL constraints for 5-Tracks standard cells. (a) with two fins per devices. (b) with one fin per device.



Figure 4.15 Normalized PPA metrics for 6-Tracks and 5-Tracks libraries.



Figure 4.16 3-D sketches of FinFET and lateral NWFET. Source: [19].

| Target Frequency | 0.25 fmax | 0.4 fmax | 0.55 fmax | 0.7 fmax | 0.85 fmax | 0.935 fmax | fmax |
|------------------|-----------|----------|-----------|----------|-----------|------------|------|
| <b>FF</b>        | ✓         | ✓        | ✓         | ✓        | ✓         | ✗          | ✗    |
| <b>LNW3</b>      | ✓         | ✓        | ✓         | ✓        | ✓         | ✓          | ✗    |
| <b>LNW2</b>      | ✓         | ✓        | ✓         | ✗        | ✗         | ✗          | ✗    |

✓ Timing met      ✗ Timing not met

Table 4.12 Timing closure across the frequency sweep for FF and LNW devices.

timing closure for each of the three libraries across the frequency sweep, where as timing closure criteria we set a threshold of 5% both on the WNS normalized to the clock period, and on the number of failing paths compared to the total paths. At fmax none of the devices closes timing. At 93.5% fmax only the LNW3 meets the timing targets. At 85% fmax and below both the FF and LNW3 libraries pass the timing check. The LNW2 device resulted viable only for lower frequency targets, meeting timing for 55% fmax and below. From these results we can estimate a performance gain of the LNW3 versus FinFET between 5 and 10%, while reducing the number of vertically stacked fins to 2 causes a performance drop of more than 40%. Figure 4.17 shows the power gain of the LNW3 and LNW2 versus finFET for each frequency point where the timing requirements were satisfied. We can see that the LNW3 is approximately 5% more power efficient than the FinFET across all the frequency sweep. The LNW2, although not being suitable for high performance, shows 20% lower power than the FinFET for the frequency range in which both the devices are viable. Overall these results show the possibility to use a GAA FET architecture in order to outperform FinFET on both power and performance, and the importance to choose the number of vertically stacked wires based on the type of application.



Figure 4.17 Power gains of LNW devices versus FinFET for frequency points where LNW close timing.

#### 4.4.2 RO level level variability assessment

Although we did not perform in this work an IP-Block based assessment of device variability, we derived the following conclusions from a Ring Oscillator based study. For the same configuration of FinFET and LNW3, delay spreads can be up to 9% smaller for LNW3 with respect to FinFET. Going from 3 to 2 fins, is possible with just 4% deterioration in nominal delay and spreads. Going to 1 fin deteriorates nominal delay by 43% and its spread by 63%. This demonstrates that besides the performance challenges documented in the previous section, the transition to a single-fin scenario would determine a dramatic increase of variability. This trend makes even more important the efforts towards a Statistical STA (SSTA) approach made by EDA [103].

### 4.5 High Performance challenges

In this subsection 4.5.1 we will show the performance benefits of introducing high drive cells into the library, extending the study presented in Section 4.1. We will then illustrate in subsection 4.5.2 how the EMIR requirements for high-performance designs further intensify the tradeoff between routability and power integrity. Subsection 4.5.3 explains the idea of tightening the M1 gear ratio with poly, in order to resolve this tradeoff.

### 4.5.1 Higher Drive cells and physical synthesis

In order to target higher performance, we can try to introduce into the library higher drive cells. Drive strength measures the capacity of a cell to deliver a certain current to an output load. Typically the cells with "1X" or "D1" suffix indicate the unit drive strength, and the label of the higher drive cells refers to the unit drive. In the studies presented in the previous sections of this chapter, the maximum drive in the standard cell library was D8. We now want to test the performance improvements that can be obtained by using drives up to D32. The comparison between a layout of a D1 (INVD1) and a D8 inverter (INVD8) is reported in Figure 4.18. The cells with different drive strength have the same functionality, but they differ in the number of stages and output pins, that is of course associated to an area difference. The expanded library was tested on the ARM 64-bit CPU, initially using the logical synthesis flow. As no performance benefits were observed versus the run with no high drive cells, a physical synthesis flow, as described in Section 2.2, was adopted. Figure 4.19 compares the WNS as a percentage of clock period for the logical and physical flows, after synthesis and after P&R. Assuming as a conventional threshold for timing closure a WNS of 10% of the clock period, we can see that in logical synthesis no timing issues are detected at the maximum frequency ( $f_{max}$ ), and even at a frequency target 10% higher than  $f_{max}$ . At P&R severe timing issues emerge, indicating a relevant miscorrelation between the wireload estimate at synthesis and the actual wireload in P&R. This optimistic estimate prevents an appropriate usage of the high drive cells. Instead, in the physical synthesis flow, the WNS and TNS problems are properly detected through a more realistic wireload estimation derived from placement, allowing the optimization engine to restructure the netlist accordingly. Cell sizing is a part of the optimization, determining a more efficient usage of the higher drive cells. Through the physical flow a better correlation between synthesis and P&R timing metrics was obtained, with a performance improvement in the range of 10%.

### 4.5.2 (EM)IR issues becoming the bottleneck

We already saw in Section 4.1, that for the power densities corresponding to a frequency target of 1.5GHz, a value of the PDN spacing ( $S_i$ ) in the range of  $1\mu m$  was already necessary in order to meet the IR-drop specifications. As modern high performance CPUs normally operate in the beyond 2GHz range, a physical exploration considering PDN scenarios with spacings down to 300nm was performed. Such a dense PDN further aggravates the tradeoff between IR and routability witnessed in Section 4.1. In fact, the more the  $S_i$  spacing is reduced and the more the number of the standard cells whose width is comparable with that spacing increases, causing placement legalization and pin access problems. A design solution



Figure 4.18 Layout comparison of a D1 and D8 Inverter in a 6-Tracks library.



Figure 4.19 Timing comparison using high drive cells, with and without the physical synthesis flow.

|                   |                              | PDN spacing        |                    |                      |                     |
|-------------------|------------------------------|--------------------|--------------------|----------------------|---------------------|
|                   |                              | 48-CPP (2 $\mu$ m) | 24-CPP (1 $\mu$ m) | 12-CPP (0.5 $\mu$ m) | 8-CPP (0.3 $\mu$ m) |
| 6-Tracks          | <b>Max placement density</b> | 0.8                | 0.7                | <i>Unroutable</i>    | <i>Unroutable</i>   |
|                   | <b>Normalized Cell area</b>  | 1                  | 1                  | n/a                  | n/a                 |
|                   | <b>Normalized Core area</b>  | 1                  | 1.14               | n/a                  | n/a                 |
| 6-Tracks (porous) | <b>Max placement density</b> | n/a                | 0.775              | 0.775                | 0.775*              |
|                   | <b>Normalized Cell area</b>  | n/a                | 1.10               | 1.20                 | 1.30*               |
|                   | <b>Normalized Core area</b>  | n/a                | 1.14               | 1.24                 | 1.34*               |

Table 4.13 Maximum placement density, normalized cell and core areas for different PDN scenarios. 6-Tracks and porous 6-Tracks cells are compared. \*Numbers for the 8CPP scenario are extrapolated.

to this issue is to make more cell types porous. A possible criterion to decide whether to make a cell type porous, is to calculate the ratio between the width of the cell and the PDN spacing, and set a threshold beyond which dummy tracks need to be inserted. We introduced porosity if the original cell width was larger than 0.5 times the PDN spacing, obtaining the results in Table 4.13. The 6-Track library with no porous cells shows a placement density degradation from 80% to 70%, tightening the PDN spacing from 2 $\mu$ m (48CPP) to 1 $\mu$ m (24CPP), with consequent core area increase. The scenarios with 0.5 $\mu$ m or 0.3 $\mu$ m spacings resulted not routable with this library. The porous 6-Tracks library was not tested for the 2 $\mu$ m spacing since all the cells have widths lower than 1 $\mu$ m. For a 1 $\mu$ m spacing, the cell enlargement due to the porous cells (+10% area) is roughly compensated by the routability recovery (+7.75% density) and the final core area is comparable with the one obtained with the non porous library. The porous 6-Tracks library is viable for a 0.5 $\mu$ m (12CPP) spacing, if all the cells wider than 250nm are made porous, that increases the cell area of 20% compared to the non porous library. Routability is kept high under these conditions, but the core area degradation goes beyond 20% compared to a 2 $\mu$ m spacing scenario. For the 0.3 $\mu$ m (8CPP) scenario we extrapolated the data, in order to show the trend. Introducing dummy tracks in the cells larger than 150nm, would basically mean to modify the greatest part of the library, causing a 30% cell area increase that would be propagated to core area, even under the assumption of keeping the same density as the 12CPP spacing scenario. The fact that an area degradation of more than half a node is encountered for aggressive PDN spacings, testifies that the EMIR requirements become a major bottleneck for high performance implementations at advanced nodes. This problem highlights the need of a radical change in the cell architecture aimed to natively make the cells more porous. This idea is explained in subsection 4.5.3.

### 4.5.3 Tighter Gear ratio between M1 and poly

Figure 4.20 (a) shows the standard cells layers from the poly to Metal 1, for a technology node not using MINT layer. In this scheme M1 needs to align with the source and drain regions contacts. If a second MOL layer (MINT) is added for horizontal routing we have a scheme like in Figure 4.20 (b), that is representative of the N7 and below technology nodes. In this case M1 tracks can be moved independently from the poly pitch, and the unitary Gear Ratio (GR) between poly and M1 pitch is no longer a hard constraint. Figure 4.21 shows the three different GR scenarios: 1, 2/3 and 0.5. The GR=1 is the baseline scenario, that was adopted for all the experiments in Chapter 3 and 4. In this situation poly and M1 have the same pitch and are interleaved with a half-pitch offset. Tightening the pitch of Metal 1 multiplies the routing resources available on M1, allowing a more efficient standard cell design and de-congesting the tightest vertical layer. The increased number of vertical tracks will also make the cells intrinsically porous to the vertical elements of the power grid, offering an attractive solution for the power integrity-routability tradeoff. Adopting in our N5 a GR of 2/3 implies moving to an M1 pitch of 28nm. According to the plot in Figure 1.10, a 28nm pitch is pushing EUV single print to the resolution limits, possibly requiring SAQP or EUV double patterning for optimal yield. The third option examined, is to tighten the GR to 0.5, that results into a 21nm pitch. We can see from Figure 4.21 that for a 0.5 GR, M1 aligns with both poly and the M0A, providing extended degrees of freedom for cell design and routing. Finally we can consider the alignment of M1 grid and cell boundary in the three scenarios. For a GR of 1 and 0.5 the cell boundary has the same offset from M1 regardless from the cell orientation. However in the 2/3 GR case, this offset changes depending if the cell is at an even or odd multiple of the CPP. In order to enable more legal combinations of horizontally abutting cells, two different versions of each cell (even and odd) might be produced. While this helps to avoid empty spaces between the cells, saving area, it also creates the significant overhead of duplicating the library. The solution adopted in the transition N3, described in Chapter 5, was to move to an 0.5 GR. This choice is in line with the aggressive scaling of  $M_X$  dictated by our roadmap.

## 4.6 Summary and conclusions

A predictive N5 node was defined, and the components of its PDK generated. The transition from a 7.5-Tracks library with 3 Fins per device, to a 6-Tracks library with 2 Fins per device was investigated through a PPA benchmarking. The comparison showed for the 6-Tracks 2-Fins similar performance and up to more than 20% lower power compared to the 7.5-Tracks library. This is explained by the reduced pin capacitance for the 6-Tracks library,



Figure 4.20 Motivations behind the traditional usage of 1:1 gear ratio between M1 and Poly.



Figure 4.21 Poly and M1 grids for different Gear Ratios: 1, 2/3 and 0.5.

that compensates for the reduced drive current. In terms of area, it was possible to maintain for the 6-Tracks, high placement densities (up to 80%), obtaining a 20% core area gain versus the reference 7.5-Tracks. Scaling track height without losing routability was made possible by the adoption of a new PDN architecture with vertical local rails, and by a cell architecture without M2 power rail and an outbound rail in MINT. New EDA features were introduced to support this PDN strategy, and to improve pin access by opening M1 and MINT to the router. The concept of scaling booster was introduced and defined as "Design, Process or EDA options that when used in conjunction allow to improve PPA at IP-block level". A new 6-Tracks library with SAGC and SDB was designed, and tested. The routability analysis demonstrated the possibility to achieve for this library up to 50% (i.e. a full node) reduced area compared to the reference 7.5-Tracks, assuming the same ruleset. As the wafer cost increase due to the scaling boosters was estimated below 10%, this approach was proved as a viable and low-cost alternative to pitch scaling. The new PDN architecture, introduced as an enabler for the 6-Tracks was studied. A significant trade-off between routability and IR-drop was found, with up to 15% area penalty for dense PDN scenarios, corresponding to tight IR-drop requirements. The electrical impact of introducing Cobalt in the BEOL was studied. Literature data show that Cobalt offers significantly reduced EM, and minimum area compared to Copper interconnects. At our N5 dimensions replacing Copper with Cobalt in M1 has a negligible electrical impact. On the other hand, reducing the minimum area constraints on M1 from Copper (70nm) to Cobalt (35nm) assumptions, was found to be a key enabler in order to design and route an area efficient 5-Tracks library, with an area gain versus the 6-Tracks of only 5% for Copper M1, and 17% for Cobalt M1 libraries. Keeping the MOL constraints unchanged compared to the 7.5 and 6-Tracks library, the transition to 5-Tracks implies a further fin depopulation, leaving room for only one fin per device. The PPA benchmarking of the 5-Tracks 1-Fin showed a 50% performance loss compared to the 6-Tracks 2-Fins, although for lower frequencies the 5-Tracks library is up to 20% more power efficient. On top of this, RO based variability studies highlighted more than 60% increase in delay variability for the 1-Fin device (versus the 2-Fin). Experiments to evaluate the electrical impact of replacing the FinFET device with lateral Nanowire device were also performed. Using lateral nanowires with three vertically stacked wires per fin allowed to achieve performance gains between 5% and 10% compared to FinFET, with 5% lower power. Reducing the number of vertically stacked nanowires to 2 causes a 50% performance loss compared to the reference FinFET scenario, but with 20% lower power in the low-frequencies range. Introducing high drive cells in conjunction with physical synthesis, a 10% performance increase was obtained. However for higher performance and power densities, the IR-routability trade off is aggravated, and for extremely dense PDN

scenarios an area penalty of more than half a node was observed. Introducing "porous" cells can partially mitigate the impact, but a structural solution can only be provided by tightening the gear ratio between M1 and poly pitches from a unitary value to 2/3 or 0.5. The concept behind this strategy is to multiply the routing resources on M1 and make all the cells intrinsically porous to the power mesh.



# Chapter 5

## Pathfinding for below N5

In this Chapter we will explore how to enable technology nodes below N5. Our predictive N3 node targets 42nm CPP and 21nm  $M_X$ , and the other main process assumptions are summarized in Table 5.1. The scaling boosters and standard cell strategies adopted to design 5.5 and 4.5-Tracks libraries in N3 technology will be illustrated in Section 5.1. In Section 5.2, we will present the results of the cross-node comparison between our N5 and N3, including PPA, cost, and IR-drop analysis based on the 64-Bit CPU from ARM. In Section 5.3 we will explore P&R physical results deriving from the adoption of 4-Tracks and 3-Tracks standard cell libraries based on a new device: the Complementary FET (CFET). We will explain how the transition to this device can constitute a viable alternative in order to target a more area efficient and cost-effective node compared to the FinFET based N3. Given the area gains versus the reference N3, the CFET based node can be classified as a potential solution for a N2 technology.

Table 5.1 Main process parameters for N7, N5 and N3 nodes.

| <b>Node</b>              | <b>N7</b>         | <b>N5</b>         | <b>N3</b>         |
|--------------------------|-------------------|-------------------|-------------------|
| <b>Track Height (TH)</b> | <b>7.5]</b>       | <b>6</b>          | <b>5.5</b>        |
| <b>Process parameter</b> | <b>value [nm]</b> | <b>value [nm]</b> | <b>value [nm]</b> |
| Gate length ( $L_g$ )    | 21                | 18                | 15                |
| Gate Spacer width        | 8                 | 8                 | 6                 |
| Fin Height ( $H_{fin}$ ) | 45                | 45                | 55                |
| Fin Pitch                | 30                | 24                | 21                |
| Fin Width ( $W_{fin}$ )  | 5                 | 5                 | 5                 |
| p/n separation           | 85                | 67                | 20.5              |
| $M_X$ Pitch              | 40                | 32                | 21                |
| Gate Pitch (CPP)         | 54                | 42                | 42                |

Table 5.2 Scaling boosters for N3 node.

| Scaling Booster             | Description                                                          |
|-----------------------------|----------------------------------------------------------------------|
| Super Via (SV)              | Via connecting layer $n$ with layer $n+2$                            |
| M1 Gear Ratio (GR)          | Lowering Gear Ratio of M1 versus poly                                |
| Spacer Defined Cut (SDC)    | Process flow for a more aggressive Tip-to-Tip with regular pattern   |
| Mid Track Hand Shake (MTHS) | Process flow for a more aggressive Tip-to-Tip with staggered pattern |
| Buried Power Rail (BPR)     | Power rail buried into the substrate                                 |
| Ru Interconnects            | Ruthenium as interconnect material                                   |
| Nanosheet (NSH)             | Gate all around device with enlarged width                           |
| CFET                        | 3D Device with stacked n-type and p-type devices                     |

## 5.1 Standard cells for the N3 Node

In Chapter 4 standard cells libraries of 7.5, 6 and 5-Tracks with 3, 2 and 1 fin per device respectively, were used in a N5 technology. Here reduced track-height cells of 5.5 and 4.5 tracks will be considered, combined with a tighter metal pitch (21nm) compared to the previous node (32nm), keeping the CPP unchanged (42nm). The proposed architectures for 5.5 (subsection 5.1.2) and 4.5-Tracks (subsection 5.1.4) have 2-Fins and 1-Fin per device respectively. Therefore an electrically fair comparison will be between the 6-Tracks in N5 and the 5.5-Tracks in N3, and between the 5-Tracks in N5 and the 4.5-Tracks in N3. The maximum area gain at cell level that can be obtained by the new node, will be given by the ratio of the cell area metrics in Equation 1.2 across the two nodes. This yields a 40% area shrink for both N3 versus N5 scenarios (2-Fin and 1-Fin architectures).

### 5.1.1 Scaling Boosters for the N3 Node

On top of the scaling boosters presented for N5 (Section 4.1), the enablement of our N3 libraries required the usage of additional DTCO solutions that are summarized in Table 5.2 and described in the following paragraphs. In the description of the cell architectures we will highlight that unfortunately, many of these solutions are mandatory in order to allow the basic standard cell template. This is quite different from N5, where the scaling boosters were mainly optional solutions to further improve PPA.

**Super Via:** A Super Via is a via fabricated between metal layer  $n$  and metal layer  $n+2$ . The supervia structure can be realized with deep-etch technologies [104]. In Figure 5.1 the



Figure 5.1 (a) Stacked Vias with landing pad. (b) Supervia.

difference between a supervia and two vertically stacked conventional vias is illustrated. The main advantage of the supervia is that it can provide a direct connection to the upper layer without the need of a landing pad in the intermediate layer. In fact the landing pad needs to honor the minimum area and minimum run length constraints, causing increased congestion and reducing routing resources available. This scaling booster is particularly attractive for advanced nodes, where the escalation of pin densities determines an increasing need for layer promotion and solutions facilitating pin access. As highlighted in [104] a commercial EDA tool capable of modelling and inserting supervias is not yet available. It is intuitive that a similar enhancement is quite disruptive for EDA, as all the input files and optimization engines are based on the assumption that vias and metal layers are interleaved. For these motivations in this work the supervias were used for pins inside the standard cells, and regular vias were kept for signal routing.

**M1 Gear Ratio:** The motivations for a Gear Ratio lower than 1 between M1 and poly were extensively illustrated in subsection 4.5.3. Essentially, having an M1 pitch which is tighter than the gate pitch facilitates standard cell design and improves routability, by increasing the number of vertical routing tracks. An 0.5 GR between M1 and poly determines that M1 tracks will alternatively overlap either with M0A or poly tracks. This is particularly convenient if used in conjunction with super via pins. In fact combining the two scaling boosters will allow to escape S/D or Gate pins directly to Metal 1, as depicted in Figure 5.2. The shapes on Metal 1 will then be normally accessed by the router.

**Spacer Defined Cut:** The idea for a Spacer Defined Cut [23] [22] comes from the tight requirements on tip-to-tip that arise in M0A based on the N3 5.5-Tracks standard cell template in Figure 5.6. The T2T in the range of 10nm that is required to enable such architecture is



Figure 5.2 OAI cell in 5.5-Tracks showing the usage of SV pins



Figure 5.3 Layout of a 4.5-Tracks AOI cell using MTHS.

not achievable through conventional metal cuts. Since this cut will be aligned in the middle of the cells it was proposed to define it through an additional spacer growth step.

**Mid Track Handshake:** The process flow for the Mid Track Handshake (MTHS) is documented in [22]. From our perspective it is more important to highlight the concept, and its usage in standard cell design. Designing complex cells in N3 4.5-Tracks, can result into the necessity to cross connect the pull up network and the pull down network using the middle track as shown in Figure 5.3. The handshake requires a tip-to-tip on MOA equal to a half MINT pitch (10.5 nm). Moreover these tip-to-tips are not aligned and they form a staggered pattern. The process in [22] allows to achieve this construct.

**Buried Power Rail:** The Buried Power Rail (BPR) was proposed in [27], [20] and [105]. This process flow is aimed to integrate the power rail in the substrate between the fins, at cell boundaries (Figure 5.4). Compared to a traditional power rail, the BPR can be engineered



Figure 5.4 Cross section of front end with Buried Power Rail. Source: [20].

with an increased aspect ratio, that reduces the resistivity without subtracting horizontal resources to the cells. This solution is particularly interesting for extremely low track heights, where the usage of the BPR allows to virtually recover one or more tracks, to be used for routing or inter-cell connections. The buried layer is located one layer below MINT, and will be accessed through special tap cells that provide the via between BPR and MINT. The tap cells will then be connected to the upper layers of the power grid.

**Ruthenium Interconnects:** As explained in subsection 1.2.3 barrier-less materials are being investigated in order to mitigate the exponential increase in line resistance for metal widths of 20nm and below. The intrinsic resistivity for Cobalt and Ruthenium is higher than for Copper, but the advantage of not having a barrier determines a cross-over for sufficiently reduced line widths. According to [100], for the same trench cross-section, Ru starts to outperform Cu beyond 16nm widths, while Co starts to outperform Cu beyond 12nm widths. At our N3 metal widths (12nm), Ru lines are approximately 1.5X less resistive than Co and nearly 2X less resistive than Cu. These considerations motivate the adoption of Ruthenium interconnects as a key scaling booster for both signals and power and ground lines.

**Nanosheet Device:** Lateral Nanosheets have been proposed as possible candidates for beyond N7 devices [21]. These devices are Gate All Around structures with enlarged  $W_{eff}$  compared to the lateral nanowires, offering superior electrostatic control over the channel [106]. The TEM image in Figure 5.5 clarifies the concept. In this cross section three nanosheets are vertically stacked for each fin, and completely surrounded by the gate. The channels, which are running orthogonally to the plane of the picture, are enlarged in the horizontal dimension compared to the nanowires [107].



Figure 5.5 TEM Cross section of Nanosheet devices. Source: [21].

**CFET:** Section 5.3 will be entirely dedicated to this scaling booster. The CFET is essentially a 3D device where the p-type and n-type lateral nanowires the are stacked.

### 5.1.2 5.5-Tracks architecture

The architecture chosen for the 5.5-Tracks cell has four internal routing tracks and double width VDD/VSS power rails as shown in Figure 5.6. This scheme is not compatible with the traditional SAQP process, but it is compatible with SADP-EUV or LELE EUV processes. In this architecture, out of the four internal routing tracks the two northern tracks can be used for the PMOS source/drain and/or gate connections, while the two southern tracks can be used for NMOS source/Drain and/or gate connection. The gate contact is based on SAGC, and therefore can be placed within the active region. Furthermore two fins per device are used. This choice establishes a trade-off with the p/n separation at the centre of the cell (21.5 nm). Additionally a specification of 10.5nm is posed to the M0A T2T. This tight and regular cut was assumed to be enabled by the Spacer Assisted Cut technique. For completeness the cross section of the standard cell template is reported in Figure 5.7, with all the relevant dimensions.

### 5.1.3 4.5-Tracks architecture

The 4.5-Tracks architecture is based on double width VDD/VSS power rails and only three internal routing tracks as shown in the top view of the standard cells template in Figure 5.8. With respect to the patterning, this scheme is compatible with both SAQP + DPT EUV cuts and DPT EUV. In this case only a single fin per device architecture is viable. Reducing the number of fins allows a relatively relaxed p/n separation of 40nm. The M0A tip-2-tip





Figure 5.8 Top view of N3 4.5-Tracks template. Source: [22]

of 17.5 nm at the border of the cell is tight but is expected to be still achievable by fine tuning the conventional cut/block techniques, while the T2T in the center of the cell is relaxed (compared to the 5.5-Tracks) to 30nm. Similar considerations can be derived by the inspection of the standard cells cross section in Figure 5.9. As it will be shown in Section 5.2, the single fin device will have a detrimental impact on performance. For this reason another variant of the 4.5-Tracks architecture based on a Nanosheet device was investigated.

### 5.1.4 4.5-Tracks architecture with Nanosheet

In [106] it was shown that the Nanosheet outperforms the single fin device at N3 dimensions. In the context of our 4.5-Tracks cell architecture, the adoption of the NSH will therefore be beneficial for performance, at the cost of more aggressive specifications on the MOL spacings, as indicated in Figure 5.10. Setting for the Nanosheet a 11nm width, the M0A T2T at the boundary of the cell will become again 10.5nm. Unlike for the 5.5-Tracks, where an uniform cut was required in the middle of the cell, in this scenario the M0A cuts are staggered, requiring a technique like the MTHS to be manufacturable. In terms of standard cells abstract the NSH variant will be equivalent to the 1-Fin 4.5-Tracks, as the layers above MINT are not affected by the device change.

### 5.1.5 Standard cells summary and comparison

Table 5.3 provides a global summary of the usage of the scaling boosters in the different standard cell architectures for N3, comparing them to the reference N5 6-Tracks library.



Figure 5.9 Cross section for 4.5-Tracks cells. Source: [22]



Figure 5.10 Cross section for 4.5-Tracks variant with Nanosheet device. Source: [22]

Based on the CPP,  $M_X$  and track height and Equation 1.2, we calculated the target cell area metric normalized to the N5 6-Tracks, that yields an ideal 40% area shrink for the N3 5.5-Tracks and a 50% area shrink for both the N3 4.5-Tracks libraries. The N5 6-Tracks and the N3 5.5-Tracks have 2 fins per device while a single fin, or NSH is used for the 4.5-Tracks cells. The choice of using a double width power rail (21nm) in the case of all the N3 libraries was suggested by the need of avoiding a highly resistive power rail of 10.5nm width. This choice determines a non-uniform track pattern on the horizontal  $M_X$  layers (MINT and M2). The track pattern will be: 26.25nm - 3 times 21nm - 26.25nm for the 5.5-Tracks and 26.25nm - 2 times 21nm - 26.25nm for the 4.5-Tracks . While a 32nm pitch (N5) can be printed with EUV SE, moving to 21nm pitch (N3) requires the usage of DPT EUV, either for the metal cuts of SAQP lines or directly to print the lines. The 5.5-Tracks scheme is only compatible with the second approach. For the N3 cells the major scaling boosters introduced in N5 were also adopted: the SAGC and the SDB. On top of those the SV pins and the 0.5 GR on the vertical  $M_X$  layers (M1 and M3) were used for N3 cells. Tight tip-to-tip specifications on M0A necessitate the SDC and MTSH for 5.5-Tracks and 4.5-Tracks cells respectively. It was decided not to use the BPR that was instead introduced with the CFET (Section 5.3). Instead, in order to reduce the resistivity of  $M_X$  power and signal lines, Ruthenium interconnects were chosen.

## 5.2 Cross Node Comparison Between N5 and N3

In this section a P&R comparison between the N5 and N3 libraries in Table 5.3 will be presented. The benchmarking is based on the 64-Bit CPU from ARM. Examining the cells from a P&R angle we realize that the width stays unchanged across the three libraries (due to the CPP saturation), while the cell height is dramatically reduced in N3 due to the combined effect of  $M_X$  pitch and track-height reduction. This, together with the reduced M1 GR for the N3 cells, translates into 2X the vertical routing resources compared to N5, while the horizontal resources decrease according to the track-height. We finally highlight that the presence of the SV pins makes impossible to use MINT for routing extensions, while the M1 routing is activated in both the nodes providing intra-cell and short-range connectivity. Table 5.4 summarizes and quantifies these observations.

Before examining in the details the results of the cross node comparison, we wanted to highlight that, given the paramount importance of the IR-drop awareness demonstrated in Chapter 4, a fair PPA benchmarking of N3 versus N5 will need to be driven by an IR-drop evaluation. The methodology that we propose is based on the interdependencies that are qualitatively described in Figure 5.11. The PDN spacing (1) is tightly coupled to routability

|                              | N5 (reference) | N3         |                              |                              |
|------------------------------|----------------|------------|------------------------------|------------------------------|
| Feature                      | 6-Tracks       | 5.5-Tracks | 4.5-Tracks                   | 4.5-Tracks (NSH)             |
| CPP [nm]                     | 42             | 42         | 42                           | 42                           |
| Mx [nm]                      | 32             | 21         | 21                           | 21                           |
| Normalized Cell Area         | 1              | 0.6        | 0.5                          | 0.5                          |
| #of Fins/device              | 2              | 2          | 1                            | 1                            |
| Patterning                   | EUV SE         | EUV DPT    | SAQP + EUV DPT CUTS/ EUV DPT | SAQP + EUV DPT CUTS/ EUV DPT |
| Non-Uniform M0 (and M2) grid | ☒              | ☒          | ☒                            | ☒                            |
| SDB                          | ☒              | ☒          | ☒                            | ☒                            |
| SAGC                         | ☒              | ☒          | ☒                            | ☒                            |
| Super Via Pins               | ☒              | ☒          | ☒                            | ☒                            |
| GR 0.5 of M1 (and M3)        | ☒              | ☒          | ☒                            | ☒                            |
| MTHS                         | ☒              | ☒          | ☒                            | ☒                            |
| SDC                          | ☒              | ☒          | ☒                            | ☒                            |
| BPR                          | ☒              | ☒          | ☒                            | ☒                            |
| Interconnects                | Co/Cu          | Ru         | Ru                           | Ru                           |
| NSH                          | ☒              | ☒          | ☒                            | ☒                            |

Table 5.3 Comparative table showing Scaling boosters usage and technology features across different N5 and N3 libraries.

|                                           | N5 6-Tracks | N3 5.5-Tracks | N3 4.5-Tracks |
|-------------------------------------------|-------------|---------------|---------------|
| Normalized Cell Width                     | 1           | 1             | 1             |
| Cell Height [nm]                          | 192         | 115.5         | 94.5          |
| # of Mx Vertical Tracks per CPP           | 1           | 2             | 2             |
| # of Mx Horizontal Tracks per cell height | 6           | 5.5           | 4.5           |
| M0 Open to router                         | Yes         | No            | No            |
| M1 Open to router                         | Yes         | Yes           | Yes           |

Table 5.4 P&R related considerations on N5 and N3 libraries.

and hence determines core area. A higher performance target (3) also affects core area (2), mainly through an increased area allocated for buffers and inverters. Performance target is obviously correlated to total power (4), that divided by core area determines power density (5). An increased power density causes more severe IR-drop (6). If the target IR-drop is not met a tighter PDN spacing needs to be selected, re-spinning the whole loop. This complexity was addressed with a 4-steps approach structured as below:

- **Step1:** A physical only analysis is performed at low frequencies (or even disabling timing). Placement density is swept for multiple PDN dimensions in order identify maximum density and minimum core area in each PDN scenario. Following the numbering scheme in Figure 5.11 we want to isolate the impact of (1) on (2).
- **Step2:** It is a frequency sweep as described in subsection 2.3.3. The dependency of (4) and (2) from (3) are identified.
- **Step3:** It is an IR-drop evaluation (as explained in 2.2.4) aimed to verify the compliance with the IR-drop specification for different power densities. This step covers the arc from (5) to (6), and triggers the arc from (6) to (1) for the scenarios that do not meet the IR target.
- **Step4:** Once the appropriate PDN dimension has been selected for each scenario a final frequency sweep is launched. Unlike for Step2, the the PDN will now be properly dimensioned for each node-library-performance scenario, allowing to achieve an IR-aware PPA comparison.

Although in this work we only considered the impact of Static and Dynamic IR-drop on power and ground nets, for advanced nodes other effects started to be more and more dominating, severly affecting design closure: Signal IR, Electromigration (EM), both for VDD/VSS lines and for signals, and Self Heating Effect (SHE). All these parameters are correlated to power density that is therefore to be considered as a metric of prominent importance.

### 5.2.1 N3 patterning and ruleset

The transition to more aggressive pitches in N3  $M_X$  layers requires more aggressive lithography compared to N5, where EUV single exposure was considered viable for the tightest metal pitch (32nm) and for the tightest C2C via spacing (42nm). Based on the current resolution of EUV systems, N3 dimensions require intensive usage of EUV double patterning for both metal and via layers. Table 5.5 shows the delta in the patterning assumptions between N5 and



Figure 5.11 Centrality of IR-drop assessment for a fair PPA benchmarking across nodes and technology options.

N3. Basically all the  $M_X$  layers in N3 require either SAQP lines + double patterning EUV for the metal cuts or EUV DPT for the lines.  $V_X$  layers in N5 are placed on a 21nm x 21nm grid. A C2C spacing of 21nm is then needed, requiring the transition to DPT EUV also for the vias.

In Table 5.6 the main rules used for SAQP + DPT EUV cuts are documented. The block width (B.W), that defines the tip to tip, is equal to 18nm and it extends to the adjacent track with a 42nm length. The block spacing rules (Bx.S) are color selective, and the two colors do not interact (Bx.S.4). Moreover the color of the block needs to match the color of the metal (Bx.C). From basic geometrical considerations we can derive that Bx.S.4 and Bx.S.2 will be always honoured, as by construction those spacings are larger than 42nm. The crucial rules will therefore be Bx.S.1 and Bx.S.3 that determine respectively the edge to edge spacing on the same track (32nm) and two tracks away (21nm). While in M2 and M3 the pattern of the blocks will be staggered as depicted in Table 5.6, on M1 the metal cuts will be more regular and aligned at cell boundary, making also the Bx.S.1 and Bx.S.3 rule automatically fulfilled. For these reasons, as the physical analysis in subsection 5.2.4 confirmed, M1 is expected to be easily routable, while the congestion and DRC issues will mainly affect M2.

### 5.2.2 BEOL in N3

As explained in the introduction of this chapter and in [100], Ruthenium interconnects were chosen for N3, since they outperform both Copper and Cobalt in terms of line and via resistance for the metal widths considered. Table 5.7 reports the main deltas in the BEOL assumptions for the  $M_X$  layers between N5 and N3. The metal width was shrunk from 16 to 10.5nm keeping the aspect ratio constant (1.5). Also the dielectric constant ( $k$ ) is roughly

| Layer      | N5                       |            | N3                       |                                                 |
|------------|--------------------------|------------|--------------------------|-------------------------------------------------|
|            | Pitch [nm]<br>  C2C [nm] | Patterning | Pitch [nm]<br>  C2C [nm] | Patterning                                      |
| <b>M0</b>  | 32                       | EUV        | 21                       | SAQP+ DPT<br>EUV CUTS      EUV DPT+<br>SUPERVIA |
| <b>V01</b> | 42                       | EUV        | 21                       | DPT EUV                                         |
| <b>M1</b>  | 42                       | EUV        | 21                       | SAQP+ DPT<br>EUV CUTS      EUV DPT              |
| <b>V12</b> | 42                       | EUV        | 21                       | DPT EUV                                         |
| <b>M2</b>  | 32                       | EUV        | 21                       | SAQP+ DPT<br>EUV CUTS      EUV DPT              |
| <b>V23</b> | 42                       | EUV        | 21                       | DPT EUV                                         |
| <b>M3</b>  | 32                       | EUV        | 21                       | SAQP+ DPT<br>EUV CUTS      EUV DPT              |

Table 5.5 Delta in patterning assumptions between N5 and N3.

| Rule                           | Description                                      | Operator | Value [um]    | Figure |
|--------------------------------|--------------------------------------------------|----------|---------------|--------|
| <b>Mx: SAQP + DPT EUV CUTS</b> |                                                  |          |               |        |
| <b>Mx.W</b>                    | Metal Width                                      | =        | 0.012         |        |
| <b>Mx.S</b>                    | Metal Spacing                                    | =        | 0.009         |        |
| <b>Mx.L</b>                    | Minimum Run Length                               | >=       | 0.032         |        |
| <b>Bx.W</b>                    | Block Width                                      | =        | 0.018         |        |
| <b>Bx.L.1</b>                  | Block Length                                     | =        | 0.042         |        |
| <b>Bx.L.2</b>                  | Maximum Merged Block Length                      | =<       | 1             |        |
| <b>Bx.S.1</b>                  | Same color spacing on same track                 | >=       | 0.032         |        |
| <b>Bx.S.2</b>                  | Same color diagonal spacing on other track       | >=       | 0.032         |        |
| <b>Bx.S.3</b>                  | Same color diagonal spacing With PRL = 0         | >=       | 0.021         |        |
| <b>Bx.S.4</b>                  | Same color spacing on other track with alignment | >=       | 0.021         |        |
| <b>Bx.S.4</b>                  | Different Color Spacing                          | >=       | No Constraint |        |
| <b>Bx.C</b>                    | Block Color                                      | =        | Metal Color   |        |



Table 5.6 Ruleset for N3 SAQP patterning + DPT EUV cuts.

Table 5.7 Main deltas in BEOL for N5 and N3  $M_X$  layers.

|                                                    | N5                   |             | N3                      |
|----------------------------------------------------|----------------------|-------------|-------------------------|
| <b>Material</b>                                    | Cobalt ( <i>Co</i> ) | Copper (Cu) | Ruthenium ( <i>Ru</i> ) |
| <b>Thickness [nm]</b>                              | 24                   | 24          | 16                      |
| <b>Width [nm]</b>                                  | 16                   | 16          | 10.5                    |
| <b>Aspect Ratio</b>                                | 1.5                  | 1.5         | 1.5                     |
| <i>k</i>                                           | 4.2                  | 2.8         | 3                       |
| <b>Line Resistance [<math>\Omega/\mu m</math>]</b> | 670                  | 444         | 995                     |

unchanged in N3. In spite of the improvements achievable with Ruthenium, the cross node comparison of line resistance highlights a more than 2X increase for the 10.5 nm wide lines compared to the 16nm wide Cobalt lines. This problem motivated the re-design of the PDN as described in subsection 5.2.3.

### 5.2.3 Power mesh in N3

In Section 4.1 the PDN introduced for N5 had been illustrated and the reasons for its adoption justified. Figure 5.12 shows layer by layer the modifications introduced in the  $M_X$  layers of the power mesh for N3, compared to the reference N5 PDN. In N5 we had moved to a single CD (16nm) power rail on MINT that is accessed through 21nm wide vertical stripes in M1 and M3, that are running in parallel and are connected through M2 minimum area staples. In the N3 architecture we propose a double CD (21nm) power rail on MINT in order to avoid a 10.5nm rail (Figure 5.12 (a)). For the same reasons we opted for replacing the local vertical rails (M1 and M3) with power staples (Figure 5.12 (c)). In order to make this PDN still viable we will need to intensify the access to the M0 rail with tighter PDN spacing. It is intuitive from Figure 5.12 (b) that in N3 technology the 0.5 GR that was chosen in M1 and M3 will allow for an increased porosity of the standard cells that will often have one free track between neighbouring M1 shapes. We will demonstrate in subsection 5.2.4 how this native porosity enables efficient P&R even with PDN spacings down to 8CPP ( 330nm). Additionally the usage of Ruthenium as material for  $M_X$  interconnects is functional to further reduce the resistance of the local MINT rail, which is particularly critical in the N3 architecture. From Figure 5.12 (d) we see that due to the power staples on M1 and the standard cells M1 shapes, it was decided to align the metal cuts at cell boundary and in the middle of the cells, which makes the M1 layer extremely regular.



Figure 5.12 Comparison between Power delivery network in N5 and N3. Deltas shown for different layers and views. (a) Mint layer with only standard cells shapes. (b) M1 layer with power and ground nets and standard cells shapes visible. (c) Power mesh in  $M_X$  layers. (d) power mesh, standard cells and signals in M1.

### 5.2.4 Physical results

The physical comparison between N5 and N3 extends the study in subsection 4.5.3, adding the maximum placement density, cell area and core area for the N3 libraries. As previously done in N5, the libraries were tested in four different PDN scenarios, in order to quantify the impact of power mesh tightening. The results reported in Table 5.8 clearly demonstrate that the native porosity introduced in the N3 libraries thorough the 0.5 GR between M1 and poly allows to achieve high placement densities even for PDN spacings down to 8CPP (0.3um). In fact the 5.5-Tracks library is routable up to 80% density in a 12CPP (0.5um) spacing scenario, with only 2.5% placement density degradation in the 8CPP scenario. The sensitivity of core area to the power mesh is then dramatically reduced compared to N5, where for spacings below 1um the increased usage of porous cells determined core are losses between 20 and 30% compared to the 2um spacing scenario. For the 4.5-Tracks library a maximum density of 77.5% was obtained with the most relaxed PDN spacing, with a density loss of 5% for the tightest spacing. The 4.5-Tracks library is therefore slightly less routable and more sensitive to the PDN tightening compared to the 5.5-Tracks. Moreover the cell level area scaling is ideal for the 5.5-Tracks, with 40% area shrink compared to the N5 6-Tracks library, while for the 4.5-Tracks, the ideal 50% scaling given by Equation 1.2 is not fully met, due to some standard cells not scaling in an ideal way. The combination of these two issues limits the core area gains that can be achieved with the 4.5-Tracks architecture, with a scaling factor between 0.54 and 0.57 compared to the reference N5 with 2um spacing. As explained before, in order to make a fair area comparison between the two nodes and between the different libraries in N3, we need to understand which columns in Table 5.8 have to be selected and compared for an iso IR comparison. This is made possible by the electrical analysis presented in the next subsection.

### 5.2.5 Electrical Results

As described in the introduction of this chapter, the first frequency sweep was performed with the most relaxed PDN spacing, and the procedure in subsection 2.3.3 was used. In Figure 5.13 the following timing metrics are compared for the different nodes and libraries: The Worst Negative Slack (WNS) as a percentage of the clock period, and the percentage of failing paths. The thresholds adopted as timing closure criteria were 10% for the first metric and 2% for the second. Based on these conventional limits and the analysis of Figure 5.13, we can observe that the N5 6-Tracks 2-Fins, the N3 5.5-Tracks 2 Fins and the N3 4.5-Tracks 1-NSH exhibit similar performance, with the N3 NSH options showing slightly better timing.

|                            |                              | PDN spacing  |              |                   |                   |
|----------------------------|------------------------------|--------------|--------------|-------------------|-------------------|
|                            |                              | 48-CPP (2um) | 24-CPP (1um) | 12-CPP (0.5um)    | 8-CPP (0.3um)     |
| N5<br>6-Tracks             | <b>Max placement density</b> | 0.8          | 0.7          | <i>Unroutable</i> | <i>Unroutable</i> |
|                            | <b>Normalized Cell area</b>  | 1            | 1            | n/a               | n/a               |
|                            | <b>Normalized Core area</b>  | 1            | 1.14         | n/a               | n/a               |
| N5<br>6-Tracks<br>(porous) | <b>Max placement density</b> | n/a          | 0.775        | 0.775             | 0.775*            |
|                            | <b>Normalized Cell area</b>  | n/a          | 1.10         | 1.20              | 1.30*             |
|                            | <b>Normalized Core area</b>  | n/a          | 1.14         | 1.24              | 1.34*             |
| N3<br>5.5-Tracks           | <b>Max placement density</b> | 0.8          | 0.8          | 0.8               | 0.775             |
|                            | <b>Normalized Cell area</b>  | 0.60         | 0.60         | 0.60              | 0.60              |
|                            | <b>Normalized Core area</b>  | 0.60         | 0.60         | 0.60              | 0.62              |
| N3<br>4.5-Tracks           | <b>Max placement density</b> | 0.775        | 0.775        | 0.75              | 0.725             |
|                            | <b>Normalized Cell area</b>  | 0.52         | 0.52         | 0.52              | 0.52              |
|                            | <b>Normalized Core area</b>  | 0.54         | 0.54         | 0.55              | 0.57              |

Table 5.8 Maximum placement density, normalized cell and core areas for different PDN scenarios in N5 and N3. Areas are normalized to the N5 6-Tracks with 48CPP spacing.

\*Numbers for the 8CPP scenario are extrapolated in N5.

On the other hand the 4.5-Track library with 1-Fin device shows a performance loss of at least 12.5% compared to the other options.

Figure 5.14 shows the normalized power, the power gain and the power density increase compared to N5 across the frequency sweep. In the first plot (Figure 5.14(a)), normalized power curves indicate the expected linear increase versus frequency. The power gain versus N5 is different for each of the N3 library at the different frequency points. This information is more clearly quantified in Figure 5.14(b). The 5.5-Tracks 2-Fins library provides power gains between 35% and 25%. The 4.5-Tracks 1-Fin library resulted the most power efficient power gains up to 45% at low frequencies. The 4.5-Tracks with NSH option proved to be the best compromise between area power and performance, with increased performance compared to the 1-Fin option and lower power compared to the 5.5-Tracks 2-Fins. The power gain of the NSH versus N5 is close to 35% for all frequency points. Target density was progressively decreased for all runs increasing frequency as explained in subsection 2.3.3. The area gain of the 5.5-Tracks and 4.5-Tracks versus N5 is therefore constant across the sweep, with core area scaling of 0.6 and 0.54 respectively ( Table 5.8). Since area scaling is larger than power scaling in all N3 runs, a power density increase is expected compared to N5. Figure 5.14(c) illustrates that the power density increase is between 5% and 10% for the 1-Fin option at low



Figure 5.13 Performance comparison between reference N5 and N3 libraries. Horizontal dashed lines indicate timing closure thresholds.

frequencies, between 10% and 20% for the 5-Tracks 2-Fin, and between 20 and 25% for the 4.5-Tracks with NSH.

In order to find out the PDN requirements for the different power densities in the two nodes, the plots in Figure 5.15 were assembled. In these plots, the dynamic vectorless IR drop on the  $M_X$  layers was calculated as a function of power density for the N5 PDN (Figure 5.15(a)) and for the N3 PDN (Figure 5.15(b)). For each node a family of curves corresponding to different PDN spacings (48CPP, 24 CPP, 12CPP and 8CPP) was considered. In the N5 PDN the 8CPP spacing was not investigated since a major re-design of the library would have been needed, as explained in subsection 4.5.3. The Values of IR drop are reported as percentage of VDD and refer to the 99% percentile value as in subsection 4.1.4. The IR drop target on  $M_X$  layers was set to 2.5% of VDD, and it is indicated by the horizontal dashed lines. Given a certain value of power density, the PDN can be selected as the one with the largest spacing below the IR target. The target density is decreased accordingly, based on the values in Table 5.8. Moreover the target density is decreased by 2.5% for each frequency step. As a result, at the second iteration of the frequency sweep, the power densities will be lower than the ones calculated at the first iteration, which guarantees the convergence of the flow in Figure 5.11. Table 5.9 documents the power densities, the appropriate PDN spacings and the final target densities for all the runs of the cross node comparison. The final PPAC comparison is reported and discussed in the next subsection ( 5.2.6).



Figure 5.14 Comparison of power related metrics between reference N5 and N3 libraries. (a) Normalized power curves for all libraries. (b) Power Gain of N3 runs versus N5. (c) Power density increase in N3 runs versus corresponding N5 runs.



Figure 5.15 Values of IR at 99% percentile on  $M_X$  layers versus normalized power for different PDN dimensions. (a) N5 power mesh. (b) N3 power mesh.

|                            | Power density |         |           |        |
|----------------------------|---------------|---------|-----------|--------|
|                            | 50%fmax       | 75%fmax | 87.5%fmax | fmax   |
| <b>N5 6-Tracks 2Fins</b>   | 0.42          | 0.61    | 0.71      | 0.80   |
| <b>N3 5.5-Tracks 2Fins</b> | 0.47          | 0.70    | 0.84      | 0.97   |
| <b>N3 4.5-Tracks 1Fin</b>  | 0.45          | 0.67    | 0.84      | n/a    |
| <b>N3 4.5-Tracks 1NSH</b>  | 0.51          | 0.74    | 0.88      | 1.00   |
| Selected PDN               |               |         |           |        |
| <b>N5 6-Tracks 2Fins</b>   | 24CPP*        | 24CPP*  | 12CPP*    | 12CPP* |
| <b>N3 5.5-Tracks 2Fins</b> | 24CPP         | 12CPP   | 12CPP     | 12CPP  |
| <b>N3 4.5-Tracks 1Fin</b>  | 24CPP         | 12CPP   | 12CPP     | n/a    |
| <b>N3 4.5-Tracks 1NSH</b>  | 12CPP         | 12CPP   | 12CPP     | 12CPP  |
| Target density             |               |         |           |        |
| <b>N5 6-Tracks 2Fins</b>   | 0.775         | 0.750   | 0.725     | 0.700  |
| <b>N3 5.5-Tracks 2Fins</b> | 0.800         | 0.775   | 0.750     | 0.725  |
| <b>N3 4.5-Tracks 1Fin</b>  | 0.775         | 0.750   | 0.725     | n/a    |
| <b>N3 4.5-Tracks 1NSH</b>  | 0.750         | 0.725   | 0.700     | 0.675  |

Table 5.9 Power density values, driving the selection of the PDN. The selected PDN and the target frequency determine the target density. \*In N5 PDN spacings below 24CPP require the introduction of porous cells.

### 5.2.6 PPAC Summary

The Final PPA comparison is presented in Figure 5.16, where the normalized area and power are indicated by the histogram, and the power and area gain versus N5 for each frequency point is highlighted by the table. The area histograms for N5 show a 17% area loss from 50% $f_{max}$  to  $f_{max}$  due to buffer insertion ( 7%) and PDN tightening ( 10%). The N5 runs are more area efficient in an iso-IR comparison compared to what was derived by the physical analysis (Table 5.8). In fact the N5 runs are less sensitive to the PDN and do not require porous cells, allowing to obtain area gain up to 52% and 55% (rather than 40% and 46% in Table 5.8) for the 5.5-Tracks and the 4.5-Tracks respectively. We also notice that the area gain increases for higher performance implementations, where N5 will be more impacted by the sensitivity to the PDN. At lowest frequency the best implementation is the one with the 4-Tracks 1-Fin that optimizes all the metrics with 44% power gain and 53% area gain versus N5. However the 1-Fin option is 12% slower compared to the other N3 libraries that are iso-performance compared to N5. At  $f_{max}$  the 4.5-Tracks with NSH option is the optimal configuration with 35% power gain and 55% area gain versus N5. Nevertheless the area and power gain ( 28% and 52%) of the 5.5-Tracks 2-Fin option are quite close to the 4.5-Tracks NSH, with reduced process complexity. Although the physical and electrical analysis conducted in this chapter has demonstrated promising results, the main problem in the transition to the new node is cost. A detailed cost analysis for our N3 dimensions is given in [108]. This work indicates that the massive introduction of DPT EUV on the  $M_X$  layers, as illustrated in subsection 5.2.1, causes a wafer cost increase in the range of 40% compared to N5 dimensions. Therefore, even considering a full node area gain across the two nodes, the die cost reduction in the new node is lower than 10%. The adoption of DPT EUV at N3 dimensions is currently unavoidable considering the resolution of the state of the art machines. As explained in [108] a second generation of EUV lithography systems is being deployed in order to increase the Numerical Aperture (NA) from 0.33 to 0.5. This High-NA EUV could allow to print N3 dimensions with single exposure, enabling a more cost-effective N3. The change in the patterning assumptions would determine a re-spin the whole DTCO loop. While these solutions are being investigated, new avenues have to be explored as in section 5.3 and chapter 6.

## 5.3 CFET as alternative solution (N2)

In this section a physical P&R comparison based on the ARM 64-bit CPU will be presented for CFET based libraries. 4-Tracks and 3-Tracks libraries were designed and tested against



Figure 5.16 Final PPA summary of the cross node comparison.

the FinFET based libraries in section 5.2. The purpose of the section is to investigate the benefits of the CFET using the same P&R ruleset as N3.

### 5.3.1 Introduction to CFET. From the device to P&R

The concept of the Complementary FET (CFET) device and the process integration flow proposed to manufacture it are described in [109]. Figure 5.17 clarifies the structure of the CFET through different views. Figure 5.17 (a) shows that the idea comes from folding the nFET on top of the pFET (or vice versa) in a conventional 2D FinFET device. This originates the structure in Figure 5.17 (b) where the z-dimension is used in order to save area. The pFET and the nFET share the same gate. The Buried Power Rail (BPR) is used in this configuration, and super vias will also be needed in order to connect the bottom device to Metal 0 and the top device to the BPR. These constructs are also visible in the cross section in Figure 5.17 (c).

The comparison of the process features and scaling boosters in the N3 FinFET and the CFET libraries is provided by Table 5.10, where the N3 5.5-Tracks library is considered as reference. Two CFET libraries with 4-Tracks and 3-Tracks were designed keeping the same CPP and  $M_X$  pitches of N3. As a consequence, the cell level area gain for the CFET libraries will be simply given by the ratio of the track heights. The 4-Tracks architecture allows 2-Fins per device, while the 3-Tracks architecture 1-Fin per device. With respect to the patterning assumptions for the  $M_X$  layers (and above) we want to keep them unchanged



Figure 5.17 CFET Device views. (a) The idea comes from "folding" the nFET and the pFET. (b) 3D view of the nFET stacked on top of the pFET. (c) cross section of the CFET.



Figure 5.18 Comparison of an AOI cell in N3 5.5-Tracks, and CFET libraries. Dimensions are in scale.

compared to section 5.2. On top of the scaling boosters used in N3, in order to enable the extreme track height reduction of the CFET libraries, the BPR was introduced. With the BPR the non uniform grid on the horizontal  $M_X$  layers is avoided, since the multi CD power rail is pushed into a lower layer. In the  $M_X$  vertical layers (M1 and M3) the 0.5 GR is not a hard constraint for the CFET, and a pitch relaxation to 2/3 GR can be tested.

Figure 5.18 shows the comparison of an AOI cell between the three libraries. In general, from the comparative analysis of standard cells in the CFET library and in the reference N3 cells, a significant amount of advantages emerged in the CFET architecture. These advantages mainly derive from the increased connectivity between the pMOS and nMOS, which is intrinsically provided by a 3D device. In fact, although super vias are used in the CFET, they are not in the layers that are part of the standard cell abstract, which avoids the super via pins that were present in the N3 cells, and prevented the routing in MINT. Also the 0.5 GR between poly and M1 (and M3) is no longer required by construction. Additionally,

| Feature                      | N2 CFET    |          |          |
|------------------------------|------------|----------|----------|
|                              | 5.5-Tracks | 4-Tracks | 3-Tracks |
| CPP [nm]                     | 42         | 42       | 42       |
| Mx [nm]                      | 21         | 21       | 21       |
| Normalized Cell Area         | 1          | 0.73     | 0.55     |
| #of Fins/device              | 2          | 2        | 1        |
| Patterning                   | EUV DPT    | EUV DPT  | EUV DPT  |
| Non-Uniform M0 (and M2) grid | ☒          | ☒        | ☒        |
| SDB                          | ☒          | ☒        | ☒        |
| SAGC                         | ☒          | ☒        | ☒        |
| Super Via MOL                | ☒          | ☒        | ☒        |
| GR 0.5 of M1 (and M3)        | ☒          | ?        | ?        |
| MTHS                         | ☒          | ☒        | ☒        |
| SDC                          | ☒          | ☒        | ☒        |
| BPR                          | ☒          | ☒        | ☒        |
| Interconnects                | Ru         | Ru       | Ru       |
| CFET                         | ☒          | ☒        | ☒        |

Table 5.10 Comparative table showing Scaling boosters usage and technology features in reference N3 and CFET libraries.

|                                                                | N3 5.5-Tracks | CFET 4-Tracks | CFET 3-Tracks |
|----------------------------------------------------------------|---------------|---------------|---------------|
| <b>Normalized Cell Width</b>                                   | 1             | 1             | 1             |
| <b>Cell Height [nm]</b>                                        | 115.5         | 84            | 63            |
| <b># of <math>M_x</math> Vertical Tracks per CPP</b>           | 2             | 2             | ?             |
| <b># of <math>M_x</math> Horizontal Tracks per cell height</b> | 5.5           | 4             | 3             |
| <b>M0 Open to router</b>                                       | No            | Yes           | Yes           |
| <b>M1 Depopulated</b>                                          | No            | Yes           | Yes           |

Table 5.11 P&R related considerations on N3 and CFET libraries.

in the greatest part of the standard cells it was possible to avoid utilizing M1 for internal connections. The resulting depopulation of M1 obstructions translates in more routing resources, and degrees of freedom available to the router in this layer. This advantages will have to be used to mitigate the decrease in the horizontal routing resources. Across the three libraries, the width of the cells stays unchanged due to the fact that the same CPP is used. On the other hand, the cell height will be scaled proportionally to the track-height, reaching extremely reduced values in the CFET cells as indicated in Table 5.11. In the Vertical  $M_x$  layers we actually want to attempt to relax the GR from 0.5 to 2/3 which causes a corresponding reduction of the routing resources.

Our approach to enable an efficient P&R for the CFET was to leverage its advantages to counterbalance this extreme depopulation of routing resources. The P&R strategies that were used are listed below:

- Extend MINT with the router to allow more flexibility in intercepting free M1 tracks.
- Leverage the Depopulated M1 for more unconstrained routing.
- Allow denser pin access scheme.
- Leverage the buried power rail in order to reduce the routing resources consumed by the PDN.

**Extend MINT with the router:** Unlike in the FinFET based N3, in the CFET implementation it is possible to allow routing in MINT due to the absence of SV pins. From the visual inspection of Figure 5.19 we can verify how the feature of extending the pins in MINT is



Figure 5.19 Views of MINT layer in implementation with CFET libraries. (a) MINT shapes in the standard cells. (b) MINT extensions from the router. (c) Pins + extensions.

heavily used by the tool in conjunction with M1 routing. In Figure 5.19(b) we also observe that the extensions from the tool they are quite dense and can be relatively long, testifying that pin access challenges can be solved escaping the pins to free M1 tracks that are relatively far.

**Leverage the Depopulated M1:** As previously mentioned, the increased connectivity offered by the CFET allowed to design the greatest part of the cells without using M1. Unfortunately the most complex cells (e.g. Flip Flop, Full Adder) still needed M1 shapes to be completed. The top view of the IP block shown in Figure 5.20 visually shows the M1 depopulation using the CFET library (b), compared to the N3 implementation (a) where nearly the whole M1 area is obstructed. In the CFET approximately the half of the area in M1 will be available for signals, the exact number depending by the usage of complex cells.

**Denser Pin access:** In N3, given the amount of obstruction in the cells, M1 was very regular and the metal cuts aligned on a grid at cell boundary (and in the middle of the cells for the power staples). In the CFET scenario we opted for not aligning the metal cuts on a grid, allowing them to be sparse, as shown in Figure 5.21. Actually we are forced to move to sparse metal cuts if we want to allow lines that are two tracks apart to intercept the most outbound pin in MINT from different sides of the cell boundary. This construct is very used, and by construction also requires "kissing corners" that were not legal in our N3 ruleset. In other words, following the terminology of subsection 5.2.1, Bx.S.3 needs to be reduced to zero for a feasible P&R. The reason why it is unavoidable to have line ends coming from opposite directions at cell boundary, is that this configuration can be forced by a via down to the MINT pins which is underneath or by a via up to the M2 layer which is parallel to MINT.



Figure 5.20 Top view of IP block showing M1 obstructions. (a) In N3. (b) In the CFET implementation.

Given the importance of this rule to proceed in the analysis, this construct was legalized for the CFET, keeping all the other rules unchanged compared to N3. In the CFET we can also fully leverage in M1 the reduced minimum Run Length that we assumed for barrierless materials, in order to intensify the pin access and pick up more than one pin per cell height, while in N3 the pin access scheme was limited to one per cell height for every M1 track.

**Buried Power Rail:** The final enabler for an efficient P&R in the CFET libraries is the BPR. The BPR uses an additional layer that is located at the level of the fins, and can be made thicker and wider, offering reduced resistivity compared to the rail in MINT. From an EDA point of view this layer can be modeled as an extra layer below MINT that is not accessible by the router. The BPR will be connected to the upper layers of the power grid through special TAP cells containing the vias to access this layer. These cells can be arranged into columns, as shown in Figure 5.22, forming a vertical power rail or power staples that can be connected to the rest of the grid as in N3. According to the estimates in [27], the decreased resistivity of this layer, can allow a significant relaxation of the spacing between the M1 rails/staples, with a 64CPP spacing required to match the IR drop figures of a 12CPP spacing PDN in N3. As a consequence the routing resources to be dedicated to the PDN will be dramatically decreased and the placement legalization facilitated.

### 5.3.2 Physical Results

Table 5.12 shows the main physical metrics for our CFET runs, comparing them to the reference N3. Initially, relaxing the M1 and M3 pitch to a unitary gear ratio with poly



Figure 5.21 M1 view for CFET implementation showing sparse metal cuts with "kissing corners" and denser pin access.



Figure 5.22 Buried Power Rail mesh in PnR.

| Node                  | N3                | CFET              |                   |                   |                   |
|-----------------------|-------------------|-------------------|-------------------|-------------------|-------------------|
| Library               | 5.5-Tracks        | 4-Tracks          | 3-Tracks          | 4-Tracks          | 3-Tracks          |
| PDN spacing           | 0.5um<br>(12·CPP) | 2.6um<br>(64·CPP) | 2.6um<br>(64·CPP) | 2.6um<br>(64·CPP) | 2.6um<br>(64·CPP) |
| M1 GR                 | 0.5               | 0.5               | 1                 | 2/3               | 2/3               |
| Max Placement Density | 80%               | 60%               | Unroutable        | 77.5%             | 70%               |
| Normalized Cell Area  | 1                 | 0.73              | 0.58              | 0.73              | 0.58              |
| Normalized Core Area  | 1                 | 0.97              | n/a               | 0.74              | 0.65              |

Table 5.12 Comparison of main physical metrics between reference N3 and CFET runs.

(42nm pitch) was attempted, but routability was extremely low, with the 4-Tracks library unroutable beyond 60% density, and the 3-Tracks run not converging. These values are of course unacceptable, as the placement density degradation cancels the area gains from the standard cells. The M1 and M3 pitch was then changed to 2/3 GR, corresponding to a metal pitch of 28nm. This value is considered at the limit of single exposure EUV, and is currently expected to require EUV DPT. However, if pitch relaxation guarantees routability, it can still be functional to improve reliability, decrease RC, and relax the via grid. With the 2/3 GR in the vertical  $M_X$  layers we were able to route the CFET 4-Tracks with only 2.5% placement density loss compared to the reference N3, and achieve a reasonable (70%) density even in the 3-Tracks scenario. Since the cell level scaling is nearly ideal for both CFET libraries, the cell level gain was smoothly converted into core area gain for the CFET 4-Tracks with a half-node area shrink (-25%) compared to the baseline N3. The 3-Tracks core area gain is decreased by a 10% density loss, providing a final area reduction of 35% versus the N3 reference.

Two other interesting metrics are compared in Figure 5.23: Wire length and via count. Total wire length in the CFET runs is reduced by 10% for the 4-Tracks and by 15% for the 3-Tracks. The wire length distributions by layer show that this gain mainly derives from the  $M_X$  layers, especially M1 and M3, where the GR was successfully relaxed compared to N3. Another remarkable improvement in the CFET runs is the massive reduction of the via count



Figure 5.23 Comparison of wire length and via distributions between reference N3 and CFET runs.

in the V0 and V1 layers, that translates into a 30% reduction in the total via count for both CFET libraries versus N3. The lower via count is primarily caused by the reduced number of vias (V0 and V1) used in the standard cells. From a qualitative point of view, a reduction in the wire length for the most capacitive layers will determine an RC reduction, positively affecting both power and performance. A lower via count will instead contribute to a yield improvement.

### 5.3.3 Final Comparison

Table 5.13 provides a summary of the comparison between N3 and CFET based runs. Since the physical analysis demonstrated the possibility to achieve at least a half-node area gain with the same (or cheaper) BEOL with the CFET, this device can be categorized as a possible option for an N2 node, or a more cost effective alternative for N3. An early electrical evaluation based on Ring Oscillator was shown in [109]. In this work, the benchmarking of the CFET showed similar performance and power figures compared to N3, which makes it currently more attractive for area and cost benefits than for electrical reasons.

|                                         |                                     | N3                                  | CFET                                |                                     |
|-----------------------------------------|-------------------------------------|-------------------------------------|-------------------------------------|-------------------------------------|
|                                         |                                     | 5.5-Tracks                          | 4-Tracks                            | 3-Tracks                            |
| <b><u>Advantages</u></b>                |                                     |                                     |                                     |                                     |
| <b>Super Via Pins</b>                   | <input checked="" type="checkbox"/> | <input type="checkbox"/>            | <input type="checkbox"/>            | <input type="checkbox"/>            |
| <b>GR M1, M3 (pitch)</b>                | 0.5 (21nm)                          | 2/3 (28nm)                          | 2/3 (28nm)                          |                                     |
| <b>Normalized Core Area</b>             | 1                                   | 0.75                                | 0.65                                |                                     |
| <b><u>Challenges</u></b>                |                                     |                                     |                                     |                                     |
| <b>Buried Power Rail</b>                | <input type="checkbox"/>            | <input checked="" type="checkbox"/> | <input checked="" type="checkbox"/> | <input checked="" type="checkbox"/> |
| <b>Device</b>                           | FF                                  | CFET                                | CFET                                |                                     |
| <b>Metal Cuts<br/>"Kissing" Corners</b> | <input type="checkbox"/>            | <input checked="" type="checkbox"/> | <input checked="" type="checkbox"/> |                                     |

Table 5.13 Final comparison table between CFET and reference N3.

## 5.4 Summary and conclusions

The dimensions for a predictive N3 node were proposed, and the PDK components were generated. Multiple scaling boosters for N3 were illustrated, including: super via, M1 gear ratio, Spacer Defined Cut, Mid Track Handshake, Buried Power Rail, Ruthenium interconnects, Nanosheet device. Based on our N3 ruleset, standard cells libraries with two track-heights were designed: 5.5-Tracks and 4.5-Tracks. For the 4.5-Tracks, keeping the same standard cell abstract, two devices were evaluated: the 1Fin FinFET and the lateral Nanosheet. The standard cell architecture and requirements were documented for all these libraries, and compared against the reference N5 6-Tracks. The deltas in the patterning, ruleset and BEOL assumptions between N5 and N3 were clarified, and a cross node PPAC comparison was performed. In order to make this comparison fair, the benchmarking was made IR-drop aware through an algorithmic procedure, allowing to benchmark PPA figures "iso-IR". In the N3 node the 0.5 GR between poly and M1 allowed to dramatically decrease the sensitivity of routability to PDN density. In fact for the tightest PDN scenario as little as 2.5% core area increase is witnessed (For the 5.5-Tracks library), while in the reference N5 area penalties between 20% and 30% had been found. Area gains up to 52% and 55% were shown for the 5.5-Tracks and 4.5-Tracks respectively. At lowest frequency the 4.5-Tracks 1-Fin is the best option with 44% power gain and 53% area gain versus N5. However the 1-Fin option showed a performance loss in the range of 12% compared to the other N3 libraries that exhibited similar performance as N5. At maximum frequency the 4.5-Tracks

with NSH option is the optimal configuration with 35% power gain and 55% area gain versus N5. Nevertheless the area and power gain ( 28% and 52%) of the 5.5-Tracks 2-Fin option are quite close to the 4.5-Tracks NSH, with reduced process complexity, that makes the value proposition for the 4.5-Tracks less clear. Most importantly, the heavy usage of EUV double patterning in N3 causes a wafer cost increase in the range of 40% compared to N5, allowing only a die cost reduction below 10%, even in the best case area scaling scenario. In order to explore a more cost effective solution a radically new device was explored: The Complementary FET. An introduction to CFET was provided, from the device concept to the standard cells and P&R aspects. The routability tests on CFET libraries with 4 and 3 tracks demonstrated the possibility to achieve at least half a node area gain versus the N3 5.5-Tracks library, using the same (or cheaper) BEOL assumptions. For this reason this device was inserted into the roadmap as a possible option for an N2 node.



# Chapter 6

## Towards STCO

### 6.1 Motivations and Introduction to STCO

We have seen in the previous Chapters that as conventional scaling, only driven by lithography, was made infeasible by multiple limitations, industry migrated to a DTCO approach. In this work we highlighted the importance to enact the DTCO at IP-block level, based on post Place and Route assessments. The results presented in Chapter 5 for our predictive N3 (and CFET based N2), although showing promising data, clearly highlight that new avenues have to be explored in order to guarantee a cost effective scaling roadmap for N3 and beyond. In [23] imec proposed a new strategy to maintain an affordable scaling trend: The System Technology Co-Optimization (STCO). According to this idea, lithography driven scaling and DTCO should be extended by a system level evaluation that takes into account the PPAC bottlenecks at SoC level, in order to find technology solutions that improve them. Figure 6.1 shows the timeline for the insertion of past and future nodes, categorizing them within the three eras of scaling: lithography driven, DTCO driven and STCO driven. STCO is a superset of the other two approaches, which are not meant to be abandoned, but are considered insufficient in order to maintain Moore's law sustainable. The workhorses of the STCO concept are: Introduction of higher resolution lithography (High-NA EUV). Individuating SoC bottlenecks and technology solutions that act as scaling boosters for each block of the SoC and its infrastructure (section 6.2). Adoption of radically new devices, differentiated for the requirements of each SoC block. These heterogeneous technologies should be co-integrated on the same chip or 3D integration techniques (section 6.3). Finally, in the longer term, technologies enabling newer (non Von Neumann) computing paradigms is to be considered (section 6.4). The purpose of this Chapter is not to cover in detail all these domains, each of which would require a specific book, but to illustrate at high level the opportunities of a system level approach to scaling. Before going though the details, a



Figure 6.1 STCO Roadmap proposed by imec.

simple consideration on the area split at SoC level, and the node over node scaling of the main blocks, can further justify the need of STCO. Figure 6.2, from [23], shows the area split for a high performance SoC, between Ultra High Density (UHD) standard cells, High Density (HD) memories, Analog and IO. According to this data, the analog and IO constitute approximately 30% of the SoC area, and they are basically non scalable in an advanced nodes context. This makes these blocks more and more dominating compared to the SRAMs, CPU and GPU blocks. Addressing these type of challenges will be the common thread of the next sections. Overall the STCO can be considered a step up and a continuation of the work presented in the previous chapters of this thesis.

## 6.2 STCO Boosters

### 6.2.1 High-NA EUV

While EUV systems with a 0.33 Numerical Aperture lenses are ready to start high volume manufacturing, tool vendors are ramping up EUV systems with 0.55 NA [110]. These machines will be equipped with novel anamorphic lenses capable of reaching resolutions below 10nm. The primary reason for the introduction of High-NA EUV in the mid-term is motivated by the economic implications of scaling. In [108] the advantages of High-NA EUV versus Double patterning EUV were quantified, based on imec's cost model. The



Figure 6.2 Area pie for a high performance SoC. Adapted from: [23].

analysis projected for High-NA EUV relevant gains over EUV DPT with respect to the following metrics: number of process steps, tool time required to process the modules and module cost. However the wafer cost reduction compared to the EUV DPT scenario is limited between 10% and 15%. As we had mentioned in subsection 5.2.6, the EUV DPT scenario determined a 40% node over node increase for our N3 versus N5, and therefore even the estimate with EUV High-NA doesn't fully match the requirements of Moore's law. Moreover, the same work reported and expected cost increase of the machines for EUV High-NA between 1.6X and 2X compared to the 0.33 NA machines. Finally the roadmap for the insertion of EUV High-NA into HVM is not stabilized. These considerations justify the exploration of scaling paths that are not directly relying on the uncertain availability of a more aggressive lithography platform.

### 6.2.2 SRAM scaling

In chapters 3, 4 and 5 we have shown the PPA results deriving from the benchmarking of IP-blocks where SRAMs memory modules were absent or had been detached. Of course this is a simplification that can be functional to separately evaluate the impact of the technology solutions on logic and memory. Each of the scaling boosters presented in the previous chapters, has a different PPA impact on SRAMs compared to logic and a detailed analysis of



Figure 6.3 High Density (111) and High Performance (112/122) bit cell scaling versus technology node. Source:[24]

those trends would require a separate work completely focused on SRAM scaling. In [24] and [111] a summary of the DTCO work done at imec on SRAMs for nodes beyond N7 is provided. The scaling trend shown in Figure 6.3, indicates a node over node scaling of 0.625, that is lower than the one found for logic in this work. In this way the SRAM area progressively becomes more dominating respect to logic. Additionally SRAM scaling in FinFET technology is proportional to the product between CPP and Fin Pitch, and it does not benefit from track height reduction [24]. Figure 6.4 shows the top view of the ARM 64 bit CPU including the SRAM modules in our predictive N5 technology, for a 7.5-Tracks library and a 6-Tracks library with SAGC. Since reducing track height the area of the SRAM modules is not affected, keeping the same relative position for the memories will determine the 6-Tracks floorplan to be more congested compared to the 7.5-Tracks. In fact the channels created by the rectilinear shapes corresponding to the memory macros, are narrower for the 6-Tracks scenario that is therefore more challenging for routability and timing optimization. This example highlights the added value of considering in conjunction the impact of the technology solutions on logic and memory, that might modify the conclusions compared to a more simplistic decoupled evaluation.

### 6.2.3 Emerging Non Volatile Memories

Conventional memories are rapidly approaching the physical limits of scalability [25]. Traditional volatile and non volatile memories like DRAM and NAND flash are based on the



Figure 6.4 Floorplans in imec N5. (a) 7.5-Tracks library. (b) 6-Tracks library with SAGC.

capacitance from an electrical charge stored in the memory cell. This approach, for state of the art nodes, faces severe challenges in achieving sufficient charge to guarantee the sensing margins [112]. For these reasons new resistance based Non Volatile Memories (NVM) such as Magnetic RAM (MRAM), and Resistive RAM (ReRAM) have undergone intensive investigation, and are considered the leading candidates to replace conventional DRAM and flash memories. MRAM uses tunneling resistance that depends on the relative magnetization directions of ferromagnetic electrodes [113]. In Figure 6.5 the MRAM memory cell is shown, consisting of a pinned ferromagnetic electrode layer, the tunneling oxide and a free ferromagnetic electrode layer. When the free and pinned layer's magnetizations are parallel (Figure 6.5 (a)), the Magnetic Tunnel Junction (MTJ) stack exhibits low resistance ( $R_p$ ). When the two magnetizations are antiparallel the resistance of the stack is high ( $R_{ap}$ ). The ratio  $\frac{R_{ap}-R_p}{R_p}$ , defines the sensing ratio between 1 and 0. ReRAM is based on a three-layer structure of top electrode, switching medium and bottom electrode (Figure 6.6). In this case the resisting switching mechanism is based on the formation of a filament in the switching medium, when a voltage is applied between the two electrodes. Table 6.1 compares the main characteristics of emerging and mainstream technologies. MRAM are considered attractive as a replacement for DRAM and the last level cache SRAMs [33], maintaining low programming voltage, fast read/write speed and long endurance. ReRAM already outperform fash memories in most of the metrics with superior speed, lower power and better endurance. MRAM and ReRAM can be integrated in a modified BEOL that is compatible with standard back end. As NVM emerging technologies progress, a complex system level evaluation is therefore necessary in order to assess how to re-organize the physical design and the memory hierarchy, optimizing the SoC PPA.



Figure 6.5 Magnetic RAM (MRAM) basic operational scheme. (a) Parallel magnetization between the free and pinned layers results in low resistance ( $R_p$ ). (b) Magnetization that is not parallel yields high resistance ( $R_{ap}$ ). Source: [25].



Figure 6.6 Cross section of a ReRAM memory cell.

Table 6.1 Comparative table for mainstream and emerging memories. Source: [33]. F is the minimum feature size.

| Metric           | SRAM     | DRAM   | NAND Flash | MRAM    | ReRAM   |
|------------------|----------|--------|------------|---------|---------|
| Cell Area        | $100F^2$ | $6F^2$ | $4F^2$     | $6F^2$  | $4F^2$  |
| Multibit         | 1        | 1      | 3          | 1       | 2       |
| Voltage          | 1V       | 1V     | 10V        | 1.5V    | 3V      |
| Read Time        | 1ns      | 10ns   | 10us       | 10ns    | 10ns    |
| Write Time       | 1ns      | 10ns   | 100us      | 10ns    | 10ns    |
| Retention        | n/a      | 64ms   | 10y        | 10y     | 10y     |
| Endurance        | $1E16$   | $1E16$ | $1E4$      | $1E15$  | $1E12$  |
| Write Energy/bit | fJ       | $10fJ$ | $10fJ$     | $0.1pJ$ | $0.1pJ$ |



Figure 6.7 Comparison of I-V characteristics of IGZO and Si transistor. Source: [26].

#### 6.2.4 IGZO as Power Switch Off Cells

In [114] a process to fabricate amorphous Indium Zinc Oxide (IGZO) thin-film transistors in the back end of a standard CMOS BEOL was demonstrated. As shown by the I-V curves in Figure 6.7, these thin film devices have  $I_{off}$  currents that are roughly 10 orders of magnitude lower than Silicon based transistors, but current values of  $I_{on}$  are between 10 and 100 times lower. This  $I_{on}$  degradation prevents their usage for logic, but other interesting use-cases can be found. One of the most effective power management techniques is called power gating, which uses Power Switch Off (PSO) cells in order to power down specific blocks in standby mode, preventing them to consume leakage power. According to [115] the area penalty due to PSO cells can be up to 15% of the core area. The schematic of these cells is shown in Figure 6.8, and it essentially consists of a transistor which is coupling or decoupling the power gated VDD rail to the regular VDD rail based on a sleep signal. Since the top level part of the PDN is typically in the upper layers of the BEOL, we might think of a scenario where the area burden due to the PSO is removed from the front end and the power gating is directly implemented in the backend through the usage of IGZO transistors. Using the IMD or the ILD of the topmost metal layers, would allow to enlarge these devices to compensate for the reduced  $I_{on}$ , without causing area penalty or subtracting resources to signal routing.



Figure 6.8 Schematic of a PSO cell.

### 6.2.5 AirGap in BEOL

In subsection 1.2.3 we highlighted that low-k materials are functional to reduce the parasitic capacitance for IC interconnects. In [116] and [117] process techniques to use an airgap as dielectric were shown. In theory, using air as dielectric would allow a  $k$  of 1, but due to non idealities, values in the range of 2 have been shown in literature, that are still 20% lower than state of the art ultra low-k dielectrics ( $k=2.4$ ). This technology is more easy to integrate into the  $M_Z$  layers, where pitches are coarser. As documented in Table 6.2, we set up two runs with modified BEOL in N5 and N3, in order to test possible benefits of the introduction of Airgap in the  $M_Z$  layers, in the ideal  $k=1$  assumption. No evident timing benefits were observed and even the power benchmarking showed limited gains as reported in Figure 6.9. This plot shows for each node, power related metrics normalized to the respective values in non-airgap implementation. We can see that the reduction in the  $k$  value in  $M_Z$ , causes a wire capacitance reduction in the range of 10% for both nodes. However Pin Capacitance and Internal Power are of course not affected by the modification and stay unchanged compared to the non-airgap runs. The dynamic power is both a function of pin and wire capacitance. Since these two capacitances are similar in our design, the gain in terms of dynamic power is roughly reduced by a factor 2 compared to the wire capacitance gain. Total power is also approximately evenly split between dynamic and internal power which demagnifies of a factor 4 the gain in total power compared to the one in wire capacitance, with final power gains between 2% and 3%. Through an IP-block level analysis it was therefore not possible to find an interesting use case for this technology. However, stepping up to a system level analysis, would allow to take into account the top levels of the clock distribution network, where for a very high performance design, signal integrity issues often become the bottleneck [118]. In this context, using for the benchmarking a full SoC for high performance with global interconnects, would most probably find a use case for this technology, considering aspects which cannot be detected at IP-block level.

| Design        | ARM 64 bit CPU |           |
|---------------|----------------|-----------|
| Technology    | N5             | N3        |
| Track-Height  | 6              | 4.5Tracks |
| Airgap Layers | Mz k=1         | Mz k=1    |
| Device Option | 2-Fin          | 1-NSH     |
| VDD           | 0.65V          | 0.65V     |
| Frequency     | fmax           | fmax      |

Table 6.2 Setup of the experiments with Airgap introduction in  $M_Z$  layers.



Figure 6.9 Power figures showing gains of Airgap in  $M_Z$  layers.



Figure 6.10 Cross section showing backside PDN concept. Source: [27].

### 6.2.6 Backside PDN

In [27], besides the Buried Power Rail, a more radical and long term solution was proposed: the Backside PDN. Figure 6.10 illustrates this concept through a cross section of the FEOL. In this scheme the BPR, rather than being connected to the higher levels of a conventional PDN through special TAP cells, is accessed through micro Through Silicon Vias (u-TSV) that come from the substrate (hence the name backside). In order to make this possible, wafer thinning techniques [119] will have to be used and high aspect ratio TSVs with pitch comparable to cell height will be required. Since state of the art TSV pitches are above 1um [120], and cell heights for our predictive N5 and N3 are between 250nm and 100nm, it is evident that a significant technology challenge exists to make this technique viable for advanced nodes, where it is most needed to mitigate the tradeoff between routability and IR-drop. The advantage of the backside PDN is the complete removal of the power mesh from the upper layers, where typically one or more layers are completely allocated for it. Moreover the congestion and IR-drop due to the PDN in the  $M_Z$  and  $M_Y$  layers are also eliminated. Finally, as shown in Figure 6.10, the presence of VDD and GND planes below the substrate, could allow to integrate large Metal-Insulator-Metal (MIM) capacitors, with further reduction of dynamic IR-drop. It is clear that the insertion of this scaling booster needs to be evaluated at system level, as having the global VDD and VSS in the backside needs to be studied in conjunction with novel packaging approaches.



Figure 6.11 Bulk mobility of Si, Ge, and III-V materials with their respective energy bandgap, where empty and solid symbols are used for hole and electron respectively. Source: [28].

## 6.3 Overview on Emerging Devices in the context of Hybrid Scaling

### 6.3.1 III-V Materials

Enhancing transistor performance has become increasingly difficult at nanoscale dimensions due to the tradeoffs explained in subsection 1.2.2. One possible way out is introducing new channel materials with increased carrier mobility, which could in principle allow further voltage scaling or increased performance. A promising family of materials is the so called III-V compound semiconductors [121]. These materials are obtained by combining group III elements (e.g. Al, Ga, In) with group V elements (e.g. N, P, As, Sb). Figure 6.11 shows electron and holes mobility for Silicon, and widely studied III-V semiconductor materials. Electron mobility is increased up to more than one order of magnitude, while advantages for hole mobility are less clear. The benefits of High Electron Mobility (HEMT) III-V transistors for RF, mm-Wave and high power applications have been extensively shown in literature [122], [123], [124]. On the other hand the opportunity for the adoption of III-V materials to replace Silicon in digital logic is still debated, and for dimensions corresponding to state of the art nodes, no significant electrical advantages have been found so far [125], [126].



Figure 6.12 3D view of a Vertical FET .Source: [29].

### 6.3.2 Vertical FET (VFET)

The work in [19] is completely devoted to a study on the Vertical FET (VFET) device, based on imec process assumptions for the N7 and N5 nodes. In Equation 1.3 we had seen how the CPP scaling is limited by the constraints on gate length. The optimal solution to decouple CPP and  $L_g$ , would be rotating the channel making it perpendicular to the wafer surface. The resulting structure is called Vertical FET, and its 3D view is shown in Figure 6.12. The top electrode can be directly connected to the upper layers of the metal stack. The gate is wrapped around the vertical channel. The bottom electrode needs to be escaped to the upper metal layers. Because of the area overhead to be allocated for the bottom terminal connections, relevant area gains are possible only for simple standard cells (e.g. buffers, inverters) and SRAMs, that due to their regularity allow area shrink up to 30% compared to lateral devices [29], [19]. However for more complex cells (e.g. Flip-Flop, Full adder, AOI..) with an increased number of terminals, the area shrink obtainable with the VFET is negligible. The benchmarking in [19] also shows for the VFET a 13% performance gain at iso-energy or a 24% energy reduction at iso-performance. Given the greatly increased complexity and cost of the process, compared to lateral devices, and the absence of a clear area gain to compensate for it, a careful evaluation should be done on the cost effectiveness of this device versus the system level gains.

### 6.3.3 Negative Capacitance FET (NC-FET)

Figure 6.13 shows that for recent technology nodes the power supply (and threshold voltage) have not been significantly scaled, preventing to further decrease power and power density, that as we have seen in Chapters 4 and 5 has become one of the major bottlenecks in IC



Figure 6.13 Evolution of supply and threshold voltage. Source: [30].

implementation. One of the main causes of the saturation of voltage scaling is that the Subthreshold Swing (SS) of a conventional FinFET device cannot be made lower than a theoretical limit of 60mV/decade, that is imposed by fundamental physics constraints. For this reason, investigating new device architectures that could allow lower SS, and hence lower VDD, is highly valuable especially in the context of ultra low power applications. A device that is being proposed for this purpose is the Negative Capacitance FET (NC-FET) [31]. In the NC-FET a ferroelectric (FE) gate layer is introduced on top of a more conventional gate stack (Figure 6.14). The combination of the external electric field and the polarization effect in the FE material creates a negative voltage drop through the FE layer, thus resulting in a voltage amplification that improves (i.e. decreases) SS. In [31] a SS as small as 27mV/decade was obtained, allowing a VDD of 0.3V. This promising results should be verified at state of the art geometries, and the trade off between power and performance evaluated at chip level.

### 6.3.4 From 2.5 to 3D integration

The considerations in this section highlighted the fact that, within the current landscape of emerging devices, it is not possible to individuate a universal replacement of FinFET that can guarantee PPA benefits for every block of the SoC. Instead, in the ideal scenario, a different optimal technology should be chosen for each block, and the resulting SoC should be assembled by integrating heterogeneous technologies. An example of this concept is shown in Figure 6.15. For the digital logic (e.g CPU and GPU) the most advanced node should be chosen, in order to leverage the area scaling benefits. For the CPU a device



Figure 6.14 NC-FET transistor. (a) 3D structure of the transistor. (b) Equivalent capacitor model. Source: [31].

option maximizing performance could be selected (e.g. lateral nanowires), while for the GPU a device choice allowing minimum power (e.g. NC-FET) and area should be made. L1 SRAM could be replaced with VFET device, while the L2/L3 levels implemented with MRAM. High performance I/O and RF/mmWave blocks could be optimally implemented in III-V technology, possibly using an older node to minimize the cost with little or no area penalty. The most logical way to enable this scenario is through 3D integration techniques. Figure 6.16 shows the progression from 2.5D with interposer towards sequential 3D. The 2.5D with interposer technique is a relatively mature technology and it has been demonstrated multiple times [127], [128]. The main difference with a traditional 2D implementation is that in the 2.5D approach, a silicon interposer is placed between the package substrate and the dice, and Through Silicon Vias (TSVs) connect the metallization layers on its upper and lower surfaces. In this case the pitch of the TSVs is coarse (multiple microns), allowing to redistribute on the shared metal layers only global wiring, and the partitioning is done at die level. In Figure 6.16 only two dies are shown for clarity in the 2.5D case, but this method can in principle be extended to an arbitrary number of dies. Wafer to Wafer (W2W) bonding techniques are being improved [129], in order to allow the fabrication of fully 3D ICs. In this case multiple active layers are vertically stacked and their BEOL are connected face-to-face, face-to-back or back to back [130] (In Figure 6.15 a face-to-face scheme is shown). The W2W technique enables 3D interconnects with pitches down to 1um [131], which extends its usage for intermediate levels of the wiring hierarchy. In this way an IP-block level partitioning can be achieved. The next step, which is still in a research phase [132], is sequential 3D, where multiple layers of active devices are integrated in a single chip, and the different tiers should be connected at gate level by TSVs with pitches in the range of 100nm. It is important to mention, that while the 2.5D with interposer approach is supported by existing EDA tools, for the W2W and sequential 3D, no commercial tools are currently



Figure 6.15 Concept of using heterogeneous technologies specialized for each block, in conjunction with 3D stacking techniques.

available [133]. In order to support the design of 3DICs new capabilities would be required in many domains, from automatic partitioning and floorplanning, to 3D-aware placement, routing and optimization. Pioneering studies [134] [135] done with academic EDA flows, demonstrated for monolithic 3DIC up to 25% performance improvements at iso-power or 20% power improvements at iso-frequency versus 2D, with cost reduction up to 60%.

## 6.4 New Computing Paradigms

### 6.4.1 In-memory computing

Over the last several decades, the rate of improvement in processors has exceeded that of memory by several orders of magnitude. The separation between the memory and CPU in Von Neumann architecture and the need to transfer data between them have created the primary performance and energy bottleneck in modern computing systems [136]. The idea of processing data within the memory itself, is emerging as the ultimate way to break the Von Neumann separation. In [136] it was shown how emerging NVMs, as the ones discussed in subsection 6.2.3, can be used to create logic families where the logical states are represented by the resistance values. The logical states at the input and the output of the logic gates will be memorized in the NVM elements themselves, that are therefore both the memory cells and the building blocks of the logic gates. In [137] a new architecture called memristive Memory Processing Unit (mMPU) was proposed. The mMPU is essentially a crossbar array of NVM elements, where an in-memory processing can be executed, and the outputs of processing used as inputs for the next cycle. The decision of whether an element is a data storage element or is to be used for processing is done dynamically by the memory controller according

| 2.5D with Interposer |           | Wafer to Wafer Bonding | Sequential 3D |
|----------------------|-----------|------------------------|---------------|
|                      |           |                        |               |
| <b>3D Wiring</b>     | Global    | Intermediate           | Local         |
| <b>Contact Pitch</b> | 10um      | 1um                    | 0.1um         |
| <b>Partition</b>     | Die-Level | IP-Block-Level         | Gate-Level    |

Figure 6.16 Comparison of different 3D integration techniques.

to the executed program. In [136] an automatic synthesis flow capable of optimizing the performance of the in-memory computations was also proposed. This computing platform demonstrates how to use emerging technologies in the context of radically new computing paradigms. A PPA benchmarking of these types of architectures versus conventional Von Neumann processors would be needed to quantify the advantages, that might point to the need of NVM technology improvement.

#### 6.4.2 Spiking Neural Networks

Mimicking the unparalleled energy efficiency of the brain through brain-inspired computing architectures has been proposed for a long time [138]. The basic unit for such a computing system is composed of several synapses, a neuron block and an axon. It mimics biological neural cells where the synapses receive the synaptic spikes from the other connected neurons and convert them into currents according to their strength, with the neuron performing spatio-temporal integration of these signals and generating the output spikes that are propagated through the axon. The hardware implementation of a large scale neuromorphic system has historically been prevented by the lack of compact and low-power synaptic units. In emerging technologies like ReRAM, the conductance of a resistive memory can be incrementally modified by controlling the potential across it, similar to a biological synapse. In this context the design of a compact, low power and versatile CMOS neuron that can interface with

these resistive synapses could provide the fundamental building block to integrate large scale brain-inspired meromorphic systems [139] [140] [141].

## 6.5 Summary and conclusions

The challenges encountered in enabling cost-effective technology nodes below N5, demonstrate the necessity to start exploring radically new approaches to scaling. In this chapter it was proposed to step up from a DTCO to a System Design Technology Co-Optimization strategy. In fact, considering the SoC complexity, it is easy to find out important bottlenecks that cannot be resolved by scaling digital logic. As an example, the analog and I/O blocks are nearly non scalable at advanced nodes, thus becoming more and more dominant in terms of area. The STCO is to be considered an extension and a superset of the DTCO. The main workhorses proposed to enable this concept are: STCO scaling boosters, emerging devices, 3D integration techniques, new computing paradigms. The STCO scaling boosters are technology solutions that although not directly scaling digital logic, help to improve PPA at SoC level, improving the efficiency of specific blocks, or of the SoC infrastructure. This of course requires moving from an IP-Block level to an SoC level benchmarking to properly quantify the benefits. Examples of STCO scaling boosters were illustrated and their possible usage within the SoC was suggested. Emerging Non Volatile memories could replace DRAM and the L2/L3 caches offering reduced area and non volatility, or to outperform Flash memories with higher performance, lower power and better endurance. IGZO thin film transistors could be integrated in the BEOL, and used as PSO cells, relieving up to 15% area burden from the front end. AirGap dielectric could be introduced in the  $M_Z$  layers, in order to alleviate signal integrity problems for high performance designs. A backside PDN using ultra low pitch TSV could allow to completely remove the PDN from the back end, and guarantee extremely reduced IR. In the beyond silicon landscape a plethora of new devices is emerging, but none of them seems to offer a scaling roadmap that can provide benefits for all the blocks of the SoC. III-V materials transistors are widely used in RF/mmWave applications but are not suitable for logic. The Vertical FET device allows to shrink SRAMs by 30% but shows limited gains for complex standard cells. The negative capacitance FET is expected to allow a significant VDD reduction, and it is therefore a good candidate for ultra low power applications. The most natural and logical idea, will be therefore to have an SoC using heterogeneous technologies, assembled through 3D integration techniques. The reduction of TSV pitches and the evolution of the EDA tools will progressively allow to mix the technologies at a finer level of granularity. Finally two examples of new computing paradigms were given: the In-memory computing, and the spiking neural networks. Both of

them highlight that even in such disruptive scenarios the need for a co-optimization between the technology and the system will persist.

# Chapter 7

## Conclusions

In this work we proposed a novel DTCO approach based on post P&R Design of Experiments on predictive PDKs, enabling design and technology path-finding for technology nodes below N7. The motivations for the need of DTCO were explained, and the major challenges posed by scaling beyond N7 were illustrated. The two pillars of this work are the generation of a predictive PDKs, and the PPA benchmarking through a state of the art digital implementation flow. It was shown how to build up these two parts, illustrating the basic components of a predictive PDK and the main implementation steps. More importantly it was explained how to practically use this platform in order to perform different classes of DoEs, aimed to extract specific learnings, and drive the decision making across the complex DTCO space. In this space, the patterning, standard cells, device and BEOL options are deeply interrelated, and are also coupled with physical design choices and EDA advancements. The methodology we adopted tries to dominate this complexity decoupling all this aspects into separate DoE types, that were used to enable the generation of the results of the thesis. An introduction to EUV lithography was given, explaining the motivations for its adoption at advanced nodes. A high level description of an EUV system was provided, along with a status update on the major challenges for EUV adoption in High Volume Manufacturing. A roadmap indicating EUV usage for technology nodes below N7 was proposed. The first opportunity to insert EUV is in the context of an EUV enabled N7 that was defined "N7+", to differentiate it from the reference 193i based N7. EUV single patterning is still viable for our predictive N5 node, while at N3 dimensions the need for EUV double patterning was justified. A standard cells library was designed in such a way to be compatible with both N7 and N7+ rule-sets, that were also documented. The PPAC comparison showed for the N7+ node 6% lower power, 15% improved performance and improved routability compared to the reference N7. Furthermore a 12% wafer cost reduction and a 9% yield improvement are expected. The RC variability due to Line Edge Roughness was recently classified as the top

priority problem affecting EUV lithography. Our study showed that modelling this effect through the conventional RC corners causes a detrimental impact on timing closure, with up to 17% performance loss. This is mainly due to the worst resistive corner, that is more than 3 times more resistive than the typical corner. However, given the stochastic nature of LER, a statistical interconnect timing analysis method was also developed and tested. This study led to the conclusion that the averaging effect of longer and shorter wires determines a negligible impact of LER on system level timing. A predictive N5 node was defined, and the components of its PDK generated. The transition from a 7.5-Tracks library with 3 Fins per device, to a 6-Tracks library with 2 Fins per device was investigated through a PPA benchmarking. The comparison showed for the 6-Tracks 2-Fins similar performance and up to more than 20% lower power compared to the 7.5-Tracks library. This is explained by the reduced pin capacitance for the 6-Tracks library, that compensates for the reduced drive current. In terms of area, it was possible to maintain for the 6-Tracks, high placement densities (up to 80%), obtaining a 20% core area gain versus the reference 7.5-Tracks. Scaling track height without losing routability was made possible by the adoption of a new PDN architecture with vertical local rails, and by a cell architecture without M2 power rail and an outbound rail in MINT. New EDA features were introduced to support this PDN strategy, and to improve pin access by opening M1 and MINT to the router. The concept of scaling booster was introduced and defined as "Design, Process or EDA options that when used in conjunction allow to improve PPA at IP-block level". A new 6-Tracks library with SAGC and SDB was designed, and tested. The routability analysis demonstrated the possibility to achieve for this library up to 50% (i.e. a full node) reduced area compared to the reference 7.5-Tracks, assuming the same ruleset. As the wafer cost increase due to the scaling boosters was estimated below 10%, this approach was proved as a viable and low-cost alternative to pitch scaling. The new PDN architecture, introduced as an enabler for the 6-Tracks was studied. A significant trade-off between routability and IR-drop was found, with up to 15% area penalty for dense PDN scenarios, corresponding to tight IR-drop requirements. The electrical impact of introducing Cobalt in the BEOL was studied. Literature data show that Cobalt offers significantly reduced EM, and minimum area compared to Copper interconnects. At our N5 dimensions replacing Copper with Cobalt in M1 has a negligible electrical impact. On the other hand, reducing the minimum area constraints on M1 from Copper (70nm) to Cobalt (35nm) assumptions, was found to be a key enabler in order to design and route an area efficient 5-Tracks library, with an area gain versus the 6-Tracks of only 5% for Copper M1, and 17% for Cobalt M1 libraries. Keeping the MOL constraints unchanged compared to the 7.5 and 6-Tracks library, the transition to 5-Tracks implies a further fin depopulation, leaving room for only one fin per device. The PPA benchmarking of the 5-Tracks 1-Fin showed a

50% performance loss compared to the 6-Tracks 2-Fins, although for lower frequencies the 5-Tracks library is up to 20% more power efficient. On top of this, RO based variability studies highlighted more than 60% increase in delay variability for the 1-Fin device (versus the 2-Fin). Experiments to evaluate the electrical impact of replacing the FinFET device with lateral Nanowire device were also performed. Using lateral nanowires with three vertically stacked wires per fin allowed to achieve performance gains between 5% and 10% compared to FinFET, with 5% lower power. Reducing the number of vertically stacked nanowires to 2 causes a 50% performance loss compared to the reference FinFET scenario, but with 20% lower power in the low-frequencies range. Introducing high drive cells in conjunction with physical synthesis, a 10% performance increase was obtained. However for higher performance and power densities, the IR-routability trade off is aggravated, and for extremely dense PDN scenarios an area penalty of more than half a node was observed. Introducing "porous" cells can partially mitigate the impact, but a structural solution can only be provided by tightening the gear ratio between M1 and poly pitches from a unitary value to 2/3 or 0.5. The concept behind this strategy is to multiply the routing resources on M1 and make all the cells intrinsically porous to the power mesh. The dimensions for a predictive N3 node were proposed, and the PDK components were generated. Multiple scaling boosters for N3 were illustrated, including: super via, M1 gear ratio, Spacer Defined Cut, Mid Track Handshake, Buried Power Rail, Ruthenium interconnects, Nanosheet device. Based on our N3 ruleset, standard cells libraries with two track-heights were designed: 5.5-Tracks and 4.5-Tracks. For the 4.5-Tracks, keeping the same standard cell abstract, two devices were evaluated: the 1Fin FinFET and the lateral Nanosheet. The standard cell architecture and requirements were documented for all these libraries, and compared against the reference N5 6-Tracks. The deltas in the patterning, ruleset and BEOL assumptions between N5 and N3 were clarified, and a cross node PPAC comparison was performed. In order to make this comparison fair, the benchmarking was made IR-drop aware through an algorithmic procedure, allowing to benchmark PPA figures "iso-IR". In the N3 node the 0.5 GR between poly and M1 allowed to dramatically decrease the sensitivity of routability to PDN density. In fact for the tightest PDN scenario as little as 2.5% core area increase is witnessed (For the 5.5-Tracks library), while in the reference N5 area penalties between 20% and 30% had been found. Area gains up to 52% and 55% were shown for the 5.5-Tracks and 4.5-Tracks respectively. At lowest frequency the 4.5-Tracks 1-Fin is the best option with 44% power gain and 53% area gain versus N5. However the 1-Fin option showed a performance loss in the range of 12% compared to the other N3 libraries that exhibited similar performance as N5. At maximum frequency the 4.5-Tracks with NSH option is the optimal configuration with 35% power gain and 55% area gain versus N5. Nevertheless the area and power gain ( 28%

and 52%) of the 5.5-Tracks 2-Fin option are quite close to the 4.5-Tracks NSH, with reduced process complexity, that makes the value proposition for the 4.5-Tracks less clear. Most importantly, the heavy usage of EUV double patterning in N3 causes a wafer cost increase in the range of 40% compared to N5, allowing only a die cost reduction below 10%, even in the best case area scaling scenario. In order to explore a more cost effective solution a radically new device was explored: The Complementary FET. An introduction to CFET was provided, from the device concept to the standard cells and P&R aspects. The routability tests on CFET libraries with 4 and 3 tracks demonstrated the possibility to achieve at least half a node area gain versus the N3 5.5-Tracks library, using the same (or cheaper) BEOL assumptions. For this reason this device was inserted into the roadmap as a possible option for an N2 node. The challenges encountered in enabling cost-effective technology nodes below N5, demonstrate the necessity to start exploring radically new approaches to scaling. In this chapter it was proposed to step up from a DTCO to a System Design Technology Co-Optimization strategy. In fact, considering the SoC complexity, it is easy to find out important bottlenecks that cannot be resolved by scaling digital logic. As an example, the analog and I/O blocks are nearly non scalable at advanced nodes, thus becoming more and more dominant in terms of area. The STCO is to be considered an extension and a superset of the DTCO. The main workhorses proposed to enable this concept are: STCO scaling boosters, emerging devices, 3D integration techniques, new computing paradigms. The STCO scaling boosters are technology solutions that although not directly scaling digital logic, help to improve PPA at SoC level, improving the efficiency of specific blocks, or of the SoC infrastructure. This of course requires moving from an IP-Block level to an SoC level benchmarking to properly quantify the benefits. Examples of STCO scaling boosters were illustrated and their possible usage within the SoC was suggested. Emerging Non Volatile memories could replace DRAM and the L2/L3 caches offering reduced area and non volatility, or to outperform Flash memories with higher performance, lower power and better endurance. IGZO thin film transistors could be integrated in the BEOL, and used as PSO cells, relieving up to 15% area burden from the front end. AirGap dielectric could be introduced in the  $M_Z$  layers, in order to alleviate signal integrity problems for high performance designs. A backside PDN using ultra low pitch TSV could allow to completely remove the PDN from the back end, and guarantee extremely reduced IR. In the beyond silicon landscape a plethora of new devices is emerging, but none of them seems to offer a scaling roadmap that can provide benefits for all the blocks of the SoC. III-V materials transistors are widely used in RF/mmWave applications but are not suitable for logic. The Vertical FET device allows to shrink SRAMs by 30% but shows limited gains for complex standard cells. The negative capacitance FET is expected to allow a significant VDD reduction, and it is therefore a good candidate for

ultra low power applications. The most natural and logical idea, will be therefore to have an SoC using heterogenous technologies, assembled through 3D integration techniques. The reduction of TSV pitches and the evolution of the EDA tools will progressively allow to mix the technologies at a finer level of granularity. Finally two examples of new computing paradigms were given: the In-memory computing, and the spiking neural networks. Both of them highlight that even in such disruptive scenarios the need for a co-optimization between the technology and the system will persist.



# Bibliography

- [1] M. Stojcev et al. The limits of semiconductor technology and oncoming challenges in computer microarchitectures and architectures. 17:285–312, 01 2005.
- [2] Intel manufacturing day: Nodes must die, but moore’s law lives! <https://www.semiwiki.com/forum/content/6698-intel-manufacturing-day-nodes-must-die-but-moores-law-lives-e.html>, March 2017.
- [3] K. Mistry. 10 nm technology leadership, 2017.
- [4] Intel’s 22nm tri-gate transistors. <https://www.realworldtech.com/intel-22nm-finfet/>, May 2011.
- [5] Tech brief: Multiple patterning makes miniaturization possible. lamresearch.com, March 2015.
- [6] D. Jang. Device exploration of nanosheet transistors for sub-7-nm technology node. 64:2707–2713, 2017.
- [7] M. G. Bardon et al. Extreme scaling enabled by 5 tracks cells: Holistic design-device co-optimization for finfets and lateral nanowires. In *2016 IEEE International Electron Devices Meeting (IEDM)*, pages 28.2.1–28.2.4, Dec 2016.
- [8] Zsolt Tőkei et al. On-chip interconnect trends, challenges and solutions: How to keep  $rc$  and reliability under control. *VLSI Technology, 2016 IEEE Symposium*, pages 1 – 2, June 2016.
- [9] Cliff Hou. A smart design paradigm for smart chips. *2017 IEEE International Solid-State Circuits Conference*, February 2017.
- [10] Arindam et al. Maintaining moore’s law: enabling cost-friendly dimensional scaling, 2015.
- [11] Cortex-m0. <https://developer.arm.com/products/processors/cortex-m/cortex-m0>.
- [12] T. Hawkins. 802.3an ldpc decoder, 2007.
- [13] Lars Liebmann. *Design Technology Co-Optimization in the Era of Sub-Resolution IC Scaling*. SPIE PRESS, 2016.
- [14] Sematech pushes extreme uv lithography forward. <http://spie.org/newsroom/0079-sematech-pushes-extreme-uv-lithography-forward>, April 2006.

- [15] I. Fomenkov. Euv source for high volume manufacturing: Performance at 250 w and key technologies for power scaling. *Source Workshop*, 2017.
- [16] N. Mojarrad. Beyond euv lithography: a comparative study of efficient photoresists' performance. *Nature Scientific Reports*, 2015.
- [17] H. Meiling. Role of euv and its business opportunity. *Investor Day*, 2016.
- [18] R. Baert et al. System-level impact of interconnect line-edge roughness. *IEEE International Interconnect Technology Conference (IITC)*, 2018.
- [19] D. Yakimets. *Vertical Transistors A slippery path towards the ultimate CMOS scaling*. KU Leuven, 2017.
- [20] Julien Ryckaert. Scaling beyond 7nm: Design-technology co-optimization at the rescue. In *ISPD*, 2016.
- [21] Y. M. Lee et al. Accurate performance evaluation for the horizontal nanosheet standard-cell design space beyond 7nm technology. In *2017 IEEE International Electron Devices Meeting (IEDM)*, pages 29.3.1–29.3.4, Dec 2017.
- [22] S.M.Y. Sherazi et al. Track height reduction for standard-cell in below 5nm node: how low can you go?, 2018.
- [23] R. H. Kim et al. Imec n7, n5 and beyond: Dtco, stco and euv insertion strategy to maintain affordable scaling trend, 2018.
- [24] P. Weckx et al. Stacked nanosheet fork architecture for sram design and device co-optimization toward 3nm. In *2017 IEEE International Electron Devices Meeting (IEDM)*, pages 20.5.1–20.5.4, Dec 2017.
- [25] Y. Song et al. What lies ahead for resistance-based memory technologies? 46(8):30–36, August 2013.
- [26] S. Yamazaki. Unique technology from japan to the world —super low power lsi using caac-os. *Semiconductor Energy Laboratory*, May 2015.
- [27] B. Chava et al. Dtco exploration for efficient standard cell power rails, 2018.
- [28] Chandan Yadav. *Compact Modeling of Capacitance and Current in Silicon and III-V Transistors*. ResearchGate, 2016.
- [29] An Steegen. Nanotechnology creating magic. *Semicon Korea*, 2018.
- [30] Hei Wong et al. On the scaling of subnanometer eot gate dielectrics for ultimate nano cmos technology. *Microelectronic Engineering*, 138:57 – 76, 2015. Next-Generation Electronic Materials, Devices, and Sensors / Biosensors.
- [31] F. Liu et al. Negative capacitance transistors with monolayer black phosphorus. *npj Quantum Materials*, 2016.
- [32] Kyungwook Chang et al. Power benefit study of monolithic 3d le at the 7nm technology node. *Low Power Electronics and Design (ISLPED)*, pages 201 – 206, July 2015.

- [33] S. Yu et al. Emerging memory technologies: Recent trends and prospects. *IEEE Solid-State Circuits Magazine*, 8:43–56, 2016.
- [34] G. Moore. Cramming more components onto integrated circuits. *Electronics Magazine*, 38:114–117, 1965.
- [35] Julien Ryckaert et al. Design technology co-optimization for n10. *Proceedings of the IEEE 2014 Custom Integrated Circuits Conference*, pages 1 – 8, September 2014.
- [36] G. Northrop. Design technology co-optimization in technology definition for 22nm and beyond. In *2011 Symposium on VLSI Technology - Digest of Technical Papers*, pages 112–113, June 2011.
- [37] Greg Yeric et al. The past present and future of design-technology co-optimization. *Proceedings of the IEEE 2013 Custom Integrated Circuits Conference*, pages 1 – 8, September 2013.
- [38] Lars Liebmann et al. Overcoming scaling barriers through design technology cooptimization. *Symposium on VLSI Technology*, pages 1 – 2, June 2016.
- [39] L. Mattii et al. Post place and route design-technology co-optimization for scaling at single-digit nodes with constant ground rules. *Journal of Micro/Nanolithography, MEMS, and MOEMS*, 17:17 – 17 – 13, 2018.
- [40] L. Mattii. Eda role in the design-technology co-optimization towards n7. *CDNLive EMEA 2016*, 2016.
- [41] C. H. Jan et al. A 22nm soc platform technology featuring 3-d tri-gate and high-k/metal gate, optimized for ultra low power, high performance and high density soc applications. In *2012 International Electron Devices Meeting*, pages 3.1.1–3.1.4, Dec 2012.
- [42] V. Mahor et al. Future of vlsi design: The finfet logic circuits. *International Journal of Electronics, Electrical and Computational System*, 6(11), 2017.
- [43] S. Y. Wu et al. A 7nm cmos platform technology featuring 4th generation finfet transistors with a 0.027um2 high density 6-t sram cell for mobile soc applications. In *2016 IEEE International Electron Devices Meeting (IEDM)*, pages 2.6.1–2.6.4, Dec 2016.
- [44] R. Xie et al. A 7nm finfet technology featuring euv patterning and dual strained high mobility channels. In *2016 IEEE International Electron Devices Meeting (IEDM)*, pages 2.7.1–2.7.4, Dec 2016.
- [45] Hiroyuki Nagasaka Soichi Owa. Advantage and feasibility of immersion lithography. *Journal of Micro/Nanolithography, MEMS, and MOEMS*, 3:3 – 3 – 7, 2004.
- [46] S. Owa et al. Immersion lithography extension to sub-10nm nodes with multiple patterning. *Proc.SPIE*, 9052:9052 – 9052 – 9, 2014.
- [47] Marie Bardon et al. Extreme scaling enabled by 5 tracks cells: Holistic design-device co-optimization for finfets and lateral nanowires. *Electron Devices Meeting (IEDM), 2016 IEEE International*, pages 1 – 4, December 2016.

- [48] D. Edelstein et al. Full copper wiring in a sub-0.25 /spl mu/m cmos ulsi technology. In *International Electron Devices Meeting. IEDM Technical Digest*, pages 773–776, Dec 1997.
- [49] S. Beyne et al. Study of electromigration mechanisms in 22nm half-pitch cu interconnects by 1/f noise measurements. In *2017 IEEE International Interconnect Technology Conference (IITC)*, pages 1–3, May 2017.
- [50] I. Ciofi et al. Modeling of via resistance for advanced technology nodes. *IEEE Transactions on Electron Devices*, 64(5):2306–2313, May 2017.
- [51] L. Li et al. Cu diffusion barrier: Graphene benchmarked to tan for ultimate interconnect scaling. In *2015 Symposium on VLSI Technology (VLSI Technology)*, pages T122–T123, June 2015.
- [52] S. Dutta et al. Sub-100 nm<sup>2</sup> cobalt interconnects. *IEEE Electron Device Letters*, 39(5):731–734, May 2018.
- [53] F. Griggio et al. Reliability of dual-damascene local interconnects featuring cobalt on 10 nm logic technology. In *2018 IEEE International Reliability Physics Symposium (IRPS)*, pages 6E.3–1–6E.3–5, March 2018.
- [54] O. V. Pedreira et al. Reliability study on cobalt and ruthenium as alternative metals for advanced interconnects. In *2017 IEEE International Reliability Physics Symposium (IRPS)*, pages 6B–2.1–6B–2.8, April 2017.
- [55] C. K. Hu et al. Future on-chip interconnect metallization and electromigration. In *2018 IEEE International Reliability Physics Symposium (IRPS)*, pages 4F.1–1–4F.1–6, March 2018.
- [56] C. C. Liu et al. An accurate interconnect test structure for parasitic validation in on-chip machine learning accelerators. *CoRR*, abs/1701.03181, 2017.
- [57] Z. Wang et al. Highly effective low-k dielectric test structures and reliability assessment for 28nm technology node and beyond. In *2017 China Semiconductor Technology International Conference (CSTIC)*, pages 1–3, March 2017.
- [58] J. Khim. Large die assembly technology in tsv packages. *Semicon Taiwan*, 2016.
- [59] Lawrence T. Clark et al. Asap7: A 7-nm finfet predictive process design kit. *Microelectronics Journal*, 53:105–115, July 2016.
- [60] S. C. Song et al. Unified technology optimization platform using integrated analysis (utopia) for holistic technology, design and system co-optimization at <= 7nm nodes. *IEEE Symposium on VLSI Circuits (VLSI-Circuits)*, pages 1 – 2, June 2016.
- [61] Praveen Raghavan et al. 5nm: Has the time for a device change come? *17th International Symposium on Quality Electronic Design (ISQED)*, pages 275 – 277, March 2016.
- [62] Andrew B. Kahng et al. Mixed cell-height implementation for improved design quality in advanced nodes. *Computer-Aided Design (ICCAD)*, pages 854 – 860, November 2015.

- [63] B. W. Ku, T. Song, A. Nieuwoudt, and S. K. Lim. Transistor-level monolithic 3d standard cell layout optimization for full-chip static power integrity. In *2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)*, pages 1–6, July 2017.
- [64] First look: 5nm. <http://semiengineering.com/first-look-5nm/>, October 2015.
- [65] Cadence, imec disclose 3-nm effort. [https://www.eetimes.com/document.asp?doc\\_id=1333016](https://www.eetimes.com/document.asp?doc_id=1333016), February 2018.
- [66] V. Moroz et al. Modeling and optimization of group iv and iii-v finfets and nano-wires. In *2014 IEEE International Electron Devices Meeting*, pages 7.4.1–7.4.4, Dec 2014.
- [67] C. C. McAndrew et al. Best practices for compact modeling in verilog-a. *IEEE Journal of the Electron Devices Society*, 3(5):383–396, Sept 2015.
- [68] Cadence Design Systems, Inc. *LEF/DEF 5.8 Language Reference*, 2016.
- [69] Sayed Sherazi et al. Architectural strategies in standard-cell design for the 7 nm and beyond technology node. *Journal of Micro/Nanolithography, MEMS, and MOEMS*, 15, February 2016.
- [70] Cadence Design Systems, Inc. *Virtuoso Abstract Generator User Guide 6.1.6*, 2015.
- [71] Liberty reference manual. [https://people.eecs.berkeley.edu/~alanmi/publications/other/liberty07\\_03.pdf](https://people.eecs.berkeley.edu/~alanmi/publications/other/liberty07_03.pdf), 2013.
- [72] J. Bhasker. *Static Timing Analysis for Nanometer Designs*. Springer, 2009.
- [73] R. Goyal et al. Current based delay models: A must for nanometer timing. 01 2005.
- [74] Werner Gillijns et al. Impact of a sadp flow on the design and process for n10/n7 metal layers. *Design-Process-Technology Co-optimization for Manufacturability IX*, March 2015.
- [75] Ivan Ciofi et al. Impact of wire geometry on interconnect rc and circuit delay. *IEEE Transactions on Electron Devices*, 63:2488 – 2496, May 2016.
- [76] Cadence Design Systems, Inc. *Quantus QRC Techgen Reference Manual*, 2016.
- [77] Cadence Design Systems, Inc. *Innovus Stylus Common UI Text Reference Manual*, 2018.
- [78] S. Gangadharan et al. *Constraining Designs for Synthesis and Timing Analysis*. Springer, 2013.
- [79] Cadence Design Systems, Inc. *Power and Rail Analysis Using Voltus IC Integrity*, 2015.
- [80] Cadence Design Systems, Inc. *How to Effectively Use Libscore Functionality*, 2017.
- [81] D. Milojevic. Library-level characterization of sub-10nm processing nodes. *CDNLive EMEA 2017*, 2017.

- [82] N. Kate et al. Liquid immersion lithography system having a tilted showerhead relative to a substrate, 2006. US Patent 7,256,864.
- [83] H. Lee. Improvement of depth of focus control using wafer geometry, 2015.
- [84] Y. Khopkar et al. Evaluating vacuum components for particle performance for euv lithography, 2014.
- [85] R. Paschotta. article on 'bragg mirrors' in the encyclopedia of laser physics and technology. *Encyclopedia of Laser Physics and Technology*, 2008.
- [86] M. Singh et al. Improved theoretical reflectivities of extreme ultraviolet mirrors, 2000.
- [87] Asml integrated report 2017. [https://staticwww.asml.com/doclib/investor/annual\\_reports/2017/downloadcenter/reports/asml\\_20180207\\_2017\\_Integrated\\_Report\\_based\\_on\\_US\\_GAAP\\_FINAL.pdf](https://staticwww.asml.com/doclib/investor/annual_reports/2017/downloadcenter/reports/asml_20180207_2017_Integrated_Report_based_on_US_GAAP_FINAL.pdf), February 2018.
- [88] G. M. Gallatin et al. Resolution, ler, and sensitivity limitations of photoresists, 2008.
- [89] The 20-year journey to the chips of tomorrow. <https://medium.com/@ASMLcompany/the-20-year-journey-to-the-chips-of-tomorrow-4df3ac1ebc72>, 2016.
- [90] J. Van Schoot et al. High-na euv lithography enabling moore's law in the next decade. *SPIE proc.*, 2017.
- [91] Cadence Design Systems, Inc. *Via Pillar Flow in Innovus*, 2018.
- [92] A. Mallik et al., 2015.
- [93] R. Xie, K.Y. Lim, M.G. Sung, and R.R.H. Kim. Methods of forming single and double diffusion breaks on integrated circuit products comprised of finfet devices and the resulting products, August 9 2016. US Patent 9,412,616.
- [94] G. Bouche, A.C.H. Wei, G.P. Wells, A.P. Labonte, and J. Wan. Self-aligned gate contact formation, May 2 2017. US Patent 9,640,625.
- [95] Cadence Design Systems, Inc. *Innovus User Guide*, 2016.
- [96] B.E. Kastenmeier and R.E. Evazians. Semiconductor device having conductors with different dimensions and method for forming, July 12 2012. US Patent App. 13/004,988.
- [97] Cadence Design Systems, Inc. *Voltus IC Power Integrity Solution manual*, 2016.
- [98] Jim Dodrill. Voltage variations & sta panel. *ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems*, March 2017.
- [99] Sandeep Samal. Full chip impact study of power delivery network designs in gate-level monolithic 3-d ics. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, October 2016.
- [100] Ivan Ciofi et al. Rc modeling, scaling trends and mitigation approaches. *Materials for Advanced Metallization (MAM) Conference*, 2018.

- [101] L. Mattii et al. Efficient place and route enablement of 5-tracks standard-cells through euv compatible n5 ruleset, 2018.
- [102] Dmitry Yakimets et al. Lateral nwfet optimization for beyond 7nm nodes. *International Conference on IC Design & Technology (ICICDT)*, pages 1 – 4, June 2015.
- [103] A. Elzeftawi. Addressing process variation and reducing timing pessimism at 16nm and below. [www.cadence.com](http://www.cadence.com), 2016.
- [104] S. Pal et al. Supervia: Relieving routing congestion using double-height vias. *UCLA: Nanosystems Computer-Aided Design Laboratory*, 2017.
- [105] L. Zhu et al. Assessing benefits of a buried interconnect layer in digital designs. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 36(2):346–350, Feb 2017.
- [106] M. G. Bardon et al. Power-performance trade-offs for lateral nanosheets on ultra-scaled standard cells. *Symposium on VLSI Technology*, 2018.
- [107] S. D. Kim et al. Performance trade-offs in finfet and gate-all-around device architectures for 7nm-node and beyond. In *2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S)*, pages 1–3, Oct 2015.
- [108] A. Mallik et al. Euvl gen 2.0: key requirements for constraining semiconductor cost in advanced technology node manufacturing, 2018.
- [109] J. Ryckaert et al. The complementary fet (cfet) for cmos scaling beyond n3. *Symposium on VLSI Technology*, 2018.
- [110] J. V. Schoot et al. High-na euv lithography enabling moore’s law in the next decade, 2017.
- [111] M. K. Gupta et al. Device circuit and technology co-optimisation for finfet based 6t sram cells beyond n7. In *2017 47th European Solid-State Device Research Conference (ESSDERC)*, pages 256–259, Sept 2017.
- [112] Seok-Hee Lee. Scaling trends and challenges of advanced memory technology. In *Proceedings of Technical Program - 2014 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA)*, pages 1–1, April 2014.
- [113] W. H. Butler et al. Spin-dependent tunneling conductance of Fe|MgO|Fe sandwiches. *Phys. Rev. B*, 63:054416, Jan 2001.
- [114] L. Chi et al. 1-v full-swing depletion-load a-in-ga-zn-o inverters for back-end-of-line compatible 3d integration. *IEEE Electron Device Letters*, 37(4):441–444, April 2016.
- [115] Power Forward Initiative. A practical guide to low-power design user experience with cpf. *Semiconductor Energy Laboratory*.
- [116] J. P. G. de Mussy et al. Novel selective sidewall airgap process [single damascene interconnects]. In *Proceedings of the IEEE 2005 International Interconnect Technology Conference, 2005.*, pages 150–152, June 2005.

- [117] C. Penny et al. Reliable airgap beol technology in advanced 48 nm pitch copper/ulk interconnects for substantial power and performance benefits. In *2017 IEEE International Interconnect Technology Conference (IITC)*, pages 1–4, May 2017.
- [118] N. Chen et al. A highly integrated rfsoc design for 3g smart phone application. In *2016 IEEE 66th Electronic Components and Technology Conference (ECTC)*, pages 1309–1315, May 2016.
- [119] P. Kettner et al. Thin wafer handling and processing-results achieved and upcoming tasks in the field of 3d and tsv. In *2009 11th Electronics Packaging Technology Conference*, pages 787–789, Dec 2009.
- [120] F. X. Che et al. Study on warpage and stress of tsv wafer with ultra-fine pitch vias for high density chip stacking. In *2017 IEEE 19th Electronics Packaging Technology Conference (EPTC)*, pages 1–7, Dec 2017.
- [121] J. A. del Alamo et al. Iii–v cmos: What have we learned from hemts? In *IPRM 2011 - 23rd International Conference on Indium Phosphide and Related Materials*, pages 1–4, May 2011.
- [122] D. Marti et al. 150-ghz cutoff frequencies and 2-w/mm output power at 40 ghz in a millimeter-wave algan/gan hemt technology on silicon. *IEEE Electron Device Letters*, 33(10):1372–1374, Oct 2012.
- [123] A. R. Jha. Advances in iii-v transistors (hemts and hbts) for mm-wave applications. In *12th International Symposium on Electron Devices for Microwave and Optoelectronic Applications, 2004. EDMO 2004.*, pages 5–8, Nov 2004.
- [124] Y. Niida et al. 3.6 w/mm high power density w-band inalgan/gan hemt mmic power amplifier. In *2016 IEEE Topical Conference on Power Amplifiers for Wireless and Radio Applications (PAWR)*, pages 24–26, Jan 2016.
- [125] K. D. Cantley et al. Performance analysis of iii-v materials in a double-gate nmosfet. In *2007 IEEE International Electron Devices Meeting*, pages 113–116, Dec 2007.
- [126] R. Chau et al. Opportunities and challenges of iii-v nanoelectronics for future high-speed, low-power logic applications. In *IEEE Compound Semiconductor Integrated Circuit Symposium, 2005. CSIC '05.*, pages 4 pp.–, Oct 2005.
- [127] H. Wang et al. Design of test structures for electrical and reliability measurements in a 2.5d tsv interposer. In *2013 14th International Conference on Electronic Packaging Technology*, pages 41–45, Aug 2013.
- [128] P. Kumar et al. Validating 2.5d system-in-package inter-die communication on silicon interposer. In *2014 IEEE Electrical Design of Advanced Packaging Systems Symposium (EDAPS)*, pages 97–100, Dec 2014.
- [129] I. Sugaya et al. New precision wafer bonding technologies for 3dic. In *2015 International 3D Systems Integration Conference (3DIC)*, pages TS7.1.1–TS7.1.7, Aug 2015.

- [130] G. Neela et al. Techniques for assigning inter-tier signals to bondpoints in a face-to-face bonded 3dic. In *2013 IEEE International 3D Systems Integration Conference (3DIC)*, pages 1–6, Oct 2013.
- [131] J. De Vos et al. Importance of alignment control during permanent bonding and its impact on via-last alignment for high density 3d interconnects. In *2016 IEEE International 3D Systems Integration Conference (3DIC)*, pages 1–5, Nov 2016.
- [132] A. Mallik et al. The impact of sequential-3d integration on semiconductor scaling roadmap. In *2017 IEEE International Electron Devices Meeting (IEDM)*, pages 32.1.1–31.1.4, Dec 2017.
- [133] K. Chang et al. Cascade2d: A design-aware partitioning approach to monolithic 3d ic with 2d commercial tools. In *2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, pages 1–8, Nov 2016.
- [134] K. Acharya et al. Monolithic 3d ic design: Power, performance, and area impact at 7nm. In *2016 17th International Symposium on Quality Electronic Design (ISQED)*, pages 41–48, March 2016.
- [135] K. Chang et al. Match-making for monolithic 3d ic: Finding the right technology node. In *2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)*, pages 1–6, June 2016.
- [136] R. Ben Hur et al. Simple magic: Synthesis and in-memory mapping of logic execution for memristor-aided logic. In *2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, pages 225–232, Nov 2017.
- [137] R. B. Hur et al. Memory processing unit for in-memory processing. In *2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)*, pages 171–172, July 2016.
- [138] X. Wu et al. A cmos spiking neuron for brain-inspired neural networks with resistive synapses and in situ learning. *IEEE Transactions on Circuits and Systems II: Express Briefs*, 62(11):1088–1092, Nov 2015.
- [139] Il Song Han. Biologically plausible vlsi neural network implementation with asynchronous neuron and spike-based synapse. In *Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.*, volume 5, pages 3244–3248 vol. 5, July 2005.
- [140] B. Abinaya et al. An event based cmos quad bilateral combination with asynchronous sram architecture based neural network using low power. In *2015 2nd International Conference on Electronics and Communication Systems (ICECS)*, pages 995–999, Feb 2015.
- [141] T. Morie. Cmos circuits and nanodevices for spike based neural computing. In *2015 IEEE International Meeting for Future of Electron Devices, Kansai (IMFEDK)*, pages 112–113, June 2015.

## Authored and co-authored papers and journals

- L.Mattii et al.: **EDA Role In The Design-Technology Co-Optimization Towards N7.** – *CDNLive EMEA* , May 2016, Munich, Germany
- L.Mattii et al.: **IR-drop aware Design & Technology Co-Optimization for N5 node with different device and cell height options.** – *ICCAD*, October 2017, Irvine, USA
- L.Mattii et al.: **Design-Technology-EDA solutions for scaling at single digit nodes with constant ground rules.** – *CDNLive EMEA*, May 2017, Munich, Germany
- L.Mattii et al.: **Post place and route design-technology co-optimization for scaling at single-digit nodes with constant ground rules.** – *Journal of Micro/Nanolithography, MEMS, and MOEMS*, January 2018
- L.Mattii et al.: **Efficient place and route enablement of 5-tracks standard-cells through EUV compatible N5 ruleset.** – *SPIE*, February 2018, San Jose, USA
- P. Raghavan et al.: **Metal stack optimization for low-power and high-density for N7-N5.** – *SPIE*, February 2016, San Jose, USA
- Y. Sherazi et al.: **Low track height standard cell design in iN7 using scaling boosters.** – *SPIE*, February 2017, San Jose, USA
- C. Tabery et al.: **In-design and signoff lithography physical analysis for 7/5nm.** – *SPIE*, February 2017, San Jose, USA
- Y. Sherazi et al.: **Track height reduction for standard-cell in below 5nm node: How low can you go.** – *SPIE*, February 2018, San Jose, USA
- B. Chava et al.: **DTCO exploration for efficient standard cell power rails.** – *SPIE*, February 2018, San Jose, USA
- R. Baert et al.: **System-level impact of interconnect line-edge roughness.** – *IEEE International Interconnect Technology Conference (IITC)*, June 2018, Santa Clara, USA
- M. Bardon et al. : **Power-performance Trade-offs for Lateral NanoSheets on Ultra-Scaled Standard Cells.** – *Symposia on VLSI Technology and Circuits*, June 2018, Honolulu, USA

## imec internal works

- L.Mattii et al.: **Place and Route: PPA, IR-Drop analysis, metalization and standard cell options.** – *imec Partner Technical Week (PTW)*, April 2016, Leuven, Belgium
- L.Mattii et al.: **IR-drop awareness in the DTCO loop for imec N7.** – *imec Partner Technical Week (PTW)*, October 2016, Leuven, Belgium
- L.Mattii et al.: **Post P&R exploration of EUV patterning options in iN7.** – *imec Partner Technical Week (PTW)*, April 2017, Leuven, Belgium
- L.Mattii et al.: **Post Place and Route comparison between iN7 and iN5.** – *imec Partner Technical Week (PTW)*, October 2017, Leuven, Belgium
- L.Mattii et al.: **Post Place and Route comparison for iN7 and iN5 on Arm64 bit CPU.** – *imec Partner Technical Week (PTW)*, April 2018, Leuven, Belgium
- L.Mattii et al.: **Place and Route exploreation of CFET libraries with 4 and 4 tracks.** – *imec Partner Technical Week (PTW)*, April 2018, Leuven, Belgium



# Appendix A

## Components of a predictive PDK

### A.1 example of technology .lef

Example of Multiple patterning Via:

```
LAYER prV12
  TYPE CUT ;
  MASK 3 ;
# V12 width and length
  PROPERTY LEF58_CUTCLASS "
    CUTCLASS VX      WIDTH 0.024 LENGTH 0.016 ;
  " ;

# V12 Other color Spacing
  PROPERTY LEF58_SPACINGTABLE "
    SPACINGTABLE
    CENTERTOCENTER VX TO VX
    CUTCLASS      VX
    VX           0.042  0.042 ;
  " ;

# V12 Same color Spacing
  PROPERTY LEF58_SPACINGTABLE "
    SPACINGTABLE
    SAMEMASK
    CENTERTOCENTER VX TO VX
    CUTCLASS      VX
    VX           0.100  0.100 ;
  " ;
```

```
# V12 Enclosure
PROPERTY LEF58_ENCLOSURE "
  ENCLOSURE CUTCLASS VX BELOW END 0.000 SIDE 0.000 ;
  ENCLOSURE CUTCLASS VX ABOVE END 0.008 SIDE 0.000 ;
"
;

END prV12
```

Example of SAxP line + Double color cut:

```
LAYER CUT_M2
  TYPE MASTERSLICE ;
  MASK 2 ;
  PROPERTY LEF58_TYPE "
    TYPE TRIMMETAL ; " ;
  PROPERTY LEF58_TRIMMEDMETAL "
    TRIMMEDMETAL PRM2 ; " ;
  PROPERTY LEF58_TRIMSHAPE "
    TRIMSHAPE EXTENSIONMODEL ADJACENTTRACK EXACTWIDTH 0.024
    MAXLENGTH 1.0000 USEMETALMASK ; " ;
  PROPERTY LEF58_SPACING "
    SPACING 0.050 PRLSPACING 0.050 0.084 ENDTOEND 0.050 PRL 0
    SAMEMASK ; " ;
END CUT_M2

LAYER PRM2
  TYPE ROUTING ;
  MASK 2 ;
  DIRECTION HORIZONTAL ;
  PITCH 0.032 0.032 ;
  WIDTH 0.016 ;
  OFFSET 0 0 ;
  AREA 0.00128 ;
  SPACING 0.016 ;
  MAXWIDTH .160 ;
  MINCLOSEDAREA 0.001 ;
  PROPERTY LEF58_SPACING "SPACING 0.020 SAMEMASK ; " ;
  PROPERTY LEF58_SPACING "SPACING 0.021 ENDOLINE 0.017 WITHIN 0.000
    SAMEMASK ;" ;
  PROPERTY LEF58_RECTONLY
    "RECTONLY
    ;" ;
END PRM2
```

## A.2 example of macro .lef

Example of NAND2 cell:

```
MACRO ND2D1
  CLASS CORE ;
  ORIGIN 0 0 ;
  FOREIGN ND2D1 0 0 ;
  SIZE 0.168 BY 0.24 ;
  SYMMETRY X Y ;
  SITE core ;
  PIN A1
    DIRECTION INPUT ;
    USE SIGNAL ;
    PORT
      LAYER prMINT ;
      RECT 0.009 0.1095 0.075 0.1305 ;
    END
  END A1
  PIN A2
    DIRECTION INPUT ;
    USE SIGNAL ;
    PORT
      LAYER prMINT ;
      RECT 0.093 0.1095 0.159 0.1305 ;
    END
  END A2
  PIN VDD
    DIRECTION INOUT ;
    USE POWER ;
    PORT
      LAYER prMINT ;
      RECT -0.021 0.1735 0.189 0.1945 ;
    END
  END VDD
  PIN VSS
    DIRECTION INOUT ;
    USE GROUND ;
    PORT
      LAYER prMINT ;
      RECT -0.021 0.0455 0.189 0.0665 ;
    END
  END VSS
  PIN ZN
```

```

DIRECTION OUTPUT ;
USE SIGNAL ;
PORT
  LAYER prVINT1 ;
  RECT 0.0720 0.1415 0.0960 0.1625 ;
  RECT 0.0720 0.0775 0.0960 0.0985 ;
  LAYER prMINT ;
  RECT 0.009 0.0775 0.159 0.0985 ;
  RECT 0.009 0.1415 0.159 0.1625 ;
  LAYER PRM1 ;
  RECT 0.072 0.072 0.096 0.168 ;
END
END ZN
END ND2D1

```

Example of Flip Flop cell:

```

MACRO DFCNQSTKD1
  CLASS CORE ;
  ORIGIN 0 0 ;
  FOREIGN DFCNQSTKD1 0 0 ;
  SIZE 0.546 BY 0.48 ;
  SYMMETRY X Y ;
  SITE core ;
  PIN CP
    DIRECTION INPUT ;
    USE CLOCK ;
    PORT
      LAYER prMINT ;
      RECT 0.45 0.1095 0.537 0.1305 ;
    END
  END CP
  PIN D
    DIRECTION INPUT ;
    USE SIGNAL ;
    PORT
      LAYER prMINT ;
      RECT 0.009 0.1095 0.18 0.1305 ;
    END
  END D
  PIN VDD
    DIRECTION INOUT ;
    USE POWER ;
    PORT
      LAYER prMINT ;

```

```
        RECT -0.021 0.2855 0.18 0.3065 ;
        RECT 0.282 0.2855 0.567 0.3065 ;
        RECT -0.021 0.1735 0.567 0.1945 ;
    END
END VDD
PIN VSS
DIRECTION INOUT ;
USE GROUND ;
PORT
LAYER prMINT ;
    RECT -0.021 0.4135 0.348 0.4345 ;
    RECT 0.45 0.4135 0.567 0.4345 ;
    RECT -0.021 0.0455 0.567 0.0665 ;
END
END VSS
PIN CDN
DIRECTION INPUT ;
USE SIGNAL ;
PORT
LAYER prVINT1 ;
    RECT 0.3240 0.2535 0.3480 0.2745 ;
    RECT 0.3240 0.1095 0.3480 0.1305 ;
LAYER prMINT ;
    RECT 0.282 0.1095 0.348 0.1305 ;
    RECT 0.24 0.2535 0.348 0.2745 ;
LAYER PRM1 ;
    RECT 0.324 0.104 0.348 0.280 ;
END
END CDN
PIN Q
DIRECTION OUTPUT ;
USE SIGNAL ;
PORT
LAYER prVINT1 ;
    RECT 0.4920 0.3815 0.5160 0.4025 ;
    RECT 0.4920 0.3175 0.5160 0.3385 ;
LAYER prMINT ;
    RECT 0.45 0.3175 0.537 0.3385 ;
    RECT 0.45 0.3815 0.537 0.4025 ;
LAYER PRM1 ;
    RECT 0.492 0.312 0.516 0.408 ;
END
END Q
OBS
```

```
LAYER prMINT ;
  RECT 0.009 0.4455 0.096 0.4665 ;
  RECT 0.009 0.3815 0.096 0.4025 ;
  RECT 0.009 0.0135 0.096 0.0345 ;
  RECT 0.009 0.2535 0.138 0.2745 ;
  RECT 0.009 0.2055 0.138 0.2265 ;
  RECT 0.009 0.1415 0.138 0.1625 ;
  RECT 0.009 0.0775 0.138 0.0985 ;
  RECT 0.114 0.4455 0.18 0.4665 ;
  RECT 0.009 0.3495 0.18 0.3705 ;
  RECT 0.114 0.0135 0.18 0.0345 ;
  RECT 0.156 0.2535 0.222 0.2745 ;
  RECT 0.198 0.4455 0.264 0.4665 ;
  RECT 0.198 0.2855 0.264 0.3065 ;
  RECT 0.198 0.1095 0.264 0.1305 ;
  RECT 0.198 0.3495 0.306 0.3705 ;
  RECT 0.156 0.1415 0.348 0.1625 ;
  RECT 0.156 0.0775 0.348 0.0985 ;
  RECT 0.198 0.0135 0.348 0.0345 ;
  RECT 0.366 0.4135 0.432 0.4345 ;
  RECT 0.114 0.3815 0.432 0.4025 ;
  RECT 0.324 0.3495 0.432 0.3705 ;
  RECT 0.009 0.3175 0.432 0.3385 ;
  RECT 0.366 0.1415 0.432 0.1625 ;
  RECT 0.366 0.1095 0.432 0.1305 ;
  RECT 0.366 0.0775 0.432 0.0985 ;
  RECT 0.282 0.4455 0.537 0.4665 ;
  RECT 0.45 0.3495 0.537 0.3705 ;
  RECT 0.366 0.2535 0.537 0.2745 ;
  RECT 0.156 0.2055 0.537 0.2265 ;
  RECT 0.45 0.1415 0.537 0.1625 ;
  RECT 0.45 0.0775 0.537 0.0985 ;
  RECT 0.366 0.0135 0.537 0.0345 ;

LAYER prVINT1 ;
  RECT 0.0720 0.3495 0.0960 0.3705 ;
  RECT 0.0720 0.3175 0.0960 0.3385 ;
  RECT 0.1140 0.2535 0.1380 0.2745 ;
  RECT 0.1140 0.2055 0.1380 0.2265 ;
  RECT 0.1560 0.4455 0.1800 0.4665 ;
  RECT 0.1560 0.0135 0.1800 0.0345 ;
  RECT 0.2400 0.2855 0.2640 0.3065 ;
  RECT 0.2400 0.1095 0.2640 0.1305 ;
  RECT 0.2820 0.3495 0.3060 0.3705 ;
  RECT 0.2820 0.0775 0.3060 0.0985 ;
```

```

RECT 0.3660 0.4135 0.3900 0.4345 ;
RECT 0.3660 0.3175 0.3900 0.3385 ;
RECT 0.3660 0.2535 0.3900 0.2745 ;
RECT 0.3660 0.1095 0.3900 0.1305 ;
RECT 0.3660 0.0135 0.3900 0.0345 ;
RECT 0.4080 0.3815 0.4320 0.4025 ;
RECT 0.4080 0.1415 0.4320 0.1625 ;
RECT 0.4080 0.0775 0.4320 0.0985 ;
RECT 0.4080 0.3495 0.4320 0.3705 ;
RECT 0.4500 0.2535 0.4740 0.2745 ;
RECT 0.4920 0.2055 0.5160 0.2265 ;
RECT 0.4920 0.1415 0.5160 0.1625 ;
RECT 0.4920 0.0775 0.5160 0.0985 ;
RECT 0.4920 0.0135 0.5160 0.0345 ;

LAYER PRM1 ;
RECT 0.072 0.305 0.096 0.376 ;
RECT 0.114 0.190 0.138 0.280 ;
RECT 0.156 0.009 0.180 0.472 ;
RECT 0.198 0.009 0.222 0.472 ;
RECT 0.240 0.104 0.264 0.312 ;
RECT 0.282 0.072 0.306 0.376 ;
RECT 0.366 0.248 0.390 0.440 ;
RECT 0.366 0.009 0.390 0.136 ;
RECT 0.408 0.338 0.432 0.408 ;
RECT 0.408 0.072 0.432 0.210 ;
RECT 0.450 0.248 0.474 0.376 ;
RECT 0.492 0.009 0.516 0.232 ;

LAYER prV12 ;
RECT 0.1140 0.1900 0.1380 0.2100 ;
RECT 0.1980 0.1900 0.2220 0.2100 ;
RECT 0.4080 0.1900 0.4320 0.2100 ;

LAYER PRM2 ;
RECT 0.0945 0.1900 0.4515 0.2100 ;
END
END DFCNQSTKD1

```

### A.3 example of .lib files

Example of Inverter .lib file in ECSM format.

```

cell (INV1) {
    ecm_vtn : 0.35;
    ecm_vtp : 0.35;

```

```

area : 0.03;
cell_leakage_power : 4.54975;
leakage_power () {
    value : 4.54989;
    when : "(I * !(ZN))";
}
leakage_power () {
    value : 4.54962;
    when : "(!(I) * ZN)";
}
pin (ZN) {
    direction : output;
    function : "!(I)";
    max_capacitance : 0.12127;
    timing () {
        related_pin : "I";
        timing_sense : negative_unate;
        timing_type : combinational;
        cell_rise (delay_template_7x7) {
            index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
0.0448141, 0.115949, 0.3");
            index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
0.0194458, 0.0485611, 0.12127");
            values ( \
"0.00335254, 0.00530689, 0.010015, 0.0217976, 0.0505884,
0.122615, 0.300129", \
"0.00409603, 0.00602921, 0.0107541, 0.0224899, 0.0513477,
0.122841, 0.300836", \
"0.00569017, 0.00811754, 0.0127568, 0.0244178, 0.0530619,
0.12504, 0.302774", \
"0.00782638, 0.0114506, 0.0179332, 0.0297408, 0.0586944,
0.131499, 0.307903", \
"0.0108663, 0.0161925, 0.0257478, 0.0428425, 0.0728573,
0.144361, 0.321192", \
"0.0146708, 0.0223811, 0.0366881, 0.0620898, 0.106091,
0.181803, 0.35874", \
"0.0187316, 0.0299765, 0.0510517, 0.0886282, 0.154219,
0.267064, 0.45919" \
);
        };
    }
    rise_transition (delay_template_7x7) {
        index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
0.0448141, 0.115949, 0.3");
    }
}

```

```
    index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
              0.0194458, 0.0485611, 0.12127");
    values ( \
              "0.00201513, 0.00385998, 0.00848931, 0.0198333,
              0.0483891, 0.119772, 0.294567", \
              "0.00203101, 0.00385906, 0.00846885, 0.0199212,
              0.0484416, 0.118906, 0.294539", \
              "0.00305525, 0.00433548, 0.00848252, 0.0198254,
              0.0486136, 0.1197, 0.29457", \
              "0.00492692, 0.00674523, 0.0103062, 0.0199913, 0.0482429,
              0.120278, 0.293989", \
              "0.0084972, 0.0110769, 0.0161276, 0.0250458, 0.0486231,
              0.120003, 0.294234", \
              "0.0156795, 0.0196358, 0.0268796, 0.0395214, 0.0634856,
              0.120392, 0.294614", \
              "0.0297508, 0.0366228, 0.0477077, 0.0668167, 0.0991537,
              0.158871, 0.298304" \
);
    ecmw waveform ("0") {
        index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                   0.9, 0.95";
        values : \
                  "0.0863379, 0.086526, 0.0869129, 0.0873141, 0.0877318,
                  0.0881871, 0.0887036, 0.0893292, 0.0901428,
                  0.0914847, 0.0927247";
    }
    ecmw waveform ("1") {
        index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                   0.9, 0.95";
        values : \
                  "0.0865835, 0.0869451, 0.0876844, 0.0884592, 0.089279,
                  0.0901415, 0.0911358, 0.0923192, 0.0938623,
                  0.0964228, 0.0988053";
    }
    ecmw waveform ("2") {
        index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                   0.9, 0.95";
        values : \
                  "0.0870559, 0.087848, 0.0894743, 0.0911676, 0.0929747,
                  0.0948495, 0.0970553, 0.0996569, 0.103055, 0.108664,
                  0.113853";
    }
    ecmw waveform ("3") {
```

```
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
      "0.0881405, 0.090009, 0.0938599, 0.097856, 0.102045,
       0.106632, 0.111587, 0.117689, 0.125969, 0.138893,
       0.151545";
}
ecsm_waveform ("4") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
      "0.0907954, 0.0953429, 0.104675, 0.114491, 0.124378,
       0.135423, 0.14781, 0.16288, 0.182361, 0.214623,
       0.244978";
}
ecsm_waveform ("5") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
      "0.0974064, 0.108671, 0.13177, 0.155443, 0.180457,
       0.207449, 0.238146, 0.275215, 0.324523, 0.403113,
       0.477544";
}
ecsm_waveform ("6") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
      "0.113907, 0.141832, 0.198568, 0.25699, 0.318197,
       0.384963, 0.460112, 0.551557, 0.673064, 0.873741,
       1.06947";
}
ecsm_waveform ("7") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
      "0.0864256, 0.0868608, 0.0875034, 0.0880472, 0.0884735,
       0.0889306, 0.0894484, 0.0900782, 0.0909019,
       0.0922388, 0.0934496";
}
ecsm_waveform ("8") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
```

```
    "0.0869483, 0.0875429, 0.0884232, 0.0891849, 0.0899899,
     0.0908638, 0.0918448, 0.0930439, 0.0946109,
     0.0971246, 0.0994826";
}
ecsm_waveform ("9") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0877078, 0.0885938, 0.0902074, 0.091887, 0.0936328,
     0.0955887, 0.0977281, 0.100356, 0.103791, 0.109335,
     0.114454";
}
ecsm_waveform ("10") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0888868, 0.0907478, 0.094577, 0.0985397, 0.102787,
     0.107324, 0.112342, 0.118461, 0.126612, 0.139661,
     0.152267";
}
ecsm_waveform ("11") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0915484, 0.0960959, 0.105434, 0.115209, 0.125136,
     0.136182, 0.148571, 0.163651, 0.183169, 0.215412,
     0.245664";
}
ecsm_waveform ("12") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0981551, 0.109394, 0.132318, 0.156032, 0.181116,
     0.207676, 0.237999, 0.274938, 0.32449, 0.40231,
     0.479344";
}
ecsm_waveform ("13") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.114663, 0.142596, 0.19935, 0.257796, 0.318969,
     0.38567, 0.460974, 0.552335, 0.674025, 0.874663,
     1.07028";
}
```

```
ecsm_waveform ("14") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.08566653, 0.0867589, 0.0880293, 0.088975, 0.0897708,
         0.0905247, 0.0912461, 0.0920303, 0.0930309,
         0.0944959, 0.0959357";
}
ecsm_waveform ("15") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0866885, 0.0879587, 0.0895639, 0.0908547, 0.0919403,
         0.0929521, 0.0939628, 0.0951902, 0.0968602,
         0.0996397, 0.102381";
}
ecsm_waveform ("16") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0879164, 0.0896729, 0.092042, 0.0938902, 0.0956534,
         0.0975913, 0.0997686, 0.102373, 0.10582, 0.111382,
         0.116465";
}
ecsm_waveform ("17") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0900688, 0.09265, 0.0965231, 0.10047, 0.104661,
         0.109252, 0.11417, 0.120296, 0.128567, 0.141369,
         0.15413";
}
ecsm_waveform ("18") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0934808, 0.0980155, 0.107306, 0.116812, 0.126943,
         0.137896, 0.150314, 0.165426, 0.185174, 0.217127,
         0.246571";
}
ecsm_waveform ("19") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
```

```
    "0.100051, 0.11129, 0.134147, 0.158162, 0.182587,
     0.209874, 0.240527, 0.277862, 0.326157, 0.405901,
     0.480723";
}
ecsm_waveform ("20") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.116543, 0.14446, 0.201194, 0.259606, 0.320861,
     0.387608, 0.462706, 0.554176, 0.675624, 0.876313,
     1.07208";
}
ecsm_waveform ("21") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0829162, 0.0854737, 0.0882455, 0.0900357, 0.0914437,
     0.0926609, 0.0938364, 0.0949627, 0.0962648,
     0.0981529, 0.100023";
}
ecsm_waveform ("22") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0842729, 0.0871179, 0.0904851, 0.0927436, 0.0946041,
     0.0962852, 0.0978474, 0.0994889, 0.101473,
     0.104529, 0.107293";
}
ecsm_waveform ("23") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0866112, 0.0901734, 0.0944585, 0.0975818, 0.100314,
     0.102768, 0.105203, 0.107888, 0.111566, 0.117712,
     0.123817";
}
ecsm_waveform ("24") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0901571, 0.0949526, 0.101059, 0.105852, 0.110012,
     0.114575, 0.11971, 0.125843, 0.134024, 0.147077,
     0.159299";
}
```

```
ecsm_waveform ("25") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.096222, 0.102887, 0.112542, 0.12221, 0.132413,
         0.143529, 0.155592, 0.170453, 0.190668, 0.221937,
         0.253118";
}
ecsm_waveform ("26") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.104893, 0.116391, 0.139433, 0.163484, 0.188172,
         0.216333, 0.246362, 0.283762, 0.333258, 0.411179,
         0.486729";
}
ecsm_waveform ("27") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.121548, 0.149477, 0.20622, 0.264803, 0.326321,
         0.392737, 0.467716, 0.558792, 0.681341, 0.881418,
         1.07639";
}
ecsm_waveform ("28") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0751254, 0.0816282, 0.0876408, 0.0910986, 0.0936006,
         0.0957009, 0.0976762, 0.0995958, 0.101588,
         0.104356, 0.107092";
}
ecsm_waveform ("29") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0768998, 0.0838443, 0.09076, 0.0950233, 0.0981905,
         0.101027, 0.103577, 0.1061, 0.108878, 0.112818,
         0.116366";
}
ecsm_waveform ("30") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
```

```
    "0.0803593, 0.0880044, 0.0964903, 0.102091, 0.106596,
     0.110582, 0.114451, 0.118219, 0.12275, 0.129352,
     0.135434";
}
ecsm_waveform ("31") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
    0.9, 0.95";
    values : \
    "0.0866671, 0.095842, 0.10728, 0.115031, 0.121634,
     0.127677, 0.133579, 0.140077, 0.148639, 0.163054,
     0.177648";
}
ecsm_waveform ("32") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
    0.9, 0.95";
    values : \
    "0.0964312, 0.108729, 0.124378, 0.136228, 0.1467,
     0.157692, 0.169904, 0.184851, 0.204969, 0.23647,
     0.26723";
}
ecsm_waveform ("33") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
    0.9, 0.95";
    values : \
    "0.112185, 0.129159, 0.153308, 0.17707, 0.202333,
     0.229196, 0.259791, 0.297072, 0.345411, 0.425192,
     0.499918";
}
ecsm_waveform ("34") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
    0.9, 0.95";
    values : \
    "0.13405, 0.163066, 0.220076, 0.278666, 0.340034,
     0.406027, 0.48194, 0.572899, 0.695311, 0.895754,
     1.0909";
}
ecsm_waveform ("35") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
    0.9, 0.95";
    values : \
    "0.0553622, 0.0693487, 0.0838552, 0.0907791, 0.0955796,
     0.0995054, 0.103015, 0.106459, 0.109917, 0.115146,
     0.122327";
}
```

```
ecsm_waveform ("36") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0564577, 0.0732424, 0.0881393, 0.0964378, 0.102344,
         0.107216, 0.111788, 0.116074, 0.120548, 0.127182,
         0.133163";
}
ecsm_waveform ("37") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0607198, 0.0788947, 0.0968258, 0.107074, 0.114978,
         0.121523, 0.127874, 0.133953, 0.140454, 0.149485,
         0.157556";
}
ecsm_waveform ("38") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0700377, 0.0895019, 0.111917, 0.125985, 0.136862,
         0.146924, 0.156178, 0.165507, 0.17635, 0.192136,
         0.205877";
}
ecsm_waveform ("39") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.087129, 0.11106, 0.138856, 0.158615, 0.175968,
         0.190925, 0.205726, 0.2221, 0.242783, 0.278349,
         0.313688";
}
ecsm_waveform ("40") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.113162, 0.144133, 0.18388, 0.213665, 0.239607,
         0.266637, 0.296894, 0.334057, 0.383667, 0.461967,
         0.537332";
}
ecsm_waveform ("41") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
```

```
    "0.151074, 0.194301, 0.256893, 0.31556, 0.376799,
     0.443574, 0.518731, 0.610174, 0.731688, 0.932364,
     1.12809";
}

ecsm_waveform ("42") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0, 0.0417621, 0.073075, 0.0867889, 0.0960055,
     0.103566, 0.110287, 0.11654, 0.123375, 0.136698,
     0.172901";
}

ecsm_waveform ("43") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.00415395, 0.0410651, 0.0777969, 0.0943512, 0.105679,
     0.114811, 0.123085, 0.130974, 0.139087, 0.152246,
     0.174188";
}

ecsm_waveform ("44") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.00777315, 0.0510987, 0.0896272, 0.109728, 0.12422,
     0.135886, 0.147117, 0.157436, 0.168211, 0.184201,
     0.199631";
}

ecsm_waveform ("45") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0196007, 0.066153, 0.111578, 0.137302, 0.157358,
     0.173463, 0.189445, 0.204118, 0.220078, 0.242282,
     0.261826";
}

ecsm_waveform ("46") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
     0.9, 0.95";
    values : \
    "0.0471859, 0.0955983, 0.150883, 0.185981, 0.21444,
     0.239054, 0.262179, 0.285134, 0.312839, 0.352474,
     0.386748";
}
```

```

ecsm_waveform ("47") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.0894216, 0.149976, 0.221027, 0.271656, 0.314179,
         0.351899, 0.389962, 0.430527, 0.481356, 0.56945,
         0.657851";
}
ecsm_waveform ("48") {
    index_1 : "0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
               0.9, 0.95";
    values : \
        "0.159741, 0.236195, 0.336883, 0.412264, 0.476751,
         0.544024, 0.619065, 0.710568, 0.832776, 1.03276,
         1.22826";
}
ecsm_capacitance (fall) {
    threshold_pct : "70.0";
    values : \
        "0.000339402, 0.000353384, 0.00036221, 0.000366373,
         0.000368082, 0.00036873, 0.000368969, \
         0.000368186, 0.000378663, 0.000386673, 0.000391106,
         0.000393009, 0.000393699, 0.000393913, \
         0.000393308, 0.000393126, 0.000398431, 0.000401837,
         0.000403471, 0.000404098, 0.00040416, \
         0.000403393, 0.00040825, 0.000405236, 0.000406814,
         0.000407743, 0.000408066, 0.000408083, \
         0.000410977, 0.00041037, 0.000409924, 0.000409879,
         0.000409709, 0.000409699, 0.000409604, \
         0.000421923, 0.000419235, 0.000418294, 0.000416247,
         0.000415644, 0.000414695, 0.000414112, \
         0.000424124, 0.000425303, 0.00042348, 0.000422767,
         0.000420278, 0.000417903, 0.000416724";
}
ecsm_capacitance (fall) {
    threshold_pct : "50.0";
    values : \
        "0.000361355, 0.000374137, 0.000382772, 0.000387167,
         0.000389049, 0.000389781, 0.000390049, \
         0.00039028, 0.000395677, 0.000400869, 0.000404291,
         0.000405862, 0.000406451, 0.000406642, \
         0.00041982, 0.000413325, 0.00041209, 0.00041258,
         0.000412986, 0.00041258, 0.000412464, \

```

```
    0.000443898, 0.000437316, 0.000426226, 0.000420632,
    0.000418142, 0.000416942, 0.000415633, \
0.000466391, 0.000455706, 0.000443063, 0.000431955,
    0.000424385, 0.000420436, 0.000418582, \
0.000489363, 0.000477747, 0.000464411, 0.000448567,
    0.000437383, 0.000429141, 0.000424479, \
0.00051302, 0.000500088, 0.000485297, 0.000470054,
    0.000453622, 0.000439464, 0.000430442";
}
ecsm_capacitance (fall) {
    threshold_pct : "30.0";
    values : \
    "0.000383253, 0.000392332, 0.000398774, 0.000402223,
    0.000403761, 0.000404377, 0.000404611, \
0.000417231, 0.000415154, 0.000415045, 0.000416011,
    0.000416418, 0.000416574, 0.000416631, \
0.000461423, 0.000442239, 0.000430185, 0.000424933,
    0.000422737, 0.000419982, 0.000419434, \
0.000533589, 0.000488303, 0.000454767, 0.000437184,
    0.000429152, 0.000425598, 0.000422296, \
0.000634368, 0.000576159, 0.000503046, 0.000461632,
    0.000441087, 0.000431209, 0.000426993, \
0.000644004, 0.000642785, 0.000602647, 0.000513701,
    0.000467331, 0.000445326, 0.000434401, \
0.000648675, 0.000647122, 0.000646083, 0.000616936,
    0.000521858, 0.000469632, 0.000444873";
}
}
cell_fall (delay_template_7x7) {
    index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
    0.0448141, 0.115949, 0.3");
    index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
    0.0194458, 0.0485611, 0.12127");
    values ( \
    "0.0034676, 0.00550566, 0.0104225, 0.0224271, 0.052295,
    0.127232, 0.31066", \
    "0.00420427, 0.00621899, 0.0110833, 0.023169, 0.0529816,
    0.127889, 0.311518", \
    "0.00587731, 0.00834427, 0.0130677, 0.0250606, 0.0548789,
    0.129932, 0.313467", \
    "0.00817934, 0.0118593, 0.0184261, 0.0304457, 0.0600514,
    0.134712, 0.318392", \
    "0.0116508, 0.017028, 0.0267092, 0.0439937, 0.0745947,
    0.148936, 0.332365", \
```

```

    "0.0165942, 0.0243544, 0.0387695, 0.0645047, 0.109245,
     0.186219, 0.369725", \
    "0.0235311, 0.0349326, 0.0561381, 0.0939816, 0.160388,
     0.275094, 0.470612" \
);
}

fall_transition (delay_template_7x7) {
    index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
              0.0448141, 0.115949, 0.3");
    index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
              0.0194458, 0.0485611, 0.12127");
    values ( \
        "0.0020361, 0.00391714, 0.00858225, 0.0202622, 0.0485294,
         0.120908, 0.29588", \
        "0.00202729, 0.00389959, 0.0085165, 0.0200598, 0.0486105,
         0.1211, 0.295698", \
        "0.00306626, 0.00432597, 0.00852014, 0.0200542,
         0.0487401, 0.120844, 0.29582", \
        "0.00495468, 0.00680563, 0.0102771, 0.019973, 0.0486213,
         0.120622, 0.295593", \
        "0.00856476, 0.0111807, 0.0163032, 0.0253694, 0.0485789,
         0.120977, 0.296271", \
        "0.0157013, 0.0197161, 0.0271501, 0.0397862, 0.0636442,
         0.120302, 0.296372", \
        "0.0300137, 0.0366322, 0.0479006, 0.0676158, 0.100576,
         0.158867, 0.298448" \
);
}

ecsm_waveform ("0") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0840928, 0.0842901, 0.0846947, 0.0851118, 0.0855414,
         0.0860098, 0.0865279, 0.0871479, 0.0879656,
         0.0892279, 0.0904434";
}

ecsm_waveform ("1") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0843512, 0.0847301, 0.0855044, 0.0863117, 0.0871596,
         0.0880478, 0.0890591, 0.0902289, 0.0917899,
         0.094175, 0.0965154";
}

ecsm_waveform ("2") {

```

```
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0848457, 0.0856757, 0.0873753, 0.0891402, 0.0910001,
         0.0929647, 0.0951578, 0.0977225, 0.101139,
         0.106408, 0.111531";
}
ecsm_waveform ("3") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0859828, 0.0879401, 0.0919547, 0.0960854, 0.100512,
         0.104969, 0.110313, 0.116348, 0.12436, 0.137018,
         0.148791";
}
ecsm_waveform ("4") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0887692, 0.093543, 0.103269, 0.113283, 0.123867,
         0.134837, 0.147155, 0.161812, 0.181333, 0.212356,
         0.241612";
}
ecsm_waveform ("5") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0956981, 0.107485, 0.131406, 0.156386, 0.18159,
         0.209774, 0.2404, 0.277293, 0.325419, 0.401594,
         0.473203";
}
ecsm_waveform ("6") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.113002, 0.142304, 0.201711, 0.26268, 0.326014,
         0.393202, 0.469053, 0.558561, 0.677941, 0.871131,
         1.05879";
}
ecsm_waveform ("7") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
```

```
    "0.0842121, 0.0846516, 0.0853048, 0.0858495, 0.0862799,
     0.0867465, 0.0872634, 0.0878768, 0.0886803,
     0.0899798, 0.0911454";
}

ecsm_waveform ("8") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0847383, 0.085331, 0.0862307, 0.0870226, 0.0878632,
     0.0887612, 0.0897797, 0.0909222, 0.0924866,
     0.094872, 0.0972367";
}

ecsm_waveform ("9") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0855191, 0.0864151, 0.0881041, 0.0898524, 0.091682,
     0.0936254, 0.0957787, 0.0983689, 0.101749, 0.107084,
     0.112107";
}

ecsm_waveform ("10") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0867249, 0.0886741, 0.0926846, 0.0968137, 0.101187,
     0.105711, 0.110763, 0.116874, 0.124903, 0.137263,
     0.149553";
}

ecsm_waveform ("11") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0894821, 0.0942452, 0.103987, 0.114142, 0.124361,
     0.135524, 0.147943, 0.162752, 0.18192, 0.213184,
     0.241949";
}

ecsm_waveform ("12") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0964155, 0.108212, 0.132332, 0.156935, 0.18334,
     0.210431, 0.241436, 0.278035, 0.32593, 0.403007,
     0.472775";
}
```

```
ecsm_waveform ("13") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.113712, 0.14301, 0.202427, 0.263414, 0.326785,
         0.39406, 0.469762, 0.559112, 0.678649, 0.871652,
         1.05974";
}
ecsm_waveform ("14") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0835177, 0.0845303, 0.0858864, 0.0868386, 0.0876767,
         0.0884195, 0.0891347, 0.0899049, 0.0908635,
         0.092281, 0.093708";
}
ecsm_waveform ("15") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0845825, 0.0858261, 0.0875101, 0.0887589, 0.0898748,
         0.0908865, 0.0918802, 0.0930849, 0.0947106,
         0.0974037, 0.100057";
}
ecsm_waveform ("16") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0857558, 0.0875536, 0.0899552, 0.0918542, 0.0936716,
         0.0956099, 0.0978295, 0.100374, 0.103747, 0.109133,
         0.11407";
}
ecsm_waveform ("17") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0879689, 0.0906132, 0.0946191, 0.098742, 0.10303,
         0.107603, 0.112685, 0.118796, 0.126742, 0.13942,
         0.151113";
}
ecsm_waveform ("18") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
```

```
    "0.0915384, 0.0962901, 0.106007, 0.115906, 0.126625,
     0.137421, 0.149962, 0.164646, 0.183829, 0.215206,
     0.244078";
}

ecsm_waveform ("19") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0984677, 0.11025, 0.134193, 0.159114, 0.184415,
     0.212474, 0.243072, 0.279958, 0.328047, 0.404378,
     0.47565";
}

ecsm_waveform ("20") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.115756, 0.14505, 0.20447, 0.265444, 0.32879,
     0.396009, 0.47181, 0.561264, 0.680699, 0.873825,
     1.06164";
}

ecsm_waveform ("21") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0809908, 0.0833347, 0.0862148, 0.0880499, 0.0894461,
     0.0907215, 0.0918703, 0.0930045, 0.0943101,
     0.0961928, 0.0978998";
}

ecsm_waveform ("22") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0822125, 0.0851839, 0.0885156, 0.0907975, 0.0927213,
     0.0944015, 0.0959739, 0.0976032, 0.0995704,
     0.102511, 0.105152";
}

ecsm_waveform ("23") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0845781, 0.0882132, 0.0925344, 0.0957435, 0.0984871,
     0.100968, 0.103393, 0.106021, 0.109584, 0.115394,
     0.121325";
}
```

```
ecsm_waveform ("24") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0881819, 0.093061, 0.0992618, 0.104172, 0.108434,
         0.112988, 0.118125, 0.124145, 0.13198, 0.144813,
         0.156649";
}
ecsm_waveform ("25") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.094381, 0.101171, 0.111138, 0.121166, 0.131461,
         0.142594, 0.154988, 0.169788, 0.188942, 0.220197,
         0.2491";
}
ecsm_waveform ("26") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.103227, 0.115102, 0.139104, 0.163846, 0.1897,
         0.217254, 0.247616, 0.284469, 0.332826, 0.407939,
         0.481536";
}
ecsm_waveform ("27") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.120562, 0.149817, 0.209241, 0.27023, 0.333617,
         0.400934, 0.476557, 0.565823, 0.685438, 0.878345,
         1.06665";
}
ecsm_waveform ("28") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0732794, 0.0799634, 0.0860637, 0.0895595, 0.0920805,
         0.094193, 0.0961823, 0.0981242, 0.100101, 0.102821,
         0.105444";
}
ecsm_waveform ("29") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
```

```
    "0.0750635, 0.0821888, 0.0891884, 0.0935045, 0.0967038,
     0.0995702, 0.102157, 0.104685, 0.107449, 0.111258,
     0.114744";
}

ecsm_waveform ("30") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0785423, 0.0864003, 0.0949528, 0.100625, 0.105232,
     0.109251, 0.113151, 0.116929, 0.121377, 0.127746,
     0.133681";
}

ecsm_waveform ("31") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0847891, 0.0943661, 0.105462, 0.113526, 0.120502,
     0.126536, 0.132482, 0.138895, 0.147145, 0.161261,
     0.175165";
}

ecsm_waveform ("32") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.0948557, 0.107384, 0.123244, 0.135443, 0.146059,
     0.157137, 0.169443, 0.184022, 0.203552, 0.234298,
     0.263977";
}

ecsm_waveform ("33") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.110762, 0.128285, 0.153173, 0.177913, 0.203999,
     0.231478, 0.262, 0.29889, 0.347154, 0.422894,
     0.495349";
}

ecsm_waveform ("34") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
    "0.133487, 0.163436, 0.223024, 0.283974, 0.347185,
     0.414907, 0.490301, 0.580245, 0.699103, 0.892843,
     1.07981";
}
```

```
ecsm_waveform ("35") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0540376, 0.0687267, 0.0834222, 0.0904079, 0.0952268,
         0.0991363, 0.102696, 0.106109, 0.109575, 0.115,
         0.120997";
}
ecsm_waveform ("36") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0551311, 0.0724673, 0.0877271, 0.0960877, 0.102024,
         0.106897, 0.111531, 0.115804, 0.120346, 0.126608,
         0.132454";
}
ecsm_waveform ("37") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0589267, 0.0780862, 0.0964026, 0.106719, 0.114745,
         0.121312, 0.127736, 0.133869, 0.140342, 0.149127,
         0.157055";
}
ecsm_waveform ("38") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.068897, 0.0887122, 0.111513, 0.125788, 0.136841,
         0.147047, 0.15633, 0.165574, 0.176618, 0.19229,
         0.206073";
}
ecsm_waveform ("39") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.086248, 0.110643, 0.138898, 0.15906, 0.176554,
         0.191787, 0.206636, 0.222704, 0.242975, 0.276974,
         0.310491";
}
ecsm_waveform ("40") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
```

```
    "0.112464, 0.144218, 0.184811, 0.215178, 0.241461,
     0.268761, 0.299045, 0.33548, 0.38389, 0.459184,
     0.53317";
}

ecsm_waveform ("41") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.151559, 0.19553, 0.260047, 0.321169, 0.384394,
         0.452268, 0.527439, 0.617541, 0.736156, 0.930118,
         1.11748";
}

ecsm_waveform ("42") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0, 0.0427982, 0.0757943, 0.0893729, 0.0986329,
         0.106073, 0.112856, 0.119387, 0.126185, 0.138799,
         0.173422";
}

ecsm_waveform ("43") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.00537416, 0.0431728, 0.0803491, 0.0970336, 0.108398,
         0.117475, 0.125916, 0.133666, 0.142002, 0.154539,
         0.174961";
}

ecsm_waveform ("44") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.00785399, 0.0527106, 0.0922734, 0.112288, 0.127003,
         0.13868, 0.149937, 0.160189, 0.170826, 0.185729,
         0.201066";
}

ecsm_waveform ("45") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0199427, 0.0679461, 0.113806, 0.139886, 0.160175,
         0.176524, 0.192685, 0.207502, 0.223423, 0.245147,
         0.263619";
}
```

```
ecsm_waveform ("46") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0484317, 0.098135, 0.153422, 0.188999, 0.21841,
         0.24293, 0.266482, 0.289575, 0.316288, 0.354445,
         0.38808";
}
ecsm_waveform ("47") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.0910676, 0.152969, 0.224532, 0.27607, 0.319313,
         0.357636, 0.39485, 0.434938, 0.484403, 0.568798,
         0.652228";
}
ecsm_waveform ("48") {
    index_1 : "0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2,
               0.1, 0.05";
    values : \
        "0.161975, 0.240272, 0.342859, 0.419562, 0.484954,
         0.553154, 0.62852, 0.71801, 0.837376, 1.03028,
         1.21908";
}
ecsm_capacitance (rise) {
    threshold_pct : "30.0";
    values : \
        "0.000337908, 0.000351925, 0.000360669, 0.000364853,
         0.000366609, 0.000367196, 0.000367387, \
         0.000367237, 0.000377642, 0.000385997, 0.000390465,
         0.000392425, 0.000393071, 0.000393235, \
         0.000388669, 0.000392627, 0.000398071, 0.000401538,
         0.000403168, 0.000403733, 0.000403851, \
         0.000403919, 0.000403322, 0.00040498, 0.00040665,
         0.000407607, 0.00040795, 0.000407973, \
         0.000410705, 0.000410251, 0.000409799, 0.000409944,
         0.00040965, 0.000409683, 0.000409633, \
         0.000421758, 0.000419106, 0.000417327, 0.000416187,
         0.000415615, 0.00041463, 0.000414077, \
         0.000424755, 0.000425144, 0.000423404, 0.000422686,
         0.000420217, 0.000418052, 0.000416712";
}
ecsm_capacitance (rise) {
    threshold_pct : "50.0";
```

```

values : \
"0.000360762, 0.000373704, 0.000382278, 0.000386684,
 0.000388607, 0.000389301, 0.000389537, \
0.000390258, 0.000395561, 0.000401211, 0.000404678,
 0.0004052, 0.000405797, 0.000405971, \
0.000417577, 0.000413836, 0.00041257, 0.000413224,
 0.000413488, 0.000413612, 0.000413636, \
0.000442332, 0.000434216, 0.000426373, 0.000421133,
 0.000418855, 0.000416818, 0.000416087, \
0.00046413, 0.00045449, 0.000442666, 0.00043219,
 0.00042491, 0.000421254, 0.000419433, \
0.000484533, 0.000474735, 0.000461188, 0.000448266,
 0.000437777, 0.000429815, 0.000425388, \
0.000503933, 0.000493982, 0.000481827, 0.000468465,
 0.000453252, 0.000439976, 0.000431257";
}

ecsm_capacitance (rise) {
  threshold_pct : "70.0";
  values : \
"0.00038423, 0.000393663, 0.000400049, 0.000403541,
 0.000405128, 0.000405736, 0.000405957, \
0.000418192, 0.000416587, 0.000417208, 0.000418037,
 0.000416613, 0.000416832, 0.000416899, \
0.00045928, 0.000442585, 0.000431907, 0.00042705,
 0.000424349, 0.000422952, 0.000422555, \
0.000528359, 0.00048474, 0.000455398, 0.000438816,
 0.000431333, 0.000426575, 0.000425167, \
0.000631891, 0.000568644, 0.000500648, 0.000461465,
 0.00044243, 0.00043354, 0.000428937, \
0.000643452, 0.000643496, 0.000593539, 0.000510921,
 0.000467514, 0.000446892, 0.00043644, \
0.000648325, 0.000647983, 0.000645735, 0.000610485,
 0.000518682, 0.000469719, 0.000446256";
}

internal_power () {
  related_pin : "I";
  rise_power (power_template_7x7) {
    index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
      0.0448141, 0.115949, 0.3");
    index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
      0.0194458, 0.0485611, 0.12127");
    values ( \

```

```
    "9.93131e-05, 0.000104969, 0.00010846, 0.000110127,
     0.000110533, 0.00011063, 0.000116064", \
    "9.39162e-05, 9.98259e-05, 0.000105046, 0.000108592,
     0.000110239, 0.000110687, 0.000116345", \
    "0.000104474, 0.000105882, 0.000106727, 0.000108893,
     0.000110212, 0.000110368, 0.000116164", \
    "0.00015718, 0.000146052, 0.00013651, 0.000126264,
     0.000119062, 0.000115018, 0.000118042", \
    "0.000340948, 0.000308445, 0.000262781, 0.000214853,
     0.000170211, 0.000141595, 0.000132612", \
    "0.000866602, 0.0008031, 0.000702017, 0.000569104,
     0.000414471, 0.000282746, 0.000207879", \
    "0.00224378, 0.00216219, 0.00199342, 0.00171469,
     0.00133114, 0.000924823, 0.000607151" \
);
}

fall_power (power_template_7x7) {
    index_1 ("0.001, 0.00258734, 0.00669433, 0.0173205,
     0.0448141, 0.115949, 0.3");
    index_2 ("0.0005, 0.00124863, 0.00311815, 0.00778684,
     0.0194458, 0.0485611, 0.12127");
    values (
        "9.97004e-05, 0.000105458, 0.000109176, 0.000110937,
         0.000111065, 0.000111203, 0.000115254", \
        "9.38779e-05, 0.00010016, 0.000105653, 0.000109306,
         0.000110479, 0.000110998, 0.000114822", \
        "0.000104202, 0.000106118, 0.000106757, 0.000109011,
         0.000110685, 0.000111211, 0.000115542", \
        "0.000158412, 0.000145965, 0.000136108, 0.000126085,
         0.000118826, 0.000114226, 0.000117382", \
        "0.000340677, 0.000308096, 0.000262408, 0.000220155,
         0.000170108, 0.000141802, 0.000131498", \
        "0.000865408, 0.000801752, 0.00070537, 0.000567273,
         0.000412256, 0.000281087, 0.000204636", \
        "0.00223998, 0.00215936, 0.0019903, 0.00171054,
         0.00132699, 0.000920284, 0.000596869" \
);
}

pin (I) {
    direction : input;
    max_transition : 0.3;
    capacitance : 0.00051302;
}
```

```

    rise_capacitance : 0.000503933;
    rise_capacitance_range (0.000360762, 0.000503933);
    fall_capacitance : 0.00051302;
    fall_capacitance_range (0.000361355, 0.00051302);
}
}

```

## A.4 example of .ict file

Example of Conductor layer:

```

conductor PRM2 {
    min_spacing          0.01
    min_width            0.016
    delta_height         0.000
    delta_layer          ILD2
    thickness            0.024
    gate_forming_layer  FALSE
    wire_top_enlargement 0.000628893351396
    wire_bottom_enlargement -0.000628893351396
    rho
        rho_silicon_widths      0.0134543426441 0.0142543794902
        0.0151019890029 0.016 0.0169514095097 0.0179593927729
        0.01902731384 0.0201587367983 0.0213574376667 0.022627416998
        0.048
        rho_silicon_thicknesses 0.0192 0.0216 0.024 0.0264 0.0288
        rho_values              0.103439821126 0.101029313766
        0.0986461898954 0.0963035022837 0.0940141481147
        0.0917905339075 0.0896442313782 0.087585641393
        0.0856236842884 0.0837655344357 0.0687746109504
    0.099657109793 0.097233415223 0.0948367718086 0.0924804231165
    0.0901774298176 0.0879403323222 0.0857808082054 0.0837093416268
    0.0817349225334 0.0798647925037 0.0648270009075
    0.0965646015145 0.0941332398703 0.0917289402786 0.0893650733368
    0.0870547903086 0.0848106851104 0.0826444552493 0.0805665788665
    0.0785860251058 0.0767100136023 0.0616656602116
    0.0940143169278 0.0915796313751 0.0891722078554 0.0868054961639
    0.0844926863359 0.082246371305 0.0800782126667 0.0779986265503
    0.0760165061831 0.0741389958641 0.0591140929425
    0.0918919823244 0.0894572223276 0.0870500056379 0.08468382649
    0.0823718782161 0.0801267175474 0.0779599359912 0.075881855048
    0.0739012611742 0.0720251941532 0.0570340438274
}
}

```

```
}
```

Example of Via layer:

```
via    prV12 {  
    bottom_layer      PRM1  
    top_layer         PRM2  
    area_resistance  66.7143535945 0.00016384 48.6442635641  
                    0.00020736 36.819840797 0.000256 28.7230294651 0.00030976  
                    22.9712714323 0.00036864 13.1572696083 0.000576 6.64648926901  
                    0.001024 4.01341840115 0.0016  
}
```

Example of dielectric layers:

```
dielectric    DB2 {  
    conformal          FALSE  
    delta_height       0.000  
    delta_layer        IMD1  
    thickness          0.007  
    dielectric_constant 5.5  
}  
  
dielectric    ILD2 {  
    conformal          FALSE  
    delta_height       0.000  
    delta_layer        DB2  
    thickness          0.017  
    dielectric_constant 2.8  
}  
  
dielectric    IMD2 {  
    conformal          FALSE  
    delta_height       0.000  
    delta_layer        ILD2  
    thickness          0.024  
    dielectric_constant 2.8  
}  
  
dielectric    IMD2dmg {  
    conformal          TRUE  
    expandedFrom       PRM2  
    delta_height       0.000  
    delta_layer        ILD2  
    thickness          0.000
```

```
    topThickness          0.000
    sideExpand            0.0015
    dielectric_constant  4.5
}
```



Figure A.1 Managing PDK files versions for different nodes and DTCO options through SVN repository.

## A.5 File management through SVN repository

Descriptive names for the PDN components.



# Appendix B

## Examples of EDA scripts

### B.1 Logical and Physical synthesis

```
#####
## Preset global variables and attributes
#####
set DESIGN ArmM0

set GEN_EFF medium
set MAP_OPT_EFF high
set PHYS_EFF high

set DATE [clock format [clock seconds] -format "%b%d-%T"]

set TESTCASE_PHY testcase_phy

set _OUTPUTS_PATH outputs_${TESTCASE_PHY}
set _REPORTS_PATH reports_${TESTCASE_PHY}
set _LOG_PATH logs_${TESTCASE_PHY}

if {! [file exists ${_LOG_PATH}]} {file mkdir ${_LOG_PATH}; puts "
Creating directory ${_LOG_PATH}"}
if {! [file exists ${_OUTPUTS_PATH}]} {file mkdir ${_OUTPUTS_PATH};
puts "Creating directory ${_OUTPUTS_PATH}"}
if {! [file exists ${_REPORTS_PATH}]} {file mkdir ${_REPORTS_PATH};
puts "Creating directory ${_REPORTS_PATH}"}

set_attribute init_lib_search_path {./PDK/LIB/} /
set_attribute init_hdl_search_path {./PDK/RTL/} /
set_attribute script_search_path {./scripts} /
```

```
## Uncomment and specify machine names to enable super-threading.
##set_attribute super_thread_servers {<machine names>} /
##For design size of 1.5M - 5M gates, use 8 to 16 CPUs. For designs >
## 5M gates, use 16 to 32 CPUs
set_attribute max_cpus_per_server 8 /

# For Debug Purposes
set_attr super_thread_debug_directory super_thread_debug_directory /
set_attr heartbeat 600 /
set_attr information_level 9 /
set_attr pbs_debug_level 1 /
set_attr phys_flow_effort high /
#set_attribute phys_legalization_enhancement true /

## Include leakage and dynamic power in QoS reporting
set_attribute qos_report_power true /

## Innovus executable path and globla settings
set_attribute invs_gzip_interface_files true /
#SC set_attribute time_recovery_arcs true /
#SC set env(ENCOUNTER) <Innovus executable path>
#SC set_attribute innovus_executable <invs_exe_path> /
  ;# Set path to innovus executable to used by synth -to_placed
  regexp \[0-9\]+\(\.\.\[0-9\]+\) [get_attribute program_version /] exe_ver
  exe_sub_ver
  puts "Executable Version: $exe_ver"
#SC set_attribute time_recovery_arcs true /

#####
## Library setup
#####
set_attribute timing_use_ecsm_pin_capacitance true /


set_attribute library {\ \
./PDK/LIB/N05_6T_MINT.lvt.tt.vdd0.65.T25_ecsm.lib \
./PDK/LIB/N05_6T_MINT.lvt.tt.vdd0.65.T25_High_Drive_ecsm.lib \
./PDK/LIB/N05_6T_MINT.lvt.tt.vdd0.65.T25_nldm.extra_ff.lib \
./PDK/LIB/N05_clkgate.lvt.tt.vdd0.65.T25_nldm.scan.lib \
} /


set_attribute lef_library {\ \
```

```
./PDK/LEF/N05_6TMint_6TM2_M1open_outboundpg_tech.lef \
./PDK/LEF/main_EUV_6T.lef \
./PDK/LEF/main_EUV_6T.dup.lef \
./PDK/LEF/extra_EUV_6T.lef \
./PDK/LEF/scan_EUV_6T.lef \
} /


#Cells having problems in Liberate
set_attribute avoid 1 { RCA0I211D8 RCA0I211D12 NR3D12 NR3D8 INR2XD12
NR4D4 NR4D8 NR4D12 AOI211D8 AOI211D12 }

#INNOVUS SETTINGS FOR PHYSICAL FLOW

set_attribute innovus_executable /icd/flow/INNOVUS/INNOVUS162/16.21-
e020_1/lnx86/bin/innovus /
set_attribute invs_temp_dir ./INVS_DB /
set_attribute invs_preload_script ./SCRIPTS_design/pre_load.tcl /
set_attribute invs_postload_script ./SCRIPTS_design/post_load.tcl /


set_attribute qrc_tech_file ./PDK/TECH/qrcTechFile
#SC set_attribute number_of_routing_layers <value> /designs/$DESIGN

##set_attribute congestion_effort <low|medium|high> /
set_attribute lp_insert_clock_gating true /

## Power root attributes
#set_attribute lp_clock_gating_prefix <string> /
#set_attribute lp_power_analysis_effort <high> /
#set_attribute lp_power_unit mW /
#set_attribute lp_toggle_rate_unit /ns /
set_attribute lp_multi_vt_optimization_effort low /


#####
## Load Design
#####

##Default undriven/unconnected setting is 'none'.
##set_attribute hdl_unconnected_input_port_value 0 | 1 | x | none /
##set_attribute hdl_undriven_output_port_value 0 | 1 | x | none /
```

```

## set_attribute hdl_undriven_signal_value      0 | 1 | x | none /
## generates <signal>_reg[<bit_width>] format
#set_attribute hdl_array_naming_style %s\[%d\] /
set_attribute hdl_track_filename_row_col true /

read_hdl "./PDK/RTL/CORTEXMODS.v ./PDK/RTL/cortexm0ds_logic.v"
elaborate $DESIGN

time_info Elaboration
check_design -unresolved
time_info Check_Design_Elaboration

#Reading floorPlan (and PowerPlan) for physical synthesis
read_def -no_nets FPLAN.def

#####
## Constraints Setup
#####

read_sdc ./PDK/RTL/ARM_M0_1_8G.sdc
time_info Read_sdc

report timing -lint
time_info Report_timing_lint_Read_sdc
#####
## read in def file.
#####

#SC read_def <file_name>

#####
## Define cost groups (clock-clock, clock-output, input-clock, input-
## output)
#####

## Uncomment to remove already existing costgroups before creating
## new ones.
## rm [find /designs/* -cost_group *]

if {[llength [all::all_seqs]] > 0} {
    define_cost_group -name I2C -design $DESIGN
}

```

```

define_cost_group -name C20 -design $DESIGN
define_cost_group -name C2C -design $DESIGN
path_group -from [all::all_seqs] -to [all::all_seqs] -group C2C -
    name C2C
path_group -from [all::all_seqs] -to [all::all_outs] -group C20 -
    name C20
path_group -from [all::all_inps] -to [all::all_seqs] -group I2C -
    name I2C
}

define_cost_group -name I20 -design $DESIGN
path_group -from [all::all_inps] -to [all::all_outs] -group I20 -
    name I20

foreach cg [find / -cost_group *] {
    report timing -cost_group [list $cg] >> ${_REPORTS_PATH}/${DESIGN}_
        _pretim.rpt
}
#####
## Leakage/Dynamic power/Clock Gating setup.
#####

#set_attribute lp_clock_gating_cell [find /lib* -libcell <
    cg_libcell_name>] "/designs/$DESIGN"
set_attribute max_leakage_power 0.0 "/designs/$DESIGN"
#set_attribute lp_power_optimization_weight <value from 0 to 1> "/
    designs/$DESIGN"
#set_attribute max_dynamic_power <number> "/designs/$DESIGN"
#set_attribute lp_optimize_dynamic_power_first true "/designs/$DESIGN"
"
## read_tcf <TCF file name>
## read_saif <SAIF file name>
## read_vcd <VCD file name>

#### To turn off sequential merging on the design
#### uncomment & use the following attributes.
##set_attribute optimize_merge_flops false /
##set_attribute optimize_merge_latches false /
#### For a particular instance use attribute 'optimize_merge_seqs' to
    turn off sequential merging.

#####
## Synthesizing to generic
#####

```

```

set_attribute syn_generic_effort $GEN_EFF /
#syn_generic -physical
syn_generic -physical
time_info SYN_GEN

write_snapshot -outdir $_REPORTS_PATH -tag generic
report_summary -outdir $_REPORTS_PATH
report datapath > $_REPORTS_PATH/generic/${DESIGN}_datapath.rpt
time_info SYN_GEN_REPORTS
write_db -to_file ./DB/generic/${DESIGN}_generic.db
#####
## Synthesizing to gates
#####

set_attribute syn_map_effort $MAP_OPT_EFF /
#syn_map -physical
syn_map -physical
time_info SYN_MAP

write_snapshot -outdir $_REPORTS_PATH -tag map
report_summary -outdir $_REPORTS_PATH
foreach cg [find / -cost_group *] {
    report timing -cost_group [list $cg] > $_REPORTS_PATH/${DESIGN}_[
        vbsename $cg]_post_map.rpt
}
time_info SYN_MAP_REPORTS

## Intermediate netlist for LEC verification..
write_hdl -lec > ${_OUTPUTS_PATH}/${DESIGN}_intermediate.v
write_do_lec -revised_design ${_OUTPUTS_PATH}/${DESIGN}_intermediate.
    v -logfile ${_LOG_PATH}/rtl2intermediate.lec.log > ${_OUTPUTS_PATH}
    /rtl2intermediate.lec.do
write_db -to_file ./DB/intermediate/${DESIGN}_int.db
## ungroup -threshold <value>

#####
## Optimize Netlist
#####

## Uncomment to remove assigns & insert tiehilo cells during
## Incremental synthesis
## set_attribute remove_assigns true /

```

```

##set_remove_assign_options -buffer_or_inverter <libcell> -design <
design|subdesign>
##set_attribute use_tiehilo_for_const <none|duplicate|unique> /
set_attribute syn_opt_effort $MAP_OPT_EFF /
#syn_opt -physical
syn_opt -physical
time_info SYN_OPT

write_snapshot -outdir $_REPORTS_PATH -tag syn_opt
report_summary -outdir $_REPORTS_PATH

#SC write_snapshot -innovus -outdir $_REPORTS_PATH -tag incr_physical
#SC report_summary -outdir $_REPORTS_PATH

foreach cg [find / -cost_group -null_ok *] {
    report timing -cost_group [list $cg] > $_REPORTS_PATH/${DESIGN}_[
        vbasename $cg]_post_opt.rpt
}

time_info SYN_OPT_REPORTS

#####
## QoS Prediction & Optimization.
#####

#SC set_attribute invs_temp_dir ${_OUTPUTS_PATH}/genus_invs_pred /
#SC set_attribute syn_opt_effort $PHYS_EFF /
#SC syn_opt -physical
#SC write_snapshot -innovus -outdir $_REPORTS_PATH -tag
    syn_opt_physical
#SC report_summary -outdir $_REPORTS_PATH

report clock_gating > $_REPORTS_PATH/${DESIGN}_clockgating.rpt
report power -depth 0 > $_REPORTS_PATH/${DESIGN}_power.rpt
report gates -power > $_REPORTS_PATH/${DESIGN}_gates_power.rpt
report datapath > $_REPORTS_PATH/${DESIGN}_datapath_incr.rpt
report messages > $_REPORTS_PATH/${DESIGN}_messages.rpt

#####
## Final: write Innovus file set (verilog, SDC, config, etc.)
#####

#SC write_snapshot -innovus -outdir $_REPORTS_PATH -tag
    final_physical

```

```

#SC report_summary -outdir $_REPORTS_PATH

write_hdl > ${_OUTPUTS_PATH}/${DESIGN}_m.v
## write_script > ${_OUTPUTS_PATH}/${DESIGN}_m.script
write_sdc > ${_OUTPUTS_PATH}/${DESIGN}_m.sdc

write_db -to_file ./DB/final/${DESIGN}_final.db
write_design -innovus -basename ./DB/final/${DESIGN}_final
#####
### write_do_lec
#####

write_do_lec -golden_design ${_OUTPUTS_PATH}/${DESIGN}_intermediate.v
    -revised_design ${_OUTPUTS_PATH}/${DESIGN}_m.v -logfile ${_LOG_PATH}/intermediate2final.lec.log > ${_OUTPUTS_PATH}/
    intermediate2final.lec.do
## Uncomment if the RTL is to be compared with the final netlist..
## write_do_lec -revised_design ${_OUTPUTS_PATH}/${DESIGN}_m.v -
    logfile ${_LOG_PATH}/rtl2final.lec.log > ${_OUTPUTS_PATH}/
    rtl2final.lec.do
puts "Final Runtime & Memory."
time_info FINAL
puts "=====
puts "Synthesis Finished . . . . ."
puts "====="

file copy [get_attribute stdout_log /] ${_LOG_PATH}/.

##quit

```

## B.2 Cadence Foundation Flow

## B.3 Dynamic Vectorless analysis

```

read_design -physical_data ../DBS/postroute.enc.dat ArmMO

read_spef ../postroute.spef -rc_corner typ

set_power_analysis_mode \

```

```
-report_missing_nets true \
-method dynamic_vectorless \
-create_binary_db true \
-power_grid_library           { \
    ./techview/tech_techonly.cl \
    ./stdcellview/stdcell_stdcells.cl \
} \
-write_static_currents false

set_dynamic_power_simulation -period 5ns -resolution 10ps

set_power_output_dir dynamic_power
report_power -outfile dynamic_power/power.rpt

set_rail_analysis_mode \
    -method           dynamic \
    -accuracy        hd \
    -work_directory_name work_dynamic \
    -save_voltage_waveforms true \
    -decap_cell_list { DCAP1 DCAP2 DCAP4 DCAP8 DCAP16 } \
    -filler_cell_list { FILL1 FILL2 FILL4 FILL8 FILL16 FILL32 FILL64 } \
    -power_grid_library { \
        ./techview/tech_techonly.cl \
        ./stdcellview/stdcell_stdcells.cl \
    }

set_pg_nets -net VDD -voltage 0.65 -threshold 0.60
set_pg_nets -net VSS -voltage 0.0 -threshold 0.05

set_power_pads \
    -net           VDD \
    -format        xy \
    -file          ../VDD_test.pp

set_power_pads \
    -net           VSS \
    -format        xy \
    -file          ../VSS_test.pp
```

```

set_rail_analysis_domain -name all -pwrnets VDD -gndnets VSS

set_power_data -reset

set_power_data \
  -format current \
{ \
  dynamic_power/dynamic_VDD.ptiavg \
  dynamic_power/dynamic_VSS.ptiavg \
}

set_dynamic_rail_simulation \
  -resolution 10ps \
  -voltage 1.0

analyze_rail \
  -results_directory      ./dynamicRailResults \
  -type                  domain \
  all

##### TESTING DEEP TRENCH ON MINT

scale_what_if_resistance -global -net VDD -layer prMINT -scale 0.5
scale_what_if_resistance -global -net VDD -layer prMINT -scale 0.5
analyze_rail \
  -results_directory      ./dynamicRail_DEEP_TRENCH \
  -type                  domain \
  all

```

## B.4 Compare Metrics

```

um_compare_html -gold RPT/ArmMO_Testcase1.html
-comp RPT/ArmMO_Testcase2.html
-comp RPT/ArmMO_TestcaseN.html > PPA_Comparison.html

```