

RUHR-UNIVERSITÄT BOCHUM

Horst Görtz Institute for IT-Security

# A Non-Linear/Linear Instruction Set Extension for Lightweight Ciphers

Susanne Engels, <u>Elif Bilge Kavun</u>, Hristina Mihajloska, Christof Paar, Tolga Yalçın



#### **Overview**

- Lightweight and Pervasive Devices
  - Which Devices and Applications?
  - Security Need
- Standard/Lightweight Cryptography
  - Current Solutions
  - Software-oriented Solutions?
- Instruction Set Extension: NLU
  - ISE Model
  - Hardware Unit
  - Applications
- Conclusion and Future Directions



#### **Overview**

- Lightweight and Pervasive Devices
  - Which Devices and Applications?
  - Security Need
- Standard/Lightweight Cryptography
  - Current Solutions
  - Software-oriented Solutions?
- Instruction Set Extension: NLU
  - ISE Model
  - Hardware Unit
  - Applications
- Conclusion and Future Directions





DEVELOPMENT OF COMPUTERS AND CONSTRAINED DEVICES



Time



LIGHTWEIGHT APPLICATIONS



Electronic passports

Logistics





Road toll-collection



**CONSTRAINED DEVICES?** 







- More precisely:
  - RFID-Tags: Radio-Frequency Identification
  - Smart Cards
  - Wireless Sensors



**CONSTRAINED DEVICES?** 

- Low power and energy consumption
  - Active devices with on-chip batteries
  - Battery-less passive devices that rely on limited EM-transmitted power
- Low area and complexity
  - Gate count, I/O pin count, storage
- Constrained communication bandwidth
  - Due to increased device mobility and power constraints





THE NEED FOR SECURING CONSTRAINED DEVICES



- Control on access: Car key systems, internet banking, etc.
- Enforcing business models: Electronic wallet, SIM-cards, etc.
- Counterfeiting: Gaming, batteries, etc.
- Privacy protection: GSM, medical sensors, etc.













**CONSTRAINED DEVICES?** 

- Low power and energy consumption
  - Active devices with on-chip batteries
  - Battery-less passive devices that rely on limited EM-transmitted power
- Low area and complexity
  - Gate count, I/O pin count, storage
- Constrained communication bandwidth
  - Due to increased device mobility and power constraints



**CONSTRAINED DEVICES?** 

- DusseSincreased device mobility

  Journsumption

  Jou



#### **Overview**

- Lightweight and Pervasive Devices
  - Which Devices and Applications?
  - Security Need
- Standard/Lightweight Cryptography
  - Current Solutions
  - Software-oriented Solutions?
- Instruction Set Extension: NLU
  - ISE Model
  - Hardware Unit
  - Applications
- Conclusion and Future Directions



Requirements : The trade-off





LIGHTWEIGHT CRYPTOGRAPHY REQUIREMENTS?

- No strict criteria, but common features:
  - Should be cheaper than traditional cryptography
  - A reduced level of security is sufficient
    - ➤ Key size below 128 bits
  - Short data block size





LIGHTWEIGHT BLOCK CIPHERS

- Algorithms with particularly low implementation costs
  - Tailored to fulfill previously mentioned requirements
- Examples:
  - PRESENT, CLEFIA (ISO standards), KLEIN, LED, mCrypton, etc.



LIGHTWEIGHT BLOCK CIPHERS

- Algorithms with particularly low implementation costs
  - Tailored to fulfill previously mentioned requirements
- Examples:
  - PRESENT, CLEFIA (ISO standards), KLEIN, LED, mCrypton, etc.

Mostly targeted for

low gate count in hardware!



LIGHTWEIGHT BLOCK CIPHERS - SOFTWARE SOLUTIONS

- 8-bit microprocessors are used widely in the market
  - Hardware-optimized *software-unfriendly* lightweight ciphers are actually mostly implemented in 8-bit microprocessors!
  - Results in higher code size and more cycles
- We should proceed with software-friendly solutions and designs!



LIGHTWEIGHT BLOCK CIPHERS - SOFTWARE SOLUTIONS

Software-friendly solutions?

Not much there...



LIGHTWEIGHT BLOCK CIPHERS - SOFTWARE SOLUTIONS: RELATED WORK

- Not necessarily for lightweight block ciphers
- Implementation of cipher-specific instructions
  - Plugging the specific cipher as a coprocessor to the main module
  - Increases microprocessor area!
- Other works introduce complex instructions utilization



LIGHTWEIGHT BLOCK CIPHERS - SOFTWARE SOLUTIONS: RELATED WORK

- Not necessarily for lightweight block ciphers
- Implementation of cipher-specific instructions
  - Plugging the specific cipher as a coprocessor to the main module
  - Increases microprocessor area!
- Other works introduce complex instructions utilization

A first attempt: **NLU**!!!



#### **Overview**

- Lightweight and Pervasive Devices
  - Which Devices and Applications?
  - Security Need
- Standard/Lightweight Cryptography
  - Current Solutions
  - Software-oriented Solutions?
- Instruction Set Extension: NLU
  - ISE Model
  - Hardware Unit
  - Applications
- Conclusion and Future Directions



NON-LINEAR/LINEAR INSTRUCTION SET EXTENSION FOR LIGHTWEIGHT CIPHERS

- In block ciphers;
  - Non-linear refers to substitution Introduces confusion
  - Linear refers to permutation Introduces diffusion
- Block ciphers designed in a way to provide these!
  - Sbox layer for substitution
  - Mixing layer for permutation
- They are essential but also costly in software!



Non-Linear/Linear Instruction Set Extension For Lightweight Ciphers

- Non-linear refers to substitution Introduces of the son
   Linear refers to permutation Introduces of the son
   Linear refers to permutation Introduces of the son
   Block ciphers designed in a warrust provision
   Block ciphers designed in a warrust provision these!
   Sbox layer for substitution in a variable of the solution o



ISE MODEL

- Special instructions
  - To realize non-linear and linear layers of block ciphers
  - To reduce cycle count and code size
- A unified hardware block for:
  - Cycle-consuming substitution and permutation operations
- Call new instructions in software!
  - Results in less cycles...



ISE MODEL

- Special instructions
  - To realize non-linear and linear layers of block ciphers
  - To reduce cycle count and code size
- A unified hardware block for:
  - Cycle-consuming substitution and permutation operations
- Call new instructions in software!
  - Results in less cycles...

### Hardware block should be cheap!



ISE MODEL

- For substitution:
  - Sboxes expressed in their Algebraic Normal Form (ANF)
- For permutation:
  - Linear layer expressed in binary matrix multiply-and-add form



ISE MODEL

- For substitution:
  - Sboxes expressed in their Algebraic Normal Form (ANF)
- For permutation:
  - Linear layer expressed in binary matrix multiply-and-add form

Very simple and generic architecture!



- Non-linear operations: Expressed in their ANF
  - In Boolean logic, ANF is a method of standardizing and normalizing logical formulas
  - ANF makes it easy to define the function
  - Better result in software than using lookup table for Sbox



HARDWARE - NON-LINEAR UNIT



 $\blacksquare$   $m_i$  used for masking the unused ANF components



HARDWARE - NON-LINEAR UNIT



 $\blacksquare$   $m_i$  used for masking the unused ANF components



















HARDWARE -LINEAR UNIT

Linear operations: In binary matrix multiplication form



34



HARDWARE -LINEAR UNIT

Linear operations: In binary matrix multiplication form





HARDWARE -LINEAR UNIT



 $\blacksquare$   $m_i$  used for masking



**HARDWARE** 

NLU: Overall unit





**HARDWARE** 

NLU: Overall unit





**HARDWARE** 

- Shift registers to perform both:
  - $out[n] = mask \times in[n]$
  - out[n] = ( mask x in[n] ) + out[n-i] , i = 1 ... 4
- matrix multiply-and-add form!
  - Used in Present...



**NLU Instructions** 

| Instruction                    | Syntax         | Description                                        |
|--------------------------------|----------------|----------------------------------------------------|
| NLD                            | NLD n , K      | $CONF \leftarrow CONF \ll K[MSB-n], if n>0$        |
| Load NLU configuration         |                | $CONF \leftarrow CONF \ll K$ , else                |
| NNL                            | NNL Rd, Rs     | $Rd(7:4) \leftarrow ANF[Rs(7:4)]$                  |
| NLU non-linear operation       |                | $Rd(3:0) \leftarrow ANF[Rs(3:0)]$                  |
| NMU                            | NMU Rd, Rs     | $Rd \leftarrow M \times Rs$                        |
| NLU multiply operation         |                | $FIFO \leftarrow FIFO \ll M \times Rs$             |
| NMA                            | NMA s , Rd, Rs | $Rd \leftarrow M \times Rs + FIFO(s)$              |
| NLU multiply-and-add operation |                | $FIFO \leftarrow FIFO \ll [M \times Rs + FIFO(s)]$ |





- 64-bit block size, 80/128-bit key size
- Pure substitution-permutation network
- 4x4-bit S-box
- 31 rounds
- Secure against linear and differential cryptanalyses
- ISO standard!



```
: load ANF bits for the PRESENT S-Box
NLD 0, 0xB3
NLD 0, 0x92
NLD 0, 0x67
NLD 0, 0x0B
NLD 0, 0xDE
NLD 0, 0x43
NLD 0, 0x4A
NLD 0, 0x80
; perform non-linear S-Box operation
NNL r18, r18
NNL r19, r19
NNL r20, r20
NNL r21, r21
NNL r22, r22
NNL r23, r23
NNL r24, r24
NNL r25, r25
```



```
y_{\{63...56\}}
                       a_{\{63,59,55,51,47,43,39,35\}}
y_{\{55...48\}}
                    a_{\{31,27,23,19,15,11,7,3\}}
y_{\{47...40\}}
                       a_{\{62,58,54,50,46,42,38,34\}}
                       a_{\{30,26,22,18,14,10,6,2\}}
y_{\{39...32\}}
y_{\{31...24\}}
                       a_{\{61,57,53,49,45,41,37,33\}}
y_{\{23...16\}}
                       a_{\{29,25,21,17,13,9,5,1\}}
y_{\{15...8\}}
                       a_{\{60,56,52,48,44,40,36,32\}}
y_{\{7...0\}}
                       a_{\{28,24,20,16,12,8,4,0\}}
```





**APPLICATIONS: PRESENT** 

 $Y_7 = M_{00}A_7 \oplus M_{01}A_6 \oplus M_{02}A_5 \oplus M_{03}A_4$   $Y_6 = M_{00}A_3 \oplus M_{01}A_2 \oplus M_{02}A_1 \oplus M_{03}A_0$   $Y_5 = M_{10}A_7 \oplus M_{11}A_6 \oplus M_{12}A_5 \oplus M_{13}A_4$   $Y_4 = M_{10}A_3 \oplus M_{11}A_2 \oplus M_{12}A_1 \oplus M_{13}A_0$   $Y_3 = M_{20}A_7 \oplus M_{21}A_6 \oplus M_{22}A_5 \oplus M_{23}A_4$   $Y_2 = M_{20}A_3 \oplus M_{21}A_2 \oplus M_{22}A_1 \oplus M_{23}A_0$   $Y_1 = M_{30}A_7 \oplus M_{31}A_6 \oplus M_{32}A_5 \oplus M_{33}A_4$  $Y_0 = M_{30}A_3 \oplus M_{31}A_2 \oplus M_{32}A_1 \oplus M_{33}A_0$ 



```
; state is in registers r18 to r25
; write M03 to NLU
NLD 0, 0x00
NLD 0, 0x80
NLD 0, 0x40
```

```
; temporary registers r10, r11
 NMU r10, r21
 NMU r11, r25
 ; write M02 in NLU
 NLD 0, 0x00
 NLD 0, 0x00
 NMA 2, r10, r20
 NMA 2, r11, r24
 ; write M01 in NLU
 NLD 0, 0x00
 NLD 0, 0x00
 NMA 2, r10, r19
 NMA 2, r11, r23
 ; write M00 in NLU
 NLD 0, 0x00
 NLD 0, 0x00
 NMA 2, r10, r18
 NMA 2, r11, r22
 ; write M13 in NLU
 NLD 0, 0x40
 NLD 0, 0x04
```



**APPLICATIONS: AES** 



- 128-bit block size,128/192/256-bit key size
- Substitution-permutation network
- 8x8-bit S-box
- 10/12/14 rounds
- Secure against linear and differential cryptanalyses
- NIST standard!



**APPLICATIONS: AES** 



RESULTS

- Hardware unit synthesized in UMC 90 nm low-leakage
   Faraday library
- Area cost:
  - 1752 GE
- Power consumption:
  - 28.59 uW @ 100 KHz



**RESULTS** 

- Hardware unit synthesized in UMC 90 nm low-leakage
   Faraday library
- Area cost:
  - 1752 GE
- Power consumption:
  - 28.59 uW @ 100 KHz

Low-cost!!!



RESULTS

#### Performance results

| Implementation   | Number<br>of Clock<br>Cycles | Flash<br>Memory<br>Utilization | Time-Area<br>Product (TAP)<br>(cycles·bytes) | TAP<br>Gain<br>(%) |
|------------------|------------------------------|--------------------------------|----------------------------------------------|--------------------|
| PRESENT (LUT)    | 10792                        | 660 bytes                      | $7.1 \times 10^{6}$                          | 0                  |
| PRESENT (NLU)    | 6017                         | 406 bytes                      | $2.4 \times 10^{6}$                          | 66                 |
| CLEFIA (compact) | 42124                        | 2170 bytes                     | $91.4 \times 10^{6}$                         | 0                  |
| CLEFIA (fast)    | 28684                        | 3046 bytes                     | $87.4 \times 10^{6}$                         | 4                  |
| CLEFIA (NLU)     | 15268                        | 1912 bytes                     | $29.2 \times 10^{6}$                         | 68                 |
| SERPENT (ANF)    | 49314                        | 7220 bytes                     | $356.0 \times 10^{6}$                        | 0                  |
| SERPENT (LUT)    | 106338                       | 2620 bytes                     | $278.6 \times 10^{6}$                        | 22                 |
| SERPENT (NLU)    | 45431                        | 2960 bytes                     | $134.5 \times 10^{6}$                        | 62                 |
| AES (LUT)        | 3159                         | 1570 bytes                     | $4.96 \times 10^{6}$                         | 0                  |
| AES (NLU)        | 2826                         | 1402 bytes                     | $3.96 \times 10^{6}$                         | 20                 |



#### **Overview**

- Lightweight and Pervasive Devices
  - Which Devices and Applications?
  - Security Need
- Standard/Lightweight Cryptography
  - Current Solutions
  - Software-oriented Solutions?
- Instruction Set Extension: NLU
  - ISE Model
  - Hardware Unit
  - Applications
- Conclusion and Future Directions



### **Conclusion and Future Directions**

- A generic instruction set extension for lightweight block ciphers
- Extremely simple and very low-cost design
- Improves software implementations of lightweight block ciphers
  - Time-area product reductions of 20-70%
- Modular architecture allows it to be used in 4, 16, 32, 64-bit
   CPUs
  - Possible to go for 32-bit microprocessors in future
- Performance results of more ciphers to be added



# **Thanks for Listening!**

Any Questions?

elif.kavun@rub.de