Understanding and Securing TLS

TLSeminar

Project Presentations Schedule
Team Grapefruit, Wed, 26 Apr 2017

Here is the schedule for the project presentations for Friday, 28 April. Each multi-person team will have up to 20 minutes to present (which should include at least 5 minutes for questions, so plan on a 15-minute or shorter presentation), and each single-person team will have up to 15 minutes to present (which should include at least 3 minutes for questions, so plan on a 12-minute or shorter presentation).

9:20-9:30amBreakfast
9:30-9:50amLine Of Trust (Anant Kharkar, Sam Havron, Bill Young, Joshua Holtzman)
9:50-10:05amSoK: Software Testing Methods Applied to SSL/TLS: Lessons in Discovering Implementation Bugs (Darion Cassel)
10:05-10:20amSSLTimer: Testing an SSL Implementation with respect to Timing Attack Vulnerability (Yuchi Tian)
10:20-10:40amMinimal TLS (Tianyi Jin, Cyrus Malekpour, Bhuvanesh Murali, Daniel Saha)
10:40-10:50amBreak
10:50-11:10amQuantum Menace (Reid Bixler, Collin Berman)
11:10-11:25amSSL Skimmer (Adam Imeson)
11:25-11:40amTIE: TLS Invariants Exploration (Ben Lowman)
11:40am-noonDecentralizedCA (Bargav Jayaraman, Hannah Li)

TLS Outside The Web
Team Pineapple, Sun, 23 Apr 2017

As the web evolves, taking on physical form with embedded devices, or becoming more abstracted with cloud computing, the question of how to secure these new kinds of connections becomes paramount. Can TLS work for non-web scenarios?

Sizzle and DTLS improve TLS performance in an attempt to make the protocol feasible on embedded devices with limited power and bandwidth - but sacrifice some security in the process. Developers of non-browser software like Cloud services, mobile apps, and payment transfer services misunderstand or intentionally disable vital security measures - and SSL APIs fail to communicate how to secure connections and implement authentication. Meanwhile, Amazon Web Services offers a heavily centralized alternative to securing the Cloud and IoT devices - but potentially at a high privacy cost.

Sizzle: TLS For Embedded Devices

Vipul Gupta, Matthew Millard, Stephen Fung, Yu Zhu, Nils Gura, Hans Eberle, Sheueling Chang Shantz. Sizzle: A Standards-based end-to-end Security Architecture for the Embedded Internet. Pervasive and Mobile Computing, 2005.

Sizzle is an end-to-end security architecture for embedded devices that is fully implemented on the 8-bit Berkeley/Crossbow Mica2 “mote” platform. Capable of completing an abbreviated SSL (TLS 1.0) handshake in less than 4 seconds, this mote devices were envisioned to function as sensors/actuators connected to the wireless network. When this paper was published (2005), Sizzle on “mote” was the world’s smallest and least resource-intensive secure web server.

With the goal of allowing highly constrained embedded devices to offer secure connections, the authors of this paper uses 160-bit elliptic curve cryptography (ECC) to reduce computation for the system. When using ECC over RSA, there is a 4x speedup (or 5x speed up with session reuse) in computation speed of public/key cryptography. The figures 1 (full handshake) and 2 (abbreviated handshake) below show the comparison of the amount of time it took to transfer 450 bytes of data over HTTPS.

Authentication for the Internet of Things

Thomas Kothmayr, Corinna Schmitt, Wen Hu, Michael Brünig, Georg Carle. DTLS based security and two-way authentication for the Internet of Things. Ad Hoc Networks. May 2013.

Datagram TLS, also known as DTLS, is for all intents and purposes a “lightweight” version of TLS that can operate on devices that aren’t as powerful as traditional computers. Hence, it was a great addition for the internet of things devices as that revolution took off. According to its specification, it provides the same security guarantees as TLS except it is a stream-oriented service. It, however, does come with a downside. Namely, there is an overhead: tolerance for packet reordering and datagram loss. The former is pretty self explanatory, the latter is essentially stating that if the data is larger than the size of the datagram, it is possible that there be a data loss. Unlike, TLS, there are much fewer libraries that support this protocol, as can be seen in the graphic below:

Source: https://en.wikipedia.org/wiki/Datagram_Transport_Layer_Security

So, we know of the following encryption techniques. When you have an access point and say a printer, that have a pre-shared key, we use WPA as a communication medium. Similarly, when we need secure connection between a sensor and a security end-point we use VPN, but what about a secure connection using TLS among the internet of things. That is the primary problem that DTLS seeks to solve. Since TCP and TLS incurs an overhead small, battery-starved, low-bandwidth devices, we choose to look at DTLS which uses a less strenuous UDP protocol and a lighter version of TLS, but still maintains the security guarantees.

Earlier, we discussed Sizzle, but it was important to notice that sizzle only provides one-way authentication between the client and the node. That is say you have a thermostat, this would allow someone to change your thermostat, but wouldn’t allow them to read the information. This is not exactly the security that we want.

Let’s take a closer look at DTLS. The image below depicts a DTLS record. This looks similar to the record that we have seen with TLS and so there doesn’t seem to be too many causes for concern or intrigue about this structure. To its right, we have the handshake protocol. The main thing to notice here is that we have an extra, optional ClientHello and ClientHelloVerify before the rest of the protocol, which matches with what we’ve seen all semseter with TLS.

Source: DTLS based security and two-way authentication for the Internet of Things

This security protocol is an application-layer security protocol. The reasons for using this are that lower-layer security protocols do not provide end-to-end encryption and the secure connections must be established to form a mesh network. This is better than routing algorithms that are agnostic of the payload protection, which means that you may have data sent over a non-secured connection or node.

Finally, we can see how the proposed system actually works. This system uses a system of publishers and subscribers. Each of these can be considered the entities in the secured network. There is an access control server which stores teh access rights for all teh publishers (or motes). To intiialize any connection, the subscriber is verified by access control and is granted a ticket. The subscriber then presents this ticket to the publisher and the publisher verifies its legitimacy setting up the connection. In the larger scheme of things, the connection may look like:


Source: DTLS based security and two-way authentication for the Internet of Things

So overall, when looking at the internet of things, we need to be wary of achieving performance, energy consumption, memory and other limiting factors that the hardware has in order to achieve some level of security. Through DTLS, the researchers presented they were able to create a system that was fairly secure and was able to experience speedups of up to 163 milliseconds based on the protocol and encryption being used. Overall, we are able to see that creating a secure network for the internet of things is slowly manifesting into a reality.

The Most Dangerous Code In The World

Martin Georgiev, Subodh Iyengar, Suman Jana, Rishita Anubhai, Dan Boneh, Vitaly Shmatikov. The Most Dangerous Code in the World: Validating SSL Certificates in Non-Browser Software. ACM CCS 2012.

SSL verification for non-browser applications is nearly always compromised. These applications don’t implement SSL directly, but rather use libraries or wrappers which can have various flaws and vulnerabilities. These added vulnerabilities exist for a variety of reasons, but many of them are introduced because of a lack of security understanding in app developers and poor APIs. Rather than high-level security properties of network tunnels, many APIs divulge low-level details of SSL protocol to app developers. In many cases, options values associated with API functions are misinterpreted by app developers, leading to security risks. In other cases, attempts to fix certificate validation bugs can lead to further insecurity, potentially leading the app developer to accidentally break or disable the certificate validation entirely.

While most apps should do chain-of-trust and hostname verification, few do either. Chain-of-trust is often bypassed altogether, and hostname verification is often done via Common Name instead of SubjectAltName as is recommended by RFC 2818. Many applications also do not have a robust system for checking certificate revocation or allowing the user to do so. OpenSSL provides chain-of-trust verification, but requires that applications do their own hostname verification, while JSSE may or may not do either. This dependency on app developers to understand and implement potentially complex verification is unreasonable and leads to security vulnerabilities.

To avoid shop-for-free attacks, many vendor websites use SSL tunnels to communicate sensitive payment information, and these tunnels are often not set up securely. Attacks were attempted in a controlled environment on these tunnels using self-signed certificates and incorrect Common Name certificates. In Amazon’s flexible payment service, false was sent to cURL in a parameter instead of 2. In a bug fix, this parameter was changed to true, which also bypasses Common Name checks and causes vulnerability to MitM attacks.

One issue for app developers is needing to test their applications before making them live, and for those tests they either need to get valid localHost certificates (which aren’t given out) or they have to disable their certificate validation. Developers often do not enable the validation afterward and “disabling proper certificate validation appears to be the developers’ preferred solution to any problem with SSL libraries.”

App developers should attempt testing using abnormal and invalid certificates. They should also protect themselves by not relying on libraries and not disabling certificate validation during testing. SSL library developers should also attempt to be more explicit about their functionality and should avoid giving responsibility for validation to app developers.

AWS IoT Security Overview

Amazon Web Services (AWS) is the largest cloud service provider in the world, offering a wide range of products from computing and network resources to game development services. One of their newer offerings, AWS IoT, enables developers to link internet-connected devices to the cloud.

The use case for this service is the Internet of Things. Physical sensors in our homes, offices, and cities collect data from their surroundings. Software can then analyze this data and leverage actuators to effect changes in the environment. For example, smart thermostats use temperature sensors to decide whether to heat or cool our houses. These thermostats can learn our daily routines, allowing them to control the temperature more efficiently than classical programmable thermostats. Furthermore, since the thermostats are connected to the internet, homeowners are able to remotely control their home temperature from their mobile phone.

AWS IoT provides application developers a protocol, Message Queue Telemetry Transport (MQTT), for communication between Internet-connected things and the cloud. Sensors use MQTT to publish data to a message broker, which passes on the sensor readings to subscribing smart appliances and other devices. These messages are also processed by a rules engine, allowing developers to interface with other AWS services, such as their storage and data processing solutions.


Source: Amazon

All these messages to and from AWS IoT need to be secured. To ensure privacy, TLS is used to encrypt all MQTT traffic between devices and the message broker. Devices are authenticated using one of three identity types. First, clients can use the standard X.509 certificates used in HTTPS, which are authenticated using challenge-response TLS Client Authentication. AWS IoT also allows clients to use two AWS-specific identity types: IAM roles, and Amazon Cognito Federated Identities.

Even after a device has authenticated itself to the message broker, it is only allowed to execute an operation if it has been given the appropriate permission. An entity’s permissions are specified by AWS IoT policies, which are attached to the entity’s identity (certificate, IAM role, or Cognito identity). If an entity needs access to AWS services outside of IoT, IAM policies are used instead of IoT policies. These policies provide developers fine-grained control of authorization.

AWS End-To-End Hardware Security

Brandom Lewis. AWS, Microchip deliver trust anchor for end-to-end IoT security. Embedded Computing Design, 7 October, 2016.

Digital certificates are ingrained in almost every aspect of our digital lives. Signed by Certificate authorities (CAs), they help authenticate the identity of parties involved in the electronic exchange of information, preventing potential threats such as MITM attacks. With the Internet of Things (IoT) introducing billions of clients that communicate with the cloud in a two-way fashion, the requirement for mutual authentication of both clients and servers has steadily increased nowadays.

To solve the problems related to certificate provisioning for client-side devices, Amazon Web Services (AWS) released “Use Your Own Certificate” feature this April. It allows original equipment manufacturers (OEMs) to register digital certificates signed by a third-party CA with the AWS IoT platform using an API, which, could happen even before the devices come online. This provides a new possibility for OEMs to generate cryptographic keys for device during its production period. The latest capability in the AWS IoT portfolio, Just-In-Time Registration (JITR), is also partly based on such process. As the term implies, devices can automatically connect to and be recognized by the AWS IoT cloud the first time they request service from the platform. By ensuring them being pre-equipped with unique, trusted private keys and correct server configurations and policies, such immediate, autonomous onboarding of devices with cloud services can maintain secure despite the large numbers of connections.

Fortunately, good news have arrived from hardware side of IoT spectrum as well. One example is ECC508A, a 2 mm x 3 mm tamper-resistant CryptoAuthentication device based on elliptic curve Diffie-Hellman (ECDH) algorithms. The architectural features of the chip include internally encrypted memory, isolated power rails, a memory and logic shield, internal clock generation, and a lack of probe points to protect against different kinds of real-world threats. It also improves overall system performance when TLS transaction runs on hardware-based crypto accelerator. With the ECC508A acting as the root of trust, Microchip can act as a third-party CA, signing device certificates in an offline verification process. Then OEMs are able to access robust security and automatic cloud authentication by adding a single component to their bill of materials (BoM).

Security is still essential to the continued rollout of IoT in every market. Device makers now can separate security from business functions and continue to work upon that. The ECC508A evaluation kit has been available on the market, which must excite those who deeply concern and care about the future of cyber security.

Conclusion

As we have seen, TLS is being widely used outside of the web in embedded devices, the Internet of Things, Amazon Web Services, and many others. The problem, however, is how these devices choose to implement their own versions of TLS and the problems that ensue. We have seen interesting applications of TLS that seem to also fundamentally break the intended security inherent in the protocol. Many of the problems are actually just based on laziness in the production of these devices, such as pinning a single certificate to every type of a device.

Besides TLS, alternative protocols have been designed for minimal data use and enhanced security throughout all communication between these devices. Because of the limited resources provided by embedded devices and IoT devices, many of the protocols must consider how much CPU, battery, and memory will be used at a time.

As time goes on, more and more of these embedded devices will be in our homes and cities, and there is the very real possibility of our own physical safety being in harm’s way due to a compromised device. Internet enabled doors, heating systems, windows, safes, and more could hold the potential of being locked, overused, and unlocked to cause danger to their owner. While TLS never was designed with the idea outside of web use, we must begin to consider alternative protocols or improve current ones to ensure the security of our technology outside of our browsers.

TLS 1.3
Team Mango, Fri, 24 Mar 2017

TLS Evolves: Version 1.3

TLS v1.3 is a major revision to TLS to simplify the protocol, and improve its security and performance. In order to get a good understanding of TLS v1.3 and where it is heading in the future, we will first look at where TLS has been.

Looking Backward: Retro TLS

SSL/TLS has a storied past. SSL v1.0 was never released. Netscape, the company that originally developed SSL, circulated it internally but decided not to release it to the public because it had several flaws including a lack of data integrity protection.

After that non-starter, the timeline looks like this:

A more thorough timeline can be found at SSL/TLS and PKI History.

So what does TLS v1.3 bring to the table? Let’s take a look…

Faster Handshake

TLS 1.3 introduces a significantly slimmer Handshake Protocol than previous versions. In order to understand the implications of these changes, we first review the Handshake Protocol used in TLS 1.2.

In TLS 1.2, the client begins the handshake with a Client Hello, followed by a Server Hello response from the server. The client then proceeds with a Client Key Exchange and Client Finished; the server responds with its own versions of these messages.

In contrast, TLS 1.3 incorporates the key share messages with the Client/Server Hello, meaning that each side of the connection has to send one less message (and only send one message total to initiate the connection).

TLS 1.2 Handshake
TLS 1.2 Handshake

TLS 1.3 Handshake
TLS 1.3 Handshake

AuthLoop: End-to-End Cryptographic Authentication for Telephony over Voice Channels, Bradley Reaves, Logan Blue, and Patrick Traynor. USENIX Security Symposium. August 2016.

On the subject of modification and adaptations to the typical TLS 1.2 system of encryption and authentication, we explore AuthLoop: a TLS-style authentication protocol specifically designed for telephony networks. In this domain, the system must connect three different types of telephony networks: cellular, VoIP, and PSTN. However, the TLS Handshake transmission speeds for such a system were extremely slow - averaging 98 seconds per handshake - which is completely infeasible for most phone calls. AuthLoop keeps the authentication and shared secret elements of TLS and a freshness/liveness component analogous to the Heartbeat Protocol. On the other hand, AuthLoop removes RSA and the cipher agreement messages. Furthermore, AuthLoop does not encrypt messages and therefore has no Record Protocol. After slimming down, the average transmission time reduced drastically to 4.8 seconds.

0-RTT Resumption

A major new feature in the TLS 1.3 draft is support for 0-RTT session resumption. In TLS 1.2, establishing a connection to a new server required at least 4 trips between the server and client to make an HTTP request and receive a response. With a session-ID or session ticket, that could be reduced to 3 trips per connection. TLS 1.3 by default reduces the number for new connections to only 3 trips per connection, but also adds support for a new mode termed 0-RTT. In this mode, resumed HTTPS connections require only 2 trips, which is the bare minimum required for a full HTTP query and response. In this mode, TLS 1.3 adds barely any additional latency cost over a plain HTTP request!

0-RTT
TLS 1.3 0-RTT (Source: https://blog.cloudflare.com/tls-1-3-overview-and-q-and-a/)

However, the addition of 0-RTT resumption to the protocol has an important implication for the security features provided by the protocol. Because TLS 1.3 session tickets, which enable 0-RTT resumption, are stateless on the server, such requests from the client are trivially vulnerable to replay attacks. An attacker who can intercept an encrypted client message can re-send it to the server, tricking the server into processing the same request twice (which could be serious, for example, if the request is “transfer $x to Bob”).

To remedy this, the protocol authors recommend that initial requests from the client be idempotent, or non-state-changing. Servers should not allow the first request to be idempotent in 0-RTT mode. This has been arguably the most controversial part of the new standard, as it puts the onus on some higher level protocol to solve a problem that TLS has historically been responsible for. Even worse, it is not solved directly by HTTP but rather must be specifically kept in mind by web developers.

Deployment

Deployment of TLS 1.3 remains loosely in the future as the protocol specification finishes its final draft. Current TLS 1.3 drafts include 0-RTT by requiring servers to set up a profile that defines its use. However, as with many other features in earlier TLS protocols, 0-RTT data is not compatible with older servers. A server using TLS 1.3 has the option to limit what early data to use in a 0-RTT and what to buffer.

Data Center use of Static Diffie-Hellman: While ephemeral (EC) Diffie-Hellman is in nearly all ways an improvement over the TLS RSA handshake, it has a limitation in certain enterprise settings. Specifically, the use of ephemeral (PFS) ciphersuites is not compatible with enterprise network monitoring tools such as Intrusion Detection Systems (IDS) that must passively monitor intranet TLS connections made to endpoints under the enterprise’s control. Such monitoring is ubiquitous and indispensable in some industries, and loss of this capability may slow adoption of TLS 1.3.

Deployment of TLS 1.3 across the web faces several industry concerns, most notably regarding Static RSA (no forward secrecy), as posted from an email exchange between Andrew Kennedy, an employee at BITS (the technology policy division of the Financial Services Roundtable http://www.fsroundtable.org/bits), and Kenny Paterson.

Andrew Kennedy writes,

… While I am aware and on the whole supportive of the significant contributions to internet security this important working group has made in the last few years I recently learned of a proposed change that would affect many of my organization’s member institutions: the deprecation of RSA key exchange.

Deprecation of the RSA key exchange in TLS 1.3 will cause significant problems for financial institutions, almost all of whom are running TLS internally and have significant, security-critical investments in out-of-band TLS decryption.

Like many enterprises, financial institutions depend upon the ability to decrypt TLS traffic to implement data loss protection, intrusion detection and prevention, malware detection, packet capture and analysis, and DDoS mitigation.

Kenny’s response: (excerpted from https://www.ietf.org/mail-archive/web/tls/current/msg21278.html)

Hi Andrew,

My view concerning your request: no.

Rationale: We’re trying to build a more secure internet.

Meta-level comment:

You’re a bit late to the party. We’re metaphorically speaking at the stage of emptying the ash trays and hunting for the not quite empty beer cans.

More exactly, we are at draft 15 and RSA key transport disappeared from the spec about a dozen drafts ago. I know the banking industry is usually a bit slow off the mark, but this takes the biscuit.

Cheers,

Kenny

Anti-Downgrade Prevention and Detection

Downgrade resilience in key-exchange protocols by Karthikeyan Bhargavan, Christina Brzuska, Cédric Fournet, Markulf Kohlweiss, Santiago Zanella-Béguelin and Matthew Green in IEEE Symposium on Security and Privacy (SP), 2016.

TLS 1.2 suffers from various downgrade and man-in-the-middle attacks like Logjam, FREAK and POODLE. Logjam exploits the option of using legacy “export-grade” 512-bit Diffie–Hellman groups in TLS 1.2. It forces susceptible servers to downgrade to cryptographically weak 512 bit Diffie-Hellman groups, which could then be compromised. FREAK is a man-in-the-middle attack that affects the OpenSSL stack, the default Android web browser, and some Safari browsers. It tricks servers into negotiating a TLS connection using cryptographically weak 512 bit encryption keys. POODLE exploits vulnerability in SSL 3.0 but is applicable to TLS 1.2 once the attacker performs version rollback to SSL 3.0 through a man-in-the-middle attack.

The above problems can be countered using correct downgrade protection. While TLS 1.2 does implement downgrade protection, it fails to do so correctly. Downgrade protection requires sending MAC of finished messages between client and server to ensure that the negotiated parameters have not be modified by a MITM attacker. TLS 1.2 does not hash all the negotiated parameters in its MAC allowing the attacker to alter the non-hashed parameters and launch downgrade attacks. TLS 1.3 fixes this issue by hashing all the parameters and also isolates TLS 1.2 or lower version messages (which have downgrade resilience issues) by requiring the TLS 1.3 server to set first N bits of its ServerRandom nonce to a fixed value on recieving ClientHello message from a TLS 1.2 or below client. This signals the TLS 1.3 clients and they reject any packet that has the fixed value sequence.

Downgrade Resilience in Key-Exchange Protocols

Downgrade protection primarily relies on the MACs in the finished messages, which in turn rely on the strength of the group and the negotiated algorithms and hash. If a client and server support a weak group, then an attacker can downgrade the group and break the master secret to forget the MACs, as in Logjam.

The figure below shows the faulty downgrade resilience of TLS 1.2, where the TLS 1.2 server fails to hash the negotiated parameters like protocol version (v), chosen parameters (a_R) and server identity (pk_R) in its hash message hash_1(.) (see subfigure (b) of the below figure).

Downgrade Protection in TLS 1.2
TLS 1.0 - 1.2 with (EC)DHE key exchange (a), where messages labeled with * occur only when client authentication is enabled, and (b) its downgrade protection sub-protocol
Source: https://eprint.iacr.org/2016/072.pdf

Draft 10 of TLS 1.3 implements the following downgrade protection mechanism which rectifies the above mistake and consequently hashes all the negotiated parameters. Notice the hash_1(H(m_1, m_2, -)) in the message sent by server (subfigure (b) in the figure below), which hashes all the negotiated parameters in m_2.

Downgrade Protection in TLS 1.3 Draft 10
TLS 1.3 1-RTT mode with server-only authentication (a) and its downgrade protection sub-protocol (b)
Source: https://eprint.iacr.org/2016/072.pdf

However, there are three downgrade attacks possible on TLS 1.3 as described in Draft 10. One, an attacker downgrades the connection to TLS 1.2 or lower and mounts any of the downgrade attacks mentioned before. This will succeed as long as the attacker can forge the finished MACs. Second, an attacker uses the TLS fallback mechanism to stop TLS 1.3 connections and allows only TLS 1.2 connections to go through. Even if the end points implement the fallback protection mechanism, the attacker can use one of the downgrade attacks in TLS 1.2 to break the connection. Third, in Draft 10 of the TLS1.3 protocol, the handshake hashes restart upon receiving a Retry message and hence, the attacker can downgrade the Diffie-Hellman group for some classes of negotiation functions.

TLS 1.3 draft 11 counters the above three attacks by incorporating two countermeasures. First, TLS 1.3 protocol continues the handshake hashes over retries (subfigure (a) of the figure below). Second, TLS 1.3 servers always include their highest supported version number in the server nonce, even when they choose a lower version such as TLS 1.0. Draft 11 of TLS 1.3 fixed the issue by requiring TLS 1.3 server to set top N bits of the ServerRandom to be a specific fixed value on receiving ClientHello message from a TLS 1.2 or below client. TLS 1.3 clients which receive a TLS 1.2 or below ServerHello check for this value and abort if they receive it. The figure below shows the client check using verifyVersion functionality. This allows for detection of downgrade attacks over and above the Finished handshake as long as ephemeral cipher suites are used. This prevents attacks targeted at (EC)DHE.

Downgrade Protection in TLS 1.3 Draft 11
TLS 1.3 Draft 11 Update on Downgrade Resilience in Key-Exchange Protocols
Source: https://eprint.iacr.org/2016/072.pdf

The TLS 1.3 server will send a ServerHello message in response to a ClientHello message when it is able to find an acceptable set of algorithms and the client’s key_share extension is acceptable. If it is not able to find an acceptable set of parameters, the server will respond with a handshake_failure fatal alert. The ServerHello message contains server’s random value which incorporates downgrade protection mechanism. If a ClientHello indicates only support for TLS 1.2 or below, then the last eight bytes of server’s random value MUST be set to: 44 4F 57 4E 47 52 44 01. If a ClientHello indicates only support for TLS 1.1 or below, then the last eight bytes of server’s random value SHOULD be set to: 44 4F 57 4E 47 52 44 00. TLS 1.3 clients are required to check the above values in the random field of server responses.

Authenticated Encryption

Up until now, we’ve mostly concerned ourselves with the “MAC-Encode-Encrypt” (MEE) packet construction method. In a nutshell, MEE follows three steps:

  1. Calculate a MAC over the payload
  2. Append the MAC and an appropriate amount of padding to the payload
  3. Encrypt the modified payload to generate a ciphertext

As we’ve discussed in previous classes, the CBC mode of operation has its downsides; adversaries can break encryption by utilizing padding oracle attacks, since padding can only exist in a handful of values and lengths. Moreover, it’s impossible to actually verify the integrity of the ciphertext until the MAC has been revealed by decrypting the ciphertext. The duration required to decrypt the tampered ciphertext and validate the MAC leaks sensitive (and potentially useful) timing information to adversaries.

Encrypt-then-MAC

In general, MEE is inferior to its cousin, “Encrypt-then-MAC” (ETM). In ETM, as opposed to MEE, the plaintext is encrypted before the MAC is calculated. Intuitively, it makes sense that ETM is more secure—any tampering of the ciphertext is immediately evident when the MAC is calculated, therefore no decryption takes place (and no timing information is leaked). Additionally, assuming the ciphertext appears random, the MAC also appears random and reveals no information about the underlying ciphertext.

Galois/Counter Mode (GCM)

Before we jump on the ETM bandwagon, however, let’s take a look at yet another mode of operation, Galois/Counter Mode (GCM). GCM is an authenticated encryption algorithm that provides confidentiality and integrity, and does so extremely efficiently.

GCM
Galois/Counter Mode (credit: Wikipedia)

GCM At-A-Glance

  1. Sequentially number blocks
  2. Encrypt block numbers with block cipher E
  3. XOR result of encryption with plaintext to produce ciphertext
  4. Combine ciphertext with authentication code to produce authentication tag

The authentication tag can be used to verify the integrity of the data upon decryption, similar to an HMAC. If this “counter mode” of authenticated encryption seems superior, that’s because it is! TLS 1.3 only provides support for GCM, CCM, and ChaCha20-Poly1305, another authenticated encryption mode of operation. Say goodbye to MAC-then-encrypt.

TLS v1.3 Removals

An overview of TLS 1.3 and Q&A

In TLS v1.3, everything was scrutinized for being really necessary and secure, and scrapped otherwise. In particular, the following things are removed:

Formal Verification

TLS 1.3 is the first revision of the TLS protocol to incorporate formal verification during development. Cas Cremers, Marko Horvat, Sam Scott, and Thyla van der Merwe’s paper, Automated Analysis of TLS 1.3: 0-RTT, Resumption and Delayed Authentication, provides a recent (February 2016) description of the challenges and results of such an analysis. In the blog post associated with their work, the authors contextualize their verification efforts:

The various flaws identified in TLS 1.2 and below, be they implementation- or specification-based, have prompted the TLS Working Group to adopt an “analysis-before-deployment” design paradigm in drafting the next version of the protocol. After a development process of many months, the TLS 1.3 specification is nearly complete. In the spirit of contributing towards this new design philosophy, we model the TLS 1.3 specification using the Tamarin prover, a tool for the automated analysis of security protocols.

The authors are able to prove that revision 10 of the specification meets the goals of authenticated key exchange for any combination of unilaterally or mutually authenticated handshakes. Further, the authors discovered a new, unknown attack on the protocol during a PSK-resumption handshake. The 11th revision of the protocol included a fix for this attack.

Protocol Model

The authors used the Tamarin prover for their analysis. Tamarin is an interactive theorem proving environment (similar to Coq) specially designed for the verification of protocols such as TLS. As TLS is already an abstract specification, encoding TLS 1.3 into the Tamarin specification language was relatively straightforward. “Rules” (functions) over this specification captured honest-party and adversary actions alike. The following state diagram depicts the client TLS state (as defined in Tamarin) and transitions between the states (Tamarin rules) for an entire session.

Partial client state machines for TLS 1.3 revision 10

Source: Automated Analysis of TLS 1.3

Proven Security Properties

The next step in the analysis involved encoding the desired security properties of TLS 1.3 as Tamarin lemmas. The authors encoded the following properties:

Each lemma must hold over its respective domain of states (a subset of the nodes in the client state machine above, for example). While proof assistants like Tamarin are capable of constructing simple proofs, a significant amount of manual effort was required to prove the enumerated lemmas. As such, a notable contribution of this work is the actual Tamarin proof artifact itself, not just what was and wasn’t proven. The authors claim their Tamarin abstractions and proofs were constructed with extensibility to future TLS development in mind.

Discovered Attack

While verifying the delayed authentication mechanism portion of the protocol, an attack was discovered which violated client authentication; an adversary is able to impersonate a client while communicating with the server.

Step 1. The victim client, Alice, establishes a connection with the man-in-the-middle attacker, Charlie. Charlie establishes a connection with Bob, the server which which Alice wishes to connect. A PSK is established for both connections, PSK_1 and PSK_2, respectively.

Client Authentication Attack: Step 1
Source: Automated Analysis of TLS 1.3

Step 2. Alice sends a random nonce, nc, to Charlie using PSK_1. Charlie reuses this nonce to initiate a PSK-resumption handshake with Bob. Bob responds with random nonce ns and the server Finished message using PSK_2. Charlie reuses ns and recomputes the Finished message for Alice using PSK_1. Alice Returns her Finished message to Charlie. Charlie then recomputes this Finished message for Bob using PSK_2.

Client Authentication Attack: Step 2

Source: Automated Analysis of TLS 1.3

Step 3. Charlie makes a request to Bob that requires client authentication. Charlie is thus prompted for his certificate and verification. This request is re-encrypted and forwarded to Alice. To compute the verification signature of this forwarded request, Alice uses the session_hash value, which is the hash of all handshake messages excluding the Finished messages. This session_hash value will match that of Charlie and Bob’s, and thus Charlie can re-encrypt Alice’s signature for Bob. Bob accepts Alice’s certificate and verification as valid authentication for Charlie.

Client Authentication Attack: Step 3

Source: Automated Analysis of TLS 1.3

The discovery of this attack is noteworthy in that it was completely unexpected by the TLS Working Group.

The fix, which forces the session_hash value to include Finished messages was even suggested in an official pull request, but was rejected.

The authors make a strong case that formal verification has been an extremely valuable part of the design process of TLS 1.3. The speed with which the fix was incorporated into subsequent protocol revisions suggests that the TLS Working Group shares this sentiment.

Two Oakland 2017 papers provide more reports on formal verification efforts for TLS 1.3, up through Draft 18:

Usable Security in TLS
Team Pineapple, Wed, 22 Mar 2017

Introduction

In 1999, Alma Whitten and Doug Tygar performed a usability analysis of PGP 5.0 called Why Johnny Can’t Encrypt. PGP Corp claimed its product “makes complex mathematical cryptography accessible for novice computer users”.

However, Whitten and Tygar’s evaluation, based on experiments with users, argued that PGP’s user interface design made computer security usable only for people who are “already knowledgeable in that area”. This does not seem to be a fair expectation for users. Today, around 40% of the entire world population use the internet. When the paper was published, that percentage was 4.1%. Thus, as the number of internet users has skyrocketed, the security community has become increasingly aware of how important it is to make security understandable and accessible to typical users.

Usability is, of course, important for any service that is a vital part of the lives of billions. However, for security in particular, there are higher stakes— confusing user interfaces can lead to sensitive data being exposed, from financial transactions to identification information. HTTPS users are vulnerable to significant, real-life risks when technical language abounds and browsers inundate them with too many warnings (many of which are false positives).

Humans are arguably the weakest link, so usability and communication must be taken into account in any model that expects the user to act safely. When a user interface fails to effectively communicate security consequences, it is unreasonable to blame an improperly informed user for exposing themselves to those dangers.

Users Are Not The Enemy

Users Are Not The Enemy, Anne Adams and Martina Angela Sasse. Communications of the ACM. December 1999.

Even perfectly written software can become vulnerable to attacks when the user does not understand its behavior. In Users Are Not the Enemy, Adams and Sasse discuss how users can both knowingly and unknowingly compromise computer security mechanisms such as password authentication. In authentication, few users knew how to or even understood the need to constitute a secure password. As a result, most of the passwords revolved around familiar and related phrases, words, or character patterns. Because the average person finds it difficult to remember passwords, most users rarely change the passwords and some prefer to record them in plaintext.

Users are the reason, however, not the true enemy. Without sufficient knowledge of hacking techniques, typical users understandably default to their own models of security threats and priorities. Meanwhile, security experts and developers do not always understand the users’ perceptions, tasks, or needs. Before quickly dismissing users as unknowing collaborators in cyberspace crime, we should educate them in why we are protecting their account. Consequently, they would be more motivated to practice security-conscious behaviors when they perceive a threat.

The Source Awakens

The Source Awakens. Presentation at USENIX Enigma 2016 by Matthew Smith.

We often see breaking news articles detailing the most recent hacking or the most recent password leak. These aren’t attacks focusing on specific users typically uneducated in cybersecurity principles but rather on firms’ products, that have been developed by computer scientists and engineers who are supposed to be experts in their fields. Now, if these developers are making mistakes, is it their lack of developing ability or is it their lack of understanding of the security issues? Considering the two, it’s best to assume the latter, as it is more likely. Therefore, developers are themselves making mistakes when they are not aware of the security issues latent in their system.

A story, which probably hits home for many software developers, is trying to Google an approach or solution to their problem or feature that they want to implement. The developer clicks on the first StackOverflow link and gets a solution, implements it, and keeps it as long as it apparently works. The developer now continues and goes about completing the task with no question to the validity of the code just implemented. What the developer doesn’t realize is that the code found and used may have serious security, such as this this code for self-signed certificate validation:

validation
Source: The Source Awakens

The code seems to work. But, upon closer inspection, it clearly doesn’t do anything to check certificates. When this program sees any and all certificates, it simply approves every single one, regardless of what is contained within. So, of course the code “works” when the developer incorporates into their program (at least it gives no apparent errors), but it presents a major security vulnerability. Similarly, developers may use some dummy certificate validator or self-signed certificates for testing, but then forget to remove them in the release.

Thus, usability is a security issue for developers using APIs and libraries, not just for end users using graphical interfaces.

Malware Analysis Tools

There are several decompilers that take binary code and output the decompiled code, such as HexRays (shown below). But, the generated code isn’t very readable to the average programmer, such as this example from the Simda malware domain generation code:

hexrays
Source: The Source Awakens

Smith discusses how his team created a new system called DREAM++ that can outputs more readable decompiled code, so we may get something like:

dream++
Source: The Source Awakens

By producing more readable decompiled code, both security experts as well as typical programmers were able to much more easily and quickly understand the decompiled programs.

Browser Warnings

Improving SSL Warnings: Comprehension and Adherence by Adrienne Porter Felt, Alex Ainslie, Robert Reeder, et Al. ACM Conference on Human Factors in Computing (CHI), 2015.

Security experts and programmers will always attempt to do what is best for their end-users, but the problem is that (very) often the end-user will circumvent any attempt at stopping whatever they intend to do. Looking at HTTPS specifically, the inherent security boils down to the end user’s choice of whether or not to proceed to a possibly unsafe webpage.

Therefore, it is the web browser’s responsibility to provide adequate notice to a user when their connection is not secure. Research conducted in 2015 by Google and the University of Pennsylvania found that reducing technical jargon in Chrome browser SSL warning and focusing on brevity and simplicity, resulted in an increase in comprehension of SSL warnings. Still, only 49% of the respondent’s could answer the following question correctly:

What might happen if you ignored this error while checking your email?

  • Your computer might get malware

  • A hacker might read your email

(Technically both are true, but realistically the user is supposed to understand that invalid HTTPS results in hackers being able to read your communications in plaintext)

The researchers also found that the improved design of browser warnings could influence up to 30% more users to go “Back to Safety” rather than “Proceed to Unsafe”. The new design is shown below:

chrome37
Source: Improving SSL Warnings: Comprehension and Adherence

If this seems familiar, it is because it was introduced in Chrome 37. This new design requires an extra click from a user to proceed to the potentially unsafe site. Notice that the unsafe option is a dark gray text link contrasted with a safe blue button.

Below, a blast from the past: SSL warnings from pre-2015 Chrome versions.

sslwarning
Source: Self Signed Certificates with Postman

Noticeably, this has ‘too much’ technical jargon and is neither short nor simple. Similarly the user can easily proceed to the site straight from the warning. Google’s research was meant to improve the user experience when seeing these warning to minimize the security risks taken by the end-user.

TLS Errors on the Web

Here’s My Cert, So Trust Me, Maybe? Understanding TLS Errors on the Web by Devdatta Akhawe, Bernhard Amann, Matthias Vallentin, & Robin Sommer. International World Wide Web Conference (WWW) 2013.

The end-user isn’t the only source of security issues when it comes to browser usability in TLS. The behavior of web browsers may not be studied as much as vulnerabilities in TLS, but it is almost just as dangerous in measure. Usually when browsers report TLS errors, they don’t offer distinguishing features between real attacks and benign errors; hence leaving the decision to the end-user of whether continuing is itself a security issue. On most occasions, errors will turn out to be “false positives”, which can include errors due to server misconfiguration, self-signed certificates (as mentioned previously), and name validation errors. As a result, end-users quickly train themselves to click through these warnings, regardless of the content, making it unlikely that they will pay enough attention when a real attack comes along.

This paper reports on a large-scale measurement study of common TLS warnings. A total of 11.5 billion SSL connections on all ports were captured over a nine-month period in their experiment. The warnings and errors produced were then categorized based on where they occurred during the verification: chain building errors, chain validation errors, and name validation errors. These correspond to the three separate steps of certificate validation of the Network Security Services (NSS) library, used by Firefox (and Chrome on Linux). The following is an algorithm that can translate the NSS responses into their categorization:

nssalgorithm
Source: Here’s My Cert, So Trust Me, Maybe?

The results of this experiment indicate a clear opportunity, and need, for reducing false warnings. A 1.54% false warning rate is unacceptable if we understand that benign scenarios are orders of magnitude more common than actual attacks.

Based on this analysis, a number of concrete recommendations are proposed to help browser vendors lower the risk of such unwanted habituation. For example, advocating the use of free TLS certificates via authorities like StartCom, using a more relaxed name validation algorithm that accepts multiple levels for an asterisk, or enabling AIA support or preload all known intermediate authorities in the browser cache. By implementing these changes, the complete usability of browsers for end-users will be significantly improved, therefore leading to a drastic decrease in attacks and security issues caused by users.

Rethinking Connection Security Indicators

Rethinking Connection Security Indicators by Adrienne Porter Felt, Robert Reeder, Alex Ainslie, Helen Harris, Max Walker, Christopher Thompson, Mustafa Emre Acer, Elisabeth Morant, and Sunny Consolvo. Twelfth Symposium on Usable Privacy and Security (SOUPS), July 2016.

By seriously considering the ramifications of the symbology we use to send messages to the users, we can work to develop more useful and informative messages. This section focused on the findings of a study conducted to figure out how users react to various symbols appearing in their browsers (most often in the URL field). The study found that a surprising number of users could identify one or more of the significant features in a good connection, but the majority of users could not identify issues with a bad connection or one not using HTTPS.

Only about 40% of the people in the study were able to identify an HTTP connection as insecure or what protocol was being used; many of the participants believed the symbol had to do with the favicon of the page or that the symbol included general information about the page such as permissions. A summary of the results of this research can be seen in the table below.

icons
Source: Rethinking Connection Security Indicators

The researchers attempted to identify the three symbols that would be the most effective for conveying information to the user. They then conducted more research regarding how users would interpret these symbols.

urls
Source: Rethinking Connection Security Indicators

results
Source: Rethinking Connection Security Indicators

The researchers were able to confidently determine that any sign of an error is interpreted to be essentially the same, regardless of color or shape. Secondly, they determined that even if users were shown to be on https or ‘secure’ sites, they still had a ‘healthy level of paranoia’ in only being ‘somewhat’ confident in the security of the site that they were currently accessing.

Based on these results, browsers need to start asking what dangers on the internet are. Is it worse to have a fault in one’s implementation of their certificates or to not have HTTPS connection at all? Browsers can also consider blacklists of sites that are known for putting user’s information or connections at risk. Browsers could also do more to put security in a more central or significant location, such as having a pop-up before every page with information on the connection type and associated risks. We need to become more aware of what messages we’re sending to the users via the particular symbols that we choose for varying types of connections.

Conclusion

We’ve covered several studies showing progress, and the need for more research, in the area of usable security. Despite the work done to bridge the gap between experts and users, there is a still a gulf between each group’s understanding of the other. In some cases, browser designers assume that users know what a certificate is, while users assume that the green lock in their URL bar indicates a website is free of malware.

The philosophical debate between informing users more effectively and simply manipulating them to make “correct” decisions is also unresolved. The major browsers have adopted a hybrid approach with TLS error warnings that both provide basic information on the error and require significant user effort to circumvent.

Such error warnings are commonplace, which raises another issue. Users tend to become accustomed to error messages (habituation), assuming them to be false positives for genuine threats, because they actually are false positives in the vast majority of cases. Benign server administration errors account for essentially all TLS warnings that users see. Browser manufacturers are engaged in efforts to reduce the amount of noise users experience while sifting through warnings of potential hazards.

Work at the intersection of security and human-computer interaction continues to progress, but more needs to be done to ensure the security of future internet users. We can’t eliminate people, the weakest links in the chain, so we must remember design systems around their fallibility. While security is always at the forefront of most security researchers brains, we must also begin to consider the users who will be interacting with these features as a part of the system itself.

TLS Interception and SSL Inspection
Team Poppyseed, Mon, 20 Mar 2017

The fact that “SSL inspection” is a phrase that exists, should be a blazing red flag that what you think SSL is doing for you is fundamentally broken. Compounding the problem are the mistakes that SSL inspection software authors are making.

Will Dormann (2015), Carnegie Melon Software Engineering Institute CERT/CC Blog

Recent History

TLS Interception, also referred to as SSL Inspection, is a topic that has been in the news in recent years and months. Back in 2014, researchers from Brigham Young University published a paper titled TLS Proxies: Friend or Foe? where they deployed a Flash application via Google Adwords campaign to identify client-server certificate mismatches across the web. They discovered a wide prevalence of adware, malware and TLS proxy products presenting certificates trusted by the client but not issued by the server – and in most instances acting in a negligent manner by introducing security vulnerabilities. One parental filter they tested replaced untrusted certificates with trusted ones, bypassing browser warning screens. This is exactly the type of passive attack HTTPS aims to prevent.

Vulnerabilities involving two advertising injectors, one of which was preinstalled on Lenovo PCs, were found to severely compromise the security of end users in February of 2015. Later that same year, German journalist Hanno Böck looked at three popular antivirus suites and found that all lowered security by either exposing end users to vulnerabilities like FREAK and CRIME or supporting less secure encrpytion algorithms.

In early 2017 researchers teamed up with Google, Mozilla, and Cloudflare for an internet-wide survey - The Security Impact of HTTPS Interception ( Zakir Durumeric, Zane Ma, Drew Springall, Richard Barnes, Nick Sullivan, Elie Bursztein, Michael Bailey, J. Alex Halderman, Vern Paxson; in NDSS 2017). TLS interception software was assessed based on how the TLS connection observed from the client differed from the TLS parameters advertised by the client. In all but two of the tested products, security was reduced, and in some cases serious vulnerabilities were introduced. Most recently in February of 2017, a Chrome 56 update took down almost a third of Montgomery County Public School’s 50,000 fleet of Chromebooks offline, because the school systems web filter, BlueCoat Proxy, did not properly handle TLS 1.3. When Chrome attempted to connect via TLS 1.3, the Bluecoat software abruptly terminated the connection, rather than negotiating for TLS 1.2.

How SSL/TLS interception works

SSL/TLS interception is performed by software on “middleboxes” located in between the client and HTTPS website or on the client’s machine, in the case of malware, anti-virus software, and ad injectors. Middlebox software has both legitimate and illegitimate use cases including proxies or content filters, antivirus suites, content cachers, advertising injectors, and malware.

Middlebox framework
Source: The Security Impact of HTTPS Interception (2017)

Middlebox proxy software relies on the client having previously installed a root certificate onto their operating system. Any outgoing SSL/TLS connections from the client are terminated and re-established by the proxy to the server, which acts as an in-the-middle attacker. In an ideal deployment, the proxy’s ClientHello mirrors the TLS parameters expressed in client’s ClientHello, to provide the same expected parameters to the client. The proxy can then inspect plaintext and establish a TLS connection back to the client using the installed certificate to circumvent browser warnings and silently intercept the connection between client and server.

Superfish

In 2015, there was in an incident involving Lenovo PC’s shipped with a preinstalled image advertisement optimizer developed by Superfish. Superfish used Komodia’s tool “SSL hijacker” to intercept HTTPS connections in order to gather image data for its ad optimization engine. Komodia’s tool is similar to to all SSL inspectors — it first installs root certificates on the client machine and then MITMs all TLS connections to HTTPs websites, issuing the preinstalled Komodia certificate to the client instead of the target HTTPS server’s certificate to bypass browser warnings.

To enable it to generate trusted (by the browser based on the new root CA key installed) certificates for any website the user connects to, it needed to generate new certificated on-the-fly, so the private signing key for the root CA needed to be stored on the user’s device. This means that the private key for the certificate was visible in the software and could be trivially extracted by the end user. In addition, Komodia used the same private key for every machine running Superfish. It didn’t take long for security researcher Robert Grahm to crack the password for the private key (hint: it was ‘komodia’). With this key, an adversary could MITM any client running Superfish on their laptop by using using a copy of this hardcoded certificate. To compound this, users were not alerted to the presence of Superfish software on their new Lenovo laptops.

Komodia released a security notice saying they fixed the issue by updating the software to create unique certificates per installation and randomly generated passwords. They also addressed other potential vulnerabilities such as updating their list of supported cipher suites and verifying certificate revocation statuses (they support OCSP). The countermeasures outlined in their security notice serve as a starting point for all HTTPS interception software developers.

PrivDog

Shortly after the Superfish incident, another piece of TLS interception software named PrivDog made by Adtrustmedia was also found to be vulnerable. PrivDog is an advertising program which intercepts HTTPS connections and replaces “bad” advertisements with advertisements approved by Adtrustmedia.

Privdog, like the aforementioned Superfish, simply replaced certificates for a HTTPS server with new certificates signed by the root certificate they installed on the affected machine. However, the Privdog software performed no validation of the original certificate presented by the target server. Not only did it make untrusted certificates seem trusted, but legitimite websites with EV Certificates were replaced with PrivDog’s self signed certificate removing the green browser indication. Any website an affected user visited with an invalid certificate would appear valid, without browser warnings. An adversary could easily MITM a client running PrivDog by simply advertising a self-signed certificate!

The Security Impact of HTTPS Interception

The Security Impact of HTTPS Interception. Zakir Durumeric, Zane Ma, Drew Springall, Richard Barnes, Nick Sullivan, Elie Bursztein, Michael Bailey, J. Alex Halderman, Vern Paxson. Network and Distributed Systems Security Symposium (NDSS) 2017.

In early 2017, researchers teamed up with Google, Mozilla and Cloudflare in efforts to measure TLS interception in an internet wide study. They noted that TLS interception software can be detected from the server’s point of view by identifying a mismatch between popular browsers TLS handshakes and the observed handshake. Going one step further, by observing the TLS handshakes of popular interception software they were able to construct fingerprints for some of the most widely used interception products.

The study measured interception from the vantage point of the Cloudflare CDN, Firefox Update servers, and popular e-commerce sites. Important results from the study found that about 5-10% of measured HTTPS connections were intercepted, and much of the software reduced the security of the end user in one way or another, with 97%, 54%, and 32% of connections to Firefox, Cloudflare, and e-commerce sites becoming less secure respectively. Interestingly, the only middlebox software to earn a grade of ‘A’ was BlueCoat Proxy.

Grades for middlebox interception
Source: The Security Impact of HTTPS Interception

Chrome 56 update breaks Bluecoat Proxy v6.5

Note these issues are always bugs in the middlebox products. TLS version negotiation is backwards compatible, so a correctly-implemented TLS-terminating proxy should not require changes to work in a TLS-1.3-capable ecosystem […] That these products broke is an indication of defects in their TLS implementations
David Benjamin, Chromium Bug Tracker (2017)

On 21 February 2017, shortly after the above paper was published, mishandling of TLS 1.3 connections by BlueCoat Proxy left thousands of clients without internet connection after an automatic Chrome 56 update. The problem wasn’t that BlueCoat Proxy didn’t implement TLS 1.3, but that it didn’t gracefully renegotiate down to TLS 1.2 which it does support. Instead, the software simply terminated the incoming connection. This left tens of thousands of Chromebooks used by Montgomery County Public Schools students temporarily unable to connect to the internet. The temporary solution was for individual users to alter Chrome’s internal settings to disable TLS 1.3 chrome://flags/#ssl-version-max until a more general solution was delivered by the following day by Chromium, which rolled-back TLS 1.3 support by default.

Going Forward

Whether it be at the cost of availability or end user security, these incidents expose the fragility of TLS interception software. Google has reached out to middlebox vendors in efforts to help them resolve the issues, but system administrators should consider the risks of TLS interception seriously. There are, however, situations where it is necessary such as when companies are legally required to monitor traffic of their employees to comply with regulations (such as in the financial industry). Vendors should independently strive to fix their products for the security of their users at the same time. Organizations who deploy TLS interception software should choose products in an informed manner and carefully consider the risks imposed by interception software.

Grades for various clientside interception
Source: The Security Impact of HTTPS Interception (2017)

Survey and Revised Schedule
David Evans, Wed, 8 Mar 2017

I hope everyone is enjoying Spring Break!

You can see the results of the survey here: Mid-Course Survey Results (I edited one response that mentioned another student since that wasn’t intended to be public, but otherwise it is everything that was submitted.)

Based on the survey responses, and the depletion of Team Poppyseed, I have made some adjustments to the teams and schedule. Team Sesame, Team Cinnamon, and Team Poppyseed have been turned into a Smoked Salmon Shmear, and reformed as Team Pineapple (which is Team Sesame and some additions from Poppyseed) and Team Mango. Teams should wrap-up their remaining blogging responsibilities as they were (that is, Team Poppyseed is still responsible for completing the blog for Class 7, which hopefully is well underway already).

Team PineappleTeam Mango
Adam Imeson (aei5uj)
Bethlehem Naylor (bn9eb)
Bhuvanesh Murali (bm4cr)
Collin Berman (cmb5nh)
Daniel Saha (drs5ma)
Haina Li (hl3wb)
Joshua Holtzman (jmh2ba)
Reid Bixler (rmb3yz)
Tianyi Jin (tj2cw)
Anant Kharkar (agk7uc)
Bargav Jayaraman (bj4nq)
Benjamin Lowman (brl2xx)
Bill Young (wty5dn)
Cyrus Malekpour (cm7bv)
Darion Cassel (dfc9ed)
Sam Havron (sgh7cc)
Yuchi Tian (yt8mn)

I don’t want to increase the presentation and blogging workload for the rest of the semester, though, especially since I want to ensure everyone has time to work on your project. So, each team will still be leading every third week, and blogging every third week, and we will do something else in the other classes.

Team Mango is scheduled to lead the next class on March 17 (this is what was assigned to Team Cinnamon, so not a new responsibility for most of the team, and the new recruits from Team Poppyseed should have limited roles expected for this first presentation since they just joined the reformulated team). Team Pineapple is responsible for Blogging and Food for the class. Note that according to the survey results, 73.4% of the class is sick of bagels. So, Team Pineapple is challenged to come up with something more interesting for breakfast.

The full revised schedule is below. Note that the team that is not leading the class is responsible for both Blogging and Food.

DateTopicTeam MangoTeam Pineapple
Class 8 (17 Mar)TBDLeadBlog, Food
Class 9 (24 Mar)TBDBlog, FoodLead
Class 10 (31 Mar)TBD
Class 11 (7 Apr)TBDLeadBlog, Food
Class 12 (14 Apr)TBDBlog, FoodLead
Class 13 (21 Apr)TBD
Class 14 (28 Apr)Mini-Conference

Timing Attacks
Team Cinnamon, Wed, 1 Mar 2017

Digression: Notable News

Two newsworthy events occurred this week that are of high importance and relevant to TLS in general: SHA-1 Collisions and the Cloudflare Leak. (Those are discussed in the separate linked posts.)

Introduction

Timing attacks exploitd information leaked from timing side-channels to learn private data. In this threat model, an attacker is able to observe the time required for certain parts of a program at runtime and gain information about the execution path followed.

Consider the following snippet of Python code as an example:

def comp(a, b):
    if len(a) != len(b):
        return False
    for c1, c2 in zip (a, b):
        if c1 != c2:
            return False
    return True

The comp() function performs string comparison, returning True if the strings are character-wise equal and False otherwise. Let us assume for the sake of simplicity that the lengths of the strings are public (we are not concerned with timing leaks from the initial length comparison).

Suppose we compare the strings "hello" and "catch". These words fail the string comparison immediately, since the first letters are different. Thus, the function returns False after a single iteration of the loop. In contrast, comparing "hello" with "hella" will require 5 iterations before returning False. An attacker can use the resulting timing delay to determine that "hello" and "hella" share beginning characters, whereas "hello" and "catch" do not. If "hello" was a password, an attacker who could measure the timing precisely enough to count loop iterations would be able to incrementally guess each letter of the string.

One solution to this problem would be to use a Boolean flag initialized to True. If the words have mismatched characters, the flag will be set to False, and the flag will be returned after comparing all characters. In theory, this masks the timing leak; in practice, a compiler may optimize such code to return early, reinstating the timing leak. Furthermore, there may still be timing leaks in the execution of the code, such as variation in the instruction cache depending on which statements are entered.

Thus, we see from this simple example that verifying constant-time implementation of code has many challenges, which we further discuss below.

Remote Timing Attacks are Practical

Remote timing attacks are practical. David Brumley and Dan Boneh (2005).Computer Networks, 48(5), 701-716.]

At the heart of RSA decription is a modular exponentiation \( m = c^d mod~N\) where \(N = pq\) is the RSA modulus, d is the private decryption exponent, and c is the ciphertext being decrypted. OpenSSL uses the Chinese Remainder Theorem (CRT) to perform this exponentiation. With Chinese remaindering, the function \( m = c^d mod~N\) is computed in two steps:

  1. Evaluate \( m_1 = c^{d_1} mod~p\) and \( m_2 = c^{d_2} mod~q\) (\(d_1\) and \(d_2\) are precomputed from \(d\)).

  2. Combine \(m_1\) and \(m_2\) using CRT to yield m.

Both steps could have timing side channels if not implemented carefully.

Chinese Remainder Theorem

Suppose that \(n_1,n_2,…,n_r\) are pairwise relatively prime positive integers, and let \(c_1,c_2,…,c_r\) be integers.

Then the system of congruences,

\(X \equiv c_1 (mod~n_1)\)
\(X \equiv c_2 (mod~n_2)\)

\(X \equiv c_r (mod~n_r)\)

has a unique solution modulo \(N = n_1n_2…n_r\)

Gauss’s Algorithm

\(X \equiv c_1N_1{N_1}^{-1} + c_2N_2{N_2}^{-1} + … + c_rN_r{N_r}^{-1}(mod~N)\), where

\(N_i = N/n_i \)
\(N_i{N_1}^{-1} \equiv 1 (mod~n_i)\)

Hasted’s Broadcast Attack

Hasted’s Broadcast Attack relies on cases when the public exponent \(e\) is small or when partial knowledge of the secret key is available. If \(e\) (public) is the same across different sites, the attacker can use Chinese Remainder Theorem and decrypt messages!

Hasted’s Broadcast Attack works as follows:

Montgomery Reduction

Montgomery Reduction is an algorithm that allows modular arithmetic to be performed efficiently when the modulus is large.

The reduction takes advantage of the fact that \(x~mod~2^n\) is easier to compute than \(x~mod~q\); the reduction simply strips off all but the \(n\) least significant bytes.

Steps needed for the reduction:

PDF Collision
(Image credit)

Protections against timing attacks

There are numerous defenses against timing attacks, but known defenses are either expensive or only provide partial mitigation.

Remote Timing Attacks are Still Practical

Billy Bob Brumley and Nicola Tuveri. “Remote timing attacks are still practical.” European Symposium on Research in Computer Security. Springer Berlin Heidelberg, 2011.

Timing attacks target cryptosystems and protocols that do not run in constant time. Elliptic curve based signature schemes aim at providing side-channel resistance against timing attacks. For instance, scalar multiplication is achieved via Montgomery’s ladder which performs a sequence of independent field operations on elliptic curves. Brumley and Tuveri reveal a timing attack vulnerability in OpenSSL’s implementation of Montgomery’s ladder that consequently leaks the server’s private key.

What is Montgomery’s Ladder?

Consider the right-to-left square-and-multiply algorithm to compute an exponentiation operation:

Montgomery ladder
Right-to-Left Square-and-Multiply Algorithm
Source: https://cr.yp.to/bib/2003/joye-ladder.pdf

The above algorithm performs more operations when the bit is set, thereby leading to a possible timing attack. Montgomery’s power ladder method, on the other hand, performs the same number of operations in both the cases. This prevents timing based side-channel attacks as well as makes the algorithm more efficient by making it parallelizable. The algorithm is as below:

Montgomery ladder
Montgomery’s Ladder
Source: https://cr.yp.to/bib/2003/joye-ladder.pdf

OpenSSL’s implementation of Montgomery’s Ladder

OpenSSL uses Elliptic Curve Cryptography for generating Digital Signatures to sign a TLS server’s RSA key. Elliptic Curve Cryptography for Digital Signature uses the following curve for binary fields:

ECC
NIST recommends two standard curves: 1. Set \(a_2 = 1\) and choose \(a_6\) pseudo-randomly, or 2. Choose \(a_2\) from \({0,1}\) and set \(a_6 = 1\). Either of the two curves can be used for digital signatures.

Parties select private key as \(0 < d < n\) and public key as \([d]G\) and then proceed to generate digital signatures using elliptic curves as:

digital

OpenSSL uses Montogmery’s ladder to compute the above digital signatures since it requires multiple exponentiation operations. However, OpenSSL’s implementation has a flaw that leads to timing attack. Below is OpenSSL’s implementation:

OpenSSL
OpenSSL’s implementation of Montgomery’s Ladder
Source: https://gnunet.org/sites/default/files/Brumley%20%26%20Tuveri%20-%20Timing%20Attacks.pdf
As indicated in the third line of code, OpenSSL optimizes the number of ladder steps and therefore leaks information about the MSB of k. Since the time taken to compute the scalar multiplications is proportional to the logarithm of k, which is revealed by the MSB of k, this leads to a timing attack. The attacker collects multiple digital signatures such that the signatures are generated by random nonce k with leading zero bits; this information is revealed by the above timing attack. The attacker then launches a lattice attack using the collected digital signatures to reveal the RSA key of the TLS server.

Countermeasure

A possible countermeasure as proposed by Brumley and Tuveri is to pad the scalar \(k\):

counter
Countermeasure to OpenSSL’s flaw
Source: https://gnunet.org/sites/default/files/Brumley%20%26%20Tuveri%20-%20Timing%20Attacks.pdf
This ensures that the logarithm is constant and hence leaks no side-channel information. Moreover, the above modification does not cause extra computation overhead.

Sources

Marc Joye and Sung-Ming Yen. “The Montgomery powering ladder.” International Workshop on Cryptographic Hardware and Embedded Systems. Springer Berlin Heidelberg, 2002.

Billy Bob Brumley and Nicola Tuveri. “Remote timing attacks are still practical.” European Symposium on Research in Computer Security. Springer Berlin Heidelberg, 2011.

Howgrave-Graham, Nick A., and Nigel P. Smart. “Lattice attacks on digital signature schemes.” Designs, Codes and Cryptography 23.3 (2001): 283-290.

Cache Timing Attacks

At this point, it seems as though we’ve seen everything—there couldn’t possibly be another side-channel attack, right?

Wrong.

Timing attacks are capable of leveraging the CPU cache as a side-channel in order to perform attacks. Since the issue results from hardware design, it’s difficult for application designers to address; the behaviors that influence cache patterns are proprietary features hidden away into today’s processors.

Intel CPU Cache

In cache terminology, hits occur when queried data is present in the cache and misses occur when data must be fetched from main memory. Consider the L1 and L2 caches of the Intel Sandy Bridge processor. Both are 8-way set associative caches consisting of sets with 64-byte lines (64 sets in L1, 512 sets in L2, for a total of 32KB and 256KB in storage, respectively).

For those unfamiliar with computer architecture, addresses of information in the cache are split into three components: tag, set, and offset. An address looks something like this:

1111 0000 1111 0000 1111        000011       110000
Tag                              Set         Offset

Now that we’ve reviewed the memory hierarchy, let’s take a look at some attacks that use variations in cache timing and operation to their advantage.

PRIME+PROBE

The PRIME+PROBE attack is carried out by filling the victim’s cache with controlled data. As the victim carries out normal tasks in their machine, some of the attack data is evicted from the cache. All the while, the attacker monitors the cache contents, keeping careful track of which cache lines were evicted. In doing so, this provides the attacker with intimate knowledge of the operation and nature of the victim’s activities as well as the contents replaced by the victim.

EVICT+TIME

The EVICT+TIME attack is carried out by evicting a line of an AES lookup table from the cache such that all AES lookup tables are in the cache save for one. The attacker then runs the encryption process. As you might imagine, if the encryption process accesses the partially evicted lookup table, encryption will take longer to complete. By timing exactly how long encryption takes, the attackers are able to determine which indices of which tables were accessed. Because table lookups depend on the AES encryption key, the attacker thus gains knowledge about the key.

Cache Games

The Cache Games attack targets AES-128 in OpenSSL v0.9.8 and is capable of recovering the full secret key. In the attack, a non-privileged spy process conducts a 2.8 second observation of approximately 100 AES encryptions (1.56KB of data) and then performs 3 minutes of computation on a separate machine. The spy process is able to abuse the default Linux scheduler, unfortunately named the Completely Fair Scheduler, to monitor cache offset accesses with extremely accurate precision, thus gaining information about the key.

In order to abuse the CFS, the spy process creates hundreds of threads that immediately block. After a few cycles, the first thread awakens, checks for memory accesses by the target code, and then signals for the next thread to awaken. This continues for all threads. To filter-out noise, Cache Games uses an ANN that takes bitmaps of activations and outputs points with high probability of access.

Cache Games ANN
An ANN takes bitmaps of activations and outputs
points with high probability of access

The nature of the AES encryption process—consisting of 10 rounds of 16 memory accesses—allows the spy process to construct a list of partial-key candidates that is continually refined as more encryptions are repeated. The CFS maintains processes in a red-black tree and associates with each process a total runtime. The scheduler calculates a “max execution time” for each process by dividing the total time it has been waiting by the number of processes in the tree. Whichever process minimizes this value is selected to run next.

Mitigation

CacheBleed

TODO: add paper link

In keeping with the trend of affixing “-bleed” to various information security leakage vulnerabilities, CacheBleed is a recent (c. 2016) attack on RSA decryptions as implemented in OpenSSL v1.0.2 on Intel Sandy Bridge processors. In the attack, the timing of operations in cache banks is taken advantage of in order to glean information about the RSA decryption multiplier.

Cache banks were a new feature in Sandy Bridge processors, designed to accommodate accessing multiple instructions in the same cache line in the same cycle. As it turns out, cache banks can only serve one operation at a time, so if a cache bank conflict is encountered, one request will be delayed!

In order to carry-out the attack, the attacker and victim start out running on the same hyperthreaded core, thus sharing the L1 cache. The attacker then issues a huge number of requests to a single cache bank. By carefully measuring how many cycles passed in completion of the request, the attacker can discern whether the victim accessed that cache bank at some point. After many queries, the attack is successful at extracting both 2048-bit and 4096-bit secret keys.

Mitigation

The upside to CacheBleed is that it’s highly complicated, requiring shared access to a hyperthreaded core on which RSA decryption is taking place—certainly not a predictable scenario. In any case, there are other pieces of “low-hanging fruit” in computer systems that attackers are more likely to target. Nonetheless, Haswell processors implement cache banks differently such that conflicts are handled more carefully. The only other mitigation technique is to disable hyperthreading entirely.

Cloudflare Leak
Team Cinnamon, Tue, 28 Feb 2017

The Cloudflare Leak

How the Incident Developed

Cloudflare is an Internet infrastructure company that provides security and performance services to millions of websites. On February 17th, 2017, Travis Ormandy, a security researcher from Google’s Project Zero, noticed that some HTTP requests running through Cloudflare were returning corrupted web pages.

Travis Ormandy's Tweet
https://twitter.com/taviso/status/832744397800214528

The problem that Travis noticed was that under certain circumstances, when the Cloudflare “edge servers” returned a web page, they were going past the end of a buffer and adding to the HTML dumps of memory that contained information such as auth tokens, HTTP POST bodies, HTTP cookies, and other private information [1]. To make matters worse, search engines both indexed and cached this data such that it was for a while searchable.

Example Leaked Data
http://pastebin.com/AKEFci31

Since the discovery of the bug, Cloudflare worked with Google and other search engines to remove affected the cached pages.

The Impact

Data could have been leaking as early as September 22nd, but Cloudflare reported that the period of highest impact was from February 13th through February 18th with around 1 in every 3,300,000 HTTP requests have a potential memory leakage [1]. It is difficult to assess how much data was leaked, especially since the corrupted results and their cached versions were quickly removed from search engines, but Wired reported that data from large companies such as Fitbit, Uber, and OKCupid was found in the corrupted pages of a set of affected web pages.

Cloudflare asserts that the leak did not reveal any private keys, and even though other sensitive information was leaked, it did not appear in the HTML content of particularly high-traffic sites, so the damage was mitigated.

Overall, about 3000 customer’s sites triggered the bug, but, as previously noted, the data leaked could have come from any other Cloudflare customer. Cloudflare is aware of about 150 customers who were affected in that way.

The Bug Itself

As mentioned earlier, the problem resulted from a buffer being overrun and thus additional data from memory being written to the HTML of web pages. But how did this happen, and why did it happen now?

Some of Cloudflare’s services rely on modifying, or “rewriting,” HTML pages as they are routed through the edge servers. In order to do this rewriting, Cloudflare reads and parses the HTML to find elements that require changing. Cloudflare used to use a HTML parser written using a project called Ragel that converts a description of a regular language into a finite state machine. However, about a year ago they decided that the Ragel-based parser was a source of technical debt and wrote a new parser called cf-html to replace it [1].

Cloudflare first rolled this new parser our for their Automatic HTTP Rewrites service and have since been slowly migrating other services away from the Ragel-based parser. In order to use these parsers, Cloudflare adds them as a module to NGINX, a load-balancer [1].

As it turned out, the parser that was written with Ragel actually had a bug in it for several years, but there was no memory leak because of a certain configuration fo the internal NGINX buffers. When cf-html was adopted, the buffers slightly changed, enabling the leakage.

The actual bug was caused by what you might expect: a pointer error in the C code generated by Ragel (but the bug was not the fault of Ragel).

C Code Bug
https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/

As can be guessed from this snippit, the cause of the bug was that the check for the end of the bugger was done using the equality operator, ==, instead of >=, which would have caught the bug. That snippit is the generated code. Let’s look at the code that generated that.

In order to check for a the end of the buffer when parsing a <script> tag, this piece of code was used:

Ragel Bug Code
https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/

What it means is that in order to parse the end of the tag, zero or more unquoted_attr_char are parsed followed by whitespace, /, or > signifying the end of the tag. If there is nothing wrong with the script tag, the parser will move to the code in the @{ }. If there is a problem, the parser will move to the $lerr{ } section.

The bug was caused if a web page ended with a malformed HTML tag such as <script type=. The parser would transition to dd("script consume_attr failed") which is just print debug output, but then instead of failing, it transitions to fgoto script_consume_attr;, which means it tries to parse another attribute.

Notice that the @{ } block has a fhold while the $lerr{ } block does not. It was the lack of the fhold in the second block that caused the leak. In the generated code, fhold is equivalent to p-- and thus if the malformed tag error happens at the end of the buffer, then p will actually be after the end of the document and the check for the end of the buffer will fail, causing p to overrun the buffer.

The Response From Cloudflare

Cloudflare seems to have responded relatively well to this bug. After the bug was brought to their attention they performed an initial mitigation, which meant disabling all of the sevices that used cf_html, in 47 minutes. Luckily, Cloudlfare uses a ‘global kill’ [1] feature to enable the global disabling of any service. Since the Email Obfuscation was the main cause of the leak, it was disabled first, and then Automatic HTTPS rewrites were killed about 3 hours later. About 7 hours after the bug was detected, a fix was deployed globablly. As mentioned previously, Cloudflare also contacted search engines to get the affected pages and their cached versions removed from the web.

What was not as good about Cloudflare’s response were the lessons they said they learned in their incident report. Essentially, their lessons learned amounted to saying the bug was a corner case in an “ancient piece of software” and that they will be looking to “fuzz older software” to look for other potential problems [1]. We hope Cloudflare will take more time to look at this incident more seriously and consider systemic issues that could allow such a problem to occur and persist.

Further Thoughts

Bugs like this expose the difficulty of ensuring software correctness. It is quite unlikely that a corner case like this would have been caught by human eyes, and even a fuzzer would have had to have triggered some exceptional conditions in order to exposed the bug. On the other hand, many tools and processes exist for detecting these types of problems, and there is little excuse for not using them on security-critical software.

Reference

[1] https://blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/

SHA-1 Collisions
Team Cinnamon, Tue, 28 Feb 2017

The First SHA-1 Collision

On February 23rd, 2017, researchers from Google and CWI Institute in Amsterdam) announced the first SHA-1 collision. As proof of this claim, two PDFs were published that yield the same SHA-1 hash despite containing different content (PDF 1, PDF 2).

PDF Collision
https://shattered.it/static/shattered.png

While weaknesses in SHA-1 had been known since the work by Xiaoyun Wang and colleagues in 2004, this is the first known attack to find an actual SHA-1 collision. While SHA-1 was deprecated by NIST in 2011, many systems still extensively use SHA-1 (git, SVN, even some certificate authorities, etc.). The researchers argue that these findings should reinforce the need to more secure hashing algorithms:

We hope that our practical attack against SHA-1 will finally convince the industry that it is urgent to move to safer alternatives such as SHA-256.

Attack Details

SHA-1 takes an arbitrary length message and computes a 160-bit hash. It divides the (padded) input into \(k\) blocks \(M_1, M_2, …, M_k\) of 512 bits. The 160-bit internal state \(CV_j\), called the chaining value, is initialized to some initial value \(CV_0 = IV\). Then, each block \(M_j\) is fed to a compression function \(h\) that updates the chaining value, \(CV_{j+1} = h(CV_j, M_{j+1})\). \(CV_k\) Is the output of the hash.

The attack implements the best known theoretical collision attack outlined by Stevens (2013) (one of the leaders of this effort). This attack is an identical-prefix collision attack, where a given prefix \(P\) is extended with two distinct near collision block pairs such that they collide for any suffix \(S\):

$$ \text{SHA-1}(P||M_1^{(1)}||M_2^{(1)}||S) = \text{SHA-1}(P||M_1^{(2)}||M_2^{(2)}||S) $$

Finding both the first and second near collision block pairs, (\(M_1^{(1)}, M_1^{(2)}\)) and (\(M_2^{(1)}, M_2^{(2)}\)), respectively, was completed using slightly modified algorithms from Stevens’ work. Broadly speaking, differences in the first block pair cause a small difference in the output chaining value, which is “canceled” by the difference in the second block pair. The remaining identical suffixes ensure a collision. Differential paths are leveraged as a precise description of the differences in block pairs and how these differences evolve through the hashing steps. This description is the foundation of a search over the possible block pairs. Note that once the collision block pairs are found for a particular prefix, any number of colliding inputs can be found since S can be anything.

The PDF format is exploited by packaging the differing collision blocks into an embedded JPEG image. In the example collision, the differing blocks are aligned such that the background of the PDFs are different.

PDF Encoding
https://shattered.it/static/pdf_format.png

A significant contribution of this work is to apply these algorithms at the scale necessary for practical execution. While the source code for these computations has not yet been released (the authors are allowing a grace period to move to modern hashing algorithms), the changes required to scale this attack are highly non-trivial. Combined, the computations required approximately 6500 CPU years and 100 GPU years. At the time of publishing, the authors estimate the total cost of their attack (via AWS) at $110,000, easily within the reach of criminals. This attack is estimated to be approximately 100,000 times faster than a brute force search.

Full technical details of the attack are outlined in the released paper: Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, Yarik Markov. The first collision for full SHA-1. (Released 23 February, 2017)

Testing and Verification of TLS
Team Sesame, Fri, 24 Feb 2017

Introduction

TLS/SSL is the most widely adopted protocol for securing online communication. However, as we have seen in the past few weeks, creative attackers have found it riddled with exploitable weaknesses. Rather than just reacting to attacks as they are discovered, many projects instead proactively seek out potential security flaws to implement remedies before vulnerabilities are exploited. This is done primarily by testing and verification, the topic of this week’s blog post.

First, we will examine some motivating attacks on TLS implementations, including the Heartbleed attack, CRIME attack, and the infamous “goto fail”, as well as their solutions. Next, we will discuss differential testing, a technique that creates well-formed but forged certificates, called “frankencerts”, and uses them to compare the responses of many popular implementations such as OpenSSL - and how they strengthened their defenses afterward (or didn’t). Finally, we introduce verification, which takes advantage of the relationship between computer systems and mathematics to rigorously prove properties about programs, either by type-checking existing programs or building a program from scratch starting with abstract refinement types.

Heartbleed

Heartbleed by Codenomicon (2014)

OpenSSL’s heartbleed bug exploits heartbeat requests between a client and a server. In the OpenSSL implementation, the client does not check the actual lengths of the message but trusts the length field in the request message. If the real message is shorter than the length specified in the length field, then the payload returned in the response heartbleed can also contain what happened to be in the allocated memory buffer. In this manner, secret keys, login credentials, sensitive data, and miscellaneous memory are all at risk of exposure.

Heartbleed
Source: Wikipedia

Compression Ratio Info-Leak Mass Exploitation (CRIME)

The CRIME Attack by Juliano Rizzo and Thai Duong (2013)

CRIME attacks were developed by Juliano Rizzo and Thai Duong in 2012 and exploited TLS Compression by injecting plaintext into victim’s web requests and observing the outcome length. Tampering with web requests is made possible by injecting malicious JavaScript code through a network attack. Through many trials, the attacker can steal user cookies and hijack the session.

CRIME
Source: Infosec Institute

Goto Fail;

Understanding Apple ‘goto fail’ Vulnerability by Amit Sethi (2014)

In February 2014, Apple released a security update on many versions of its operating system that included the following vulnerability in the function SSLVerifySignedServerKeyExchange.

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;
  … other checks …
  fail:
    … buffer frees (cleanups) …
    return err;
Source: David A. Wheeler

The indentation on the second goto fail; is misleading. The lack of curly brackets meant that the second goto fail; will always be executed, skipping vital signature checking and accepting both good and bad signatures.

Differential Testing

Using Frankencerts for Automated Adversarial Testing of Certificate Validation by Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov (2014)
An empirical study of goto in C code from GitHub repositories by Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Éric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan (2015)

Differential testing is the process of searching for bugs in software by running multiple programs on the same inputs. If there is a discrepancy between one program’s results on a given input and another’s, it’s likely that one of the implementations is bugged. A differential testing system will flag such results for human review.

Brubaker et al. used this technique to test server authentication for many implementations of SSL/TLS as presented in their 2014 “Frankencerts” paper. The researchers initially had to deal with the dual problems of generating test case certificates and successfully interpreting the results of acceptance and rejection.

Generating certificates proved to be a challenge due to the nature of an SSL certificate. Simply randomly fuzzing valid certificates is unlikely to produce data parsable as a certificate. Manually creating test cases would take too long and potentially miss edge cases that a more statistically comprehensive generation method would find. The researchers decided on creating frankencerts by scanning 243,246 current certificates from the internet and randomly permuting their parts. This process guaranteed that the 8,127,600 frankencerts would be parsable as certificates and resulted in a wide variety of unusual combinations of extensions, key usage constraints, odd CAs, and other possible variations within a cert.

The researchers then ran OpenSSL, NSS, GnuTLS, CyaSSL, PolarSSL, MatrixSSL, OpenJDK, and Bouncy Castle on the frankencerts, looking for discrepancies between results. Because each of these libraries is intended to implement the same X.509 certificate validation logic, any discrepancy in certificate acceptance between them would indicate that SOMEONE was screwing up. Thus the researchers dealt with the second issue, that of interpreting the results of their tests.

Running the differential test on all 8 million frankencerts and the quarter-million real certificates produced 208 discrepancies that, when manually investigated, uncovered many serious flaws in the various implementations (error results are bolded):

Differential Testing Results
Source: Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov

Any of the invalid acceptances would allow for an in-the-middle attack to be crafted against the implementation in question. The researchers contacted the developers of the various SSL/TLS implementations affected and reported their results before publishing, allowing the developers time to fix their bugs.

Chen and Su’s mucert builds on the frankencert method. Mucerts are randomly fuzzed test certificates generated from a seed set of certificates. The fuzzing, or mutating, occurs in accordance with a number of guidelines that prevent mutations from generating unparseable certificates. Chen and Su statistically evaluated mucerts and compared them to frankencerts, finding that mucerts attained drastically higher levels of corner case coverage.

Verification

Verifying s2n HMAC with SAW by Joey Dodds (2016)
Implementing TLS with Verified Cryptographic Security by Karthikeyan Bhargavan, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti,and Pierre-Yves Strub (2013)
Software Foundations by Benjamin C. Pierce, et al. (2017)

Software testing can only assert properties about program behaviors covered in the test suite. There will always be missed edge cases and odd bugs that will show up in code. This is where we turn to formal specifications to ‘prove’ the correctness of our code.

But how do we go about ‘proving’ this correctness? Let’s look at what we’re trying to solve in the first case:

For each behavior X in code, formally prove that X does what it should and nothing else. For those that are familiar with proofs and mathematical abstractions, this is known as a non-existence proof. In formal verification, we must prove that a program satisfies a formal specification of its behavior. By using the methods of mathematics, we can make strong statements regarding the correctness of our cryptographic implementations, for example, TLS.

To get down to formal verification, we must first define the difference between verification and validation. Validation asks questions like Does it meet the requirements?, Is it the right product?, and Does it function according to design?. Whereas, verification asks the question does this software meet a given specification under given assumptions?.

One requirement for nearly any stronger property is to first show the code satisfied type safety. A program is considered type safe if and only if the only operations that can be performed on data are those sanctioned by the type of data. For example, we can look at the following program where integers, floats, and doubles are being implicitly cast amongst one another. In a type safe program, an integer could never be treated as a float (more importantly, a pointer type would never be treated as a non-pointer, and vice versa).

While the below program is well-typed, it illustrates some of the challenges faced when attempting to show type safety for typical C code:

Type Safety of Code

There are a variety of formal concepts that are used in program verification including model checking, deductive verification, equivalence checking, theorem proving, and correctness by construction. The problem that arises, though, is that some of these rely on concepts that simply are not feasible in commercial-level programs. For example, model checking exhaustively checks whether a model of the program meets a given specification, but every behavior in the program must have proper transitions and states resulting in the state explosion problem. Put simply, as more behaviors are added to simple programs, the number of states in the model exponentially grows. Due to this, it is challenging to scale model checking to large systems, although extensive work has been done toward this goal and often complex programs can be abstracted in ways that enable useful model checking.

One popular approach to verification is to use expressive type systems. Using the Curry-Howard isomorphism, we are able to directly relate programs with mathematical proofs and therefore have the ability to create refinement types which are types endowed with a predicate which is assumed to hold for any element of the refined type. What this means is that we can define our program using refinement types to encode the safety properties of the system which can be type-checked and proven to hold that safety property true.

There’s a lot that can be done with formal verification and TLS is a great possible use for fixing errors created during implementation. There are actually a number of projects for verifying parts of TLS out there that are attempting to combine together to formally verify TLS as a whole. This joint project is known as Project Everest. Everest is a pretty lofty goal considering the difficulty of formally verifying even small-level programs and scripts, but has made considerable progress towards building a fully verified TLS implementation. The eventual goal of Everest is that when all the projects are combined together, they will generate a C library that not only implements TLS 1.3 but is also proven secure, essentially circumventing any possible flaws in the TLS protocol.

Coq

Team Cinnamon displayed a formal verification software called Coq during class. Coq is an assistive tool for the verification of code. The software implements a variety of mathematical and computational strategies to work, in combination with the user, to formally verify code, functions, and theorems. In class we ran through a couple of sample proofs to display the functionality and potential of Coq as a verification software and to give an example of how a formal verification software works.

A download for Coq is available at https://coq.inria.fr/ which offers different versions for different architectures. We used the CoqIDE which allows for editing Coq files with helpful syntax highlighting and is useful when trying to first learn Coq. For a good introduction to Coq, as well as number of exercises for what to do, a group at UPenn has a great lab focused on Coq.

Sample Coq Code
Source: The Coq Proof Assistant

Coq is extremely powerful software to help formally verify computer programs, but it can be difficult to learn, requiring a change in mentality for most programmers. For any problems or difficulties found when using Coq, there is plenty of documentation available within Coq’s Reference Manual.

Conclusion

Implementing cryptographic protocols correctly remains a huge challenge. There are tools available now that can prove interesting properties about software, including even the absence of certain types of side channels. It is important to remember, though, that anything we prove about a program is a proof about a property based on a model of the execution environment, and assumptions about adversary capabilities. As verification and testing tools get more sophisticated, those models can come increasingly close to the actual environment and capabilities of real adversaries, but today, there remains a large gap.