Internet-Draft | HashedElision | February 2024 |
Appelcline, et al. | Expires 4 August 2024 | [Page] |
This document discusses the privacy and human rights benefits of data minimization via the methodology of hashed data elision and how it can help protocols to fulfill the guidelines of RFC 6973: Privacy Considerations for Internet Protocols and RFC 8280: Research into Human Rights Protocol Considerations. Additional details discuss how the extant Gordian Envelope draft can provide further benefits in these categories.¶
This note is to be removed before publishing as an RFC.¶
Status information for this document may be found at https://datatracker.ietf.org/doc/draft-appelcline-hashed-elision/.¶
Source for this draft and an issue tracker can be found at https://github.com/BlockchainCommons/WIPs-IETF-draft-hashed-elision.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 August 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
IETF released guidelines for privacy considerations in 2013 with [RFC6973] and then expanded upon that with human-rights considerations in 2017 with [RFC8280]. Both RFCs provide thoughtful ideas for how privacy can be improved in internet protocols, and how that can support human rights on the internet.¶
However, as generalized guidelines these RFCs don’t provide the specifics that might be required to incorporate these guidelines into new protocols. This leads to privacy threats such as correlation, secondary use, and unnecessary disclosure of data. This document suggests more specific areas of work based in part on the Data Minimization suggestions of §6.1 of RFC 6973, and expands them to also support some of the Human Rights Guidelines outlined in §6.2 of RFC 8280. It does so through the advancement of a hashed data elision methodology, which allows for the optional removal of data while maintaining hashes of that data to ensure data integrity.¶
Digital data transmission often operates on an all-or-nothing basis: sharing data means full disclosure. There is no standard methodology for minimizing data nor for eliding parts of a data packet. Releases large packets of data, much of it unnecessary, can threaten privacy in multiple ways:¶
Methodologies for minimizing the amount of data shared at any one time can reduce all of these privacy dangers.¶
§6.1 of RFC 6973 lists anonymity and pseudonymity as two methodologies for creating Data Minimization. This means removing uniquely identifying data and/or reducing the amount of personal data that is transmitted.¶
Though anonymity and pseudonymity are minimal requirements for improving the privacy of digital data, they are insufficient, as data that is pseudoanonymous or anonymous still has the danger of being correlated. To best address privacy requires reducing the amount of all data found in any disclosure to the bare minimum required for a specific disclosure.¶
Data minimization focuses on cutting out unnecessary content that is not required for a specific task. Though Data Minimization is a general requirement to improve privacy, doing so in a simplistic manner is not sufficient.¶
This is because simplistic Data Minimization excises everything about data, which can cause problems for the Integrity and potentially the Authenticity of the original data set. These are needed per the Guidelines for Human Rights, as outlined in §6.2.16 and §6.2.17 of RFC 8280.¶
A better solution for Data Minimization would not ignore other Human Rights needs as it improves privacy. Hashed data elision, which preserves hashes for data that has been cut, can provide such a solution.¶
A more nuanced Data Minimization system can solve many problems. However, there are also situations where a party doesn’t need to know any specific data, but instead requires proof that a general data precept is true. The traditional example is proof whether someone is 21 or older, for buying alcohol in the United States. With Data Minimization, the person's precise age would still be provided, even though all that's actually needed is an affirmation that the person was born more than 21 years ago.¶
In these cases, privacy threats can be reduced even more by providing no data, simply the proof that a general precept is true. This can offer very strong protection against Correlation (§5.2.1 of RFC 6973) and obviously minimizes Disclosure (§5.2.4 of RFC 6973).¶
Though some systems such as BBS+ Signatures and other Zero Knowledge Proofs system can support superior anti-correlation with “proof of knowledge of the undisclosed signature”, a more simple salted and hashed data elision often can provide easier solutions for many classes of “inclusion” proofs.¶
This section tries to identify and structure areas of work to address the aforementioned problems by turning the guidelines of RFC 6973 and RFC 8280 into more precise specifications or requirements. It focuses on hashed data elision as a core area of work, but in a section on optional areas of work discusses more specific advancements that can further support RFC 6973 and especially RFC 8280.¶
As suggested by RFC 6973, Data Minimization is a prime methodology for improving privacy and reducing problems such as Correlation, Secondary Use, and Disclosure.¶
To support Data Minimization, a specification MUST:¶
Elision is the obvious requirement for Data Minimization: it's the removal of data. The question of who can elide data becomes more important when data is signed as a means of authentication, such as in credentials. In these situations, elision is traditionally restricted to the issuer of the credential, which effectively denies the holder from doing so. To support Data Minimization requires the holder to be able to do so as well, while maintaining any signatures.¶
As noted in §2.3, above, simplistic Data Minimization can cause other human rights problems such as a lack of Authenticity or Integrity checking. This can be resolved in a specification by requiring a fingerprint that can be used to verify elided data.¶
To incorporate deterministic hashing, a specification MUST:¶
A fingerprint that is generated through a hash function such as SHA-256 or a newer function such as BLAKE3 will generally meet the first two requirements.¶
The third requirement is designed to support the requirements for Data Minimization in §3.1.1, above. If data is hashed, but any signature is applied to the hash rather than the original data, then a holder can choose to elide the data or not, as they see fit, but the signature still remains valid. This is the strong core of deterministic hashed data elision, harmonizing Data Minimization and data integretity.¶
Because data does not always need to be shared to provide the verification required by a validator, support of data proofs can provide additional privacy and human rights benefits.¶
To enable inclusion proofs, a specification MUST:¶
Through this methodology, a holder can create a proof for a specific bit of data, such as their residence in a specific country or state, demonstrate that proof’s creation, and show that it matches the hash of elided data. However, the holder does so only if and when they wish: only the hash is ever public known, the data is never known unless the holder produces a proof. Usually, the proof is only offered to an entity who is verifying a specific data element, effectively turning it from a data revelation method to a data verification method. This provides strong Data Minimization that is holder controlled.¶
Though other methodologies exist for proving the content of data, such as Zero-Knowledge Proofs and BBS+ Signatures, inclusion proofs based on hashes provide a much easier solution that is pragmatically more likely to be implemented and thus is more accessible and useable today.¶
Support for inclusion proofs can also allow for the use of herd privacy, where data about a specific user is contained within a much larger hash of data, which can be widely published without danger. This puts all the agency for data revelation in an individual user’s hand and does it without any need to “phone home”, meaning that not even the original publisher of the data would know when that data were being checked.¶
To facilitate herd privacy, a specification MUST:¶
Herd privacy provides further benefits to privacy because a credential publisher can publish data without ever having contact with credential holders, and those holders can then choose to reveal that data, or not, all without any knowledge of the publisher. Requirements #1-4 suggest one way to do so using hashed elision and merkle trees such that other information can’t be guessed from the revelation of hashes, but the requirement #5 says that other methodologies would be acceptable provided they meet the core needs of a herd privacy system.¶
Using hashed data elision as a foundation would improve the privacy of almost any IETF protocol.¶
The Gordian Envelope Internet-Draft [GordianEnvelope] is one example of a specification that supports hashed data elision. It could be used to enable all of the Core Areas of Work. It also goes further, incorporating additional functionality that can provide better support for RFC 6973 and RFC 8280 through additional features, including the following.¶
A hashed data elision system can be expanded to support both encryption and compression functions, as encrypted and compressed data can also be represented by their hashes without revealing any information about the original data.¶
Incorporating encryption into a data specification offers the highest level of privacy and of Data Minimization possible, as data can only be viewed by select individual with the decryption key. This is especially important for Confidentiality, which is referenced in §6.2.15 of RFC 8280.¶
Hashing encrypted data also improves Authenticity, per §6.2.17 of RFC 8280. As with other sorts of elided data, signatures will remain valid even following compression, provided the signatures are applied to the data hash, not the original data.¶
As currently detailed, the Gordian Envelope Internet-Draft also supports several other Guidelines for Human Rights Considerations that are listed in §6.2 of RFC 8280:¶
Support for privacy and for human rights has another requirement: it needs to be kept simple so that it finds actual use.¶
Gordian Envelope is a fundamentally simple data format that only achieves complexity through iterative structure design.¶
As outlined, the general concept of hashed data elision and the specific design of Gordian Envelope provide a wide variety of privacy advancements. They offer strong support for Data Minimization and other guidelines found in RFC 6973 and RFC 8280.¶
The biggest remaining privacy concern is of accidental correlation that can arise if different parties have different versions of the same data, which has been elided in different ways. This is currently seen as an acceptable side-effect of an elision system that allows for Authenticity and Integrity in the system, and can be offset by careful creation of Envelope structures, such as gathering small groups of data into distinct, elided branches.¶
However, the question also remains open as to whether there might be more expansive and more automated solutions.¶
Hashed data elision is intended to strengthen communication security, primarily by enhancing confidentiality (through elision) while also maintaining data integrity (through hashing). Supporting this with a signature system that signs hashes rather than original data also allows for peer entity authentication, creating a strong foundation for overall communication security.¶
However, that security depends on the strength of hashing algorithms and encryption/signature algorithms. Strong, unbroken hashes and encryption schemes are required. Potential threats to hashes and encryption such as quantum computing would result in threats to any hashed data elision system.¶
This document has no IANA actions. Gordian Envelope has already been assigned CBOR tag #200 by IANA.¶
The authors are grateful for the support of the CBOR working group in discussions of Gordian Envelope and general guidance within the IETF.¶