Cover Art

The CBOR, dCBOR, and Gordian Envelope Book

Wolf McNally
Christopher Allen

Colophon

The CBOR, dCBOR, and Gordian Envelope Book

Wolf McNally
Christopher Allen
and other contributors to Blockchain Commons.

The content is written in Markdown and built using mdBook.

All code examples are in Rust, using the dcbor and bc-envelope crates available on crates.io.

This project is open source and collaborative. Contributions require a signed Contributor License Agreement (CLA) as described in the repository. Unless otherwise noted, the content of this book is licensed under the BSD-2-Clause Plus Patent License. You are free to use, modify, and redistribute the material under these terms.

For more information, source code, and contribution guidelines, visit the Blockchain Commons GitHub organization.

Attribution Notice ⚖️

This book is a Blockchain Commons publication. If you encounter a derivative work or modified version that omits attribution or fails to reflect the license terms, please refer to the canonical source at https://cborbook.com or https://github.com/BlockchainCommons/cbor-book.

Introduction

TL;DR: Who is This Book For?

If your primary goal is to understand a modern, efficient binary serialization format that offers significant performance and size benefits over JSON, Part I provides a comprehensive guide to CBOR.
If your application requires absolute, verifiable consistency – for digital signatures, content hashing, consensus, or interoperable verification – Part II delves into the principles of determinism and the specifics of dCBOR, including a tutorial for the dcbor Rust crate.
For those building applications that require structured, verifiable, and privacy-preserving data – smart documents – Part III explores the groundbreaking capabilities of Gordian Envelope, including usage of the bc-envelope Rust crate.

Navigating the Landscape of Modern Data Representation

Modern software engineering demands tools that can handle the dual pressures of performance and trust. As systems grow more distributed and data security becomes paramount, developers must move beyond formats that were designed for a simpler era. JSON, while readable and ubiquitous, can introduce unacceptable inefficiencies—slowing performance, bloating payloads, and leaving room for ambiguity where certainty is required.

This book introduces a progressive technological stack—CBOR, dCBOR, and Gordian Envelope—that addresses these modern challenges head-on. Together, they form a path from compact binary encoding to cryptographically verifiable and privacy-preserving data structures. Understanding these tools enables engineers and decision-makers to build faster, leaner, and more trustworthy systems—without compromise.

Part I: The Foundation – Achieving Efficiency with CBOR

At the core of this stack is CBOR: Concise Binary Object Representation, defined in RFC 8949. It was designed for constrained environments, enabling implementations with minimal memory and CPU usage—ideal for IoT devices, embedded systems, and high-throughput applications.

CBOR offers significant efficiency over JSON. Its binary encoding means faster parsing, smaller messages, and lower latency. It also supports an extended data model, including binary byte strings, which simplifies integration with existing JSON systems while enabling more advanced use cases.

Crucially, CBOR is extensible. New tags can be introduced without breaking older implementations, reducing long-term maintenance burdens. Protocols can evolve without costly version negotiations, and features can be rolled out faster and more safely.

While JSON emphasizes human readability, CBOR prioritizes performance, resource efficiency, and future-proof extensibility. In many systems—especially where network bandwidth, power, or processing time is scarce—CBOR isn’t just better. It’s necessary.

Part II: The Guarantee – Ensuring Verifiable Consistency with dCBOR

Efficiency alone isn’t enough for systems that rely on trust. Distributed ledgers, digital signatures, and content-addressed data all depend on one principle: the exact same data must always serialize to the exact same bytes.

CBOR provides guidelines for deterministic encoding—but leaves enough leeway to produce different byte sequences for the same logical structure. This is a problem for cryptography and consensus protocols, where even one byte of variance invalidates signatures or breaks agreement.

Deterministic CBOR (dCBOR) solves this. It is a strict profile of CBOR that eliminates ambiguity. It defines canonical rules for numeric encoding (e.g., converting 2.0 to 2, collapsing NaNs into a single representation), mandates lexicographic sorting of map keys by their encoded form, and forbids features like indefinite-length values that undermine determinism.

dCBOR isn’t a fork. It’s fully valid CBOR, but with stricter rules and mandatory validation. Encoders must produce canonical output. Decoders must reject anything that isn’t. This guards against inconsistency, manipulation, and protocol divergence—critical when trust is on the line.

For engineering teams building security-critical systems, dCBOR provides the byte-level reliability needed to anchor hashing, signing, auditing, and cross-platform integrity checks. It replaces ambiguity with assurance.

Part III: The Breakthrough – Secure, Structured, Privacy-Enhancing Data with Gordian Envelope

Once determinism is in place, it becomes possible to build something far more powerful: a structured, secure, and privacy-aware data format that can adapt to the demands of modern identity, privacy, and trust. That’s what Gordian Envelope delivers.

Built atop dCBOR, Gordian Envelope enables deeply structured data with built-in cryptographic integrity. It’s a semantic format—often modeled as subject-predicate-object triples—wrapped in a Merkle-like digest tree. This structure guarantees that every element, not just the whole, can be independently verified.

What sets Envelope apart is holder-controlled elision: the ability to redact or hide portions of the data without invalidating the overall structure or breaking attached signatures. This enables minimal disclosure, progressive trust, and user-controlled privacy—foundational principles for self-sovereign identity and modern data sovereignty.

Envelope also supports advanced layering: encryption, compression, nested signatures, and semantic annotations. These features don’t just bolt on—they integrate directly with the underlying structure, allowing powerful capabilities like verifiable redaction, authenticated subtrees, and selective disclosure proofs.

Its use cases are broad and high-impact: verifiable credentials, digital wallets, secure logs, privacy-preserving data sharing, and cryptographic asset management. More fundamentally, Envelope shifts control from institutions to individuals. Data no longer belongs solely to the issuer. It’s held, managed, and selectively revealed by the user.

This is the architecture of trust, built from the bottom up: efficient encoding, deterministic consistency, and cryptographic structure, all aligned with privacy and user agency.

CBOR is the foundation. dCBOR is the guarantee. Gordian Envelope is the future.

From XML to JSON to CBOR

A Lingua Franca for Data?

In modern computing, data exchange is foundational to everything from web browsing to microservices and IoT devices. The ability for different systems to represent, share, and interpret structured information drives our digital world. Yet no single perfect format has emerged to meet all needs. Instead, we've seen an evolution of data interchange formats, each addressing the specific challenges and technical requirements of its time.

This narrative traces three pivotal data formats: Extensible Markup Language (XML), JavaScript Object Notation (JSON), and Concise Binary Object Representation (CBOR). We explore their origins and motivations, examine their core design principles and inherent trade-offs, and follow their adoption trajectories within the evolving digital landscape. The journey begins with XML's focus on robust document structure, shifts to JSON's web-centric simplicity and performance, and advances to CBOR's binary efficiency for constrained devices. Understanding this evolution reveals not just technical specifications, but the underlying pressures driving innovation in data interchange formats.

The Age of Structure: XML's Rise from Publishing Roots

Modern data interchange formats trace back not to the web, but to challenges in electronic publishing decades earlier. SGML provided the complex foundation that XML would later refine and adapt for the internet age.

The SGML Inheritance: Laying the Foundation

In the 1960s-70s, IBM researchers Charles Goldfarb, Ed Mosher, and Ray Lorie created Generalized Markup Language (GML) to overcome proprietary typesetting limitations. Their approach prioritized content structure over presentation. GML later evolved into Standard Generalized Markup Language (SGML), formalized as ISO 8879 in 1986.

SGML innovated through its meta-language approach, providing rules for creating custom markup languages. It allowed developers to define specific vocabularies (tag sets) and grammars (Document Type Definitions or DTDs) for different document types, creating machine-readable documents with exceptional longevity independent of processing technologies.

SGML gained traction in sectors managing complex documentation: government, military (CALS DTD), aerospace, legal publishing, and heavy industry. However, its 150+ page specification with numerous special cases complicated parser implementation, limiting broader adoption.

The web's emergence proved pivotal for markup languages. Tim Berners-Lee selected SGML as HTML's foundation due to its text-based, flexible, non-proprietary nature. Dan Connolly created the first HTML DTD in 1992. While HTML became ubiquitous, it drifted toward presentation over structure, with proliferating browser-specific extensions. SGML remained too complex for widespread web use, creating demand for a format that could bring SGML's structural capabilities to the internet in a more accessible form.

W3C and the Birth of XML: Taming SGML for the Web

By the mid-1990s, the web needed more structured data exchange beyond HTML's presentational focus. In 1996, the W3C established an XML Working Group, chaired by Jon Bosak of Sun Microsystems, to create a simplified SGML subset suitable for internet use while maintaining extensibility and structure.

The W3C XML Working Group developed XML with clear design goals, formalized in the XML 1 Specification (W3C Recommendation, February 1998):

Internet Usability: Straightforward use over the internet
Broad Applicability: Support for diverse applications beyond browsers
SGML Compatibility: XML documents should be conforming SGML documents
Ease of Processing: Simple program development for XML processing
Minimal Optional Features: Few or no optional features
Human Readability: Legible and clear documents
Rapid Design: Quick design process
Formal and Concise Design: Formal specification amenable to standard parsing
Ease of Creation: Simple document creation with basic tools
Terseness is Minimally Important: Conciseness was not prioritized over clarity

SGML compatibility was strategically crucial. By defining XML as a valid SGML subset, existing SGML parsers and tools could immediately process XML documents when the standard released in 1998. This lowered adoption barriers for organizations already using SGML and provided an instant software ecosystem. The constraint also helped the working group achieve rapid development by limiting design choices, demonstrating an effective strategy for launching the new standard.

Designing XML: Tags, Attributes, Namespaces, and Schemas

XML's structure uses nested elements marked by tags. An element consists of a start tag (<customer>), an end tag (</customer>), and content between them, which can be text or other nested elements. Start tags can contain attributes for metadata (<address type="billing">). Empty elements use syntax like <br/> or <br></br>. This hierarchical structure makes data organization explicit and human-readable.

As XML usage expanded, combining elements from different vocabularies created naming conflicts. The "Namespaces in XML" Recommendation (January 1999) addressed this by qualifying elements with unique IRIs, typically URIs. This uses the xmlns attribute, often with a prefix (xmlns:addr="http://www.example.com/addresses"), creating uniquely identified elements (<addr:street>). Default namespaces can be declared (xmlns="URI") for un-prefixed elements, but don't apply to attributes. Though URIs ensure uniqueness, they needn't point to actual online resources.

XML documents are validated using schema languages. XML initially used Document Type Definitions (DTDs) from SGML, which define allowed elements, attributes, and nesting rules. To overcome DTD limitations (non-XML syntax, poor type support), the W3C developed XML Schema Definition (XSD), standardized in 2001. XSD offers powerful structure definition, rich data typing, and rules for cardinality and uniqueness. XSD schemas are themselves written in XML.

XML's structure enabled supporting technologies: XPath for node selection, XSL Transformations (XSLT) for document transformation, and APIs like Document Object Model (DOM) for in-memory representation or Simple API for XML (SAX) for event-based streaming.

While XML effectively modeled complex data structures with extensibility and validation, its power introduced complexity. Creating robust XSD schemas was challenging, leading some to prefer simpler alternatives like RELAX NG or Schematron. Namespaces solved naming collisions but complicated both document authoring and parser development. XML's flexibility allowed multiple valid representations of the same data, potentially hindering interoperability without strict conventions. This inherent complexity, combined with verbosity, eventually drove demand for simpler formats, especially where ease of use and performance outweighed validation and expressiveness. The tension between richness and simplicity significantly influenced subsequent data format evolution.

XML's Reign and Ripples: Adoption and Impact

Following its 1998 standardization, XML quickly became dominant across computing domains throughout the early 2000s, offering a standard, platform-independent approach for structured data exchange.

XML formed the foundation of Web Services through SOAP (Simple Object Access Protocol), an XML-based messaging framework operating over HTTP. Supporting technologies like WSDL (Web Services Description Language) and UDDI (Universal Description, Discovery and Integration) completed the "WS-*" stack for enterprise integration.

Configuration Files widely adopted XML due to its structure and readability. Examples include Java's Log4j, Microsoft.NET configurations (web.config, app.config), Apache Ant build scripts, and numerous system parameters.

In Document Formats and Publishing, XML fulfilled its original promise by powering XHTML, RSS and Atom feeds, KML geographic data, and specialized formats like DocBook. Its content-presentation separation proved valuable for multi-channel publishing and content management.

As a general-purpose Data Interchange format, XML facilitated cross-system communication while avoiding vendor lock-in and supporting long-term data preservation.

This widespread adoption fostered a rich ecosystem of XML parsers, editors, validation tools, transformation engines (XSLT), data binding utilities, and dedicated conferences, building a strong technical community.

The Seeds of Change: XML's Verbosity Challenge

Despite its success, XML carried the seeds of its own partial decline. A key design principle—"Terseness in XML markup is of minimal importance"—prioritized clarity over compactness, requiring explicit start and end tags for every element.

While enhancing readability, this structure created inherent verbosity. Simple data structures required significantly more characters in XML than in more compact formats. For example, {"name": "Alice"} in JSON versus <name>Alice</name> in XML added substantial overhead, especially for large datasets with many small elements.

This verbosity became problematic as the web evolved. The rise of AJAX in the mid-2000s emphasized frequent, small data exchanges between browsers and servers for dynamic interfaces. In this context, minimizing bandwidth usage and parsing time became critical. XML's larger payloads and complex parsing requirements created performance bottlenecks.

The XML community recognized these efficiency concerns, leading to initiatives like the W3C's Efficient XML Interchange (EXI) Working Group, which developed a standardized binary XML format. While EXI offered significant compaction, it highlighted the challenge of retrofitting efficiency onto XML's tag-oriented foundation without adding complexity.

The decision to deprioritize terseness, while distinguishing XML from SGML, had unintended consequences. As the web shifted toward dynamic applications prioritizing speed and efficiency, XML's verbose structure became a liability. This created an opportunity for a format that would optimize for precisely what XML had considered minimal: conciseness and ease of parsing within web browsers and JavaScript.

The Quest for Simplicity: JSON's Emergence in the Web 2.0 Era

As XML's verbosity and complexity became problematic in web development, particularly with AJAX's rise, a simpler alternative emerged directly from JavaScript.

JavaScript's Offspring: Douglas Crockford and the "Discovery" of JSON

JSON (JavaScript Object Notation) originated with Douglas Crockford, an American programmer known for his JavaScript work. In 2001, Crockford and colleagues at State Software needed a lightweight format for data exchange between Java servers and JavaScript browsers without plugins like Flash or Java applets.

Crockford realized JavaScript's object literal syntax (e.g., { key: value }) could serve this purpose. Data could be sent from servers embedded in JavaScript snippets for browsers to parse, initially using the eval() function. Crockford describes this as a "discovery" rather than invention, noting similar techniques at Netscape as early as 1996.

The initial implementation sent HTML documents containing <script> tags that called JavaScript functions, passing data as object literal arguments. One refinement: all keys required double quotes to avoid conflicts with JavaScript reserved words.

After naming conflicts with JSpeech Markup Language, they settled on "JavaScript Object Notation" or JSON. In 2002, Crockford acquired json.org and published the grammar and reference parser. Developers quickly submitted parsers for various languages, demonstrating JSON's broader potential.

Motivation: A Lightweight Alternative for a Faster Web

JSON addressed the need for a simpler, lighter data interchange format than XML. Crockford aimed for minimalism, believing "the less we have to agree on in order to inter-operate, the more likely we're going to be able to inter-operate well." He wanted a standard simple enough to fit on a business card.

When challenged that JSON was merely reinventing XML, Crockford famously replied, "The good thing about reinventing the wheel is that you can get a round one."

JSON arrived at the perfect time. AJAX techniques created demand for optimized, small data transfers between servers and browsers. Though "AJAX" meant "Asynchronous JavaScript and XML," JSON proved better for many cases. Its syntax maps directly to JavaScript objects and arrays, making client-side parsing trivial. Its lightweight nature reduced bandwidth usage and improved web application responsiveness.

Despite originating from JavaScript, JSON's success wasn't confined to browsers. Its simplicity made it remarkably easy to implement across programming languages. The core structures—objects (maps/dictionaries), arrays (lists), strings, numbers, booleans, and null—are fundamental to most modern languages. This ease of cross-language implementation drove its rapid adoption, transforming it from a JavaScript-specific solution into a de facto standard for web APIs and configuration files industry-wide. Simplicity became a powerful catalyst for language independence and widespread adoption.

Designing JSON: Key-Value Pairs, Arrays, and Minimal Types

JSON's syntax is deliberately minimal, built on just a few structural elements from JavaScript:

Objects: Unordered key-value pairs in curly braces {}. Keys must be double-quoted strings, followed by a colon :, with comma-separated pairs. Example: { "name": "Alice", "age": 30 }.
Arrays: Ordered value sequences in square brackets [], separated by commas. Example: [ "apple", "banana", "cherry" ].

Values can only be:

String: Double-quoted Unicode characters
Number: Numeric values (without type distinction)
Boolean: true or false (lowercase)
Null: null (lowercase)
Object: Nested JSON object
Array: Nested JSON array

This text-based structure is human-readable and directly maps to common programming data structures, making it developer-friendly.

JSON intentionally omits XML features like comments, namespaces, and attributes. Crockford deliberately excluded comments, noting they were often misused in formats like XML for parsing directives or metadata, potentially breaking interoperability. The recommended approach is to include commentary as regular data with conventional keys like "_comment".

Native support arrived in ECMAScript 5 (2009) with JSON.parse() and JSON.stringify() methods, providing safe alternatives to eval() for parsing. The stringify method supports optional replacer functions for output control, and objects can implement toJSON() to customize serialization.

JSON vs. XML: A Paradigm Shift

JSON and XML reflect fundamentally different design philosophies:

Format Type: XML is a markup language for structured documents; JSON is purely a data interchange format derived from JavaScript object literals.
Structure: XML uses hierarchical tags with elements, attributes, and text. JSON uses key-value pairs and ordered arrays.
Verbosity: XML's tag structure creates inherent verbosity. JSON's minimal syntax produces more compact representations, often 30-40% smaller.
Readability: Both are text-based, but JSON's simpler structure is typically easier to scan and comprehend.
Parsing: JSON parsing is simpler and faster, with native support in JavaScript. XML requires more complex parsers to handle tags, attributes, namespaces, and validation.
Features: XML includes comments, namespaces, attributes, and robust schema languages (DTD, XSD). JSON is intentionally minimal, with extensions like JSON Schema and JSON-LD handled separately.
Data Types: JSON supports basic types (string, number, boolean, null, object, array). XML lacks built-in types without schemas, but XSD enables rich typing.

This comparison reveals the shift: XML prioritized structure, extensibility, and validation for complex documents, while JSON emphasized simplicity, usability, and performance for web APIs.

Rapid Ascent: JSON Becomes the Language of APIs

JSON's alignment with web technologies drove its widespread adoption during the "Web 2.0" and AJAX era. It quickly dominated RESTful web APIs, with surveys showing over 85% of APIs using JSON as their default format.

Its utility extended to configuration files and data storage, particularly in NoSQL databases like MongoDB (using BSON) and browser storage via localStorage.

JSON's adoption grew organically through developer preference and the ease of creating parsers across languages, as seen in implementations at json.org. Formal standardization followed with ECMA-404 and IETF RFC 8259.

A key factor in JSON's success is its remarkable stability. As Crockford emphasized, JSON is "finished"—it has no version number, and its core specification remains unchanged since inception. This stability contrasts with technologies requiring frequent updates, avoiding the fragmentation and compatibility issues CBOR later explicitly designed against. By providing a simple, reliable foundation, JSON allowed a rich ecosystem to flourish around it without requiring constant adaptation to core changes, proving stability to be a decisive feature for infrastructure technologies.

The Need for Speed (and Size): Enter CBOR

While JSON offered a much-needed simplification and performance boost over XML for web APIs, its text-based nature still presented limitations in certain demanding environments. The relentless push for greater efficiency, particularly driven by the rise of the Internet of Things (IoT), paved the way for a format that combined JSON's data model with the compactness and speed of binary encoding: CBOR.

Beyond Text: The Motivation for Binary

Text-based formats like JSON have inherent inefficiencies compared to binary representations:

Parsing Speed: Text parsing requires interpreting character sequences, computationally costlier than decoding structured binary data. Binary formats map more directly to machine data types.
Message Size: Numbers, booleans, and repeated keys consume more bytes as text than with optimized binary encodings. Comparisons consistently show CBOR significantly reducing data size versus JSON.
Binary Data Handling: JSON lacks a native binary data type (needed for images, cryptographic keys, sensor readings). Such data requires Base64 encoding, adding complexity and increasing size by ~33%.

These limitations become critical in constrained environments characteristic of IoT:

Limited Resources: Minimal CPU, memory, and battery power
Constrained Networks: Low bandwidth, high latency connections (LoRaWAN, NB-IoT, Bluetooth LE)

In these scenarios, minimizing message size conserves bandwidth and energy, while reducing processing overhead extends battery life. CBOR was specifically designed to provide JSON's flexible data model in a compact, efficiently processable binary form optimized for resource-constrained environments.

IETF Standardization: Building on the JSON Model

CBOR was developed within the IETF specifically for constrained environments, with Carsten Bormann and Paul Hoffman as key contributors.

CBOR intentionally builds upon the JSON data model, supporting equivalent types (numbers, strings, arrays, maps, booleans, and null) while adding native support for binary byte strings to address a key JSON limitation.

Initially standardized in RFC 7049 (2013), CBOR was updated in RFC 8949 (2020) as Internet Standard 94 (STD 94). Importantly, RFC 8949 maintains full wire-format compatibility with its predecessor.

The standard articulates clear design goals:

Compact Code Size: Implementable with minimal code footprint for memory-constrained devices
Reasonable Message Size: Significantly smaller than JSON without complex compression
Extensibility without Version Negotiation: Future extensions remain compatible with existing decoders
Schema-Free Decoding: Self-describing data items, parsable without predefined schemas
Broad Applicability: Suitable for both constrained nodes and high-volume applications
JSON Compatibility: Support for JSON data types with reasonable conversion

CBOR effectively synthesizes lessons from both JSON and XML. It adopts JSON's familiar data model while optimizing for constrained environments through binary encoding and size efficiency. For extensibility, CBOR provides semantic tags (registered via IANA) that allow new data types to be incorporated without breaking backward compatibility—combining JSON's simplicity with XML's extensibility approach.

Where CBOR Shines: Constrained Environments

CBOR has established itself primarily in Internet of Things (IoT) and constrained environments where its compact representation of complex data structures provides crucial efficiency gains.

Key IETF protocols leveraging CBOR include:

CoAP (Constrained Application Protocol): A lightweight HTTP alternative for constrained networks using CBOR payloads. Mappings exist for protocols like IEC 61850 (smart grids), showing performance benefits over HTTP/XML or WS-SOAP.
COSE (CBOR Object Signing and Encryption): Defines cryptographic operations using CBOR, building on JOSE concepts but with binary efficiency. Fundamental to IoT security and used in FIDO WebAuthn passkey authentication.
ACE (Authentication and Authorization for Constrained Environments): Security framework for IoT resource access using CBOR and COSE.
Device Management: Protocols like CORECONF apply NETCONF/YANG concepts to constrained devices via CBOR.
Certificate Representation: C509 creates smaller X.509 certificates than traditional DER/ASN.1 encoding.

Beyond IETF standards, formats like CBOR-LD and CBL compress semantic web data for IoT applications.

Widespread implementation support across languages (C, C++, Go, Rust, Python, Java, Swift, etc.) facilitates CBOR integration across diverse systems.

While CBOR adoption grows within constrained systems and security protocols, it remains younger than XML and less dominant than JSON in general web APIs. Its binary nature sacrifices human readability for efficiency, making it less suitable where direct inspection and manual editing are priorities.

The Trajectory: CBOR's Place and Future

CBOR's evolution optimizes for binary efficiency while maintaining JSON's flexible data model. Its growth centers on environments where these optimizations matter most: IoT, M2M communication, and security protocols.

As billions more IoT devices deploy, demand for efficient communication will increase, strengthening CBOR's position. Its integration into security mechanisms like COSE, particularly with passwordless authentication (WebAuthn/Passkeys), drives further adoption. CBOR's semantic tags provide extensibility without breaking backward compatibility.

In Part II, we'll explore another crucial CBOR advantage: deterministic encoding. This property ensures consistent serialization, essential for cryptographic applications including signatures, hashing, secure messaging, and distributed consensus protocols.

Despite these strengths, CBOR won't likely displace JSON in web APIs and general data interchange, where human readability and JavaScript integration remain paramount advantages.

Conclusion: An Evolving Landscape of Data Representation

The XML-JSON-CBOR evolution demonstrates technology's pattern of moving from feature-rich solutions toward specialized formats for specific use cases. SGML offered comprehensive features but complexity; XML simplified it for web documents; JSON further streamlined for web APIs; CBOR then optimized for binary efficiency in constrained environments.

The future likely holds coexistence rather than a single dominant format, with selection driven by application requirements. Specialized formats like CBOR achieve superior performance within their niches through deliberate trade-offs, such as exchanging human readability for size and processing speed.

Comparative Overview of XML, JSON, and CBOR

Feature	XML	JSON	CBOR
Originator/Body	W3C (Jon Bosak et al.)	Douglas Crockford; later ECMA, IETF	IETF (Carsten Bormann, Paul Hoffman)
Primary Goal	Structured Docs, Web Data Exchange	Simple/Lightweight Web APIs, Data Interchange	Binary Efficiency, Compactness, Constrained Environments (IoT)
Format Type	Markup Language (Text)	Data Format (Text)	Data Format (Binary)
Base Model	SGML Subset	JavaScript Object Literal Subset	JSON Data Model Extension
Structure	Tag-based Tree (Elements, Attributes)	Key-Value Pairs (Objects) & Ordered Values (Arrays)	Key-Value Pairs (Maps) & Ordered Values (Arrays)
Schema/Validation	DTD, XSD (Built-in, Strong)	JSON Schema (Separate Spec, Optional)	CDDL (Separate Spec, Optional)
Human Readability	High (Verbose)	High (Concise)	Low (Binary)
Size/Efficiency	Verbose, Less Efficient Parsing	Lightweight, Efficient Parsing	Very Compact, Highly Efficient Parsing
Extensibility	Namespaces, Schema	Via conventions (e.g., JSON-LD), JSON Schema	Semantic Tags (IANA Registry)
Native Binary Support	No (Requires Encoding, e.g., Base64)	No (Requires Encoding, e.g., Base64)	Yes (Byte String Type)
Primary Use Cases	Documents (HTML, DocBook), SOAP, Config Files	REST APIs, Config Files, NoSQL Data	IoT Protocols (CoAP), Security (COSE), Constrained Devices

References

W3C Recommendation: Extensible Markup Language (XML) 1.0 (Fifth Edition)
- The foundational W3C specification defining XML.
IETF RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format
- The current IETF standard defining JSON, essential for understanding its formal specification.
IETF RFC 8949: Concise Binary Object Representation (CBOR)
- The IETF standard defining CBOR, its data model, binary encoding, and extensibility.
Walsh, N. "A Technical Introduction to XML"
- Clearly outlines the original design goals and motivations behind XML's creation.
"The Rise and Rise of JSON" – Two-Bit History
- Provides an excellent narrative on JSON's origins, motivations, and the context of its emergence relative to XML.
CBOR.io (Official CBOR Website)
- Authoritative overview of CBOR, its rationale, features, and links to specifications and implementations.
JSON.org
- The original website by Douglas Crockford where JSON was first formally described and popularized.
AWS: "JSON vs XML – Difference Between Data Representations"
- A representative comparison highlighting the practical differences and trade-offs between JSON and XML, explaining JSON's rise in web APIs.
Corbado Glossary: "What is CBOR?"
- A clear explanation of CBOR's purpose, benefits (efficiency, compactness), relationship to JSON, and relevance in the IoT context.
DuCharme, B. "A brief, opinionated history of XML"
- Offers valuable historical context on XML's roots in SGML and its early development and adoption.

CBOR vs. The Other Guys

The Binary Serialization Landscape

In the previous chapter, we traced data interchange formats from verbose XML to simpler JSON, highlighting the quest for better ways to represent and exchange data. JSON's simplicity and performance advantages over XML made it dominant for web APIs. However, its text-based nature limits efficiency in size and processing speed. This led to CBOR (Concise Binary Object Representation), which retains JSON's familiar data model while leveraging binary encoding for compactness and performance—crucial for constrained environments like the Internet of Things (IoT).

CBOR exists within a broader landscape of binary serialization formats, each with specific goals and trade-offs. Understanding how CBOR compares to alternatives helps appreciate its strengths and make informed format decisions. This chapter surveys several prominent binary formats:

BSON (Binary JSON): Developed by MongoDB for internal storage and wire format, extending JSON with database-centric types and optimizing for traversability.
Protocol Buffers (Protobuf): Google's high-performance, schema-driven format designed for efficient Remote Procedure Calls and data archival.
MessagePack: A fast, compact binary alternative to JSON, used for network communication and caching.
Avro: An Apache project emphasizing robust schema evolution, common in big data ecosystems like Hadoop and Kafka.

We'll compare these formats based on origins, data models, encoding strategies, schema approaches, performance characteristics, extensibility mechanisms, and typical use cases.

A fundamental distinction exists between schema-optional formats (CBOR, BSON, MessagePack) and schema-driven formats (Protocol Buffers, Avro). Schema-optional formats embed type information with the data, allowing parsing without prior structural knowledge—like JSON. This offers flexibility but introduces overhead and may require runtime validation. Schema-driven formats rely on external schemas known by both sender and receiver, potentially enabling more compact encodings (omitting field names/types) and compile-time validation, but requiring schema management and reducing data self-description. This core difference often reflects each format's origin—whether designed for flexible document storage like BSON or for high-performance, predefined message structures like Protobuf.

BSON: Binary JSON Tailored for Databases

BSON emerged directly from MongoDB's needs. While MongoDB embraced JSON's flexible document model, raw JSON proved suboptimal for database operations due to its limited type system (lacking dates and binary data) and inefficiencies when parsing text for queries and indexing.

MongoDB created BSON to address these limitations—providing a binary JSON representation optimized for storage efficiency, rapid traversal, and enhanced type support while preserving schema flexibility. BSON serves as MongoDB's native format for both storage and network transfer.

Design and Encoding

BSON documents serialize as binary data with explicit type and length information. Each document begins with a 4-byte total size, followed by a sequence of elements, ending with a null byte. Each element contains a one-byte type code, null-terminated field name, and type-specific value encoding. The inclusion of length prefixes enables MongoDB to quickly traverse documents and access specific fields without parsing entire structures.

BSON extends JSON's data model with several database-essential types:

ObjectId: A 12-byte unique identifier (timestamp + machine ID + process ID + counter), commonly used as primary key.
Date: 64-bit integer representing milliseconds since Unix epoch.
Binary Data (BinData): Direct embedding of byte arrays with subtype indicators, avoiding Base64 encoding.
Timestamp: Special 64-bit type (seconds since epoch + ordinal) for MongoDB replication logs.
Additional Numeric Types: 32-bit integers (int32), 64-bit integers (int64), 64-bit floats (double), and 128-bit high-precision decimals (Decimal128) for financial applications.
Deprecated Types: Including Undefined (generally discouraged).

A notable design choice is BSON's array encoding—represented as standard BSON documents with string keys matching array indices ("0", "1", "2"). While simplifying internal representation (everything is a document), this adds overhead compared to more efficient array encodings.

BSON prioritizes traversability and in-place updates. Length prefixes enable field skipping during reads, while fixed-size numeric encodings simplify value modification without rewriting entire documents.

Pros and Cons

BSON's primary strengths derive from its MongoDB integration. It enables faster document traversal than parsing JSON text, with richer data types (dates, binary data, ObjectIds, high-precision decimals) essential for database operations. It maintains JSON's schema flexibility while allowing MongoDB to build indexes on document fields for efficient querying.

However, BSON has notable limitations. Type and length prefixes, along with verbose array encoding, often make BSON documents larger than equivalent JSON, particularly for small documents. It's generally less space-efficient than MessagePack or Protobuf. Like most binary formats, it lacks human readability. Its extended types prevent lossless conversion to standard JSON, limiting interoperability. BSON remains largely confined to the MongoDB ecosystem and lacks built-in RPC mechanisms.

Comparison vs. CBOR

Both CBOR and BSON are schema-optional binary formats extending the JSON data model, but with different design priorities. BSON optimizes for database storage and traversal, using length prefixes and specialized types like ObjectId and Decimal128, sometimes sacrificing compactness. CBOR prioritizes conciseness and implementation simplicity for network transmission in constrained environments, typically achieving smaller message sizes. While BSON offers database-centric types, CBOR employs a more general type system extended through standardized tags (for dates, bignums, etc.). BSON remains closely tied to MongoDB, whereas CBOR exists as an IETF standard (RFC 8949) used across various internet protocols.

BSON's design clearly reflects its purpose as MongoDB's internal format. The need for rapid field access drove the inclusion of length prefixes, while database requirements dictated specialized types like Date, BinData, and ObjectId. These adaptations make BSON more than just binary JSON—it's an extended format tailored for database operations. This specialization benefits MongoDB but creates trade-offs in size and general interoperability compared to formats designed for broader use cases. The term "Binary JSON" can therefore be somewhat misleading, as its extended types prevent guaranteed lossless round-tripping with standard JSON.

Protocol Buffers: Schema-Driven Performance

Protocol Buffers (Protobuf) originated at Google as a mechanism for serializing structured data, designed to be smaller, faster, and simpler than XML. Initially created for internal RPC and data storage, Google open-sourced it in 2008.

Design and Encoding

Protobuf takes a fundamentally schema-driven approach. Data structures ("messages") must be defined in .proto files using Protobuf's Interface Definition Language (IDL).

The workflow centers on the protoc compiler, which processes .proto files to generate source code in various languages (C++, Java, Python, Go, C#, etc.). This generated code provides message classes with methods for field access, serialization, and parsing.

The binary format prioritizes compactness and speed. Instead of field names, each field uses a unique field number (tag) paired with a wire type indicating the encoding method. Wire types specify how much data to read (e.g., VARINT for variable-length integers, LEN for length-delimited data like strings).

Encoding techniques include Varints (using fewer bytes for smaller numbers) and ZigZag encoding (for efficient negative number representation). The data model supports numerous scalar types (int32, uint64, bool, string, etc.), nested messages, repeated fields (arrays), map fields (key-value pairs), and oneof (mutually exclusive fields).

Protobuf handles schema evolution well. As long as field numbers remain stable, developers can typically add or remove optional/repeated fields without breaking compatibility. Parsers skip unknown fields, enabling forward compatibility. However, changing field types is generally unsafe, and using required fields (discouraged in newer versions) limits evolution flexibility.

Pros and Cons

Protobuf's advantages derive from its schema-driven approach, delivering high performance with compact message sizes by replacing field names with numeric tags. The schema and generated code provide compile-time type safety and simplified data access. Its evolution capabilities allow systems to change without breaking compatibility. Language-neutral code generation suits polyglot environments.

However, these schema requirements create notable limitations. Protobuf data isn't self-describing—the .proto definition is essential for interpreting the binary data. The format isn't human-readable. The workflow requires compiling .proto files and managing generated code, reducing flexibility for dynamic data structures. It can be suboptimal for very large messages (over a few megabytes) or multi-dimensional numeric arrays common in scientific computing. While widely adopted, Protobuf lacks formal standardization by bodies like IETF or W3C.

Comparison vs. CBOR

The fundamental difference is their schema approach. Protobuf mandates schemas (.proto files) and compilation. CBOR is schema-optional with self-describing data containing embedded type indicators. While CBOR supports validation with schema languages like CDDL, schemas aren't required for basic parsing.

This creates distinctions in self-description (CBOR yes, Protobuf no), encoding strategy (CBOR uses type indicators with string map keys; Protobuf uses numeric field tags and wire types), flexibility (CBOR higher, Protobuf more rigid but safer), and extensibility (CBOR uses IANA-registered tags, Protobuf uses .proto-defined options/extensions).

Performance comparisons are nuanced. Protobuf excels in speed and size, particularly for RPC with pre-shared schemas. CBOR also prioritizes efficiency, especially minimizing codec size for constrained devices. Results depend heavily on data, implementation quality, and use case. For standardization, CBOR is an IETF standard (RFC 8949), while Protobuf remains a Google-driven de facto standard.

Protobuf's philosophy achieves performance, compactness, and type safety through mandatory schemas and code generation—highly effective in controlled environments where schema management is feasible. This tight coupling yields efficiency gains but sacrifices the flexibility and self-description offered by formats like JSON or CBOR. The trade-off is clear: Protobuf prioritizes performance and structural rigidity, whereas CBOR favors flexibility and self-description while maintaining binary efficiency.

MessagePack: The Compact JSON Alternative

MessagePack emerged around 2008-2009, created by Sadayuki Furuhashi. Its goal was to provide a more efficient binary serialization format than JSON – "like JSON, but fast and small." It addresses scenarios where JSON's verbosity creates bottlenecks, such as network communication (RPC, message queues) and data caching (e.g., in memcached).

Design and Encoding

MessagePack defines a binary format mirroring JSON's fundamental data types (null, boolean, integer, floating-point, string, array, map) while enabling transparent conversion between formats.

Beyond JSON types, MessagePack adds:

bin (Binary Data): Efficient storage for raw byte sequences.
ext (Extension Type): Mechanism for application-specific types, consisting of an integer type code (tag) and a byte string payload.

The encoding prioritizes compactness. Small integers can be encoded in a single byte. Short strings need only a length prefix followed by UTF-8 bytes. Arrays and maps include their element count as a prefix. Unlike JSON, MessagePack allows any data type as map keys, not just strings. Data types and lengths are indicated by initial encoded bytes.

Pros and Cons

MessagePack delivers greater efficiency than JSON through smaller serialized output, optimized type encodings, potentially faster network transmission, and reduced storage requirements. Serialization and deserialization can outperform standard JSON libraries, though actual performance depends on implementations and data characteristics. It supports native binary data with an extension mechanism for custom types and offers implementations across numerous programming languages.

However, MessagePack sacrifices human-readability, complicating debugging. A significant limitation affects streaming: since arrays and maps require upfront element counts, streaming serialization becomes difficult when total counts aren't known in advance, potentially requiring complete in-memory buffering. While often faster than JSON, the margin varies with implementation quality and optimization. Compared to CBOR, MessagePack lacks formal standardization through bodies like IETF (its specification resides on GitHub), and its ext mechanism provides less structure than CBOR's IANA-registered tags.

Comparison vs. CBOR

CBOR and MessagePack both aim to be efficient, schema-less binary alternatives to JSON with native binary data support, but differ in key aspects:

Encoding Details: CBOR supports indefinite-length arrays and maps (beneficial for streaming when total size is unknown), while MessagePack typically requires fixed collection counts.
Standardization: CBOR is a formal IETF standard (RFC 8949) developed through consensus, whereas MessagePack uses a community-maintained specification. Many view CBOR as a more rigorous standard inspired by MessagePack.
Extensibility: CBOR employs a standardized semantic tag system with an IANA registry for extended types (dates, URIs, bignums). MessagePack uses a simpler but less structured ext type where applications define tag meanings.
Performance and Size: Comparisons vary by implementation and data. CBOR prioritizes small codec size (for constrained devices) alongside message compactness, while MessagePack focuses primarily on message size and speed.
Conceptual Simplicity: MessagePack's shorter specification appears simpler, but CBOR's unification of types under its major type/additional info system and tag mechanism offers conceptual clarity.

MessagePack pioneered the "binary JSON" concept to improve network performance, optimizing for complete, known data structures rather than streaming scenarios. Its widespread adoption demonstrates market demand. However, CBOR's formal standardization, streaming support through indefinite-length items, and standardized tag registry target broader applications, particularly for constrained devices and internet protocols.

Avro: Mastering Schema Evolution

Apache Avro emerged from Apache Hadoop around 2009, designed specifically to address schema evolution challenges in large-scale data processing systems. In environments like Hadoop or Kafka data pipelines, where producers and consumers evolve independently, Avro enables seamless schema changes without breaking compatibility. It offers rich data structures and integrates easily with dynamic languages, without requiring code generation.

Design and Encoding

Avro is schema-based, with schemas typically defined in JSON (though an alternative C-like IDL is available). A fundamental aspect of Avro is that the schema used to write data is always required to read that data. The binary encoding contains no field names or type identifiers—just concatenated field values in schema-defined order. This creates compact data that depends entirely on the schema for interpretation. Writer schemas typically accompany the data in file headers or through schema registry services. Avro also supports JSON encoding for debugging purposes.

Avro includes primitive types (null, boolean, int, long, float, double, bytes, string) and complex types (record, enum, array, map, union, fixed). Records contain named fields, arrays hold sequences, maps store key-value pairs (string keys only), and unions allow values of several specified types—commonly used for optional fields by including null (e.g., ["null", "string"]).

Avro's strength lies in its well-defined schema evolution rules:

Fields can be added or removed only if they have a default value, which readers use when the field is missing.
Field renaming uses aliases in the reader's schema to recognize data written with old names.
Type changes are generally forbidden, with limited exceptions (e.g., int to long).
For enums, adding symbols is backward compatible; removing or renaming breaks compatibility.

When reading data with a different but compatible schema, Avro uses schema resolution—comparing field names (and aliases) and applying defaults to present data according to the reader's schema.

Pros and Cons

Avro's main advantage is sophisticated schema evolution handling, making it ideal for systems with frequent or independent schema changes. JSON-defined schemas are relatively easy to manage. The binary encoding is compact since it omits field names and tags. Avro integrates well with dynamic languages when schemas are available at runtime. It has strong adoption within the Apache ecosystem, particularly Hadoop, Spark, and Kafka.

The primary disadvantage is requiring the writer's schema during deserialization, introducing schema management complexity and often necessitating a schema registry. While compact, some benchmarks suggest Avro may be slower than Protobuf in certain scenarios. The binary format is not human-readable, and developers must carefully follow schema evolution rules to maintain compatibility.

Comparison vs. CBOR

Avro and CBOR represent fundamentally different schema philosophies. Avro requires schemas for reading and writing, with design centered on schema resolution. CBOR is schema-optional and self-describing; schemas (like CDDL) can validate but aren't needed for parsing.

This affects encoding: Avro omits field identifiers, relying on schema field order. CBOR includes type information and map keys, making it interpretable without external schemas.

Avro handles schema evolution explicitly through resolution rules, defaults, and aliases. CBOR's self-describing nature allows parsers to skip unknown data, but complex changes may require application-level logic or tag conventions. CBOR offers greater ad-hoc flexibility, while Avro enforces structure through schemas. Their ecosystems also differ—Avro dominates Big Data/Apache contexts, while CBOR prevails in IoT and IETF protocols.

Avro's design clearly optimizes for schema evolution in large-scale, long-lived data systems. By requiring the writer's schema at read time, it enables powerful resolution capabilities, allowing independent producer and consumer evolution. This contrasts with Protobuf's reliance on stable tag numbers and CBOR's schema-optional flexibility. The trade-off is explicit: Avro gains robust evolution and dynamic language integration, but requires schema management and produces data that's not self-contained.

Comparative Analysis: Choosing the Right Tool

Having examined several binary serialization formats, it's clear that each addresses specific needs in the data interchange landscape. BSON optimizes for MongoDB's database operations. Protocol Buffers achieves high performance and type safety for RPC through mandatory schemas. MessagePack provides a compact binary alternative to JSON for network communication. Avro specializes in managing schema evolution for data pipelines. CBOR offers a standardized, binary-efficient encoding of the JSON data model with emphasis on constrained environments and extensibility.

No single format suits all use cases. The optimal choice depends on specific application requirements. Key decision factors include schema requirements (mandatory vs. optional), performance needs vs. flexibility, schema evolution complexity, ecosystem compatibility, and specialized features like native data types or standardized extensibility mechanisms.

The following table summarizes the key distinctions between these formats:

Feature	CBOR	BSON	Protocol Buffers	MessagePack	Avro
Origin/Primary Goal	IETF / Constrained Env Efficiency	MongoDB / DB Storage & Traversal	Google / RPC Performance & Size	Furuhashi / JSON Alternative (Speed/Size)	Apache / Schema Evolution
Schema Handling	Optional	Optional	Required (`.proto` IDL)	Optional	Required (JSON or IDL)
Schema Location	N/A or Separate (e.g., CDDL)	N/A	Separate (`.proto` file)	N/A	With Data (Files) or Registry
Self-Describing?	Yes	Yes	No	Yes	No (Binary requires schema)
Encoding Basis	JSON Model + Tags	Extended JSON Model	Schema Tags/Numbers	JSON Model + `ext` type	Schema Field Order
Extensibility	IANA Tags	Custom Types (DB-centric)	Proto Extensions/Options	`ext` type	Schema Evolution Rules
Schema Evolution	Implicit (Tags/Skipping)	Implicit	Explicit (Tag Stability)	Implicit (`ext`/Skipping)	Explicit (Resolution, Defaults, Aliases)
Typical Size	Compact	Variable (can be large)	Very Compact	Compact	Compact (Binary)
Typical Speed	Fast (esp. constrained codec)	Fast Traversal (DB context)	Very Fast (RPC context)	Fast	Fast
Standardization	IETF RFC 8949	De facto (MongoDB)	De facto (Google)	Community Spec	Apache Project
Primary Use Cases	IoT, CoAP, COSE, Security, Deterministic Needs	MongoDB	RPC, Microservices, Internal Comms	Network Comms, Caching, RPC	Big Data (Hadoop, Kafka), Data Pipelines

Info

Size and speed comparisons are general tendencies; actual performance depends heavily on data structure, implementation quality, and specific workload.

This comparison highlights the complex trade-offs between formats. Protocol Buffers excels when validation, compactness, and RPC performance are critical in environments where schema management is feasible. Avro offers superior schema evolution capabilities for large-scale data pipelines, despite requiring schema distribution mechanisms. BSON serves specialized needs within the MongoDB ecosystem. MessagePack provides an efficient binary alternative to JSON for network communication, though with potential streaming limitations. CBOR stands out when IETF standardization, constrained device support, binary-efficient JSON encoding, standardized extensibility, or deterministic encoding are priorities.

Why Choose CBOR?

Based on the preceding comparisons, CBOR presents a unique combination of features that make it the preferred choice in several specific contexts:

JSON Data Model Fidelity in Binary: CBOR provides a direct binary encoding for the familiar JSON data model. This lowers the adoption barrier for developers already comfortable with JSON, unlike formats requiring different structural concepts or mandatory schemas.
Efficiency for Constrained Environments: CBOR was explicitly designed for the Internet of Things and constrained environments. This yields encoders and decoders with small code footprints, efficient processing, and significantly reduced message sizes compared to JSON—all critical for resource-limited devices.
IETF Standardization and Integration: As an IETF standard (RFC 8949), CBOR benefits from rigorous review and a stable specification. It integrates within the broader internet protocol ecosystem, serving as a payload format in CoAP and forming the basis of COSE (CBOR Object Signing and Encryption), crucial for security in constrained environments.
Standardized Extensibility via Tags: CBOR includes a well-defined mechanism for extending the basic data model using semantic tags. These IANA-registered tags provide standardized ways to represent richer semantics while allowing basic decoders to skip tags they don't understand. This offers a more structured approach than MessagePack's ext type.
Schema-Optional Flexibility: CBOR remains schema-optional like JSON. Data is self-describing, allowing for parsing without predefined schemas—advantageous for evolving systems or ad-hoc data exchange. When validation is needed, external schema languages like CDDL (RFC 8610) can be employed.
Native Binary Data Support: CBOR includes a native byte string type, allowing efficient representation of binary data without inefficient text encodings like Base64 required by JSON.
Deterministic Encoding Potential: RFC 8949 Section 4.2 explicitly defines rules for deterministic encoding, ensuring the same data structure always serializes to identical byte sequences—critical for cryptographic applications where reproducibility is essential.

While CBOR offers these advantages, it's not human-readable like JSON. In high-performance RPC scenarios with fixed schemas, optimized Protobuf implementations might offer better raw performance. Though its ecosystem is growing, particularly in IoT and security domains, it might not have the breadth of tooling found for JSON or Protobuf in every application area.

CBOR occupies a compelling position in the serialization landscape—a standardized, extensible, and efficient binary format built on the widely understood JSON data model. Its design for constrained environments, IETF protocol integration, and support for deterministic encoding make it well-suited for IoT, secure communication, and verifiable data structures, all without imposing the mandatory schemas found in Protocol Buffers or Avro.

CBOR as a Foundation for Blockchain Commons

Blockchain Commons' specifications, including dCBOR (Deterministic CBOR) and Gordian Envelope, build directly on CBOR primarily due to its deterministic encoding capabilities.

Gordian Envelope, a "smart documents" format containing cryptographic material like keys and verifiable credentials, relies on cryptographic hashing for data integrity and selective disclosure. These functions require deterministic serialization—identical semantic data must produce identical byte sequences when encoded.

CBOR's RFC 8949 explicitly defines a "Deterministically Encoded CBOR" profile that mandates preferred integer encodings and lexicographically ordered map keys. This standardized approach to determinism gives CBOR a significant advantage over JSON (which lacks universal canonicalization) and other binary formats where determinism isn't prioritized.

While RFC 8949 established deterministic guidelines, Blockchain Commons identified remaining ambiguities that could lead to inconsistent implementations. Their dCBOR application profile, documented as an IETF Internet-Draft, further refines these rules by rejecting duplicate map keys and establishing precise numeric reduction rules to ensure values like 10, 10.0, and 10.00 encode identically.

Beyond determinism, CBOR offered additional advantages: structured binary representation suitable for cryptographic data, conciseness, standardized tag-based extensibility, IETF standardization, compatibility with constrained environments (important for hardware wallets), and platform independence.

CBOR thus provided the standardized deterministic foundation that Blockchain Commons refined through dCBOR to build secure, interoperable systems like Gordian Envelope—topics covered in later chapters.

Conclusion: A Diverse Binary Ecosystem

The evolution from XML to binary formats like BSON, Protocol Buffers, MessagePack, Avro, and CBOR reflects a landscape where no single "best" serialization format exists. Each represents specific design choices optimized for particular contexts.

BSON prioritizes efficient storage and traversal in MongoDB, extending JSON with specialized types at the cost of compactness and broader interoperability.
Protocol Buffers achieves performance and compactness for RPC through mandatory schemas and code generation, trading flexibility and self-description.
MessagePack offers a compact binary JSON alternative for network communication, despite potential streaming limitations.
Avro excels at schema evolution in data pipelines, requiring schema availability but providing robust compatibility features.
CBOR delivers an IETF-standardized, binary-efficient JSON encoding that balances flexibility with performance, offering standardized extensibility and serving constrained environments and deterministic encoding needs.

These diverse formats will continue to coexist, with developers selecting tools that match their project requirements. CBOR's position as a standardized, efficient format based on the JSON model ensures its relevance, particularly for IoT, secure systems, and verifiable data structures.

A Practical Introduction to CBOR

From Comparison to Construction

In the previous chapter, we explored the diverse landscape of binary serialization formats, comparing CBOR to its contemporaries like BSON, Protocol Buffers, MessagePack, and Avro. We saw how each format emerged from different needs and design philosophies, resulting in distinct trade-offs between schema requirements, performance, compactness, and features like schema evolution. CBOR, standardized by the IETF as RFC 8949, carved out its niche by providing a binary encoding based on the familiar JSON data model, optimized for efficiency (especially in constrained environments like IoT), extensibility, and standardization within internet protocols.

Having understood why CBOR exists and how it relates to other formats, we now shift our focus to how it works. This chapter provides a practical introduction to the core mechanics of CBOR encoding. The goal is not to replicate the exhaustive detail of RFC 8949, but rather to quickly equip engineers with a solid working understanding of how fundamental data types are represented in CBOR.

Note

Wherever this book may conflict with RFC 8949, the RFC is authoritative. This book is intended to be a practical guide, not a definitive reference. We will also use the term "CBOR" interchangeably to refer to both the encoding and the data model, unless otherwise specified.

We will progressively build up understanding by examining common data structures, comparing their representation in:

JSON: The familiar text-based format.
CBOR Diagnostic Notation: A human-readable text format, similar to JSON but extended for CBOR's features, used for documentation and debugging.
Hexadecimal CBOR: The actual binary representation shown as hexadecimal bytes, which is how CBOR data is transmitted or stored.

We will focus on the most common, definite-length encodings and the concept of "preferred serialization" – using the shortest possible form where choices exist. Advanced topics such as semantic tags (Major Type 6), indefinite-length encoding, full deterministic encoding rules (beyond preferred serialization), schema definition with CDDL, and CBOR sequences will be introduced in later chapters. By the end of this chapter, you should be able to look at simple CBOR byte sequences and understand the data they represent.

Tip

The CBOR Playground is an excellent tool if you would like to follow along with the examples, converting CBOR Diagnostic Notation to binary and back.

The Core Encoding: Major Types and Additional Information

At the heart of CBOR's encoding lies a simple yet powerful structure contained within the first byte (and potentially subsequent bytes) of any data item. This initial byte conveys two crucial pieces of information:

┌────┬────┬────┬────┬────┬────┬────┬────┐
│  7 │  6 │  5 │  4 │  3 │  2 │  1 │  0 │
├────┴────┴────┼────┴────┴────┴────┴────┤
│  MAJOR TYPE  │ ADDITIONAL INFORMATION │
└──────────────┴────────────────────────┘

Major Type (MT): The high-order 3 bits (bits 5, 6, and 7) define the general category of the data item. There are 8 major types (0 through 7).
Additional Information (AI): The low-order 5 bits (bits 0 through 4) provide specific details about the data item, whose meaning depends on the Major Type. This can range from encoding the entire value directly (for small integers or simple constants) to indicating the length of subsequent data or specifying the precision of a floating-point number.

This initial byte structure allows a CBOR decoder to immediately understand the fundamental type and size characteristics of the data item it is encountering, enabling efficient parsing without requiring a predefined schema. All multi-byte numerical values in CBOR are encoded in network byte order (big-endian).

Let's break down the Major Types and see how the Additional Information works for each:

Major Type	Bits (MT)	Meaning	Notes
0	`000`	Unsigned Integer	Values from `0` to `2⁶⁴−1`
1	`001`	Negative Integer	Encodes `-1 - n` where `n` is the encoded value
2	`010`	Byte String	Sequence of raw bytes
3	`011`	Text String	UTF-8 encoded string
4	`100`	Array	Ordered list of data items
5	`101`	Map	Pairs of keys and values
6	`110`	Tag	Semantic qualifier for the following item
7	`111`	Simple Values / Floating-Point	Booleans, null, undefined, floats, etc.

The Additional Information values (0-31) modify the meaning of the Major Type:

AI Value	Bits (AI)	Meaning
0–23	`00000`–`10111`	Value or length is encoded directly (literal value)
24	`11000`	Next 1 byte contains the value or length (uint8)
25	`11001`	Next 2 bytes contain the value or length (uint16)
26	`11010`	Next 4 bytes contain the value or length (uint32)
27	`11011`	Next 8 bytes contain the value or length (uint64)
28–30	`11100`–`11110`	Reserved for future use
31	`11111`	Indefinite-length indicator or special “break”

AI values 0-27 are used for encoding the length of the data item or the value itself, with 24-27 called the 1+1, 1+2, 1+4, and 1+8 encodings, respectively. The AI value 31 is used for indefinite-length items, which we will cover in a later chapter.

    ┌──────┐
1   │ 0-23 │
    └──────┘
    ┌────┐┌────┐
1+1 │ 24 ││    │
    └────┘└────┘
    ┌────┐┌────┬────┐
1+2 │ 25 ││    │    │
    └────┘└────┴────┘
    ┌────┐┌────┬────┬────┬────┐
1+4 │ 26 ││    │    │    │    │
    └────┘└────┴────┴────┴────┘
    ┌────┐┌────┬────┬────┬────┬────┬────┬────┬────┐
1+8 │ 27 ││    │    │    │    │    │    │    │    │
    └────┘└────┴────┴────┴────┴────┴────┴────┴────┘

Understanding this MT/AI structure is the key to decoding CBOR. We will now see it in action as we explore specific data types. An appendix contains a table of all 256 possible Major Type and Additional Information combinations.

Simple Scalar Types: Integers, Booleans, and Null

Let's start with the simplest data types common to JSON and CBOR: integers, booleans, and null.

Integers (Major Types 0 & 1)

CBOR distinguishes between unsigned integers (Major Type 0) and negative integers (Major Type 1). The Additional Information determines how the integer's value (or argument) is encoded.

Small Integers (0-23): If the unsigned integer is between 0 and 23 inclusive, it's encoded directly in the Additional Information bits of the initial byte (Major Type 0).
Larger Integers: For values 24 or greater, the Additional Information takes the value 24, 25, 26, or 27, indicating that the actual integer value follows in the next 1, 2, 4, or 8 bytes, respectively, in network byte order (big-endian).
Negative Integers: Encoded using Major Type 1. The value encoded is −1 − argument. So, an argument of 0 represents the integer -1, an argument of 9 represents -10, and so on. The argument itself is encoded using the same rules as unsigned integers (AI 0-23 for arguments 0-23, AI 24-27 for larger arguments).

Preferred Serialization: CBOR allows multiple ways to encode the same number (e.g., the number 10 could theoretically be encoded using 1, 2, 4, or 8 bytes following an initial byte with AI 24, 25, 26, or 27). However, the standard strongly recommends preferred serialization, which means always using the shortest possible form. This avoids ambiguity and unnecessary padding. For non-negative integers, this means:

Value Range	AI Value	Bytes Used After Initial Byte	Total Encoding Size
0–23	0–23	0	1 byte
24–255	24	1	2 bytes
256–65,535	25	2	3 bytes
65,536–4,294,967,295	26	4	5 bytes
4,294,967,296–2⁶⁴−1	27	8	9 bytes

The same principle applies to the argument for negative integers.

Examples (Preferred Serialization):

JSON	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`0`	`00`	`0000`	0	0	Value 0 directly encoded
`10`	`1010`	`0a0a`	0	10	Value 10 directly encoded
`23`	`2323`	`1717`	0	23	Value 23 directly encoded
`24`	`2424`	`18 1818 18`	0	24	Value in next byte; `0x18` = 24
`100`	`100100`	`18 6418 64`	0	24	Value in next byte; `0x64` = 100
`1000`	`10001000`	`19 03e819 03e8`	0	25	Value in next 2 bytes; `0x03e8` = 1000
`1000000`	`10000001000000`	`1a 000f42401a 000f4240`	0	26	Value in next 4 bytes; `0x000f4240` = 1,000,000
`-1`	`-1-1`	`2020`	1	0	-1 = -1 - 0
`-10`	`-10-10`	`2929`	1	9	-10 = -1 - 9
`-100`	`-100-100`	`38 6338 63`	1	24	Argument in next byte; `0x63` = 99 → -1 - 99 = -100
`-1000`	`-1000-1000`	`39 03e739 03e7`	1	25	Argument in next 2 bytes; `0x03e7` = 999 → -1000

Booleans and Null (Major Type 7)

CBOR uses Major Type 7 for various simple values and floating-point numbers. The boolean values true and false, and the null value, have specific, fixed Additional Information values.

JSON	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`false`	`falsefalse`	`f4f4`	7	20	Simple value: `false`
`true`	`truetrue`	`f5f5`	7	21	Simple value: `true`
`null`	`nullnull`	`f6f6`	7	22	Simple value: `null`

CBOR also defines an undefined simple value (f7, MT 7, AI 23), which doesn't have a direct equivalent in standard JSON but may be useful in certain protocols.

Strings: Bytes and Text

CBOR has distinct types for byte strings (arbitrary sequences of bytes) and text strings (sequences of Unicode characters encoded as UTF-8). This is a key advantage over JSON, which lacks native binary support and typically requires base64 encoding for binary data.

Byte Strings (Major Type 2)

Byte strings use Major Type 2. The Additional Information encodes the length of the string in bytes, following the same rules as unsigned integers (AI 0-23 for lengths 0-23, AI 24-27 + subsequent bytes for longer lengths). The raw bytes of the string immediately follow the initial byte(s).

Examples: Definite Length Byte Strings

In CBOR diagnostic notation, byte strings are represented using hexadecimal encoding prefixed with h and enclosed in single quotes.

Description	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
Empty byte string	`h''h''`	`4040`	2	0	Length 0 bytes
Bytes `0x01, 0x02, 0x03`	`h'010203'h'010203'`	`43 01020343 010203`	2	3	Length 3 bytes; followed by `01 02 03`
24 bytes (e.g., all `0x00`)	`h'…'h'…'`	`58 18 …58 18 …`	2	24	Length in next byte; `0x18` = 24; followed by 24 bytes

Examples: Definite Length Text Strings

Text strings use Major Type 3 and are explicitly defined as UTF-8 encoded Unicode strings. The Additional Information (AI) specifies the length in bytes of the UTF-8 encoding, not the number of Unicode characters, which can (and often are) different. In diagnostic notation, text strings are enclosed in double quotes (like JSON).

JSON	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`""`	`""""`	`6060`	3	0	Empty string; length 0
`"a"`	`"a""a"`	`61 6161 61`	3	1	Length 1 byte; `0x61` = `'a'`
`"hello"`	`"hello""hello"`	`65 68656c6c6f65 68656c6c6f`	3	5	Length 5 bytes; `68 65 6c 6c 6f` = `'hello'`
`"IETF"`	`"IETF""IETF"`	`64 4945544664 49455446`	3	4	Length 4 bytes; `49 45 54 46` = `'IETF'`
`"ü"`	`"ü""ü"`	`62 c3bc62 c3bc`	3	2	Length 2 bytes; `c3 bc` is UTF-8 for `'ü'`
`"你好"`	`"你好""你好"`	`66 e4bda0e5a5bd66 e4bda0e5a5bd`	3	6	Length 6 bytes; `e4 bd a0 e5 a5 bd` is UTF-8 for `'你好'`

Tip

CBOR does not perform string escaping like JSON does (e.g., for quotes or backslashes). Since the length is provided upfront, the decoder knows exactly how many bytes constitute the string content. So the string "Hello", including the quotes is seven bytes long, and the CBOR encoding would be eight bytes:

67               # Text(7 bytes)
  2248656C6C6F22 # "Hello"

67               # Text(7 bytes)
  2248656C6C6F22 # "Hello"

If you use the CBOR Playground to convert this to Diagnostic Notation, you'll get:

"\"Hello\""

"\"Hello\""

The backslash escapes you see are part of CBOR Diagnostic Notation, but not part of the CBOR encoding itself.

Collections: Arrays and Maps

CBOR supports ordered sequences of items (arrays) and unordered key-value pairs (maps), mirroring JSON's structures but with some key differences. This section focuses on definite-length collections, where the number of elements or pairs is known upfront.

Arrays (Major Type 4)

Arrays use Major Type 4. The Additional Information encodes the number of data items (elements) contained within the array, using the same encoding rules as unsigned integers (AI 0-23 for counts 0-23, AI 24-27 + subsequent bytes for larger counts). The encoded data items follow the initial byte(s) in sequence.

Like JSON, CBOR Diagnostic Notation uses square brackets [][] with comma-separated elements.

Examples: Definite Length Arrays

Arrays in CBOR use Major Type 4. The Additional Information (AI) specifies the number of elements in the array. The elements are encoded sequentially after the initial byte.

JSON	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`[]`	`[][]`	`8080`	4	0	Array with 0 elements
`[1, 2, 3]`	`[1, 2, 3][1, 2, 3]`	`83 01 02 0383 01 02 03`	4	3	Array with 3 elements; `01`, `02`, `03` encode integers 1, 2, and 3
`[true, null]`	`[true, null][true, null]`	`82 f5 f682 f5 f6`	4	2	Array with 2 elements; `f5` = true, `f6` = null
no equivalent	`["a", h'01']["a", h'01']`	`82 61 61 41 0182 61 61 41 01`	4	2	Array with 2 elements; `61 61` = "a", `41 01` = byte string `h'01'`

Maps (Major Type 5)

Maps (also known variously as dictionaries or associative arrays) use Major Type 5. The Additional Information encodes the number of pairs in the map (not the total number of keys and values). Again, the encoding follows the rules for unsigned integers. The key-value pairs follow the initial byte(s), with each key immediately followed by its corresponding value (key1, value1, key2, value2,...).

A significant difference from JSON is that CBOR map keys can be any CBOR data type (integers, strings, arrays, etc.), not just text strings.

Examples: Definite Length Maps

CBOR maps use Major Type 5. The Additional Information (AI) specifies the number of key-value pairs. Keys and values follow in alternating sequence. Diagnostic notation uses curly braces {} with comma-separated key: value pairs.

JSON	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`{}`	`{}{}`	`a0a0`	5	0	Map with 0 key-value pairs
`{"a": 1}`	`{"a": 1}{"a": 1}`	`a1 61 61 01a1 61 61 01`	5	1	1 pair: key `"a"` (`61 61`), value `1` (`01`)
`{"a": 1, "b": 2}`	`{"a": 1, "b": 2}{"a": 1, "b": 2}`	`a2 61 61 01 61 62 02a2 61 61 01 61 62 02`	5	2	2 pairs: `"a"`→`1`, `"b"`→`2`; encoded in sequence
no equivalent	`{1: "one", 2: "two"}{1: "one", 2: "two"}`	`a2 01 63 6f6e65 02 63 74776fa2 01 63 6f6e65 02 63 74776f`	5	2	2 pairs: `1`→`"one"`, `2`→`"two"`; strings encoded as `63` (length 3)

Note

Although map keys have to be serialized in some order, CBOR maps are considered orderless. This means that CBOR encoders will typically not treat the order of pairs as significant, and neither should you. Similarly, nothing in the CBOR specification requires that map keys be unique. Theoretically you could have multiple pairs with the same key, but many implementations will simply choose to keep one of the pairs and throw away the other. You should therefore never rely on the behavior of particular implementations regarding the order of keys or duplicate keys. Deterministic encoding profiles we'll discuss later in this book address these ambiguities.

Floating-Point and Other Simple Values (Major Type 7)

Major Type 7 serves as a catch-all for simple values (like true, false, and null, covered earlier) and floating-point numbers.

Floating-Point Numbers (Major Type 7)

CBOR supports IEEE-754 binary floating-point numbers in half, single, and double precision. The Additional Information (AI) field specifies the precision, and the bytes that follow are in network byte order (big-endian).

Precision	AI Value	Bytes After Initial Byte	Total Size	Notes
Half-precision	25	2 bytes	3 bytes	16-bit float (`float16`)
Single-precision	26	4 bytes	5 bytes	32-bit float (`float32`)
Double-precision	27	8 bytes	9 bytes	64-bit float (`float64`)

Preferred Serialization for Floating-Point Numbers

Similar to integers, preferred serialization for floating point values dictates using the shortest floating-point representation that can exactly encode a given value. If a number can be precisely represented in float16, it is encoded that way instead of using float32 or float64.

Value	CBOR Diagnostic	CBOR Hex	MT	AI	Explanation
`0.0`	`0.00.0`	`f9 00 00f9 00 00`	7	25	Half-precision (`float16`); zero
`1.0`	`1.01.0`	`f9 3c 00f9 3c 00`	7	25	Half-precision; `0x3c00` encodes `1.0`
`-1.5`	`-1.5-1.5`	`f9 be 00f9 be 00`	7	25	Half-precision; `0xbe00` encodes `-1.5`
`10000.0`	`10000.010000.0`	`fa 47 c3 50 00fa 47 c3 50 00`	7	26	Single-precision; `0x47c35000` encodes `10000.0`
`1.1`	`1.11.1`	`fb 3f f1 99 99 99 99 99 9afb 3f f1 99 99 99 99 99 9a`	7	27	Double-precision; only this width exactly encodes `1.1`
`3.14159`	`3.141593.14159`	`fb 40 09 21 f9 f0 1b 86 6efb 40 09 21 f9 f0 1b 86 6e`	7	27	Double-precision; needed to preserve exact π approximation
`1.0e+300`	`1.0e+3001.0e+300`	`fb 7e 37 e4 3c 88 00 75 9cfb 7e 37 e4 3c 88 00 75 9c`	7	27	Double-precision; high magnitude
`Infinity`	`InfinityInfinity`	`f9 7c 00f9 7c 00`	7	25	Half-precision encoding for positive infinity
`NaN`	`NaNNaN`	`f9 7e 00f9 7e 00`	7	25	Half-precision encoding for NaN (payload may vary)

Other Simple Values

Besides false, true, null, and undefined (AI 20-23), Major Type 7 allows for simple values 0 through 19 (encoded directly with AI 0-19) and 32 through 255 (encoded using AI 24 followed by one byte). The specific meanings of these simple values are generally undefined by the core CBOR specification and are reserved for specific profiles or applications.

Value Range	Encoding Method	Semantics
0–19	MT 7, AI = value (1-byte)	Reserved
20	MT 7, AI = 20 (0xf4)	`falsefalse`
21	MT 7, AI = 21 (0xf5)	`truetrue`
22	MT 7, AI = 22 (0xf6)	`nullnull`
23	MT 7, AI = 23 (0xf7)	`undefinedundefined`
24	MT 7, AI = 24, followed by 1 byte	Reserved
25–27	MT 7, AI = 25–27, followed by 2–8 bytes	Floating-point numbers
28–30	MT 7, AI = 28–30	Reserved
31	MT 7, AI = 31 (0xff)	"break" stop code
32–255	MT 7, AI = 24, followed by 1 byte (value)	Reserved

Notes:

Values 0–19 are currently unassigned and reserved for future use.
Values 20–23 represent the simple values falsefalse, truetrue, nullnull, and undefinedundefined, respectively.
Value 24 is reserved and not used for encoding simple values.
Values 25–27 are used to encode floating-point numbers of different precisions:
- 25: Half-precision (16-bit)
- 26: Single-precision (32-bit)
- 27: Double-precision (64-bit)
Values 28–30 are reserved for future extensions.
Value 31 is used as a "break" stop code to indicate the end of an indefinite-length item.
Values 32–255 are unassigned and available for application-specific use.

For the most up-to-date information, refer to the IANA CBOR Simple Values registry.

Putting It Together: A Nested Example

Now let's combine these elements into a more complex, nested structure. Consider the following JSON object:

{
  "name": "Gadget",
  "id": 12345,
  "enabled": true,
  "parts": [
    "bolt",
    "nut"
  ],
  "spec": {
    "size": 10.5,
    "data": "AQAA/w=="
  }
}

Total size of file: 154 bytes
Without whitespace: 109 bytes

Note that the "data" value in JSON is base64 encoded, representing the bytes 0x01, 0x00, 0x00, 0xff. In CBOR, we can represent this directly as a byte string.

CBOR Diagnostic Notation:

{
  "name": "Gadget",
  "id": 12345,
  "enabled": true,
  "parts": [
    "bolt",
    "nut"
  ],
  "spec": {
    "size": 10.5,
    "data": h'010000ff'
  }
}

{
  "name": "Gadget",
  "id": 12345,
  "enabled": true,
  "parts": [
    "bolt",
    "nut"
  ],
  "spec": {
    "size": 10.5,
    "data": h'010000ff'
  }
}

CBOR Hexadecimal Encoding with Commentary:

a5                     # map(5 pairs follow)
   64 6e616d65         # key 0: text (4 bytes, "name")
   66 476164676574     # value 0: text (6 bytes, "Gadget")
   62 6964             # key 1: text (2 bytes, "id")
   19 3039             # value 1: unsigned(12345)
   67 656e61626c6564   # key 2: text (7 bytes, "enabled")
   f5                  # value 2: primitive(21) (true)
   65 7061727473       # key 3: text(5 bytes, "parts")
   82                  # value 3: array(2 elements follow)
      64 626f6c74         # element 0: text(4 bytes, "bolt")
      63 6e7574           # element 1: text(3 bytes, "nut")
   64 73706563         # key 4: text(4 bytes, "spec")
   A2                  # value 4: map(2 pairs follow)
      64 73697A65         # key 0: text(4 bytes, "size")
      F9 4940             # value 0: float(10.5) (half-precision)
      64 64617461         # key 1: text(4 bytes, "data")
      44 010000FF         # value 1: bytes(4 bytes, h'010000FF')

a5                     # map(5 pairs follow)
   64 6e616d65         # key 0: text (4 bytes, "name")
   66 476164676574     # value 0: text (6 bytes, "Gadget")
   62 6964             # key 1: text (2 bytes, "id")
   19 3039             # value 1: unsigned(12345)
   67 656e61626c6564   # key 2: text (7 bytes, "enabled")
   f5                  # value 2: primitive(21) (true)
   65 7061727473       # key 3: text(5 bytes, "parts")
   82                  # value 3: array(2 elements follow)
      64 626f6c74         # element 0: text(4 bytes, "bolt")
      63 6e7574           # element 1: text(3 bytes, "nut")
   64 73706563         # key 4: text(4 bytes, "spec")
   A2                  # value 4: map(2 pairs follow)
      64 73697A65         # key 0: text(4 bytes, "size")
      F9 4940             # value 0: float(10.5) (half-precision)
      64 64617461         # key 1: text(4 bytes, "data")
      44 010000FF         # value 1: bytes(4 bytes, h'010000FF')

Total size of file: 68 bytes

This example demonstrates how the basic building blocks combine to represent complex, nested data structures efficiently.

Conclusion: Foundations Laid

This chapter has laid the groundwork for understanding CBOR by dissecting its core encoding mechanism. We've seen how the header byte, through its Major Type and Additional Information fields, defines the structure and type of every data item. We explored the preferred binary representations for fundamental types inherited from the JSON data model – integers (positive and negative), booleans, null, text strings, arrays, and maps – along with CBOR's native byte strings and standard floating-point numbers. By consistently comparing JSON, CBOR Diagnostic Notation, and the raw hexadecimal CBOR, we've illuminated the direct mapping between the familiar data model and its concise binary encoding.

With this foundation, you should now be able to interpret the structure of basic CBOR data items encoded using definite lengths and preferred serialization. You understand how CBOR achieves compactness while remaining self-describing at a fundamental level, allowing decoders to process data without prior schema knowledge.

However, this is just the beginning of the CBOR story. We intentionally deferred several important features to establish this core understanding:

Semantic Tags (Major Type 6): CBOR's powerful extensibility mechanism for adding meaning beyond the basic types.
Indefinite-Length Items: Encoding strings, arrays, and maps when their final size isn't known upfront, crucial for streaming applications.
CBOR Sequences: Transmitting multiple independent CBOR data items back-to-back in a stream.
Schema Definition (CDDL): Formal languages like CDDL used to define and validate the structure of CBOR data.
Deterministic Encoding: The stricter rules beyond preferred serialization needed to guarantee identical byte sequences for identical data, essential for cryptographic applications.

These advanced topics build upon the fundamentals covered here. In the upcoming chapters, we will explore CBOR's extensibility through tags, dive deep into the requirements and techniques for achieving deterministic encoding (dCBOR), and see how these elements combine to create robust, verifiable data structures like Gordian Envelope.

Extending Semantics with CBOR Tags

Beyond Basic Types: The Need for Meaning

In the previous chapter, we explored the fundamental mechanics of CBOR encoding, focusing on how basic data types like integers, strings, arrays, and maps are represented in a compact binary form. We saw how CBOR leverages a simple initial byte structure (Major Type and Additional Information) to create a self-describing format at the byte level, closely mirroring the familiar JSON data model but optimized for efficiency.

However, real-world data often carries meaning beyond these fundamental structures. How do we distinguish a simple integer representing a count from one representing seconds since an epoch? How do we represent a date, a URI, or a number larger than standard 64-bit integers can hold? While applications could implicitly agree on the meaning of specific fields (e.g., "the 'timestamp' field is always epoch seconds"), this approach lacks standardization and can lead to ambiguity and interoperability issues.

CBOR addresses this need for richer semantics through its tagging mechanism. Tags allow us to annotate underlying data items, providing additional context or type information without fundamentally changing the encoding structure. They are a cornerstone of CBOR's extensibility, enabling the representation of a vast range of data types beyond the core set, from standard types like dates and URIs to application-specific structures.

This chapter delves into CBOR Tags (Major Type 6). We will explore:

How tags work mechanically.
Their purpose in adding semantic meaning and enabling extensibility.
The IANA registry that standardizes tag definitions.
The different ranges of tag numbers and their implications for interoperability.
A selection of commonly used ("notable") tags with practical examples.

By the end of this chapter, you will understand how to use and interpret CBOR tags, unlocking a powerful feature for representing complex and meaningful data structures efficiently.

Info

As before, this chapter aims for practical understanding. For definitive details, always refer to the official specification,(https://datatracker.ietf.org/doc/html/rfc8949), and the IANA registries it defines.

Tagging Mechanism (Major Type 6)

CBOR dedicates Major Type 6 specifically for tags. A tag consists of two parts:

Tag Number: An unsigned integer (ranging from 0 up to 2⁶⁴−1) that identifies the tag's meaning.
Tag Content: A single, subsequent CBOR data item that is being tagged.

┌──────────────────────┐
│   TAG HEADER BYTE    │   → Major Type 6 + AI (determines length of tag number)
├──────────────────────┤
│   TAG NUMBER BYTES   │   → (0 to 8 bytes depending on AI)
└──────────────────────┘
           ↓
┌──────────────────────┐
│   TAGGED DATA ITEM   │   → Any valid CBOR item (primitive, array, map, etc.)
└──────────────────────┘

The encoding follows the standard CBOR pattern. The initial byte has its high-order 3 bits set to 110 (Major Type 6). The low-order 5 bits (Additional Information) encode the tag number itself, using the same rules used for all the major types:

Tag Number Range	Initial Byte	Additional Bytes	Total Tag Header Size	Notes
0 to 23	`0xC0` to `0xD7`	None	1 byte	Tag number in AI (0–23)
24 to 255	`0xD8`	1 byte (uint8)	2 bytes	AI = 24
256 to 65535	`0xD9`	2 bytes (uint16)	3 bytes	AI = 25
65536 to 4294967295	`0xDA`	4 bytes (uint32)	5 bytes	AI = 26
4294967296 to 2⁶⁴−1	`0xDB`	8 bytes (uint64)	9 bytes	AI = 27

Immediately following the initial byte(s) that encode the tag number comes the complete encoding of the single data item that serves as the tag's content.

Example: Tag 2 (unsigned bignum) applied to the byte string h'0102'h'0102'

CBOR Hex	MT	AI	Explanation
`c2`	6	2	Tag(2): Major Type 6, AI encodes tag number 2
`42`	2	2	Byte String (Major Type 2), length = 2 bytes
`0102`	–	–	Tag Content: raw bytes `0x01`, `0x02`

CBOR Diagnostic Notation: 2(h'0102')2(h'0102')

Info

If you put this diagnostic notation into the CBOR playground, convert it to its hexadecimal representation and back, you will get the value 258! This is because the playground understands that byte strings tagged with Tag 2 (unsigned bignum) are interpreted as a single integer value. In this case, the first byte 0x01 is the most significant byte, and the second byte 0x02 is the least significant byte, leading to the calculation: (1 * 256 + 2) = 258. This is the playground enforcing preferred serialization of numbers, which is a feature of the playground, not a requirement of CBOR itself.

Purpose of Tags

Why introduce this extra layer? Tags serve several crucial purposes aligned with CBOR's design goals:

Adding Semantics: Tags provide standardized meaning to underlying data. Tag 1 indicates that an integer or float represents epoch-based time; Tag 32 indicates a text string is a URI. This allows applications to interpret data correctly without relying solely on field names or out-of-band agreements.
Extensibility: Tags are CBOR's primary mechanism for defining new data types beyond the basic set, without requiring version negotiation. New standards or applications can define tags for specialized data structures (like cryptographic keys, geographic coordinates, or domain-specific objects) and register them, allowing the CBOR ecosystem to grow organically.
Interoperability Hints: Some tags provide guidance for converting CBOR data to other formats, particularly JSON which lacks native support for types like byte strings or dates. Tags 21-23, for example, suggest how binary data might be represented using base64 or hex encoding if conversion is necessary.
Type System Augmentation: Tags allow CBOR to represent data types common in programming languages but not directly present in the basic JSON model, such as unsigned 64-bit integers, arbitrarily large integers (bignums), specific date/time formats, UUIDs, and more.

This mechanism of using an inline prefix tag number followed by the content provides a compact, binary-native way to convey type and semantic information. This contrasts with more verbose text-based approaches like XML namespaces or JSON-LD contexts, aligning with CBOR's goal of message size efficiency.

Decoder Behavior

Crucially, CBOR decoders are not required to understand the semantics of every tag they encounter. This is a key aspect of CBOR's extensibility and forward compatibility. A generic decoder encountering an unknown tag N followed by content C can simply:

Decode the tag number N.
Decode the tag content C.
Pass both N and C to the application.

The application can then decide whether it understands tag N and how to interpret C based on it. If the application doesn't recognize tag N, it might treat C as opaque data, ignore it, or raise an error, depending on the application's logic. This allows systems to process messages containing newer, unknown tags without failing, provided the application logic can handle the tagged data appropriately (perhaps by ignoring it).

Tag Nesting

Tags can be nested. A tag can enclose another tag, which in turn encloses a data item. For example, consider TagA(TagB(ItemC)). The interpretation applies from the inside out: TagB modifies or adds semantics to ItemC, and then TagA applies to the result of TagB(ItemC).

Later in the book we'll discuss Gordian Envelope. The CBOR diagnostic notation for a very simple envelope containing just a text string might look like this:

200(201("Hello, envelope!"))

200(201("Hello, envelope!"))

Tag 200 is registered with IANA as "Gordian Envelope". So anytime you encounter tag 200, you know you're looking at a Gordian Envelope. The tag 201 represents dCBOR (deterministic CBOR), which we'll also cover in this book. In the Gordian Envelope specification, an Envelope containing just dCBOR is a LEAF node, which can be any valid dCBOR-- in this case, a text string.

Finding Your Tags: The IANA Registry

With potentially 2⁶⁴ tag numbers available, how do we ensure that different applications don't use the same number for conflicting purposes? With that many tags, you could just pick them at random and be pretty certain nobody else is using them, but there's a better way! The Internet Assigned Numbers Authority (IANA) maintains the official Concise Binary Object Representation (CBOR) Tags registry.

This registry serves as the central, authoritative source for standardized tag assignments. Its importance cannot be overstated:

Interoperability: The registry ensures that a specific tag number (especially in the lower, standardized ranges) consistently refers to the same semantic meaning and expected data item type across different implementations and protocols that adhere to the standards. This prevents conflicts where one application might use tag X for dates while another uses it for URIs.
Discovery: It provides a public catalog where developers can look up existing tags for common data types (like dates, bignums, UUIDs, MIME messages, etc.) before defining their own. This encourages reuse and avoids unnecessary proliferation of tags for the same concept.

The registry is presented as a table with columns including:

Tag: The tag number.
Data Item: The expected CBOR type(s) of the tag content (e.g., text string, byte string, array, integer).
Semantics: A brief description of the tag's meaning.
Reference: A pointer to the document (an RFC or other stable specification) that defines the tag in detail.

Tag Number Ranges and Registration Procedures

The IANA registry doesn't treat all tag numbers equally. The vast space from 0 to 264−1 is divided into distinct ranges, each with its own allocation policy. These policies reflect the intended use and required level of standardization for tags within that range. Understanding these ranges is crucial for choosing appropriate tags and understanding their interoperability implications.

The primary ranges and their procedures are:

Range 0-23 (Standards Action)
- Encoding: These are the most compact tags, encoded directly within the initial byte (0xc0 to 0xd7).
- Procedure: Requires Standards Action. Assignment typically necessitates definition within an IETF Request for Comments (RFC) or a standard from another recognized body. This is the most rigorous process, requiring that the IETF adopt the tag as part of a formal standard.
- Intended Use: Reserved for core, fundamental, and widely applicable data types expected to be broadly interoperable (e.g., standard date/time formats, bignums, basic content hints).
Range 24-32767 (Specification Required)
- Encoding: Covers tags requiring 1 additional byte (0xd8 xx, for tags 24-255) and the lower half of tags requiring 2 additional bytes (0xd9 xxxx, for tags 256-32767).
- Procedure: Requires Specification Required. This means a stable, publicly accessible specification document defining the tag's semantics, expected data item format, and intended use must exist. IANA-appointed experts review the specification before registration. It's less formal than full Standards Action but still requires clear documentation and review.
- Intended Use: Suitable for well-defined data types used within specific protocols, communities, or domains (e.g., COSE security tags, MIME messages, UUIDs, URIs, dCBOR, and Gordian Envelope). These tags are expected to be interoperable among parties using the defining specifications.
Range 32768 - 18446744073709551615 (First Come First Served - FCFS)
- Encoding: Covers the upper half of 2-byte tags, all 4-byte tags (0xda xxxxxxxx), and all 8-byte tags (0xdb xxxxxxxxxxxxxxxx). This is the vast majority of the tag number space.
- Procedure: First Come First Served (FCFS). Registration is granted to the first applicant who provides the required information (based on the RFC 8949 template), including contact details and preferably a URL pointing to a description of the semantics. The review is primarily for completeness, not semantic detail or overlap (beyond the number itself).
- Intended Use: Designed for application-specific tags, experimental use, vendor-specific extensions, or types where broad standardization isn't necessary or desired. Useful for rapid development or closed ecosystems.

This tiered structure represents a deliberate design choice, reflecting a spectrum from highly standardized and stable core types to flexible application-specific extensions. It reserves the most efficiently encoded tag numbers (0-23) for the most common, universally understood types, while providing ample space for innovation and specific needs in the higher ranges. All registration requests, regardless of range, must follow the basic template defined in RFC 8949 to ensure a minimum level of documentation.

Choosing and Using Tags Wisely

The existence of different registration ranges has direct practical consequences for developers choosing tags:

Interoperability Guarantees:
- Standards Action (0-23): Offers the highest likelihood of interoperability. Implementations aiming for broad CBOR compliance should recognize and potentially handle these tags. Use them whenever your data semantically matches a tag in this range.
- Specification Required (24-32767): Provides good interoperability within the community that uses the defining specification. Consumers outside this community may not recognize the tag without consulting the specification. Ideal for domain-specific standards (e.g., security tokens, IoT protocols).
- FCFS (32768+): Offers the lowest inherent interoperability guarantee. Use primarily for private or application-specific data types where producers and consumers are tightly coupled or have explicitly agreed on the tag's meaning. Relying on FCFS tags for broad, unspecified interoperability is risky.

The Danger of “Squatting”

Never use an unregistered tag number from the Standards Action (0-23) or Specification Required (24-32767) ranges for your own private or experimental purposes. This practice, sometimes called "tag squatting," inevitably leads to collisions when IANA officially assigns that number for a different purpose. It breaks interoperability and creates significant problems down the line. Use the FCFS range for experimentation or application-specific needs.

How to Register your own FCFS Tags

Tip

There is no charge to register a new tag.

Check the IANA CBOR Tags Registry to ensure that there isn't already an existing tag that does what you want. If you find one, use that instead of creating a new one.
Write your specification. This should be a stable, publicly accessible document that defines the tag's semantics, expected data item format, and intended use. It can be as simple as a GitHub Gist, but it should be clear, unambiguous, and have a stable URL.
Check the IANA CBOR Tags Registry to ensure the tag you want isn't already taken.
Review the section of RFC 8949 that describes the registry, the registration process for tags, and the template for submitting a registration request.
Review the IANA list of protocol registries. You'll find the one called CBOR Tags, which also lists the IANA experts assigned to review tag registrations if they are in the Specification Required range.
Fill out the IANA "General Request for Assignments" form.

The form itself is very simple. You will need to provide:

Your name and email address.
The type of assignment you're requesting (CBOR Tags).
The registry you're requesting the assignment from (the CBOR Tags registry).
A reason for the assignment. This information is optional, but helpful and recommended.
"Additional Information". For each tag, you're registering provide information corresponding to a column in the IANA registry. We recommend you review at the registry for examples:
- Tag: The tag number.
- Data Item: The expected CBOR type(s) of the tag content (e.g., text string, byte string, array, integer).
- Semantics: A brief description of the tag's meaning.
- Reference: A pointer to the document (an RFC or other stable specification) that defines the tag in detail.

That's it! Submit the form, and IANA will respond to your request by email.

Notable Tags

The IANA CBOR Tags registry is authoritative and growing, listing hundreds of registered tags. Navigating this full list can be daunting. Fortunately, the IETF community maintains a document, Notable CBOR Tags, which serves as a curated guide or "roadmap" to a selection of the most commonly used, interesting, or otherwise "notable" tags, particularly those defined since the original CBOR specification.

The Internet Draft on Notable Tags provides a number of tags in other interesting categories, including:

RFC 7049 (original CBOR specification) Tags defined in the original CBOR specification, including standard date/time strings, bignums, decimal fractions, and base64 encodings.
Security Tags used in security contexts, such as COSE (CBOR Object Signing and Encryption) and CBOR Web Tokens (CWT).
CBOR-based Representation Formats Tags used in CBOR-based representation formats like YANG-CBOR.
Protocols Tags utilized in specific protocols, including DOTS (DDoS Open Threat Signaling) and RAINS (Another Internet Naming Service).
Datatypes Tags representing advanced datatypes, such as advanced arithmetic types, variants of undefined, and typed/homogeneous arrays.
Domain-Specific Tags tailored for specific domains, including human-readable text and extended time formats.
Platform-Oriented Tags related to specific platforms or programming languages, such as Perl, JSON, and unusual text encodings.
Application-Specific Tags designed for particular applications, including enumerated alternative data items.
Implementation Aids Tags intended to assist with implementation, such as invalid tags and programming aids for simple values.

Tip

While the IANA registry is the definitive source, the "Notable CBOR Tags" draft provides valuable context and summaries for many practical tags.

A Few Commonly Used Tags

Let's explore a few of the most fundamental and useful tags, many defined in the original CBOR specification and detailed further in the notable tags draft:

Tag 0: Standard Date/Time String

Content: UTF-8 string
Semantics: Represents a date and time expressed as a string, following the standard format defined in RFC 3339 (a profile of ISO 8601). This is a human-readable format.
Diagnostic: 0("2013-03-21T20:04:00Z")0("2013-03-21T20:04:00Z")
Hex Example:

C0                                      # tag(0)
   74                                   # text(20)
      323031332D30332D32315432303A30343A30305A # "2013-03-21T20:04:00Z"

C0                                      # tag(0)
   74                                   # text(20)
      323031332D30332D32315432303A30343A30305A # "2013-03-21T20:04:00Z"

Tag 1: Epoch-Based Date/Time

Content: Integer or Floating-point number
Semantics: Represents a point in time as a numeric offset (in seconds, with optional fractional part for floats) from the standard Unix epoch (1970-01-01T00:00:00Z UTC). More compact and suitable for computation than Tag 0.
Diagnostic (Integer): 1(1363896240)1(1363896240)
Hex Example (Integer):

C1             # tag(1)
   1A 514B67B0 # unsigned(1363896240)

C1             # tag(1)
   1A 514B67B0 # unsigned(1363896240)

Diagnostic (Float): 1(1698417015.123)1(1698417015.123)
Hex Example (Float - double precision):

C1                     # tag(1)
   FB 41D94EF25DC7DF3B # 1698417015.123

C1                     # tag(1)
   FB 41D94EF25DC7DF3B # 1698417015.123

Tip

The choice between integer and float depends on the need for sub-second precision. More advanced time tags exist (e.g., Tag 1001) offering higher precision and timescale information, but Tag 1 remains the basic epoch representation.

Tag 2 and 3: Bignums

Content: Byte string
Semantics: Represents an arbitrarily large non-negative integer (Tag 2) or negative integer (Tag 3) that does not have to fit into the 64-bit unsigned integer (Major Type 0). The byte string contains the magnitude of the integer in network byte order (big-endian), with no leading zero bytes permitted in preferred/deterministic encoding.
Diagnostic (representing 18446744073709551616): 2(h'010000000000000000')2(h'010000000000000000')
Hex Example (representing 18446744073709551616):

C2                         # Tag(2, non-negative bignum)
    49 010000000000000000  # Byte String (length 9 bytes, 18446744073709551616)

C2                         # Tag(2, non-negative bignum)
    49 010000000000000000  # Byte String (length 9 bytes, 18446744073709551616)

Tag 32: URI

Content: UTF-8 string
Semantics: Identifies the text string content as a Uniform Resource Identifier according to(RFC-3986).
Diagnostic: 32("http://cbor.io/")32("http://cbor.io/")
Hex Example:

D8 20                                # tag(32)
   6F                                # text(15)
      687474703A2F2F63626F722E696F2F # "http://cbor.io/"

D8 20                                # tag(32)
   6F                                # text(15)
      687474703A2F2F63626F722E696F2F # "http://cbor.io/"

Tag 37: UUID

Content: Byte string (must be 16 bytes long)
Semantics: Identifies the byte string content as a Universally Unique Identifier, as defined in(RFC-9562).
Diagnostic: 37(h'f81d4fae7dec11d0a76500a0c91e6bf6')37(h'f81d4fae7dec11d0a76500a0c91e6bf6')
Hex Example:

D8 25                   # Tag(37) - uses 1+1 encoding (0xd8 0x25)
    50                  # Byte String (length 16 bytes)
        f81d4fae7dec11d0a76500a0c91e6bf6

D8 25                   # Tag(37) - uses 1+1 encoding (0xd8 0x25)
    50                  # Byte String (length 16 bytes)
        f81d4fae7dec11d0a76500a0c91e6bf6

Example: Tags in Action:

Let's see how these tags combine with basic CBOR types to represent a more complex data structure. Consider this JSON object representing a hypothetical sensor reading message:

{
  "sensorID": "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
  "captureTime": "2023-10-27T14:30:15.123Z",
  "reading": -12.345,
  "readingScale": -3,
  "rawValue": -12345,
  "statusURL": "https://example.com/status/f81d4fae",
  "alertPayload": "AQIDBA=="
}

Here, sensorID is a UUID, captureTime is a standard timestamp, reading could be represented as a decimal fraction (-12345 * 10^-3), statusURL is a URI, and alertPayload is binary data (h'01020304').

CBOR Diagnostic Notation (using tags):

{
  "sensorID": 37(h'f81d4fae7dec11d0a76500a0c91e6bf6'),    # Tag 37 for UUID
  "captureTime": 0("2023-10-27T14:30:15.123Z"),           # Tag 0 for RFC3339 string
  "reading": 4([-3, -12345]),                             # Tag 4 for decimal fraction
  "statusURL": 32("https://example.com/status/f81d4fae"), # Tag 32 for URI
  "alertPayload": h'01020304'                             # Direct byte string
}

{
  "sensorID": 37(h'f81d4fae7dec11d0a76500a0c91e6bf6'),    # Tag 37 for UUID
  "captureTime": 0("2023-10-27T14:30:15.123Z"),           # Tag 0 for RFC3339 string
  "reading": 4([-3, -12345]),                             # Tag 4 for decimal fraction
  "statusURL": 32("https://example.com/status/f81d4fae"), # Tag 32 for URI
  "alertPayload": h'01020304'                             # Direct byte string
}

This example illustrates how tags integrate seamlessly into the CBOR structure. Tag 37 clearly identifies the sensorID bytes as a UUID, Tag 0 provides a standard string representation for captureTime, and Tag 32 marks the statusURL string as a URI. We chose to represent reading as a standard float, but Tag 4 could have been used for exact decimal precision if required by the application. For alertPayload, we used a direct byte string, as CBOR handles binary natively; Tag 22 could be added as a hint if this data frequently needs conversion to base64 for JSON compatibility. The tags add semantic precision and clarity beyond what the JSON representation alone could offer directly.

Conclusion: The Power of Extensibility

CBOR Tags (Major Type 6) are the primary mechanism for extending CBOR's data model beyond its fundamental types. They provide a standardized way to imbue data items with specific semantic meaning, enabling the representation of complex types like dates, times, large or high-precision numbers, URIs, UUIDs, and much more, all while maintaining CBOR's characteristic compactness.1 The IANA registry plays a vital role in ensuring interoperability by providing a central authority for tag definitions, while the tiered registration system balances the need for stable, standardized core tags with flexibility for application-specific extensions.1

Understanding tags—how they work, where to find them, the implications of different ranges, and how to apply common ones—is key to leveraging the full power of CBOR. They allow engineers to model complex, meaningful data structures efficiently and in a way that promotes clarity and potential interoperability.

Looking ahead, tags are not just an isolated feature; they interact significantly with other advanced CBOR concepts:

Deterministic Encoding (dCBOR): As we will explore later, achieving a canonical, byte-for-byte identical encoding for the same logical data requires strict rules. These rules apply to tags as well, mandating preferred serialization for tag numbers, and often requiring the consistent presence or absence of specific tags for certain semantic types. This is essential for applications like digital signatures or content-addressable storage where byte-level reproducibility is paramount.
Application Profiles (COSE, Gordian Envelope): Many higher-level protocols and data formats built upon CBOR rely heavily on specific tags to define their structures and semantics. CBOR Object Signing and Encryption (COSE) uses tags extensively to identify signed, MACed, and encrypted messages and related security parameters. Similarly, the Gordian Envelope specification, which we will cover in detail later in this book, defines its own set of tags to structure its secure, layered data format. A solid grasp of CBOR tags is fundamental to working with these important application profiles.

Mastering CBOR tags moves us beyond simply encoding basic data structures towards building rich, extensible, and semantically precise data formats suitable for a wide range of applications, from constrained IoT devices to complex web protocols.

Indefinite-Length Items

Introduction: Beyond Known Lengths

In A Practical Introduction to CBOR, we established the foundational mechanics of CBOR encoding, focusing on how the initial byte(s) of a data item—through the interplay of Major Type (MT) and Additional Information (AI)—convey the item's type and, crucially, its size or value. We saw how integers, strings, arrays, and maps are typically encoded using definite lengths, where the exact number of bytes (for strings) or elements/pairs (for collections) is specified upfront using AI values 0 through 27. This approach, particularly when combined with preferred serialization rules, leads to compact and efficiently parsable representations, provided the size of the data item is known before serialization begins.

However, there are common scenarios where determining the total size of a data item in advance is impractical, inefficient, or even impossible. Consider these situations:

Incremental Generation: A system might generate a large log entry or document piece by piece, appending data as it becomes available. Calculating the final size would require buffering the entire content first.
Network Streaming: Sensor data or results from a long-running computation might need to be transmitted over a network as soon as parts are ready, without waiting for the entire dataset to be complete.
Data Pipelines: An intermediate process might receive data chunks from one source and need to forward them immediately in CBOR format to the next stage, without the memory or latency budget to assemble the complete object first.

For these kinds of streaming applications, requiring the total length upfront negates the benefits of incremental processing. CBOR addresses this challenge directly with indefinite-length encoding, a mechanism specifically designed for situations where the size of certain data items is not known when serialization starts. This alternative encoding applies only to byte strings (Major Type 2), text strings (Major Type 3), arrays (Major Type 4), and maps (Major Type 5).

This chapter delves into the practical details of indefinite-length CBOR encoding. We will explore the specific encoding mechanism, examine how strings and collections are represented using this method, discuss its primary use cases and practical implications for parsers, survey its application in real-world protocols, and crucially, understand why this flexible encoding is explicitly disallowed in deterministic CBOR profiles. By the end of this chapter, you will have a solid working knowledge of how CBOR handles streaming data and the trade-offs involved.

The Indefinite-Length Mechanism: AI 31 and the "Break" Code

The core mechanism for indefinite-length encoding leverages a specific value within the Additional Information (AI) part of the initial byte, alongside a unique stop code.

Recall from the Practical Introduction chapter the structure of the initial byte in any CBOR data item:

┌────┬────┬────┬────┬────┬────┬────┬────┐
│  7 │  6 │  5 │  4 │  3 │  2 │  1 │  0 │
├────┴────┴────┼────┴────┴────┴────┴────┤
│  MAJOR TYPE  │ ADDITIONAL INFORMATION │
└──────────────┴────────────────────────┘

While AI values 0 through 27 are used to encode literal values or definite lengths/counts, the AI value 31 (binary 11111) serves a distinct purpose related to indefinite-length items.

Signaling the Start: When AI value 31 is combined with the Major Types that support indefinite lengths (2, 3, 4, and 5), it signals the start of an indefinite-length data item of that specific type. It essentially acts as a marker indicating, "An item of this type begins here, but its total length is not provided; subsequent data items or chunks will follow until a specific terminator is encountered."

Applicable Major Types: It is crucial to remember that indefinite-length encoding is only defined for the following Major Types:

Major Type	Description
2	Byte String
3	Text String
4	Array
5	Map

Other Major Types (0, 1, 6, 7) do not have an indefinite-length encoding mechanism defined via AI 31 in this manner.

The Universal Terminator: The 0xff "Break" Code: To signal the end of an indefinite-length sequence (whether it's chunks of a string or elements/pairs of a collection), CBOR defines a unique, single-byte stop code: 0xff. This byte is often referred to as the "break" code.

The encoding of the break code itself is Major Type 7 (Simple Values / Floating-Point) with Additional Information 31. This specific combination (111 11111 binary) is reserved solely for this purpose. Its structure ensures it cannot be mistaken for the start of any standard CBOR data item, making it an unambiguous terminator for indefinite-length sequences. A parser encountering 0xff in a context where it's expecting the next chunk of an indefinite string, the next element of an indefinite array, or the next key/value of an indefinite map knows that the indefinite-length item is now complete.

The following table summarizes the specific initial bytes used to start indefinite-length items and the universal break code:

Type	MT	AI	Encoding	Description
Indefinite Byte String	2	31	`5f5f`	Start of indefinite byte string
Indefinite Text String	3	31	`7f7f`	Start of indefinite text string
Indefinite Array	4	31	`9f9f`	Start of indefinite array
Indefinite Map	5	31	`bfbf`	Start of indefinite map
Break Code	7	31	`ffff`	End of any indefinite-length item

Understanding these specific byte values (5f5f, 7f7f, 9f9f, bfbf for starting, ffff for stopping) is key to recognizing and parsing indefinite-length CBOR data streams.

Streaming Data: Indefinite-Length Strings

Indefinite-length strings provide a way to encode byte sequences or UTF-8 text without knowing the total number of bytes beforehand. They achieve this by breaking the string content into manageable chunks.

The fundamental concept is that an indefinite-length string is represented as:

The specific start marker (5f5f for byte strings, 7f7f for text strings).
A sequence of zero or more definite-length string chunks of the same major type.
The ffff break code.

The logical value of the complete string is obtained by concatenating the content (the raw bytes or UTF-8 text, excluding the definite-length headers) of these chunks in the order they appear.

Indefinite-Length Byte Strings (Major Type 2, AI 31)

Encoding Structure: An indefinite-length byte string starts with 5f5f, followed by zero or more definite-length byte string chunks, and terminates with ffff.

5f [chunk1][chunk2]... ff

5f [chunk1][chunk2]... ff

Chunk Structure: Each [chunkN][chunkN] must be a complete, definite-length byte string data item (Major Type 2, AI 0-27). For example, 43 010203 represents a chunk containing the 3 bytes 0x01, 0x02, 0x03. An empty chunk, encoded as 40, is also valid and contributes nothing to the concatenated value.
Examples:
- An empty byte string encoded using indefinite length:
  - CBOR Diagnostic: _ h''_ h''
  - CBOR Hex: 5f ff5f ff
- The byte sequence 0x01, 0x02, 0x03, 0x04, 0x05 encoded indefinitely with two chunks:
  - CBOR Diagnostic: _ h'010203' h'0405'_ h'010203' h'0405'
  - CBOR Hex: 5f 43 010203 42 0405 ff5f 43 010203 42 0405 ff
    - 5f5f: Start indefinite byte string
    - 43 01020343 010203: Chunk 1 (definite length 3, bytes 01 02 0301 02 03)
    - 42 040542 0405: Chunk 2 (definite length 2, bytes 04 0504 05)
    - ffff: Break code
- The same byte sequence encoded indefinitely with a single chunk:
  - CBOR Diagnostic: _ h'0102030405'_ h'0102030405'
  - CBOR Hex: 5f 45 0102030405 ff5f 45 0102030405 ff
    - 5f5f: Start indefinite byte string
    - 45 010203040545 0102030405: Chunk 1 (definite length 5, bytes 01 02 03 04 0501 02 03 04 05)
    - ffff: Break code

Notice that the same logical byte sequence (01020304050102030405) can be represented in multiple ways using indefinite-length encoding, depending on the chunking strategy. This flexibility is the core benefit for streaming, but it also introduces non-canonical representations. Furthermore, compared to the definite-length encoding (45 010203040545 0102030405), the indefinite-length versions carry an overhead of at least two bytes (the 5f5f start marker and the ffff break code), plus the header bytes for each chunk. This trade-off between flexibility, overhead, and canonicality is central to understanding indefinite-length encoding.

Indefinite-Length Text Strings (Major Type 3, AI 31)

Encoding Structure: An indefinite-length text string starts with 7f7f, followed by zero or more definite-length text string chunks, and terminates with ffff.

7f [chunk1][chunk2]... ff

7f [chunk1][chunk2]... ff

Chunk Structure: Each [chunkN][chunkN] must be a complete, definite-length text string data item (Major Type 3, AI 0-27), meaning it must contain a sequence of bytes that constitutes valid UTF-8 encoding. For example, 63 666f6f represents a chunk containing the 3 bytes for the UTF-8 string "foo".
UTF-8 Integrity Constraint: This is a critical rule specific to indefinite-length text strings: chunk boundaries must not occur in the middle of a multi-byte UTF-8 character sequence. Each individual chunk, when decoded, must result in a valid UTF-8 string. The concatenation of these valid chunks naturally forms the final, valid UTF-8 string. This constraint implies that an encoder generating indefinite-length text strings must be UTF-8 aware. When deciding where to split the text into chunks during streaming, it cannot simply cut after an arbitrary number of bytes; it must ensure the cut occurs only at a character boundary. This adds a layer of complexity compared to encoding indefinite-length byte strings, where chunks can be split arbitrarily.
Examples:
- An empty text string encoded using indefinite length:
  - CBOR Diagnostic: _ ""_ ""
  - CBOR Hex: 7f ff7f ff
- The text string "Hello World""Hello World" encoded indefinitely with three chunks:
  - CBOR Diagnostic: _ "Hello" " " "World"_ "Hello" " " "World"
  - CBOR Hex: 7f 65 48656c6c6f 61 20 65 576f726c64 ff7f 65 48656c6c6f 61 20 65 576f726c64 ff
    - 7f7f: Start indefinite text string
    - 65 48656c6c6f65 48656c6c6f: Chunk 1 ("Hello", definite length 5)
    - 61 2061 20: Chunk 2 (" ", definite length 1)
    - 65 576f726c6465 576f726c64: Chunk 3 ("World", definite length 5)
    - ffff: Break code
- The text string "你好""你好" (UTF-8 bytes: e4 bda0 e5 a5bde4 bda0 e5 a5bd) encoded indefinitely:
  - Valid Chunking (split between characters):
    - CBOR Diagnostic: _ "你" "好"_ "你" "好"
    - CBOR Hex: 7f 63 e4bda0 63 e5a5bd ff7f 63 e4bda0 63 e5a5bd ff (Chunk 1: "你", length 3; Chunk 2: "好", length 3)
  - Invalid Chunking (attempting to split within a character): An encoder must not produce, for example, a chunk ending in e4 bde4 bd followed by a chunk starting with a0a0. Each chunk's byte sequence must stand alone as valid UTF-8.

Similar to byte strings, indefinite-length text strings offer streaming flexibility at the cost of overhead and non-canonical representation, with the added requirement of maintaining UTF-8 validity within each chunk.

Streaming Collections: Indefinite-Length Arrays and Maps

Just as strings can be streamed chunk by chunk, CBOR allows arrays and maps to be encoded without knowing the total number of elements or pairs upfront.

The principle is straightforward:

Start with the specific indefinite-length marker (9f9f for arrays, bfbf for maps).
Encode the elements (for arrays) or key-value pairs (for maps) sequentially, one after another.
Terminate the sequence with the ffff break code.

Indefinite-Length Arrays (Major Type 4, AI 31)

Encoding Structure: An indefinite-length array starts with 9f9f, followed by zero or more encoded data items (its elements), and terminates with ffff.

9f [item1][item2][item3]... ff

9f [item1][item2][item3]... ff

Element Structure: Each [itemN][itemN] can be any valid CBOR data item, including integers, strings (definite or indefinite), floats, booleans, null, tags, or even other arrays and maps (definite or indefinite).
Nesting: Indefinite-length arrays can freely contain other indefinite-length items, allowing for complex, nested structures to be streamed.
Examples:
- An empty array encoded using indefinite length:
  - CBOR Diagnostic: [_][_]
  - CBOR Hex: 9f ff9f ff
- The array [1, "two", true][1, "two", true] encoded indefinitely:
  - CBOR Diagnostic: [_ 1, "two", true][_ 1, "two", true]
  - CBOR Hex: 9f 01 63 74776f f5 ff9f 01 63 74776f f5 ff
    - 9f9f: Start indefinite array
    - 0101: Element 1 (integer 11)
    - 63 74776f63 74776f: Element 2 (text string "two""two")
    - f5f5: Element 3 (truetrue)
    - ffff: Break code
- A nested indefinite array [_ "a", "b"][_ "a", "b"]:
  - CBOR Diagnostic: [_ "a", "b"][_ "a", "b"]
  - CBOR Hex: 9f 01 9f 61 61 61 62 ff 03 ff9f 01 9f 61 61 61 62 ff 03 ff
    - 9f9f: Start outer indefinite array
    - 0101: Outer element 1 (integer 11)
    - 9f9f: Start inner indefinite array (Outer element 2)
    - 61 6161 61: Inner element 1 ("a""a")
    - 61 6261 62: Inner element 2 ("b""b")
    - ffff: Break code for inner array
    - 0303: Outer element 3 (integer 33)
    - ffff: Break code for outer array

Indefinite-Length Maps (Major Type 5, AI 31)

Encoding Structure: An indefinite-length map starts with bfbf, followed by zero or more key-value pairs encoded sequentially (key1, value1, key2, value2,...), and terminates with ffff.

bf [key1][value1][key2][value2]... ff

bf [key1][value1][key2][value2]... ff

Pair Structure: Each key and each value can be any valid CBOR data item. Crucially, the data items between the bfbf marker and the ffff break code must come in pairs. A map must contain an even number of data items following the initial bfbf.
Nesting: Indefinite-length maps can contain indefinite-length items as either keys or values.
Examples:
- An empty map encoded using indefinite length:
  - CBOR Diagnostic: _ {}_ {}
  - CBOR Hex: bf ffbf ff
- The map {"a": 1, "b": false}{"a": 1, "b": false} encoded indefinitely:
  - CBOR Diagnostic: _ {"a": 1, "b": false}_ {"a": 1, "b": false}
  - CBOR Hex: bf 61 61 01 61 62 f4 ffbf 61 61 01 61 62 f4 ff
    - bfbf: Start indefinite map
    - 61 6161 61: Key 1 ("a""a")
    - 0101: Value 1 (integer 11)
    - 61 6261 62: Key 2 ("b""b")
    - f4f4: Value 2 (falsefalse)
    - ffff: Break code
- A map containing an indefinite-length byte string as a value {"data": _ h'01' h'02'}{"data": _ h'01' h'02'}:
  - CBOR Diagnostic: _ {"data": _ h'01' h'02'}_ {"data": _ h'01' h'02'}
  - CBOR Hex: bf 64 64617461 5f 41 01 41 02 ff ffbf 64 64617461 5f 41 01 41 02 ff ff
    - bfbf: Start indefinite map
    - 64 6461746164 64617461: Key ("data""data")
    - 5f5f: Start indefinite byte string (Value)
    - 41 0141 01: Chunk 1 (h'01'h'01')
    - 41 0241 02: Chunk 2 (h'02'h'02')
    - ffff: Break code for byte string
    - ffff: Break code for map

The requirement for an even number of items between bfbf and ffff is an important validation check for parsers. If a parser encounters the ffff break code immediately after reading a key but before reading its corresponding value, it indicates a malformed indefinite-length map. This adds a slight amount of state tracking (ensuring pairs are complete) compared to parsing indefinite-length arrays.

Use Cases and Practical Considerations

The primary motivation for indefinite-length encoding is to support streaming scenarios where data sizes are unknown upfront.

Network Protocols: In protocols designed for constrained environments or transferring large objects, the ability to send data in chunks without pre-calculating the total size is valuable. CoAP (Constrained Application Protocol) Block-Wise Transfers (RFC-7959) is often cited in this context. While CoAP itself manages the blocking at the protocol level, and the payloads within those blocks are often CBOR using definite lengths for simplicity, the overall concept aligns with handling large data incrementally. Indefinite-length CBOR could be used within such frameworks, although definite-length chunks are common in practice.
Log Generation/Aggregation: Systems that generate extensive logs or aggregate log streams from various sources can benefit. An application can start an indefinite-length array or map for a log record, append fields (potentially including large, streamed strings) as they become available, and finalize the record with the break code without needing to buffer the entire structure in memory first.
Data Pipelines: When CBOR data flows through multiple processing stages, using indefinite-length encoding can sometimes avoid the need for intermediate stages to buffer entire large strings or collections just to determine their length before passing them on.

However, using indefinite-length items introduces practical considerations for implementation:

Parser Implementation: Parsing definite-length items is often simpler. The parser reads the length L, potentially allocates memory for L bytes or L items, and then reads exactly that amount of data. Parsing indefinite-length items requires a different logic: the parser reads the start marker (5f5f/7f7f/9f9f/bfbf), then enters a loop, reading one complete data item (a chunk, an element, or a key-value pair) at a time. After each item, it must check if the next byte is the ffff break code. If not, it continues the loop; if it is, the indefinite item is complete. This typically involves more state management within the parser.
Buffering Considerations: While indefinite-length encoding allows the sender to stream data without knowing the total size, it doesn't automatically eliminate the need for buffering on the receiver's side. If the receiving application needs the entire concatenated string value, or needs access to all array elements simultaneously, before it can perform its processing, it will still have to accumulate the incoming chunks or elements in memory until the ffff break code is received. The primary benefit of streaming often accrues to the sender by reducing memory requirements and latency-to-first-byte, but the receiver's processing model dictates whether it can also process the data incrementally or must buffer.
Nesting Complexity: Parsing nested indefinite-length items requires careful management. When a parser encounters an indefinite-length start marker while already parsing another indefinite-length item, it must correctly associate the eventual ffff break codes with their corresponding start markers. This is typically handled using a stack internally within the parser to keep track of the nesting depth and the type of indefinite item currently being parsed.

Indefinite-Length Items in the Wild

While indefinite-length encoding is a standard part of the CBOR specification (RFC-8949), its adoption in specific protocols and applications appears less widespread than definite-length encoding.

As mentioned, CoAP Block-Wise Transfers (RFC-7959) provides a mechanism conceptually similar to streaming, allowing large resources (which might be represented in CBOR) to be transferred in chunks over constrained networks. However, the specification focuses on the CoAP-level blocking and doesn't mandate the use of CBOR indefinite-length encoding within those blocks. Implementations often favor definite-length CBOR for the block payloads due to simpler handling and the deterministic nature often desired, even if the overall resource size isn't known initially by the CoAP endpoints.

Finding other prominent, standardized protocols that mandate or heavily rely on CBOR indefinite-length encoding can be challenging. This might be partly attributed to the implications for deterministic encoding (discussed next) and the fact that many applications prioritize predictability or can manage buffering to determine definite lengths.

Nonetheless, the mechanism exists as a standard tool for scenarios where a sender truly cannot determine the size beforehand, particularly in highly resource-constrained environments or pure streaming pipelines where avoiding buffering on the sender side is paramount.

Why Not Deterministic? The Canonical Conundrum

One of the most significant implications of indefinite-length encoding is its incompatibility with deterministic encoding requirements.

The Goal of Deterministic Encoding: As outlined in RFC 8949, Section 4.2 ("Core Deterministic Encoding Requirements"), and forming the basis for profiles like dCBOR, the primary goal of deterministic encoding is to ensure that any given CBOR data model instance has exactly one, unambiguous, canonical binary representation. This property is absolutely critical for several use cases:

Cryptographic Signatures: To verify a digital signature over CBOR data, the verifier must be able to reconstruct the exact sequence of bytes that was originally hashed and signed. If multiple valid encodings exist for the same logical data, signature verification becomes unreliable or impossible.
Hashing: When using cryptographic hashes for data integrity checks, content addressing (like in distributed systems or blockchains), or indexing, it's essential that identical data always produces the identical hash. This requires a single, canonical byte representation.
Data Comparison: In databases or distributed systems, comparing data items for equality often relies on simple byte-wise comparison for efficiency. This only works correctly if the encoding is canonical.

The Ambiguity of Indefinite-Length: Indefinite-length encoding fundamentally breaks the canonical requirement because it allows the same logical data (a specific string, array, or map) to be encoded into multiple, different byte sequences based solely on how the sender chooses to chunk the data (for strings) or simply by virtue of using the indefinite markers instead of definite ones.

Consider the simple byte string h'01020304'h'01020304':

Definite-Length Encoding (Canonical): 44 0102030444 01020304 (1 initial byte + 4 content bytes = 5 bytes total)
Indefinite-Length (1 chunk): 5f 44 01020304 ff5f 44 01020304 ff (1 start byte + 1 chunk header byte + 4 content bytes + 1 break byte = 7 bytes total)
Indefinite-Length (2 chunks): 5f 42 0102 42 0304 ff5f 42 0102 42 0304 ff (1 start + 1+2 chunk1 + 1+2 chunk2 + 1 break = 8 bytes total)
Indefinite-Length (4 chunks): 5f 41 01 41 02 41 03 41 04 ff5f 41 01 41 02 41 03 41 04 ff (1 start + 4*(1+1) chunks + 1 break = 10 bytes total)

All four representations above correspond to the same logical sequence of four bytes. However, they result in distinct binary encodings (44...44..., 5f 44...5f 44..., 5f 42...5f 42..., 5f 41...5f 41...).

Violation of Canonical Requirement: This inherent possibility of multiple valid byte sequences for identical data directly violates the core principle of deterministic, canonical encoding. There is no single "preferred" way to chunk an indefinite-length string, making the representation inherently ambiguous from a byte-sequence perspective.

Exclusion from Deterministic Profiles: Consequently, specifications defining deterministic CBOR encoding, such as RFC 8949 Section 4.2.2 ("Length-Determinism"), explicitly forbid the use of indefinite-length encoding. Any data item whose initial byte is 5f5f, 7f7f, 9f9f, or bfbf is disallowed in contexts requiring Core Deterministic Encoding or similar canonical profiles. This exclusion is not arbitrary; it is a necessary consequence of prioritizing byte-for-byte reproducibility over the flexibility offered by indefinite-length streaming. Applications requiring canonical forms must use definite-length encoding, which necessitates knowing the size of strings and the counts for collections before serialization.

Conclusion: Flexibility vs. Predictability

Indefinite-length encoding stands as a specialized feature within the CBOR standard, designed to address the practical challenge of serializing data whose size is unknown when encoding begins. By using dedicated start markers (5f5f, 7f7f, 9f9f, bfbf) based on Major Type combined with Additional Information 31, and a universal 0xff0xff break code, CBOR allows byte strings, text strings, arrays, and maps to be constructed incrementally. For strings, this involves concatenating definite-length chunks; for collections, it involves appending elements or key-value pairs sequentially until the break code is encountered.

The primary advantage of this mechanism is its ability to support streaming applications, enabling senders (especially those with limited memory or needing low latency) to transmit data without first buffering the entire object to calculate its size.

However, this flexibility comes with significant trade-offs, including non-canonical representations leading to exclusion from deterministic profiles, potential overhead in encoding size, and increased complexity in parsing logic. The requirement for UTF-8 integrity in indefinite-length text strings and dealing with nested indefinite items adds further complexity for implementers.

CBOR Sequences: Streaming Independent Data Items

Introduction: Beyond Single Items

Previous chapters have explored the encoding of individual Concise Binary Object Representation (CBOR) data items, covering fundamental types like integers, strings, booleans, and null, as well as structured types like arrays and maps. We examined how definite-length and indefinite-length encodings work, and how semantic tags (Major Type 6) extend the basic data model. The focus thus far has been on representing self-contained, individual data structures, analogous to a single JSON document or a distinct object in memory.

However, many real-world applications involve data that doesn't naturally fit into a single, monolithic structure. Consider scenarios like generating log entries over time, receiving a continuous stream of sensor measurements, or exchanging a series of independent commands and responses between systems. While it's possible to wrap such sequences within a top-level CBOR array (Major Type 4), perhaps even an indefinite-length one, this approach can be inefficient or semantically awkward. It forces a collection structure onto items that might be fundamentally independent, and it requires either knowing the total count upfront (for definite-length arrays) or managing start and end markers (for indefinite-length arrays).

To address these scenarios more directly and efficiently, the IETF defined CBOR Sequences in RFC-8742. A CBOR Sequence provides a way to represent a stream or series of distinct CBOR data items without enclosing them in an overarching container like an array.

Formally, a CBOR Sequence is defined recursively as a sequence of bytes that is either:

An empty (zero-length) sequence of bytes.
The sequence of bytes representing a single, well-formed encoded CBOR data item (as defined in RFC-8949), immediately followed by another CBOR Sequence.

In essence, a CBOR Sequence is generated by simply concatenating the binary encodings of zero or more individual CBOR data items. This concatenation is the core mechanism. Crucially, there are no explicit delimiters, framing bytes, or termination markers inserted between the constituent CBOR items within the sequence itself. This minimalist design is possible because standard CBOR data items are inherently self-delimiting; the initial byte(s) of any CBOR item contain information about its type and length (or value), allowing a parser to determine exactly where that item ends and the next one begins. This contrasts sharply with formats like JSON Text Sequences (RFC-7464), which require explicit markers (like the ASCII Record Separator character followed by a newline) between JSON texts because JSON values themselves are not always self-delimiting when concatenated.

The official media type associated with this format is application/cbor-seq. Additionally, the structured syntax suffix +cbor-seq has been registered, allowing other media types to indicate that their underlying structure is a CBOR Sequence, analogous to how +cbor signifies a base of a single CBOR item.

This definition highlights the fundamental simplicity of CBOR Sequences – mere concatenation. The absence of sequence-level headers, item counts, or termination markers is a deliberate design choice rooted in CBOR's self-describing nature. Since each CBOR item encodes its own type and length, a parser can theoretically determine item boundaries without extra framing. Adding such framing would introduce overhead, running counter to CBOR's goal of conciseness, particularly for applications streaming many small items. However, this simplicity places the responsibility of determining the overall end of the sequence entirely on mechanisms external to the sequence format itself, such as the end of a file or the closure of a network connection. This characteristic has significant implications for how sequences are transported and how errors are handled, which will be explored later in this chapter.

Why CBOR Sequences? Technical Motivations

The design of CBOR Sequences stems from specific technical requirements where traditional single-item encodings or array structures fall short. The primary motivations include enabling efficient streaming, facilitating incremental processing, offering potential efficiency gains over arrays, and leveraging the simplicity of concatenation.

Streaming Data: CBOR Sequences are particularly well-suited for scenarios where data is generated or consumed continuously over time, and the total volume or number of items is not known when the process begins. Common examples include streaming application logs, transmitting time-series sensor data from IoT devices, or handling real-time event feeds. In such cases, appending a new, independently encoded CBOR item to the end of an existing sequence is straightforward. This contrasts with definite-length CBOR arrays, which require the element count to be specified upfront, and indefinite-length arrays, which, while streamable, still represent a single logical collection that must eventually be terminated by a specific 'break' byte (0xFF0xFF). Sequences allow indefinite extension without modifying previously transmitted data.
Incremental Processing: A key advantage is that sequences allow both producers and consumers to operate on discrete items one at a time. A producer can fully encode and transmit a single CBOR item. A consumer can receive the bytes for that item, decode it completely, and process it before the next item even begins to arrive. This model avoids the need for complex streaming parsers or encoders that must handle partially received or generated structures (like elements within a large array). This simplification is especially valuable for resource-constrained environments, such as IoT devices, where memory limitations might make buffering large, monolithic arrays impractical.
Efficiency Compared to Arrays: When representing a list or sequence of items, CBOR arrays (Major Type 4) introduce a small amount of overhead. A definite-length array requires an initial byte (or potentially more for very large counts) to encode the number of elements it contains. An indefinite-length array requires an initial byte indicating the indefinite type (0x9F0x9F for arrays, 0xBF0xBF for maps) and must be terminated by a final 0xFF0xFF break byte. For a sequence containing N items, using a CBOR array introduces 1 to 9 bytes of structural overhead (for the count or start/end markers). In contrast, a CBOR Sequence adds zero bytes of overhead beyond the concatenated bytes of the items themselves. While this overhead is often negligible, it can become significant when dealing with a very large number of very small items, a common pattern in sensor data or event streams.
Simplicity of Concatenation: The definition itself highlights this: generating a CBOR Sequence is achieved simply by concatenating the byte representations of individually encoded CBOR items. Furthermore, concatenating two valid CBOR Sequences always results in another valid CBOR Sequence. This property can simplify certain data aggregation pipelines or forwarding proxies where streams of CBOR items from different sources need to be merged.

These motivations reveal a fundamental design choice: CBOR Sequences prioritize the representation of a flow of independent items over a structured collection. CBOR arrays and maps (Major Types 4 and 5) represent semantically coherent, single data items within the CBOR data model; they possess a defined structure and element/pair count. Sequences, lacking this inherent enclosing structure, are better suited for streams where items might be processed individually and may not form a single logical entity. The decision between using an array or a sequence often hinges on whether the data is conceptually viewed as "one large object containing parts" or "many small, sequential objects". If the application requires an explicit marker for the end of the sequence within the data stream itself, RFC-8742 suggests that encoding the items within a CBOR array might be the more appropriate representation. Choosing sequences implies a shift away from processing a single, potentially large, structure towards processing a series of smaller, independent units.

Encoding: Simple Concatenation

The mechanism for encoding a CBOR Sequence is remarkably straightforward: individually encode each constituent data item according to the standard CBOR encoding rules defined in RFC-8949 (and detailed in a preceding chapter), and then simply concatenate the resulting byte strings in the desired order. No additional bytes, delimiters, or framing information are introduced between the encoded items as part of the sequence format itself.

Let's illustrate this with hexadecimal examples:

Example 1: Simple Sequence

Consider encoding the sequence of values: 1, then "foo", then true.
1. The integer 11 (Unsigned Integer, Major Type 0, Additional Information 1) encodes as a single byte: 0x010x01.
2. The text string "foo""foo" (Text String, Major Type 3, Additional Information 3 indicates length 3) encodes as the header byte 0x630x63 followed by the 3 UTF-8 bytes for "foo" (66 6f 6f66 6f 6f): 63 666f6f63 666f6f.
3. The boolean truetrue (Simple Value, Major Type 7, Additional Information 21) encodes as a single byte: 0xf50xf5. The resulting CBOR Sequence is the direct concatenation: 01 63 666f6f f501 63 666f6f f5.
Example 2: Sequence Containing Structured Items

Consider encoding the sequence: [10, false][10, false], then {"a": -1}{"a": -1}.
1. The array [10, false][10, false] (Array, Major Type 4, Additional Information 2 indicates 2 elements) encodes as: 0x820x82 (header) followed by 0x0a0x0a (encoding for 10) and 0xf40xf4 (encoding for false): 82 0a f482 0a f4.
2. The map {"a": -1}{"a": -1} (Map, Major Type 5, Additional Information 1 indicates 1 pair) encodes as: 0xa10xa1 (header) followed by 61 6161 61 (encoding for key "a") and 20 (encoding for value -1): a1 61 61 20a1 61 61 20. The resulting CBOR Sequence is the concatenation: 82 0a f4 a1 61 61 2082 0a f4 a1 61 61 20.
Example 3: Empty Sequence

An empty CBOR Sequence, containing zero items, is represented by an empty (zero-length) sequence of bytes.

It is instructive to contrast the CBOR Sequence encoding with CBOR array encodings for the same logical list of items. Taking the data from Example 1 (11, "foo""foo", truetrue):

As a Definite-Length Array [1, "foo", true][1, "foo", true]: Encoded as 83 01 63 666f6f f583 01 63 666f6f f5. The initial byte 0x830x83 signifies Major Type 4 (Array) with Additional Information 3 (three elements follow).
As an Indefinite-Length Array [1, "foo", true][1, "foo", true]: Encoded as 9f 01 63 666f6f f5 ff9f 01 63 666f6f f5 ff. This starts with 0x9f0x9f (Major Type 4, AI 31 - indefinite-length array), includes the encoded elements, and ends with the 0xff0xff (Major Type 7, AI 31 - break code) marker.

The following table summarizes these differences visually, including a comparison to JSON:

Data Model	Representation	Hexadecimal Encoding	Overhead Bytes	Framing Mechanism
`[1, "foo", true][1, "foo", true]`	JSON	`[31, 2c, 22 66 6f 6f 22, 2c, 74 72 75 65][31, 2c, 22 66 6f 6f 22, 2c, 74 72 75 65]` (ASCII)	Variable	Text delimiters (`[ , ]`)
`[1, "foo", true][1, "foo", true]`	CBOR Definite Array	`83 01 63 666f6f f583 01 63 666f6f f5`	1 (`8383`)	Initial byte (Type 4 + Count 3)
`[1, "foo", true][1, "foo", true]`	CBOR Indefinite Array	`9f 01 63 666f6f f5 ff9f 01 63 666f6f f5 ff`	2 (`9f9f`, `ffff`)	Start marker (`9f9f`) + Break (`ffff`)
`1, "foo", true1, "foo", true`	CBOR Sequence	`01 63 666f6f f501 63 666f6f f5`	0	None (Self-delimiting items)

This comparison clearly shows that CBOR Sequences eliminate the structural overhead associated with arrays by relying entirely on the self-delimiting nature of the constituent CBOR items. This byte-level difference underscores the efficiency motivation, especially for streams of numerous small items.

Decoding: Reading Item by Item

Decoding a CBOR Sequence involves processing the input byte stream iteratively or recursively, extracting one complete CBOR data item at a time until the stream is exhausted. The fundamental process is:

Check for End: Determine if the input stream or remaining buffer is empty. If it is, the sequence decoding is complete.
Decode Item: Attempt to decode a single, complete CBOR data item starting from the current position in the stream/buffer. This requires the decoder to correctly interpret the initial byte(s) to understand the item's major type, additional information, and any subsequent length or value bytes, thereby determining the total number of bytes constituting this single item.
Yield and Consume: If decoding the item is successful, yield or otherwise process the resulting data model value. Advance the position in the stream/buffer by the number of bytes consumed by the decoded item.
Repeat: Go back to Step 1 with the remainder of the stream/buffer.

The self-delimiting property of standard CBOR items is the cornerstone of this process; the decoder must be able to precisely identify the boundaries of each item based solely on the CBOR encoding rules.

Handling Stream Boundaries and Errors:

Normal Termination: Successful decoding concludes when the input stream is fully consumed exactly after a complete CBOR item has been decoded.
Truncation: If the input stream ends unexpectedly while decoding an item (i.e., the header indicates more bytes are needed than are available), this signifies truncation. A decoder designed for streaming data might pause at this point, waiting for more bytes to arrive before declaring an error. For file-based decoding, this typically indicates an incomplete file.
Malformed Item: If the bytes encountered do not form a well-formed CBOR data item (e.g., invalid major type/additional information combination, inconsistent length information), the decoder loses its ability to determine where the erroneous item ends. Because there are no explicit delimiters between items in a sequence, the decoder cannot reliably find the beginning of the next potential item. Consequently, a single malformed item usually prevents the decoding of the remainder of the sequence. While sophisticated error recovery might be attempted in some implementations, it is not guaranteed by the specification.
Missing Items: The CBOR Sequence format itself provides no way to detect if expected items are simply absent from the end of the sequence. If the stream terminates cleanly after the last item that was actually present, the decoder will report success. Detecting missing items requires application-level logic, such as checking expected counts, using timeouts in network protocols, or implementing specific acknowledgement mechanisms.

The fragility in the face of malformed items is a direct consequence of the design choice to omit explicit delimiters for the sake of conciseness. Unlike newline-delimited formats like NDJSON, where a parser can often resynchronize by searching for the next newline character even after encountering invalid JSON, a CBOR Sequence parser relies entirely on the internal integrity of each item to navigate the byte stream. If an item's structure is compromised, the parser effectively becomes lost. This implies that applications relying on CBOR Sequences should prioritize robust data validation before or during sequence generation, or they must be prepared for the possibility that transmission errors affecting a single item could render a large portion of a sequence unusable. For applications demanding higher resilience against such errors, incorporating additional framing or error-checking mechanisms at a higher protocol layer might be necessary.

CBOR Sequences in Diagnostic Notation

The current draft for CBOR Extended Diagnostic Notation (EDN) proposes a way to represent CBOR Sequences in a human-readable format using <<<< and >>>> as delimiters, with the items separated by commas:

<< item1, item2, item3, ... >>

<< item1, item2, item3, ... >>

The sequence from the example given above would be represented as:

<< 1, "foo", true >>

<< 1, "foo", true >>

If you enter this into the CBOR Playground and convert it to the serialized hexadecimal representation, you'll see that it converts the sequence to a CBOR byte string:

46              # bytes(6)
   0163666F6FF5

46              # bytes(6)
   0163666F6FF5

If we manually parse this out, we can see that the first byte 4646 indicates a byte string of length 6, followed by the bytes for the integer 11, the string "foo""foo", and the boolean truetrue with no other delimiters or framing:

46              # bytes(6)
   01           # unsigned(1)
   63666F6F     # "foo"
   F5           # true

46              # bytes(6)
   01           # unsigned(1)
   63666F6F     # "foo"
   F5           # true

The fact that the byte string header 4646 is included might be confusing, as it implies that the sequence is a single item.

When we convert the serialized sequence back into diagnostic notation, we just get the byte string representation, as we would expect:

h'0163666F6FF5'

h'0163666F6FF5'

CBOR arrays begin with a header that specifies the array's fixed length, and indefinite-length arrays begin with the indefinite array item and end with the break item. But sequences do not themselves have delimiters or other framing.

The EDN notation is just a way to represent the sequence in a human-readable format, but it does not change the underlying encoding of the sequence itself. Sequences serialized this way are therefor not self-identifying. A CBOR decoder could be instructed to decode a byte string as a sequence, but the fact that it is a sequence cannot be determined by inspecting the byte string itself.

Practical Use Cases

The characteristics of CBOR Sequences make them suitable for a variety of applications, particularly those involving streams or sequences of independent data units.

Streaming Applications: This is a primary motivator for CBOR Sequences.
- Logs, Metrics, and Events: Systems generating continuous streams of structured log entries, performance metrics, or discrete events can encode each entry/event as an individual CBOR item and concatenate them into a sequence for transmission or storage. The low overhead is advantageous, and the independence of items aligns well with the nature of logs or events.
- Sensor Data Feeds: IoT devices often transmit time-series data from sensors. Using CBOR Sequences allows sending each reading or batch of readings as a separate item, benefiting from CBOR's general compactness and avoiding the per-sequence overhead of arrays, which can be significant for frequent, small readings.
Record Sequences (Binary NDJSON Equivalent): CBOR Sequences provide a binary alternative to text-based formats like Newline Delimited JSON (NDJSON) or JSON Lines (JSONL). They can be used for efficiently transferring large datasets as a sequence of records, such as rows from a database query or batches of results from an API, allowing for incremental processing on the receiving end. The key advantage over NDJSON is the potential for smaller size and faster parsing due to the binary encoding and native support for binary data types without base64 encoding.
Message Delimitation in Protocols: In network protocols built on persistent connections (like TCP or WebSockets), where multiple distinct messages need to be exchanged, each message can be encoded as a single CBOR data item. A CBOR Sequence can represent the stream of these messages. For example, a sequence of Remote Procedure Call (RPC) requests or responses, or a stream of server-sent events, could be structured this way. However, a critical caveat applies: the CBOR Sequence format itself does not provide message framing over stream transports like raw TCP. The protocol implementation must rely on the transport layer (e.g., WebSocket message boundaries) or add its own framing mechanism (like length prefixing) to allow the receiver to distinguish individual messages within the stream.
Sequential File Formats: Large datasets can be stored in files as a CBOR Sequence, allowing applications to read or write the file incrementally, processing one CBOR item at a time without loading the entire file into memory. This approach might be combined with proposed mechanisms for adding CBOR magic numbers or tags at the beginning of the file or sequence for identification purposes.

Observing these use cases reveals a common underlying pattern: they often rely on an external mechanism to determine the boundaries of the sequence or the items within it, especially in streaming contexts. CBOR Sequence defines the content format (concatenated self-delimiting items) but not the framing for transport or storage. Streaming logs over TCP relies on the connection lifecycle or application-level protocols; WebSocket usage relies on WebSocket framing; file storage relies on the end-of-file marker. Therefore, while CBOR Sequences offer an efficient encoding for sequential data, engineers must consciously address how these sequences (or the individual items within them) are delimited and detected within the specific transport or storage context being used. Relying solely on the CBOR Sequence format definition without considering this framing aspect can lead to implementation pitfalls.

Comparison with Alternatives

Choosing the right data representation format involves understanding the trade-offs. CBOR Sequences should be compared against other relevant CBOR structures (arrays) and common streaming formats (like NDJSON).

CBOR Sequences vs. CBOR Arrays (Definite and Indefinite):
- Structure: A fundamental difference lies in the data model. A CBOR Sequence represents multiple, independent top-level data items concatenated together. A CBOR Array (Major Type 4) is always a single top-level data item, whose content is an ordered list of elements.1
- Framing: Sequences have no internal framing; boundaries are implicit based on item self-delimitation. Definite-length arrays encode the element count in their header. Indefinite-length arrays use a specific start byte (9F9F) and require a terminating FFFF break byte.1
- Overhead: Sequences introduce zero framing overhead. Definite arrays add 1-9 bytes for the count. Indefinite arrays add exactly 2 bytes (9F9F + FFFF).
- Processing Model: Sequences naturally lend themselves to item-by-item streaming and processing. While indefinite-length arrays also allow streaming of their elements, they are still conceptually processed as a single array unit that is only complete upon encountering the FFFF marker. Definite-length arrays typically imply processing as a whole unit once all elements are received.
- Use Case Alignment: Sequences are ideal for streams of independent items, especially when the total count is unknown or potentially unbounded, and minimizing overhead is paramount. Arrays are better suited when the data represents a single, semantically coherent list, the structure of the collection itself is significant, and an explicit end marker (for indefinite) or count (for definite) is desirable within the CBOR structure itself.
- Error Handling: As discussed, a malformed item in a sequence can prevent decoding the rest of the stream. Errors within an array element might be more contained, although recovery, especially in indefinite arrays, can still be challenging.
CBOR Sequences vs. NDJSON / JSON Lines:
- Encoding: Sequences use CBOR's binary encoding, which is typically more compact and faster to parse than NDJSON's text-based JSON encoding for each line.
- Delimitation: Sequences rely on the self-delimiting nature of CBOR items. NDJSON uses an explicit newline character (\n) after each JSON text, making it line-oriented.
- Efficiency: CBOR Sequences generally offer better performance in terms of size and processing speed due to their binary nature.
- Error Handling: The explicit newline delimiter in NDJSON often makes it easier for parsers to skip over a malformed JSON line and attempt to process the next one. CBOR Sequences lack this explicit delimiter, making recovery from malformed items harder.
- Binary Data: CBOR has native support for byte strings (Major Type 2), allowing efficient embedding of binary data. NDJSON requires binary data to be encoded within JSON strings, typically using Base64, which adds significant size overhead (around 33%) and processing complexity.
- Human Readability: NDJSON, being text-based, is human-readable with standard text tools. CBOR Sequences require specialized tools for inspection.

The following table summarizes the key characteristics of these alternatives:

Feature	CBOR Sequence	CBOR Indefinite Array	NDJSON / JSON Lines
Encoding	Binary	Binary	Text (JSON per line)
Structure	Concatenated items (multi-root)	Single array item (single root)	Concatenated lines (multi-root)
Item Delimitation	Implicit (self-delimiting)	Explicit (`0xFF0xFF` break marker)	Explicit (newline `\n`)
Overhead	None (beyond items)	2 bytes (`0x9F0x9F`/`0xBF0xBF` + `0xFF0xFF`)	1 byte per item (`\n`)
Processing	Item-by-item (native streaming)	Element-by-element streaming	Line-by-line (native streaming)
Readability	Low (requires tools)	Low (requires tools)	High (text editor-friendly)
Binary Data	Native (byte strings)	Native (byte strings)	Requires Base64 encoding
Error Recovery	Difficult (malformed item breaks stream)	Difficult (malformed element)	Easier (can skip bad lines)
Standard	RFC 8742	RFC 8949	Informal spec (ndjson.org)

This comparison highlights that CBOR Sequences offer a high-performance, low-overhead binary format for streaming independent items, trading off some error recovery robustness and human readability compared to text-based alternatives like NDJSON, and differing structurally from CBOR arrays.

Practical Advice for Engineers

When considering or implementing CBOR Sequences, several practical aspects require attention to ensure correct and robust behavior.

Guidance: When to Choose Sequences Over Arrays:
- Favor Sequences when:
  - The data naturally represents a stream of independent records, events, or messages.
  - The total number of items is unknown upfront or potentially unbounded.
  - Minimizing encoding overhead is a primary concern (e.g., for high-frequency, small items).
  - Incremental, item-by-item processing is the desired model for both producer and consumer.
  - The simplicity of direct concatenation aligns with the application logic (e.g., merging streams).
- Favor Arrays (Definite or Indefinite) when:
  - The data represents a single, semantically coherent collection or list.
  - The overall structure of the collection itself is meaningful.
  - The total count of items is known (definite) or will eventually be known (indefinite).
  - An explicit end marker within the CBOR data stream itself is required (indefinite arrays provide 0xFF0xFF).
  - Compatibility with systems expecting a single top-level CBOR item is necessary.
Transport Layer Considerations (Framing):
- A critical point often overlooked is that CBOR Sequence format does not inherently solve message framing over stream-based transports like TCP. TCP provides a reliable byte stream, but it does not preserve message boundaries. Sending a raw CBOR Sequence (concatenated items) over TCP means the receiver might receive partial items, multiple items, or parts of multiple items in a single read() call. The receiver cannot reliably identify item boundaries just by looking at TCP packet arrivals.
- To handle this, a framing mechanism must be implemented above TCP but below or as part of the application logic utilizing CBOR Sequences:
  - Length Prefixing: Before sending each CBOR item, transmit its length as a fixed-size integer (e.g., 4 bytes network order) or a variable-length integer (like Protobuf varints). The receiver first reads the length, then reads exactly that many bytes to get the complete CBOR item. This is a common pattern but reintroduces framing overhead.
  - Delimiter-Based Framing: Use a specific byte sequence (chosen carefully to avoid collision with valid CBOR data) to mark the end of one CBOR item and the start of the next. This is generally less robust and less common than length prefixing.
  - Higher-Level Protocols: Utilize protocols that provide built-in message framing. WebSockets, for instance, delivers data in discrete messages; each WebSocket message could contain exactly one CBOR item from the sequence.15 HTTP/2 streams or QUIC streams also offer framing capabilities.
  - Self-Contained Items: If each item in the sequence is itself a complex structure like a COSE object 11, it might be possible (though potentially complex) to parse partially received data to determine if a complete object has arrived. This relies heavily on the internal structure of each item.
  - Connection Lifecycle: For simple request-response or single-shot transfers, closing the TCP connection can signal the end of the sequence or item. This is inefficient for continuous streams.
- The essential takeaway is that the application or protocol designer must explicitly choose and implement a framing strategy when using CBOR Sequences over raw stream transports.
Implementation Notes & Common Patterns:
- Library Support: Check if the CBOR library being used offers specific support for sequences. For example, the Go fxamacker/cbor library provides UnmarshalFirst and DiagnoseFirst functions that decode only the first item from a byte slice and return the remaining bytes, facilitating iterative processing of a sequence. Standard Unmarshal functions in many libraries might error if trailing bytes exist after the first item, as per RFC-8949 for single items.
- Buffering: When reading from a network stream or file, employ appropriate buffering. Read data chunks into a buffer, attempt to decode one or more complete CBOR items from the buffer, consume the bytes corresponding to successfully decoded items, and handle cases where an item might be split across buffer boundaries (requiring more data to be read).
- Generators/Iterators: In programming languages offering generator functions or iterators (like Python, JavaScript, C#), these constructs provide an idiomatic way to implement a CBOR Sequence decoder. The decoder function can yield each successfully decoded item one at a time, naturally supporting the incremental processing model.
Potential Pitfalls:
- Framing Neglect: The most common pitfall is assuming CBOR Sequences provide message framing over TCP or similar streams. Always implement explicit framing.
- Error Handling Brittleness: Underestimating the consequence that a single malformed item can halt the processing of the rest of the sequence. Implement input validation or accept the potential for data loss on errors.
- Security Gaps: Remember that CBOR Sequences themselves offer no cryptographic protection. If integrity, authenticity, or confidentiality are required, each item (or potentially batches of items) must be individually protected using mechanisms like COSE. Securing the sequential relationship (preventing reordering, deletion, or insertion) often requires additional application-level mechanisms like sequence numbers or chained signatures.
- Resource Exhaustion: While sequences facilitate streaming, a naive decoder implementation that buffers all decoded items in memory before processing can still lead to memory exhaustion. Ensure that processing keeps pace with decoding in a truly incremental fashion.
- Ambiguity/Compatibility: Ensure both communicating parties understand that a CBOR Sequence is being used. Employing the application/cbor-seq media type or the +cbor-seq structured syntax suffix in protocols that support content types (like HTTP, CoAP) can help avoid ambiguity.

Understanding these points requires recognizing the layered nature of communication protocols. CBOR Sequence (RFC 8742) operates at the data representation layer, defining how to encode the content of a stream. Framing mechanisms (length prefixing, WebSocket messages) operate at the transport or session layer to define message boundaries. Security mechanisms like COSE operate at the application layer to protect the content. File system metadata or magic numbers provide context at the storage layer. Engineers must address requirements at each relevant layer; expecting the CBOR Sequence format alone to solve framing or security problems will lead to incomplete or flawed implementations.

Conclusion: Sequences in the CBOR Ecosystem

CBOR Sequences should be viewed as a specific tool within the broader CBOR toolkit, complementing definite-length items, indefinite-length items (including arrays and maps), and semantic tags. They are the appropriate choice when the primary goal is to stream or serialize a sequence of independent CBOR items with minimal overhead, particularly when the total count is unknown. When data represents a single logical collection, or when explicit framing within the CBOR structure itself is desired, CBOR arrays remain the more suitable option.

Looking ahead, understanding CBOR Sequences provides context for related concepts:

CDDL (Concise Data Definition Language): While CDDL (RFC-8610) is primarily used to define the structure of single CBOR data items, conventions exist to describe the expected content of items within a sequence. This often involves defining an array structure in CDDL and adding explanatory text stating that the elements represent the items in a sequence, or using the .cborseq control operator for sequences embedded within byte strings.6
Deterministic Encoding: Rules for deterministic encoding, such as Core Deterministic Encoding defined in RFC-8949, apply to each individual CBOR item within a sequence if a canonical byte representation is required for those items.41 The sequence structure (concatenation) is itself inherently deterministic.

By understanding the mechanics, motivations, and practical considerations of CBOR Sequences, engineers can effectively leverage this format for efficient data streaming and serialization in appropriate contexts.

CBOR Schemas with CDDL

Defining Structure Amidst Flexibility

In previous chapters, we explored the fundamental mechanics of CBOR encoding, learning how basic data types like integers, strings, arrays, and maps are represented in binary. We saw how CBOR's structure, based on Major Types and Additional Information, allows for a self-describing format that is efficient, especially in constrained environments. However, while CBOR itself defines how individual data items are encoded, it doesn't inherently restrict the overall structure of the data exchanged between systems. An application might expect a map with specific keys and value types, or an array containing a precise sequence of elements. Without a way to formally define these expectations, interoperability relies solely on human-readable documentation and the diligence of implementers – a scenario prone to errors and ambiguity.

While CBOR itself is a schemaless and self-describing format, there are many times a formal schema can be helpful to define the structure of the data being exchanged. This is especially true in cases where the data is complex, or when multiple systems need to interoperate. A schema can help ensure that all parties agree on the expected structure and types of data, reducing the risk of errors and misunderstandings.

This is where the Concise Data Definition Language (CDDL) comes in. Standardized by the IETF in RFC-8610 (and updated by RFC-9682), CDDL provides a formal, unambiguous, and human-readable notation specifically designed to express the structure of data formats that use CBOR (and, conveniently, JSON, due to its data model being a subset of CBOR's). Its primary goal is to make defining protocol messages and data formats easy and clear.

Having understood how individual CBOR items are built, we now turn to specifying what structure those items should collectively form. This chapter introduces the essentials of CDDL, focusing on equipping engineers with the practical knowledge needed to define and understand CBOR schemas. We will cover:

The core concepts and syntax of CDDL.
How to represent standard CBOR types and literal values.
Defining arrays, maps, and the crucial concept of CDDL groups for sequences.
Using operators to control occurrences, choices, and value constraints.
Building complex schemas by composing simpler, reusable rules.
The role of CDDL in data validation and the tooling ecosystem.

A key focus will be understanding how CDDL models sequences of CBOR items using groups, a concept distinct from CBOR arrays or maps, and how this directly relates to the sequential nature of CBOR encoding. By the end of this chapter, you should be able to read and write basic CDDL schemas to define the structure of your CBOR data, laying the foundation for more robust and interoperable systems. We will prioritize practical application over exhaustive coverage of every CDDL feature, aiming for a solid working understanding rather than covering every detail of the full specification.

Validating CDDL Interactively

Before diving into the details of CDDL, it's helpful to have a way to validate and experiment with CDDL schemas interactively. cddl.anweiss.tech provides a convenient online tool for this purpose.

Core Concepts: Rules, Assignments, and Types

CDDL achieves its goal of unambiguous structure definition through a relatively simple grammar, inspired by Augmented Backus-Naur Form (ABNF) but tailored for CBOR/JSON data models. At its heart, a CDDL specification consists of one or more rules.

Rules and Assignments

A rule defines a name for a specific data structure or type constraint. The most basic assignment operator is =, which assigns a name (the rule name) on the left to a type definition on the right. Rule names are typically lowercase identifiers, potentially containing hyphens or underscores.

CDDL:

; This is a comment. Comments start with a semicolon and run to end-of-line.

my-first-rule = int  ; Assigns the name "my-first-rule" to the CBOR integer type.

device_id = uint     ; Assigns the name "device_id" to the CBOR unsigned integer type.

CDDL is whitespace-insensitive, except within literal strings. Comments are essential for documenting the schema and explaining the intent behind rules.

Besides the basic assignment =, CDDL provides two other assignment operators for extending existing rules:

/= Appends alternative choices to an existing rule. So a specification for an integer or a text string can be defined as:

CDDL:

my-rule = int / tstr

or alternatively:

CDDL:

my-rule = int
my-rule /= tstr

//= Appends alternative group choices to an existing rule. This is used for adding choices between sequences, which we'll explore when discussing groups.

The Prelude: Standard Definitions

Every CDDL specification implicitly includes a set of predefined rules known as the prelude. This prelude defines convenient names for common CBOR types and some basic constraints. You don't need to define these yourself; they are always available:

Category	Name	Description
Basic Types	`bool`	Boolean value
	`uint`	Unsigned integer (technically, a non-negative integer)
	`nint`	Negative integer
	`int`	Unsigned or negative integer
	`float16`	16-bit floating point
	`float32`	32-bit floating point
	`float64`	64-bit floating point
	`float`	Any floating point
	`bstr`	Byte string
	`tstr`	Text string
	`any`	Any single CBOR data item
Constants	`null`	Null value
	`true`	Boolean true
	`false`	Boolean false
	`undefined`	Undefined value
Aliases	`nil`	Alias for `null`
	`bytes`	Alias for `bstr`
	`text`	Alias for `tstr`

These prelude types form the building blocks for more complex definitions. For instance, instead of just saying int, you can often be more specific using uint or nint if the sign is known. Alternatively, a way to think about the definition of int is that it is a union of uint and nint, but the prelude provides a more convenient shorthand:

CDDL:

int = uint / nint
float = float16 / float32 / float64

Representing Basic CBOR Types and Literals

CDDL provides direct ways to refer to the fundamental CBOR data types, largely leveraging the names defined in the prelude. It also allows specifying literal values that must appear exactly as written.

Standard Types

As seen above, the prelude provides names for most standard CBOR types:

Integers: uint, nint, int.
Floating-Point: float16, float32, float64, float. These correspond to the IEEE 754 half-, single-, and double-precision formats supported by CBOR's Major Type 7.
Simple Values: bool, true, false, null, undefined. These map directly to the specific simple values in Major Type 7.
Strings: tstr (UTF-8 text string, Major Type 3), bstr (byte string, Major Type 2).
Catch-all: any represents any single, well-formed CBOR data item.

CDDL:

; Examples using standard types
message-counter = uint
temperature = float
is-active = bool
user-name = tstr
raw-payload = bstr
any-value = any ; Allows any CBOR item including `null`.

Literal Values

CDDL allows you to specify that a data item must be a specific literal value:

Integers: 1010, 00, -1-1, 4242.
Floats: 1.51.5, -0.0-0.0, 3.141593.14159.
Text Strings: "hello""hello", """", "a specific key""a specific key". Text strings are enclosed in double quotes. Escaping rules similar to JSON apply within the CDDL source (e.g., "\"quoted\"""\"quoted\"") but the resulting CBOR string itself contains the literal characters without CDDL escapes.
Byte Strings: h'010203'h'010203', h''h''. Byte strings are represented using hexadecimal notation prefixed with h and enclosed in single quotes.

CDDL:

; Examples using literal values
message-type = 1                 ; The value must be the integer 1
protocol-version = "1.0"         ; The value must be the text string "1.0"
fixed-header = h'cafef00d'       ; The value must be these specific four bytes
status-code = 200 / 404 / 500    ; The value must be one of these integers

Literal values are often used as discriminators in choices or as fixed keys in maps.

Defining Collections: Arrays, Maps, and Groups

Beyond simple scalar types, CDDL provides syntax for defining the structure of CBOR arrays (Major Type 4) and maps (Major Type 5). Crucially, it also introduces the concept of groups delimited by parentheses (()) to define sequences of items that are not enclosed within a CBOR array or map structure. Understanding the distinction between these is vital for correctly modeling CBOR data.

Arrays (`[][]`)

CDDL uses square brackets [] to define CBOR arrays. Inside the brackets, you specify the type(s) of the elements that the array should contain.

CDDL:

; An array containing exactly three unsigned integers
triplet = [uint, uint, uint]

; An array containing a text string followed by any CBOR item
labelled-item = [tstr, any]

; An empty array
empty-array = []

; An array where the first element is a boolean, and the second is either an int or null
mixed-array = [bool, int / null]

Occurrence indicators (covered later) can be used to specify variable numbers of elements. The definition within the brackets describes the sequence of CBOR items expected within the CBOR array structure itself.

Maps (`{}{}`)

CDDL uses curly braces {} to define CBOR maps. Inside the braces, you define the expected key-value pairs. A key difference from JSON is that CBOR map keys can be any CBOR data type, not just strings. CDDL reflects this flexibility.

There are two primary ways to specify map members:

key: type: This form requires the key to be a literal tstr or int, or a tstr or int type that has a single literal value constraint. It's a shorthand commonly used when keys are simple strings or integers.
keytype => valuetype: This is the more general form. keytype can be any CDDL type definition (e.g., tstr, uint, my-custom-rule, a literal value), and valuetype defines the type of the corresponding value.

CDDL:

; A map with specific string keys
simple-object = {
  "name": tstr,
  "age": uint,
  is-verified: bool  ; Bare words are shorthand for "is-verified": bool
}

; A map using integer keys and the => syntax
indexed-data = {
  1 => tstr,         ; Key is integer 1, value is text string
  2 => bstr,         ; Key is integer 2, value is byte string
 ? 3 => float       ; Key 3 is optional (using '?')
}

; A map where keys must be unsigned integers and values are text strings
; The '*' indicates zero or more occurrences of this key/value pattern
lookup-table = {
  * uint => tstr
}

; An empty map
empty-map = {}

Important Considerations for Maps:

Key Types: Remember that CBOR allows non-string keys. CDDL fully supports defining maps with integer, byte string, or even complex keys.
Order: Although key-value pairs must be serialized in some order in the CBOR encoding, CDDL map definitions, like CBOR maps themselves, are generally considered orderless. Validation typically checks for the presence of required keys and type correctness, not the specific order in the encoded bytes. Deterministic encoding profiles, discussed later in this book, impose strict ordering rules.
Uniqueness: The core CBOR specification doesn't strictly require map keys to be unique. However, most applications assume unique keys, and CDDL validation tools often enforce uniqueness by default or provide options to control this behavior. Relying on duplicate keys is generally discouraged.

Groups (`()()`) - Defining Sequences

Perhaps the most distinctive structural element in CDDL compared to JSON-centric schema languages is the group, denoted by parentheses (). A group defines an ordered sequence of one or more CBOR data items without implying an enclosing CBOR array or map structure.

This concept directly mirrors how CBOR works: data items are encoded sequentially one after another. A group in CDDL allows you to name and constrain such a sequence.

CDDL:

; A group containing an unsigned integer followed by a text string
record-header = (uint, tstr)

; A group containing two floats
point-2d = (float, float)

At first glance, point-2d = (float, float) might look similar to point-array = [float, float]. However, they define fundamentally different structures:

point-array defines a CBOR array (e.g., [1.0, 2.5][1.0, 2.5], encoded starting with 0x820x82) containing two floats.
point-2d defines a sequence of two CBOR floats (e.g., 1.01.0 followed by 2.52.5, encoded as 0xf93c000xf93c00 followed by 0xfa402000000xfa40200000, assuming preferred serialization).

Why are groups useful?

Partial Arrays: Groups can be used to define partial arrays or sequences of items without needing to wrap them in a sub-array structure. For example, a key type that is an array of three items, where the first is a text string and the second and third are byte strings, could be defined as:

CDDL:

key = [key-info, bstr, bstr]
key-info = tstr

But if the two byte strings are the key's conceptual "body", how would we use CDDL to make that clear? Using groups! The following definition is equivalent to the above, but it makes the relationship between the key and its body clear:

CDDL:

key = [key-info, key-body]
key-info = tstr
key-body = (bstr, bstr)

Structuring Map Members: Groups can structure related members within a map without requiring a nested map.

CDDL:

person = {
    name: tstr,
    address
}

; Group defines the address structure
address = (
    street: tstr,
    city: tstr,
    zip: uint
)

This defines that the street, city, and zip keys are logically related and should appear (conceptually) together, but they remain direct members of the person map, the above definition being equivalent to:

CDDL:

person = {
    name: tstr,
    street: tstr,
    city: tstr,
    zip: uint
}

This makes address a reusable group that can be referenced in multiple places, enhancing modularity and readability.

Defining Choices Between Sequences (Group Choice //): Allows choosing between different sequences of items.

CDDL:

message = {
    header,
    payload // error-report
}
header = (version: uint, msg_id: uint)
payload = (data: bstr)
error-report = (code: int, reason: tstr)

Groups are powerful because they leverage the fundamental sequential nature of CBOR encoding. While JSON schema languages might struggle to represent bare sequences outside of arrays, CDDL embraces them, providing a precise way to model structures common in binary protocols where items follow each other without explicit delimiters.

Defining Cardinality

CDDL provides several operators to control the cardinality of elements within arrays, maps, and groups. These operators specify how many times a given type or group can occur in a sequence.

?: Optional (zero or one time).

CDDL:

optional-id = [ ?uint ]      ; Array with 0 or 1 uint
config = ( tstr, ?bool )     ; Group: tstr, optionally followed by bool

*: Zero or more times.

CDDL:

int-list = [ *int ]          ; Array with any number of ints (including zero)
byte-chunks = ( *bstr )      ; Group: sequence of zero or more byte strings

+: One or more times.

CDDL:

non-empty-list = [ +tstr ]   ; Array with at least one text string
data-record = ( uint, +float ) ; Group: uint followed by one or more floats

n*m: Specific range (n to m times, inclusive). If n is omitted, it defaults to 0. If m is omitted, it defaults to infinity.

CDDL:

rgb-color = [ 3*3 uint ]      ; Array with exactly 3 uints
short-ids = [ 1*5 int ]       ; Array with 1 to 5 ints
max-10-items = [ *10 any ]    ; Array with 0 to 10 items of any type
at-least-2 = [ 2* bstr ]      ; Array with 2 or more byte strings

These indicators provide fine-grained control over the cardinality of elements within sequences and arrays.

Choices

CDDL offers two ways to define alternatives:

Type Choice (/): Allows choosing between different types for a single data item slot.

CDDL:

identifier = tstr / uint       ; An identifier is either a text string or a uint
config-value = bool / int / tstr / null
measurement = [ tstr, int / float ] ; Array: string followed by an int OR a float

Group Choice (//): Allows choosing between different groups (sequences) of items. This is used when the choice affects multiple items or map members.

CDDL:

contact-method = {
    (email: tstr) //
    (phone: tstr) //
    postal-address
}
postal-address = (street: tstr, city: tstr)

response = {
    (status: 200, body: bstr) // (status: 500, error: tstr)
}

In the response example, the choice affects both the status value and the subsequent item (body or error).

Value Constraints and Control Operators

Beyond type and occurrence, CDDL allows constraints on the actual values or properties of data items using literal value ranges and control operators (often called "dot operators"). Control operators act as extensions to the core grammar, providing hooks for more sophisticated validation. The prelude defines several useful ones, and others can be defined by specific CDDL profiles or applications.

Ranges (..): Defines an inclusive range for numerical types or literal values.

CDDL:

age = uint .le 120             ; Using prelude.le (less than or equal)
percentage = 0..100            ; Value must be int between 0 and 100 inclusive
temperature = -40..50          ; Value must be int between -40 and 50
http-status-ok = 200..299      ; Integer range for successful HTTP status
first-byte = 0x00..0xFF        ; Integer range using hex literals

Range checks can also be combined with prelude operators like .lt (less than), .le (less than or equal), .gt (greater than), .ge (greater than or equal), .eq (equal), .ne (not equal).

Common Control Operators:
.size uint / .size (min..max): Constrains the size (length). For bstr and tstr, it's the number of bytes. For arrays, it's the number of elements. For maps, it's the number of key-value pairs.

CDDL:

short-string = tstr .size (1..64)
sha256-hash = bstr .size 32
coordinate = [ float ] .size 2  ; Array must have exactly 2 floats
simple-map = { * tstr => any } .size (1..5) ; Map with 1 to 5 pairs

Tip

The whitespace before "dot operator" is significant. If you get errors, check for missing whitespace.

.regexp tstr: Validates that a tstr matches a given regular expression pattern (syntax follows XML Schema Definition Language (XSD) style regular expressions, as per XSD Appendix F).

CDDL:

email = tstr .regexp "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
iso-date = tstr .regexp "\d{4}-\d{2}-\d{2}"

.cbor type / .cborseq type: Validates that a bstr contains bytes that are a valid CBOR encoding of the specified type or a sequence (.cborseq) of items matching type. This is useful for embedding CBOR within CBOR.

CDDL:

signed-data = {
    payload: bstr .cbor any, ; Payload is bytes that decode to some CBOR item
    signature: bstr
}
message-stream = bstr .cborseq log-entry ; Bytes contain a sequence of log entries
log-entry = [timestamp, tstr] ; Assuming log-entry is defined elsewhere
timestamp = uint

These operators allow schema authors to declaratively state constraints without needing to specify the validation logic itself. CDDL tools interpret these declarations to perform the checks.

Tip

dCBOR, which we will discuss later in this book, also defines two additional operators, .dcbor and .dcborseq, which are exactly like .cbor and .cborseq except that they also require the encoded data item(s) be valid dCBOR.

The following table summarizes the most frequently used operators for controlling structure and content:

Operator	Name	Meaning	Example Usage
`?`	Optional	Zero or one occurrence	`? int`
`*`	Zero or More	Zero or more occurrences	`* tstr`
`+`	One or More	One or more occurrences	`+ bool`
`n*m`	Range Occurrence	n to m occurrences	`2*4 float`
`/`	Type Choice	Choose between listed types	`int / tstr`
`//`	Group Choice	Choose between listed groups	`(int) // (tstr, bool)`
`..`	Value Range	Value within numerical range	`0..100`
`.size`	Size Control	Constrain byte/element/pair count	`tstr.size (1..10)`
`.regexp`	Regex Control	Match text string pattern	`tstr.regexp "..."`
`.cbor`	Embedded CBOR	Byte string is valid CBOR of type	`bstr.cbor my_type`
`.cborseq`	Embedded CBOR Sequence	Byte string is valid CBOR sequence	`bstr.cborseq my_type`

Building and Reusing Definitions

While the basic types and operators are powerful, the real strength of CDDL for defining complex data structures lies in its ability to build definitions compositionally by referencing other rules.

Rule Referencing

Once a rule is defined with a name, that name can be used anywhere a type definition is expected in another rule. This allows breaking down complex structures into smaller, manageable, named components.

CDDL:

; Define a structure for a person's name
name-structure = {
  first: tstr .size (1..50),
  last: tstr .size (1..50),
  ? middle: tstr .size (1..50) ; Optional middle name
}

; Define contact information choices
contact-info = email-address / phone-number

email-address = tstr .regexp "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
phone-number = tstr .regexp "\+?[1-9]\d{1,14}" ; Simple E.164 regex example

; Define the main person structure, referencing other rules
person = {
  name: name-structure,       ; Use the name-structure rule here
  age: uint .le 120,          ; Updated to include whitespace before .le
  ? contact: contact-info     ; Optional contact info, using the choice rule
}

In this example, the person rule is defined using references to name-structure and contact-info. This makes the person definition concise and readable. If the structure of a name or contact information needs to change, the modification only needs to happen in one place (the name-structure or contact-info rules), improving maintainability.

Modularity and Readability

This compositional approach is key to managing complexity in large data format specifications. By breaking down the overall structure into logical, named sub-components (rules and groups), CDDL schemas become:

More Readable: Each rule focuses on a specific part of the data structure.
More Maintainable: Changes to a shared structure are localized.
More Reusable: Common structures (like timestamps, identifiers, addresses) can be defined once and referenced wherever needed.

This mirrors good software engineering practices, applying principles of modularity and abstraction to data definition. This compositional design aids in creating the unambiguous descriptions that are a primary goal of CDDL.

Practical Example: The "Gadget" Revisited

Let's revisit the nested JSON/CBOR example from a previous chapter and define its structure using CDDL:

JSON/CBOR Diagnostic:

{
  "name": "Gadget",
  "id": 12345,
  "enabled": true,
  "parts": [ "bolt", "nut" ],
  "spec": { "size": 10.5, "data": h'010000ff' }
}

CDDL Definition:

CDDL:

; Define the top-level type for validation (often the first rule)
top-level = gadget

; Define the main gadget structure
gadget = {
  "name": tstr .size 1.., ; Name must be at least 1 byte
  "id": uint,
  "enabled": bool,
  "parts": [ + tstr ],      ; Array of one or more text strings
  "spec": gadget-spec       ; Reference the gadget-spec rule
}

; Define the structure for the specification sub-object
gadget-spec = {
  "size": float,       ; Allows float16, float32, or float64
  "data": bstr         ; The raw binary data
}

This CDDL schema precisely defines the expected structure:

A map (gadget) with five required keys: "name", "id", "enabled", "parts", and "spec".
"name" must be a non-empty text string.
"id" must be an unsigned integer.
"enabled" must be a boolean.
"parts" must be an array containing one or more text strings.
"spec" must be a map conforming to the gadget-spec rule.
The gadget-spec map requires keys "size" (a float) and "data" (a byte string).

Notice how the CDDL directly defines the data field as bstr, reflecting CBOR's native handling of binary data, unlike the base64 encoding necessary in the JSON representation. This schema clearly communicates the expected format for any system processing "gadget" data.

Validation and the Tooling Ecosystem

Defining a schema is only part of the story. A major practical benefit of using CDDL is the ability to automatically validate CBOR data against a schema.

The Concept of Validation

Validation is the process of checking whether a given CBOR data instance conforms to the rules specified in a CDDL schema. Conceptually, a CDDL validator tool takes two inputs:

The CDDL schema definition (e.g., a .cddl file).
The CBOR data instance (usually as raw bytes).

The validator then processes the CBOR data according to the rules defined in the schema, starting from a designated root rule (often the first rule in the file, or explicitly specified). It outputs whether the data is valid according to the schema, often providing details about any discrepancies if validation fails.

Benefits of Validation

Automated validation provides significant benefits:

Error Detection: Catch malformed data early, whether from external sources or internal bugs.
Interoperability: Ensure that systems exchanging CBOR data adhere to the agreed-upon structure.
API Contract Enforcement: Use CDDL schemas as machine-readable contracts for APIs that consume or produce CBOR.
Security: Validate that incoming data conforms to expected structural constraints, preventing certain classes of injection or processing errors. While not a substitute for comprehensive security analysis, structural validation is a valuable defense layer.

Tooling Ecosystem

A growing ecosystem of tools and libraries supports working with CDDL. While this book won't provide tutorials for specific tools, it's important to be aware of their existence:

Implementations: Libraries for parsing CDDL and validating CBOR/JSON data are available in various languages, including Rust (cddl-rs ), Node.js (cddl), and potentially others like Python, Go, or Java (e.g., via wrappers like cddl2java mentioned in ).
Functionality: Common features include:
- Parsing CDDL schemas into an Abstract Syntax Tree (AST).
- Validating CBOR data against a CDDL schema.
- Validating JSON data against a CDDL schema.
- Checking CDDL syntax conformance.
- Some tools might offer experimental features like generating documentation or code stubs, though code generation is not a primary design goal of CDDL itself.
Online Tools: Resources like cddl.anweiss.tech offer CDDL validation, allowing interactive experimentation.

The availability of these tools enables a schema-driven development workflow. The CDDL schema can serve as a central artifact for documentation, automated testing (validation), runtime checks, and ensuring consistency across different parts of a system or between collaborating teams. This elevates CDDL from merely a descriptive language to an active component in building robust CBOR-based applications.

Conclusion: Laying the Schema Foundation

This chapter has introduced the Concise Data Definition Language (CDDL) as the standard way to define the structure of CBOR data. We've moved from understanding how individual CBOR items are encoded to specifying what overall structure those items should form in a given application or protocol.

We covered the core concepts: rules defined using assignments (=, /=, //=), the use of standard types from the prelude (uint, tstr, bool, etc.), and the specification of literal values. We explored how CDDL defines CBOR arrays ([][]) and maps ({}{}), noting the flexibility of map keys in CBOR. Crucially, we delved into CDDL groups (()) and their role in defining sequences of items without explicit CBOR delimiters, highlighting how this feature directly maps to CBOR's sequential encoding and distinguishes CDDL from JSON-centric schema languages. We also learned how to control structure using occurrence indicators (?, *, +, n*m), define choices (/, //), and apply constraints using value ranges (..) and practical control operators like .size, .regexp, and .cbor. Finally, we saw how rule referencing enables modular, readable, and reusable schema design, and how the existence of validation tools makes CDDL a practical asset for development.

Best Practices for Writing CDDL:

As you start defining your own CBOR structures with CDDL, keep these practices in mind:

Clarity over Brevity: Prioritize making the schema easy to understand. Use comments (;) liberally to explain intent and choices.
Meaningful Names: Choose descriptive names for rules that reflect their purpose.
Modularity: Break down complex structures into smaller, well-named rules. This improves readability, maintainability, and reuse.
Start Specific, Generalize Carefully: Define the expected structure as precisely as possible initially. Use broad types like any or wide occurrence ranges (*) only when truly necessary, as overly permissive schemas offer less validation value.
Consider the CBOR Data Model: Think about how your CDDL definition maps to the underlying CBOR types and encoding, especially regarding the distinction between groups (()) and container types like arrays ([][]) and maps ({}{}).

With the fundamentals covered here, you are equipped to use CDDL to bring clarity and rigor to your CBOR-based data formats. This foundation is essential as we move forward to explore more advanced CBOR topics. CDDL schemas are instrumental in understanding and validating the structures used within CBOR Tags, ensuring the correctness of data before applying deterministic encoding rules (dCBOR), and understanding the precise layout of nested structures like Gordian Envelope.

Determinism: Why Consistent Encodings Matter

1.1 Introduction: The Illusion of Sameness

Consider a common scenario in software engineering: comparing two data structures that represent the same logical information. Perhaps they are configuration objects loaded from different sources, snapshots of system state taken at different times but believed to be identical, or messages exchanged between distributed components. A developer might reasonably expect that if these structures hold the same values—field for field, element for element—a simple byte-wise comparison of their serialized forms would confirm their equality. Yet, surprisingly often, this comparison fails. Two objects, logically identical, produce different sequences of bytes when serialized.

This discrepancy arises because many common data serialization formats, including text-based ones like JSON and even efficient binary formats like Protocol Buffers or CBOR itself, allow flexibility in how data is represented. The same logical map might have its keys ordered differently; the same number might be encoded with varying precision or length; the same string might have subtle variations in character encoding or normalization. While these variations are often semantically irrelevant at the data model level, they result in distinct byte sequences.

This phenomenon, where serialization yields inconsistent byte outputs for logically equivalent inputs, can be deeply problematic. Processes downstream that rely on these byte representations—such as cryptographic hashing, digital signature verification, distributed consensus mechanisms, or simple data comparison—may behave unpredictably or fail entirely. This variability acts much like a hidden, uncontrolled input, introducing non-determinism into systems that are otherwise expected to be predictable, leading to bugs that are notoriously difficult to diagnose and fix, akin to issues stemming from uninitialized memory or thread race conditions. Understanding and controlling this variability through deterministic encoding is therefore not merely an academic exercise but a practical necessity for building robust, secure, and interoperable systems. This chapter explores the fundamental need for deterministic encoding, the challenges involved, and the landscape of previous efforts to achieve it.

1.2 Defining Deterministic Encoding

At its core, Deterministic Encoding is an encoding process designed to eliminate ambiguity. It employs specific rules and makes deliberate choices during serialization to ensure that logically equivalent inputs at the data model level always produce the exact same sequence of encoded bytes. This is distinct from the general term Serialization, which simply refers to the process of representing data model items (like numbers, strings, arrays, maps) as encoded data items, potentially allowing for multiple valid representations.

The term Canonicalization is often used synonymously with deterministic encoding, emphasizing the goal of producing a single, standard, or "canonical" form for any given piece of data. Several systems aim for this canonical property, where the serialization guarantees byte consistency for the same in-memory data structure, regardless of the implementation or environment.

Within the CBOR ecosystem (RFC 8949), related concepts exist that represent steps towards reducing variability, though they don't necessarily guarantee full cross-implementation determinism on their own:

Preferred Serialization: A recommendation aiming for the shortest possible encoding for a data item's head (the initial bytes indicating type and length/value), without expending extra effort like sorting map keys.
Basic Serialization: Builds on Preferred Serialization by adding the constraint that indefinite-length encoding (where the total length isn't known upfront) must not be used for strings, arrays, or maps.

While Preferred and Basic Serialization reduce encoding variability, true Deterministic Encoding, such as CBOR Common Deterministic Encoding (CDE), imposes stricter rules, like mandatory map key sorting, to achieve the goal of a unique byte sequence for equivalent data.

Understanding why these stricter rules are necessary requires examining the common sources of non-determinism in data serialization:

Map/Object Key Order: In data models like JSON objects or CBOR maps, the order of key-value pairs is generally considered semantically insignificant. {"name": "Alice", "id": 123} is logically the same as {"id": 123, "name": "Alice"}. However, without a rule mandating a specific order (e.g., sorting keys alphabetically), serializers might output these pairs in different orders, leading to different byte sequences. This is a major source of non-determinism in formats like JSON, Protobuf, and basic CBOR. Deterministic schemes typically mandate sorting keys based on a well-defined comparison, such as lexicographical sorting of the UTF-16 key strings (as in JCS) or byte-wise lexicographical sorting of the encoded keys (as in CBOR CDE).
Number Representation: Numbers can often be encoded in multiple ways:
- Integers: Small integers might fit into short forms, but longer encodings could technically be valid in some formats. Varint encodings (used in Protobuf) can sometimes represent the same number using different byte lengths, especially if leading zeros aren't strictly prohibited. CBOR's Preferred Serialization aims for the shortest form, but deterministic rules make this mandatory. Arbitrary-precision integers (bignums) also need clear rules to avoid ambiguity with standard integer types.
- Floating-Point Numbers: These present significant challenges. IEEE 754 allows multiple binary representations (e.g., half, single, double precision), and the same value might be representable in several. Special values like NaN (Not a Number) can have different binary payloads, and positive zero (+0) and negative zero (−0) have distinct representations. Deterministic schemes must specify canonical forms, such as always using the shortest valid representation and defining a single canonical NaN.
String Encoding & Unicode: While UTF-8 is the dominant encoding today, subtleties remain. The most significant is Unicode normalization. A single character with an accent (like 'é') can often be represented either as a single precomposed character (U+00E9) or as a base character ('e', U+0065) followed by a combining accent mark (U+0301). These result in different byte sequences but represent the same visual character. Some canonicalization schemes require normalizing strings to a specific form (like NFC or NFD) before encoding, while others, like JCS and CBOR CDE, explicitly avoid this step, considering it an application-level concern due to complexity and potential information loss.
Indefinite Lengths: Formats like CBOR allow encoding arrays, maps, and strings without specifying their length upfront, using a special "break" marker to signal the end. This "indefinite-length" encoding is useful for streaming but introduces non-determinism, as the same data could be encoded with either a definite or indefinite length. Deterministic schemes like CBOR CDE typically disallow indefinite-length items.
Default Values / Optional Fields: In formats like Protobuf, if a field is set to its default value (e.g., an integer field to 0), it might be omitted entirely during serialization, or it might be explicitly included. Since the deserialized result is the same in either case (the field has the default value), this creates representational ambiguity. Deterministic schemes often require omitting default values, similar to how ASN. DER forbids encoding default values.
Extensibility Issues (Tags/Unknown Fields): How a format handles data not explicitly defined in its schema can impact determinism. Protobuf preserves "unknown fields" encountered during parsing, which aids forward/backward compatibility but significantly hinders canonicalization because the type (and thus canonical representation) of these fields isn't known. CBOR uses tags for extensibility; while the content within a tag might be canonicalized according to standard rules, ensuring that application-level data is consistently mapped to specific tags and representations might require additional application-specific rules (sometimes called Application-Level Deterministic Representation or ALDR).

It becomes clear that achieving deterministic encoding involves navigating a spectrum of choices. At one end lies basic, potentially non-deterministic serialization. Moving along the spectrum, we encounter implementation-specific determinism (where a single library might be deterministic but not interoperable), recommended practices like CBOR's Preferred or Basic Serialization, and finally, fully specified canonical forms like ASN.1 DER, JCS, BCS, or CBOR CDE, which aim for a single, universally verifiable representation. The choice of where to be on this spectrum depends heavily on the application's requirements for consistency, interoperability, and security.

1.3 The Motivation: Why Determinism is Crucial

The quest for deterministic encoding is driven by the significant problems that arise from its absence. When the same logical data can manifest as different byte sequences unpredictably, it introduces a subtle but pervasive form of non-determinism into computing systems, leading to a range of issues that can be difficult for engineers to anticipate and resolve.

One major consequence is the emergence of hard-to-diagnose bugs. Systems relying on byte-wise comparisons or hashing of serialized data may fail intermittently or produce inconsistent results depending on factors like which library version is used, the internal state of the serializer (e.g., hash table iteration order affecting map key output), or even timing variations. Debugging such issues is challenging because the root cause lies not in the application logic itself, but in the seemingly innocuous step of data serialization. Failures might appear non-reproducible until the underlying serialization variability is understood and controlled.

Furthermore, non-deterministic serialization can undermine security guarantees. Digital signatures, for instance, rely on the verifier being able to compute the exact same hash of the message data as the signer. If the data is re-serialized between signing and verification, and the serialization is non-deterministic, the hashes will mismatch, causing valid signatures to fail verification. This not only breaks functionality but could potentially be exploited in certain scenarios. Similarly, consensus protocols in distributed systems depend on nodes agreeing on the state based on identical data representations; non-determinism breaks this agreement.

Inefficiency is another consequence. Caching mechanisms often use hashes of data as keys. If logically identical data produces different serialized forms (and thus different hashes), caches will suffer unnecessary misses, leading to redundant computation or data transfer. Content-addressable storage systems lose their deduplication benefits if identical content doesn't serialize identically.

Finally, non-determinism severely hinders interoperability. If different systems, or even different versions of the same software, serialize the same data differently, they may be unable to reliably communicate or agree on shared state. This is particularly problematic in heterogeneous environments or long-lived systems where components evolve independently. Protocol Buffers' documentation explicitly warns that its default "deterministic" mode is not canonical across languages or versions precisely for these reasons.

These diverse problems highlight a fundamental point: non-determinism in serialization erodes the foundation of trust in computational processes. Digital systems rely on predictable, repeatable behavior to function correctly and securely. When the basic representation of data—its byte sequence—becomes unpredictable for the same logical content, the operations built upon that representation (comparison, hashing, verification, agreement) become inherently unreliable. This variability undermines the integrity and dependability required for critical applications, from secure communication and financial transactions to distributed databases and verifiable records. Achieving deterministic, canonical encoding is therefore essential for building systems where computational results can be consistently verified and trusted. The need for deterministic processes is not unique to serialization; it's a recurring theme in diverse fields like coding theory, machine learning, and state machine design, reflecting a general need for predictable and reliable computation.

1.4 Key Use Cases Demanding Determinism

The need for deterministic encoding is not theoretical; it is driven by the practical requirements of numerous critical computing applications. Several key use cases fundamentally depend on the ability to produce a consistent, predictable byte representation for data.

1.4.1 Distributed Consensus

Distributed systems, ranging from replicated databases to modern blockchain networks, rely on consensus algorithms (such as Paxos, Raft, or variants of Byzantine Fault Tolerance (BFT)) to ensure that multiple independent nodes agree on a single, consistent state or order of operations. This agreement process frequently involves nodes proposing, validating, and replicating data structures like transaction logs, state updates, or proposed blocks.

A core requirement for these algorithms is that all non-faulty nodes must reach the same decision based on the same information. Often, this involves nodes independently processing received data, serializing it (or parts of it), and then hashing the result to compare with hashes received from other nodes or to include in subsequent proposals. If the serialization process is non-deterministic, two nodes processing the exact same logical transaction or block data could generate different byte sequences. These different sequences would produce different cryptographic hashes, leading the nodes to disagree, even though they started with identical information. This disagreement prevents the system from reaching consensus, potentially halting progress or leading to inconsistent states across nodes.

Blockchains are a prominent example where this is critical. In a decentralized network without a central authority, nodes must independently verify transactions and agree on the contents of new blocks to add to the chain. This verification relies heavily on cryptographic hashing and consistent data representation. Deterministic serialization ensures that all nodes compute the same hashes for the same transactions and blocks, enabling the consensus mechanism (whether Proof-of-Work, Proof-of-Stake, or BFT-based) to function correctly and maintain the integrity of the shared ledger. Formats like Binary Canonical Serialization (BCS) were explicitly designed with this use case in mind, providing guaranteed byte consistency for consensus in blockchain environments.

In essence, for decentralized systems that establish trust algorithmically through consensus protocols, deterministic encoding is not merely a technical optimization but a foundational requirement. It ensures that all participants operate on verifiably identical representations of shared data, making algorithmic agreement possible and enabling trust in the absence of a central coordinator. Without it, the entire model of decentralized consensus breaks down.

1.4.2 Verifiable Data and Digital Signatures

Digital signatures are a cornerstone of modern digital security, providing three key properties:

Authenticity: Verifying the identity of the signer.
Integrity: Ensuring the data has not been altered since it was signed.
Non-repudiation: Preventing the signer from later denying that they signed the data.

The process typically involves creating a cryptographic hash (a fixed-size digest) of the data to be signed, and then encrypting this hash using the signer's private key. To verify the signature, a recipient recalculates the hash of the received data using the same hash algorithm, decrypts the received signature using the signer's public key, and compares the recalculated hash with the decrypted hash. If they match, the signature is valid.

This entire process hinges on one critical assumption: both the signer and the verifier must be able to produce the exact same hash from the same logical data. Since cryptographic hashes are extremely sensitive to input changes (a single bit flip drastically changes the output), the byte sequence fed into the hash function must be identical for both parties.

If the data is serialized non-deterministically, the signer might serialize the data one way, calculate a hash, and sign it. The verifier might receive the same logical data, but upon re-serializing it (perhaps using a different library or version), obtain a different byte sequence. This different byte sequence will produce a different hash, causing the signature verification to fail, even though the data's integrity was never compromised and the signature itself is cryptographically sound. This necessitates a deterministic, canonical representation of the data before hashing and signing.

This requirement is crucial for applications like Verifiable Credentials (VCs), where data integrity proofs (often digital signatures) are used to ensure the authenticity and tamper-evidence of claims. Standards like the W3C Data Integrity specification explicitly involve transforming data into a canonical form before hashing and signing/proving.

An important advantage of using canonicalization in this context is that it decouples the format used for signing from the format used for transmission or storage. Data can be signed based on its canonical form, but then transmitted or displayed in a more convenient, possibly non-canonical format (e.g., pretty-printed JSON for readability). The verifier simply needs to re-canonicalize the received data according to the agreed-upon rules before performing the verification step. This avoids forcing systems to use potentially inefficient or human-unfriendly formats solely for the purpose of signing, offering flexibility without sacrificing security.

1.4.3 Content-Addressable Systems and Caching

Content-Addressable Storage (CAS) is a storage paradigm where data is identified and retrieved based on a cryptographic hash of its content, rather than a user-assigned name or location (like a file path). The hash acts as the unique address for the data. This approach inherently relies on deterministic encoding: the same content must always produce the same hash to be reliably stored and retrieved.

CAS offers several significant advantages:

Automatic Deduplication: If the same piece of content is stored multiple times, it will always generate the same hash. CAS systems recognize this and store the actual data only once, simply adding references to the existing content. This can lead to substantial storage savings, especially in backup systems or large datasets with redundant information.
Data Integrity Verification: The content hash serves as a built-in checksum. When data is retrieved, its hash can be recalculated and compared to the requested address (hash). A mismatch immediately indicates data corruption.
Suitability for Distributed Systems: Content addressing works well in distributed or decentralized environments (like IPFS or Git) because data can be located and retrieved based solely on its hash, without needing a central directory or knowledge of specific server locations.

Deterministic encoding underpins the reliability of CAS. If serialization were non-deterministic, identical logical content could produce different hashes, defeating deduplication and potentially causing data retrieval issues. Furthermore, trustworthy deduplication relies on the guarantee that only truly identical data maps to the same hash. While cryptographic hash collisions are extremely rare with strong functions, non-deterministic serialization could theoretically create attack vectors if an adversary could manipulate the serialization process to force a hash collision between different logical data, potentially tricking a system into retrieving incorrect information. Deterministic encoding ensures that the hash reliably represents the logical content, making deduplication both efficient and secure.

Similarly, caching mechanisms benefit greatly from deterministic encoding. Hashes derived from canonical representations of data serve as excellent cache keys. When a system needs to check if a piece of data (e.g., a database query result, a complex object, a web resource bundle) is already in the cache, it can compute the canonical hash of the data and look it up. If the serialization were non-deterministic, logically identical data might produce different hashes upon subsequent requests, leading to cache misses and forcing redundant computations or data fetches. Content-addressable web bundles, for example, leverage this principle to improve browser cache efficiency by ensuring that a bundle's content hash only changes if the content itself changes. Deterministic behavior is also a sought-after property in lower-level caching systems within hardware and operating systems to ensure predictable performance.

1.4.4 Other Applications

Beyond these major areas, deterministic encoding provides benefits in several other contexts:

Secure Comparison and Fingerprinting: Comparing large datasets or complex objects for equality can be done efficiently and securely by comparing the hashes of their canonical representations. This avoids transmitting the full data and ensures that only truly identical data matches. This is useful for verifying configuration consistency, detecting changes in stored records, or fingerprinting data for various tracking purposes.
Testing and Diagnostics: In automated testing, ensuring that a given input always produces the exact same byte output simplifies verification, allowing for simple byte-wise comparisons of expected versus actual results. For diagnostics, presenting logged data or system states in a canonical form minimizes inconsequential differences (like map key order), making it easier for humans or tools to spot meaningful changes. It can also help in reproducing bugs that might otherwise seem non-deterministic due to variations introduced by serialization.
Object Hashing: Creating consistent, cross-language hash values for complex, nested data structures (often represented as combinations of lists, maps, and primitive types in memory) requires a canonical representation strategy. This is essential for using such objects reliably in hash tables or other contexts requiring stable identifiers derived from the object's state. Naive approaches like hashing the default string representation often fail due to non-determinism.

1.5 The Challenges of Achieving Determinism

While the need for deterministic encoding is clear, achieving it presents several non-trivial technical challenges. These stem from the inherent ambiguities in data representation and the need to impose strict, unambiguous rules across diverse platforms and implementations. Overcoming the sources of non-determinism identified earlier requires careful algorithmic design and often involves trade-offs.

Map Key Sorting: Defining a consistent order for map keys requires specifying a stable sorting algorithm that works identically everywhere. Lexicographical sorting is a common choice. However, the details matter: should the sort operate on the raw key strings (e.g., based on UTF-16 code units, as in JCS) or on the encoded byte representation of the keys (as in CBOR CDE)? Each choice has implications for implementation complexity and performance. Furthermore, sorting adds computational overhead compared to simply iterating through a map's elements in whatever order the underlying implementation provides.
Floating-Point Representation: The complexities of IEEE 754 floating-point arithmetic make canonicalization difficult. Rules must precisely define how to handle different precisions (half, single, double), ensuring the shortest valid representation is chosen. Canonical forms must be defined for special values like NaN (potentially collapsing different NaN payloads into one) and distinguishing +0 from −0. An additional complication is that floating-point calculations themselves can sometimes yield slightly different results across different hardware platforms or compiler optimizations, meaning the values input to the serializer might differ even before encoding rules are applied.
Number Representation Ambiguity: A value like '42' could potentially be represented as a standard integer, a floating-point number, or even a bignum in some formats. A canonical scheme must provide unambiguous rules for choosing the representation, such as always preferring the simplest integer type if the value fits.
Unicode Normalization: Deciding how to handle different Unicode representations of the same visual character is a significant challenge. Enforcing a specific Normalization Form (like NFC or NFD) ensures that visually identical strings have the same canonical byte sequence, but it adds a potentially costly processing step and might not be desirable in all applications (e.g., if preserving the exact original byte sequence is important). Schemes like JCS and CBOR CDE deliberately omit mandatory Unicode normalization, pushing the responsibility to the application layer if needed. This simplifies the canonicalization protocol but means that {"café": 1} and {"cafe\u0301": 1} might have different canonical forms despite looking identical.
Handling Extensibility: Integrating extensibility mechanisms (like Protobuf's unknown fields or CBOR's tags) with canonicalization is difficult. Preserving unknown fields, crucial for Protobuf's compatibility story, fundamentally conflicts with canonicalization because their type and structure aren't known. For CBOR tags, while the tag's content can often be canonicalized using standard rules, ensuring the consistent use and representation of the tags themselves often requires application-level agreements (ALDR) beyond the scope of the core canonical encoding specification. A truly universal canonical format might need to restrict or disallow unknown data or require extensions to define their own canonicalization rules.
Performance Overhead: Implementing the rules required for canonicalization—sorting keys, normalizing numbers, checking for shortest forms, potentially normalizing Unicode—inevitably adds computational cost compared to simpler serialization methods that don't enforce these constraints. This overhead might be negligible in many applications but can be significant in high-throughput or resource-constrained environments.

This inherent trade-off between the robustness offered by canonicalization and the potential performance impact is a key consideration. Systems must carefully evaluate their specific needs. The desire for performance often leads developers to use simpler, potentially non-deterministic serialization methods by default. This explains why canonical encoding isn't universally applied and why formats like CBOR offer different levels of determinism (Preferred, Basic, CDE), allowing applications to choose the appropriate balance between strictness and speed.

1.6 Surveying the Landscape: Previous Efforts

Over the years, various efforts have been made to address the need for deterministic or canonical representations, particularly for common data formats used in distributed systems and security protocols. Examining these provides valuable context and highlights recurring patterns and challenges.

1.6.1 JSON's Canonicalization Conundrum

JSON (JavaScript Object Notation), despite its ubiquity, lacks a built-in canonical representation defined in its base specification (RFC 8259). This omission means that naive serialization of JSON objects can easily lead to non-deterministic output due to varying property order and potential whitespace differences.

Several approaches have emerged to fill this gap:

RFC 8785: JSON Canonicalization Scheme (JCS): This is arguably the most prominent standard for JSON canonicalization. JCS achieves determinism by defining a strict set of rules:
- Data Subset: Input JSON must conform to the I-JSON profile (RFC 7493), which disallows duplicate object keys and imposes limits on number precision.
- Primitive Serialization: Relies on the well-defined serialization of primitives (strings, numbers, booleans, null) specified by ECMAScript. Whitespace between tokens is forbidden.
- String Handling: Specifies precise escaping rules for control characters and special characters like backslash and double-quote. Notably, it does not mandate Unicode normalization.
- Number Handling: Numbers are serialized according to ECMAScript rules, effectively using IEEE 754 double-precision representation.
- Object Key Sorting: Object properties MUST be sorted recursively based on the lexicographical order of their keys, comparing the keys as sequences of UTF-16 code units.
- Array Element Order: The order of elements within JSON arrays is preserved.
- Encoding: The final output must be UTF-8 encoded. JCS is published as an Informational RFC, meaning it's not an IETF standard but represents a community consensus. It has seen adoption in specific contexts, such as for JSON Web Key (JWK) Thumbprints (RFC 7638) and systems like Keybase, and libraries exist in multiple languages. However, it is not universally adopted across the JSON ecosystem, leading to a degree of fragmentation where applications might implement their own ad-hoc canonicalization or use different schemes.
ObjectHash: This represents a different philosophy. Instead of producing a canonical text representation, ObjectHash computes a cryptographic hash directly from the semantic structure of a JSON-like object (lists, dictionaries, primitives). It defines specific hashing procedures for each type, including sorting dictionary keys before hashing. A key feature is its support for redaction: parts of a structure can be replaced by the hash of the redacted part, allowing verification of the overall structure even with hidden data. This approach avoids intermediate text serialization altogether.
Other Ad-hoc Methods: Many systems implement simpler, non-standardized canonicalization, often just involving sorting object keys alphabetically before using a standard JSON serializer. While better than no canonicalization, these methods lack the precise rules for primitive serialization found in JCS and may not be interoperable.

The situation for JSON highlights the difficulty of retrofitting canonicalization onto a widely adopted, flexible format without a single, mandated standard.

Comparison of Selected JSON Canonicalization Approaches

Feature	RFC 8785 JCS	ObjectHash	Ad-hoc Sorting + JSON.stringify
Approach	Canonical Text Serialization	Direct Cryptographic Hashing of Structure	Text Serialization after Key Sort
Output	UTF-8 JSON Text	Cryptographic Hash (e.g., SHA-256)	JSON Text (often UTF-8)
Basis	RFC 8785 (Informational)	Custom Specification	Application-specific
Object/Map Key Ordering	Mandatory Lexicographical Sort (UTF-16 units)	Mandatory Lexicographical Sort (before hashing)	Typically Lexicographical Sort (implementation varies)
Number Handling	ECMAScript standard (IEEE 754 double)	Specific hashing rule for numbers	Depends on underlying JSON serializer
String Handling	ECMAScript standard; No Unicode Normalization	Specific hashing rule; No Unicode Normalization	Depends on underlying JSON serializer
Extensibility/Unknowns	Constrained by I-JSON; No explicit unknown handling	Handles basic JSON types; Redaction mechanism	Depends on underlying JSON serializer
Key Features	Interoperable text form, Cryptographic use	Redactability, Avoids text intermediate	Simplicity (potentially fragile)
Adoption Notes	Used in JWK Thumbprint, Keybase; Libraries exist	Used in specific projects (e.g., Certificate Transparency logs)	Common but non-standardized

1.6.2 ASN.1 Distinguished Encoding Rules (DER)

Abstract Syntax Notation One (ASN.1) is a mature standard from the ITU-T for defining data structures, widely used in telecommunications and security protocols. Associated with ASN.1 are several encoding rule sets that specify how to serialize data structures into bytes. The most relevant for canonicalization is the Distinguished Encoding Rules (DER), specified in ITU-T X.690.

DER is a specialized subset of the more flexible Basic Encoding Rules (BER). While BER allows multiple ways to encode the same value (e.g., different length specifications, constructed vs. primitive string forms), DER restricts these choices to ensure that any given ASN.1 value has exactly one valid DER encoding. This canonical property is achieved primarily through restriction: DER mandates specific choices where BER offers flexibility:

Length Encoding: Length fields must use the definite form and the minimum possible number of octets. Indefinite lengths (allowed in BER) are prohibited.
String Types: Primitive encoding must be used for string types like OCTET STRING and BIT STRING (BER allows constructed forms). Unused bits in the final octet of a BIT STRING must be zero.
Boolean Values: FALSE must be encoded as a single byte 0x00, and TRUE as a single byte 0xFF.
Set Ordering: Elements within a SET OF construct must be sorted according to their tag value and encoded bytes.
Default Values: Fields with default values defined in the ASN.1 schema must NOT be encoded if they hold the default value.

The primary application of DER is in Public Key Infrastructure (PKI), particularly for encoding X.509 digital certificates and Certificate Revocation Lists (CRLs). The unambiguous nature of DER is critical for ensuring that certificates can be parsed and validated consistently across different systems and that digital signatures covering certificate contents are reliable.

DER's success lies in its long-standing use and effectiveness within its specific domain (PKI). It demonstrates that canonicalization can be achieved and maintained over decades. However, ASN.1 and DER are often perceived as complex and potentially verbose compared to more modern formats like JSON or CBOR, which has limited their adoption in web-centric APIs and applications.

1.6.3 Hashing and Signing Strategies

The interaction between data serialization and cryptographic operations like hashing and digital signing is a critical area where determinism is paramount. The overwhelmingly standard practice is to canonicalize the data first, then apply the cryptographic operation (hash or signature) to the resulting canonical byte stream.

Signing non-canonical data introduces significant risks. A signature created over one specific byte representation might be valid only for that exact sequence. If the recipient re-serializes the data differently (due to non-deterministic rules), the signature verification will fail, even if the logical data is unchanged. This can lead to false integrity failures or, in more complex scenarios, potentially allow an attacker to craft a different serialization of the same logical data that bypasses certain checks while still matching the original signature under lenient verification rules. Canonicalization before signing ensures that the signature is bound to the semantic content rather than a specific, potentially fragile, byte layout.

It is important to distinguish the canonicalization of the message input from the determinism of the signature algorithm itself. Some signature algorithms, like DSA and ECDSA, traditionally required a random number (nonce) for each signature. Flaws in the random number generation process have historically led to catastrophic private key compromises. To mitigate this, deterministic signature generation schemes have been developed, such as RFC 6979 for DSA/ECDSA and the inherent design of algorithms like Ed25519. These schemes derive the necessary nonce deterministically from the private key and the message hash, eliminating the need for external randomness during signing.

Therefore, achieving robust and reliable digital signatures often involves ensuring determinism at two distinct layers:

Deterministic Message Representation: Using canonicalization to ensure the input to the hash function is consistent.
Deterministic Signature Computation: Using algorithms that derive internal randomness (like nonces) deterministically to avoid reliance on potentially flawed external random sources. Both layers address different potential failure points and contribute to the overall security and reliability of the signing process.

1.6.4 Other Binary Formats (Brief Mention)

While JSON and ASN.1/DER represent major text and schema-driven binary approaches, other binary formats also grapple with determinism:

Protocol Buffers (Protobuf): As mentioned earlier, Protobuf offers a "deterministic serialization" mode. However, the documentation clearly states this is not canonical. It guarantees byte consistency only for a specific binary build and schema version, but not across different language implementations, library versions, or even schema evolution (due to the handling of unknown fields). Its design prioritizes compatibility and efficiency over strict canonicalization. Specific deterministic schemes have been layered on top for specific use cases, like Cosmos SDK's ADR-027, which adds rules for field ordering, varint encoding, and default value handling.
Binary Canonical Serialization (BCS): In contrast to Protobuf, BCS was explicitly designed from the ground up with canonicalization as a primary goal. Originating in the Diem blockchain project (defunct) and now used widely in the Move language ecosystem (Sui, Aptos), BCS aims for simplicity, efficiency, and a guaranteed one-to-one mapping between in-memory values and byte representations. It defines strict rules for encoding primitives (little-endian integers, ULEB128 for lengths), sequences, maps (implicitly requiring ordered keys for canonicalization, though the base spec focuses more on structs), and structs (fields serialized in definition order). Its primary motivation is to support cryptographic hashing and consensus mechanisms in blockchains.

1.7 Lessons Learned: Successes, Shortcomings, and the Path Forward

The survey of deterministic and canonical encoding efforts reveals valuable lessons about what has worked, what challenges persist, and what properties are desirable in future solutions.

Successes:

Domain-Specific Stability: ASN.1 DER demonstrates that a strict canonical encoding standard can achieve long-term stability and interoperability within a well-defined domain like PKI, serving as the foundation for X.509 certificates for decades.
Addressing Common Formats: Efforts like JCS (RFC 8785) provide a viable, albeit not universally adopted, solution for canonicalizing JSON, leveraging existing widespread technologies like ECMAScript for primitive serialization.
Purpose-Built Solutions: Formats like BCS show that when canonicalization is a primary design goal, especially for demanding use cases like blockchain consensus, efficient and effective binary formats can be created.
Enabling Critical Patterns: Canonicalization is demonstrably essential for enabling robust digital signatures, reliable distributed consensus, and efficient content-addressable storage and caching.

Shortcomings and Persistent Challenges:

Fragmentation: For popular, flexible formats like JSON, the lack of a single, universally mandated canonicalization standard leads to fragmentation, with multiple competing schemes or ad-hoc solutions.
Design Conflicts: Some formats, like Protocol Buffers, have core design features (e.g., handling of unknown fields for compatibility) that inherently conflict with the requirements for true canonicalization across different contexts.
Complexity and Overhead: Achieving canonicalization often introduces complexity in implementation and runtime overhead due to steps like sorting or normalization, creating a trade-off against performance. This can make canonical forms less appealing for performance-critical applications if the benefits are not strictly required. ASN.1/DER, while successful, is often perceived as overly complex for simpler web-based use cases.
Handling Nuances: Accurately and consistently handling the subtleties of floating-point arithmetic and Unicode across all platforms remains a persistent challenge requiring explicit and careful rule definition.
Extensibility: Integrating extensibility mechanisms (like tags or user-defined types) into a canonical framework without requiring constant updates to the core specification or relying heavily on application-level agreements remains difficult.

A recurring theme emerges from these observations: the inherent tension between designing a data format for maximum flexibility and extensibility, and achieving strict, simple canonicalization. Features enhancing flexibility—multiple number encodings, optional fields, variable map ordering, mechanisms for unknown data—often introduce the very ambiguities that canonicalization seeks to eliminate. Consequently, canonical formats frequently achieve their goal by restricting the flexibility of the underlying data model or base encoding rules (DER restricts BER, CDE restricts CBOR, JCS restricts JSON via I-JSON). Designing a format that balances flexibility with ease of canonicalization requires careful consideration from the outset.

The Path Forward:

The increasing prevalence of distributed systems, the demand for verifiable data (like VCs), and the constant need for robust security mechanisms ensure that the need for reliable deterministic and canonical encoding will only grow. An ideal solution, building on the lessons learned, should strive for:

Unambiguity: Clear, precise rules that leave no room for interpretation and lead to a single, verifiable canonical form.
Efficiency: Minimize computational overhead compared to non-canonical serialization, making it practical for a wider range of applications.
Simplicity: Easy to understand and implement correctly, reducing the likelihood of errors (a key goal of CBOR itself).
Robustness: Handle common data types, including integers, floating-point numbers, and strings (with clear rules regarding Unicode), effectively.
Well-Defined Extensibility: Provide a clear path for extending the format without breaking canonical properties or requiring constant core specification updates.

These desirable properties set the stage for exploring more advanced solutions designed to meet these needs within the context of modern data formats. The subsequent chapters will delve into how dCBOR aims to provide such a solution within the CBOR ecosystem.

From CBOR, to CDE, to dCBOR

2.1 Introduction: The Need for Stronger Guarantees

The previous chapter established a fundamental challenge in modern software engineering: the attainment of "sameness" at the byte level. Logically identical data structures, when serialized, often yield different byte sequences due to flexibilities inherent in many common encoding formats. This variability, while sometimes semantically irrelevant at the data model level, introduces a pernicious form of non-determinism into systems. Processes relying on byte-wise comparison, cryptographic hashing, digital signature verification, or distributed consensus can fail unpredictably, leading to hard-to-diagnose bugs and undermining security guarantees.

The Concise Binary Object Representation (CBOR), standardized as RFC 8949 (STD 94), was designed with goals including efficiency, small code size, and extensibility. Recognizing the importance of consistency, the base CBOR specification itself incorporates mechanisms aimed at reducing encoding variability. Section 4.1 of RFC 8949 introduces Preferred Serialization, which provides recommendations for choosing the most efficient encoding, particularly the shortest form for the initial bytes (head) indicating an item's type and length/value, and specific representations for floating-point numbers. Building upon this, Section 4.2.2 defines Basic Serialization, which mandates the use of preferred serialization and adds the strict requirement that indefinite-length encodings (where the total length is not known upfront) must not be used for strings, arrays, or maps.

These built-in features represent valuable steps toward consistency. However, they deliberately stop short of guaranteeing strict, cross-implementation deterministic encoding. Preferred Serialization, for instance, is largely a set of recommendations, not absolute requirements for a CBOR document to be considered valid. Even Basic Serialization, while stricter by disallowing indefinite lengths, leaves significant sources of variability unaddressed. Key limitations remain:

Map Key Order: The order of key-value pairs in CBOR maps (major type 5) is explicitly considered semantically insignificant in the data model, similar to JSON objects. RFC 8949, therefore, does not mandate any specific order for serialization. Consequently, different CBOR libraries, or even different executions of the same library (depending on internal hash table implementations), might output the keys of a logically identical map in different orders, resulting in different byte sequences.
Number Representation Choices: While Preferred Serialization aims for the shortest forms, potential ambiguities can persist without stricter enforcement, particularly around floating-point edge cases (like NaN or signed zeros) or how integers near the boundaries of different encoding lengths are handled.
Implementation Variance: Most critically, the section in RFC 8949 titled "Deterministically Encoded CBOR" (Section 4.2) explicitly acknowledges that achieving deterministic encoding may involve application-specific decisions, providing flexibility rather than a single, universal set of rules. This inherent flexibility means that different CBOR encoders, even when attempting to produce "deterministic" output according to the base standard's guidelines, might make slightly different choices, leading to byte-level inconsistencies across platforms, languages, or library versions.

This deliberate balance in RFC 8949 reflects a common approach in standards development: providing flexibility for broad applicability while offering guidance for common needs. Mandating full canonicalization, including potentially costly operations like map key sorting, for all CBOR use cases might impose unnecessary overhead. Preferred and Basic Serialization offer levels of consistency suitable for many applications. However, the remaining ambiguities highlighted the need for more rigorous, standardized rules for applications where absolute, interoperable byte consistency is not just desirable but essential – use cases like cryptographic verification, consensus protocols, and content-addressable storage. This gap set the stage for the development of CBOR Common Deterministic Encoding (CDE) and, subsequently, dCBOR.

2.2 Stepping Up: CBOR Common Deterministic Encoding (CDE)

Recognizing the limitations of base CBOR's determinism guidelines for critical applications, the IETF CBOR Working Group initiated work on the CBOR Common Deterministic Encoding (CDE) specification (draft-ietf-cbor-cde). CDE represents a community effort to define a standardized, stricter set of encoding rules built upon CBOR, aiming to provide a reliable baseline for deterministic output that can be shared across diverse applications and implemented as a selectable feature in generic CBOR encoders. Its purpose is to systematically eliminate the ambiguities left open by RFC 8949's Section 4.2, thereby facilitating interoperable deterministic encoding.

CDE achieves this by mandating specific choices where base CBOR (even with Basic Serialization) offered flexibility. Conceptually, the key rules introduced by CDE include:

Mandatory Map Key Sorting: CDE decisively addresses the map key ordering problem. It requires that the key-value pairs in a CBOR map (major type 5) be sorted based on the byte-wise lexicographical order of the encoded representation of each key. This means the raw bytes of the encoded key determine the sort order, not the semantic value of the key itself. This choice represents a pragmatic approach, favoring implementation simplicity and unambiguousness over potentially more complex semantic sorting (e.g., Unicode collation for text keys). While perhaps counter-intuitive in edge cases (like numerically equivalent keys encoded differently), sorting by encoded bytes provides a clear, efficient, and universally applicable rule, eliminating a major source of non-determinism.
Strict Number Representations: CDE tightens the rules for number encoding beyond Preferred Serialization:
- Integers: Positive integers from 0 up to 2⁶⁴−1 MUST be encoded using unsigned integer types (major type 0). Negative integers from −1 down to −(2⁶⁴) MUST be encoded using negative integer types (major type 1). For integers outside this 64-bit range, CBOR tags 2 (positive bignum) and 3 (negative bignum) MUST be used, following preferred serialization rules, and crucially, the byte string content of these tags MUST NOT contain leading zero bytes.
- Floating-Point Numbers: All floating-point values MUST use their preferred serialization (typically the shortest IEEE 754 representation that accurately represents the value). CDE clarifies specific handling:
  - Positive zero (+0) and negative zero (-0) are encoded using their distinct IEEE 754 representations (e.g., negative zero as 0xf98000 for half-precision) without further special action.
  - NaN (Not a Number) values follow preferred serialization, using the canonical NaN encoding from IEEE 754. This often involves using the shortest form by removing trailing zeros in the payload and ensuring quiet NaNs have the leading significand bit set to 1.
  - Importantly, CDE explicitly prohibits mixing integer and floating-point types based on mathematical value. A value represented as a float in the data model MUST be encoded as a float, even if it is mathematically equivalent to an integer (e.g., 10.0 is encoded as a float, not the integer 10). This maintains a clear separation between types at the encoding level.
Disallowing Indefinite Lengths: CDE fully incorporates the rule from Basic Serialization, prohibiting the use of indefinite-length encodings for text strings (major type 3), byte strings (major type 2), arrays (major type 4), and maps (major type 5). Length must always be specified definitively.
Requiring Basic Validity: CDE mandates that encoders MUST produce CBOR that meets the "Basic Validity" requirements of RFC 8949 (Section 5.3.1). This includes ensuring that map keys are unique and that text strings contain valid UTF-8 sequences. Furthermore, CDE decoders MUST check for these validity conditions.

CDE, therefore, establishes a significantly more constrained profile of CBOR compared to Basic Serialization. It provides a robust foundation for achieving interoperable deterministic encoding suitable for many applications. However, the CDE specification also acknowledges that some applications might have even more specific requirements regarding the deterministic representation of application-level data. It introduces the concept of Application-Level Deterministic Representation (ALDR) rules, which operate on top of CDE to handle application-specific semantics, such as defining equivalence between different numeric types (e.g., integer 10 vs float 10.0) if needed by the application. By focusing on canonicalization within CBOR's type system and leaving cross-type semantics to the application layer, CDE maintains a manageable scope and broad applicability, serving as a common denominator for deterministic needs.

2.3 The Vision and the Need: Enter dCBOR

While CDE represents a significant advancement towards standardized deterministic encoding, the impetus for an even stricter set of rules emerged from a specific vision and a pressing practical need. This vision was largely articulated by Christopher Allen and pursued through the work of Blockchain Commons, focusing on enabling a new generation of secure, private, and user-controlled digital interactions. Central to this vision is the concept of the Gordian Envelope, designed as a format for "smart documents" capable of handling complex, hierarchical data while prioritizing user privacy and control.

The Gordian vision emphasizes several key principles, including independence, privacy, resilience, and openness. Gordian Envelope aims to embody these principles by being:

Structure-Ready: Capable of reliably encoding and storing diverse information structures, ranging from simple data items to semantic triples and property graphs, allowing for the creation of rich, interoperable "smart documents".
Privacy-Ready: Designed to protect user privacy through mechanisms supporting progressive trust and minimal disclosure. A core feature is elision, which allows the holder of an Envelope (not just the original issuer) to selectively redact or hide specific parts of the data before sharing it, revealing only what is necessary for a given interaction.
Verifiable: Built upon a foundation of cryptographic integrity. Envelopes incorporate a built-in, Merkle-like digest tree, where components of the Envelope are cryptographically hashed. This structure allows for verification of data integrity and authenticity, even after parts have been elided, through Merkle proofs.

Part III of this book will explore the Gordian Envelope in detail, including its structure, use cases, and the cryptographic mechanisms that underpin its functionality.

Translating this vision into a working system created a concrete technical requirement for Blockchain Commons. The functionality of Gordian Envelope, particularly its reliance on cryptographic hashing for the Merkle structure and elision mechanisms, demanded a serialization format with absolute, unambiguous byte-level consistency. CBOR was selected as the underlying format due to its inherent advantages: conciseness, binary efficiency, extensibility, and its status as an IETF standard.

However, any variability in the serialization of Envelope components would lead to different cryptographic hashes. This would break the integrity of the Merkle tree, render elision proofs invalid, and undermine the entire system's security and verifiability guarantees. The features enabling privacy and verification in Gordian Envelope are thus directly dependent on the deterministic nature of the underlying data representation. While base CBOR offered some guidance, and CDE was emerging as a stronger baseline, neither provided the specific, rigorous, and unwavering guarantees required for the core mechanics of Gordian Envelope. Blockchain Commons identified this gap and recognized the need to define and implement a stricter profile of CBOR deterministic encoding – the profile that became known as dCBOR.

2.4 A Tale of Two Standards: The Emergence of CDE and dCBOR

The path to standardized, highly deterministic CBOR involved parallel development and subsequent harmonization between the specific needs driving dCBOR and the broader goals of the IETF CBOR Working Group. The initial impetus and definition for the stricter rules required by Gordian Envelope originated within Blockchain Commons, spearheaded by Lead Researcher Wolf McNally. This internal effort focused on creating a CBOR encoding profile that eliminated ambiguities left unaddressed even by CBOR's preferred and basic serialization modes, ensuring the absolute byte consistency needed for Envelope's hash-based structures. Early implementations and specifications for this stricter profile were developed to meet these internal requirements.

Recognizing the potential value of this work for the wider community and the importance of standardization, Blockchain Commons brought their findings and proposals to the IETF CBOR Working Group. This engagement included presentations, active participation in mailing list discussions (starting around February 2023), and the submission of individual Internet-Drafts detailing their proposed deterministic CBOR profile, draft-mcnally-deterministic-cbor.

This input, alongside other potential use cases for deterministic encoding, influenced the direction of the CBOR Working Group. The WG recognized the need for a standardized common baseline for deterministic encoding that could serve a wide range of applications. This led to the development of the CBOR Common Deterministic Encoding (CDE) specification (draft-ietf-cbor-cde), primarily edited by Carsten Bormann, a key figure in the CBOR community. CDE was designed to capture the essential requirements for achieving interoperable determinism, such as map key sorting and canonical number representations, establishing an intermediate layer between base CBOR and more specialized needs.

Through discussion and collaboration within the working group, a clear relationship between CDE and the Blockchain Commons proposal emerged. CDE solidified its position as the official IETF WG effort defining the common deterministic encoding profile. The stricter set of rules developed by Blockchain Commons, initially conceived to meet Gordian Envelope's needs, was then positioned as dCBOR: a specific application profile built on top of CDE. This layering allows applications requiring the baseline determinism of CDE to use it directly, while applications with more stringent requirements, like Gordian Envelope, can adopt the dCBOR profile, which incorporates all CDE rules plus additional constraints.

This collaborative process is reflected in the co-authorship of later versions of the dCBOR Internet-Draft, which includes Wolf McNally, Christopher Allen, Carsten Bormann, and Laurence Lundblade, representing both the originators of the dCBOR requirements and key contributors to the broader CBOR and CDE standardization efforts. This convergence signifies a successful harmonization, ensuring that dCBOR exists as a well-defined extension within the CDE framework, rather than a divergent standard. The development also highlights the iterative nature of IETF work, with ongoing discussions and potential refinements, and illustrates a common pattern in standards development: a specific, implementation-driven need catalyzes a broader standardization effort, often resulting in layered specifications that cater to both general and specialized requirements.

2.5 Understanding Profiles: Layering Constraints

The relationship between CBOR, CDE, and dCBOR is best understood through the concept of a "profile," a common mechanism used within the IETF and other standards bodies to manage the evolution and specialization of technical specifications. RFC 6906, which defines the 'profile' link relation type, provides a useful definition: a profile allows resource representations (or, by extension, data formats) to indicate that they follow additional semantics – such as constraints, conventions, or extensions – beyond those defined by the base specification (like a media type or, in this case, the base CBOR standard).

Crucially, a profile is defined not to alter the fundamental semantics of the base specification for consumers unaware of the profile. This means a generic CBOR parser should still be able to process data encoded according to a CBOR profile like CDE or dCBOR, even if it cannot validate the profile-specific constraints. Profiles allow different communities or applications to tailor a base standard for their specific needs, promoting interoperability within that community without requiring changes to the underlying, more general standard.

Applying this concept:

CDE is a Profile of CBOR: The CDE specification explicitly defines itself as a profile that builds upon the Core Deterministic Encoding Requirements of RFC 8949. It selects specific encoding options permitted by base CBOR (like preferred number representations) and mandates them. It also adds new constraints not present in the base standard, most notably the requirement to sort map keys lexicographically based on their encoded bytes. These rules constrain the flexibility of base CBOR to achieve a common level of determinism.
dCBOR is a Profile of CDE: The dCBOR specification, in turn, explicitly defines itself as an application profile that conforms to, and further constrains, CDE. It inherits all the rules mandated by CDE (including map key sorting, canonical number forms, no indefinite lengths) and then adds its own, stricter requirements. These additional dCBOR-specific constraints include numeric reduction (treating certain floats and integers as equivalent), mandatory Unicode NFC normalization for strings, and limitations on allowed simple values.

This layered profiling approach offers significant advantages. It allows standardization to occur at different levels of granularity, catering to both general needs (CDE) and highly specific application requirements (dCBOR). It promotes interoperability because the profiles build upon each other hierarchically, ensuring that dCBOR data is also valid CDE data, which is also valid CBOR data. This avoids "forking" the standard, where incompatible versions might arise, and instead anchors specialized requirements within the established ecosystem. The use of profiles is thus a key tool enabling standards like CBOR to remain stable at their core while adapting to new and demanding use cases through well-defined, constrained profiles.

2.6 The Hierarchy: CBOR, CDE, and dCBOR

The relationship established through profiling creates a clear hierarchy: dCBOR is a specialized subset of CDE, which is itself a specialized subset of the possible encodings allowed by the base CBOR specification. This can be represented as:

dCBOR ⊆ CDE ⊆ CBOR

This subset relationship has a crucial practical implication known as the validity chain:

Any sequence of bytes that constitutes a valid dCBOR encoding is, by definition, also a valid CDE encoding.
Any sequence of bytes that constitutes a valid CDE encoding is, by definition, also a valid CBOR encoding (specifically, one that conforms to Basic Serialization plus map sorting and other CDE rules).

The reverse, however, is not true. A generic CBOR document may violate CDE rules (e.g., use indefinite lengths or unsorted map keys), and a CDE document may violate dCBOR rules (e.g., contain a float like 10.0 instead of the integer 10, or use non-NFC strings).

This hierarchy means that basic parsing compatibility is maintained. A generic CBOR decoder can parse the structure of CDE or dCBOR data. A CDE-aware decoder can parse dCBOR data and validate its conformance to CDE rules. However, only a dCBOR-aware decoder can fully validate all the specific constraints imposed by the dCBOR profile, such as numeric reduction or NFC string normalization. This ensures that dCBOR can be integrated into existing CBOR/CDE workflows without breaking basic interoperability, while still allowing for stricter validation where required.

The key differences introduced at each level, representing progressively tighter constraints on the encoding process, can be summarized as follows:

CBOR (Basic Serialization) → CDE:
- Map Key Order: Becomes mandatory; keys MUST be sorted lexicographically based on their encoded byte representation.
- Number Encoding: Preferred/shortest forms become mandatory, with specific canonical rules for floats (including NaN) and large integers (tags 2/3 without leading zeros). Mixing integer/float types for mathematically equivalent values is prohibited.
- Basic Validity: Explicitly required (no duplicate map keys, valid UTF-8).
CDE → dCBOR:
- Numeric Reduction: Mandatory; floating-point numbers that are numerically equal to integers within the range [−2⁶³,2⁶⁴−1] MUST be encoded as integers. All NaNNaN values MUST be reduced to a single canonical half-precision quiet NaN (0xf97e00).
- Simple Values: Restricted; only falsefalse, truetrue, nullnull, and floating-point values (major type 7, subtypes 20-27) are permitted. Other simple values (subtypes 0-19, 28-255) are disallowed.
- String Normalization: Mandatory; all text strings MUST be encoded in Unicode Normalization Form C (NFC).
- Duplicate Map Keys: Explicitly rejected by decoders (building on CDE's requirement for encoders not to emit them).
- Validity Checking by Decoders: Decoders MUST reject any data that does not conform to dCBOR rules, including CDE rules.

The following table provides a comparative overview of how key sources of non-determinism are handled at each level:

Feature	CBOR (Basic Serialization)	CDE (CBOR Common Deterministic Encoding)	dCBOR (Application Profile)
Map Key Order	Not specified (any order allowed)	Mandatory: Lexicographical sort of encoded key bytes	Mandatory: Lexicographical sort of encoded key bytes
Integer Encoding	Preferred (shortest head) mandatory; No indefinite length	Preferred mandatory; Strict rules for 64-bit range & tags 2/3 (no leading zeros)	Inherits CDE rules
Float Encoding	Preferred (shortest IEEE 754) mandatory; No indefinite length	Preferred mandatory; Canonical NaN; No int/float mixing	Inherits CDE rules; Canonical NaN reduced to `0xf97e00`
Numeric Reduction	Not applicable	Not specified (handled by ALDR if needed)	Mandatory: Float-to-int reduction; Canonical NaN reduction
Indefinite Lengths	Disallowed (for strings, arrays, maps)	Disallowed	Disallowed
Allowed Simple Values	All simple values (0-255) potentially allowed	All simple values potentially allowed	Restricted: Only `falsefalse`, `truetrue`, `nullnull`, floats allowed
String Normalization	Not specified	Not specified	Mandatory: Unicode NFC
Duplicate Map Keys	Invalid CBOR (handling not mandated)	Encoder MUST NOT emit; Decoder MUST check basic validity	Encoder MUST NOT emit; Decoder MUST reject
Validity Checking by Decoders	Not specified	Decoder MUST check basic validity	Decoder MUST reject any data that does not conform to dCBOR rules

These differences highlight how dCBOR makes specific, opinionated choices about semantic equivalence that go beyond the more generic baseline of CDE. For example, the numeric reduction rule embeds the semantic decision that, within the dCBOR profile, the integer 2 and the float 2.0 should produce identical byte sequences. Similarly, mandating NFC strings embeds the decision that different Unicode representations of the same visual character should yield the same bytes. While these choices might not be suitable for all applications, they are crucial for use cases like Gordian Envelope where achieving unambiguous byte-level representation for semantically equivalent data is paramount for hash-based verification.

2.7 Laying the Foundation: Why dCBOR for Gordian Envelope

The rigorous constraints imposed by the dCBOR profile are not arbitrary; they directly enable the core functionality and security goals of the Gordian Envelope system. Revisiting the requirements outlined in Section 2.3, the necessity of dCBOR becomes clear:

Merkle Tree Integrity: Gordian Envelope's structure relies on a Merkle-like tree where the digest (cryptographic hash) of each component contributes to the digests of its parent components, culminating in a single root hash for the entire Envelope. This structure allows for efficient verification of the Envelope's integrity. This mechanism is critically dependent on the absolute byte consistency provided by dCBOR. Any variation in the serialization of a sub-envelope – whether due to map key order, number representation choices, or string normalization differences – would result in a different hash. This differing hash would propagate up the tree, changing the root hash and invalidating any integrity proofs. dCBOR's strict rules ensure that the same logical Envelope content always produces the exact same byte sequence, guaranteeing stable and reproducible hashes across different systems, libraries, and time.
Elision Reliability: The privacy-enhancing feature of elision allows a holder to redact parts of an Envelope while proving that the redacted parts were originally present. This is achieved by replacing the elided sub-envelope with its pre-computed digest. For a recipient to verify the integrity of the partially elided Envelope, they must be able to trust that the provided digests accurately represent the original, now-hidden content. This trust relies entirely on the guarantee that the hash of any given sub-envelope is unique and unchanging. dCBOR provides this guarantee. If the serialization were non-deterministic, the hash computed by the issuer might differ from a hash computed later (e.g., by the holder before elision or by the verifier on a similar structure), rendering the elision mechanism unreliable.
Content Addressing: Envelopes, identified by their root hash, can be used in content-addressable systems. dCBOR ensures that two Envelopes containing the exact same logical information will always produce the identical root hash, enabling reliable storage, retrieval, and deduplication based on content.

While CDE provides a strong baseline for determinism, its rules alone might not suffice for the specific semantic requirements of Gordian Envelope. Consider these examples:

Numeric Equivalence: An application using Gordian Envelope might consider the integer 2 and the floating-point number 2.0 to be semantically identical within its data model. In fact, this is common in extremely popular environments like JavaScript, which do not distinguish between integer and floating point types. CDE, however, explicitly encodes these differently, and requires that users select whether they are encoding an integer or floating point type. If both representations were allowed in an Envelope, they would produce different hashes, breaking comparisons and potentially invalidating Merkle proofs if one form were substituted for the other. dCBOR's mandatory numeric reduction rule addresses this directly by forcing 2.0 to be encoded as the integer 2, ensuring a single, canonical byte representation for these semantically equivalent values.
String Equivalence: Similarly, if an application treats precomposed é (U+00E9) and decomposed e + combining accent ´ (U+0065 U+0301) as identical, CDE's lack of mandatory normalization could lead to different byte sequences and different hashes for otherwise identical data. dCBOR's requirement for NFC normalization ensures that such visually identical strings produce the same canonical byte sequence, preserving hash consistency.

Therefore, dCBOR is more than just a stricter version of CDE; it is the specifically tailored, rigorously deterministic foundation upon which the advanced security, privacy, and verifiability features of Gordian Envelope are constructed. The decision to use dCBOR reflects a design philosophy where the demanding requirements of the application layer directly informed the choice and definition of the underlying serialization layer, ensuring the necessary properties were available rather than compromising the application's goals.

Another less obvious, but no less important goal of dCBOR is to minimize the semantic burden placed on users and application developers. Determinism at the highest application level is not easy to achieve, with applications needing to define their own Application Level Deterministic Rules (ALDRs). Having to also think about low-level encoding issues is a challenge, especially when it's mostly re-inventing the wheel. By enforcing strict, unambiguous encoding rules at the serialization layer, dCBOR ensures that all data conforming to its profile is represented in a single, canonical byte form. This means that higher-level abstractions—such as application logic, cryptographic protocols, or data modeling frameworks—can operate with the assurance that the underlying data representation is always consistent and deterministic. Developers do not need to implement their own normalization, canonicalization, or equivalence checks for common sources of ambiguity like numeric types or Unicode strings. Instead, they can rely on dCBOR's guarantees, simplifying application code and reducing the risk of subtle bugs or security vulnerabilities arising from inconsistent data encoding. This separation of concerns enables robust, verifiable systems to be built atop dCBOR, confident that the foundational layer will always provide the determinism required for reliable operation.

2.8 Conclusion: A Path to Verifiable Data

The journey from base CBOR's initial determinism guidelines to the rigorous specification of dCBOR illustrates a common pattern in the evolution of technical standards: as applications become more sophisticated, the need for stronger guarantees from underlying protocols increases. Base CBOR (RFC 8949), with its Preferred and Basic Serialization options, provided foundational steps towards encoding consistency, balancing flexibility with efficiency. However, for applications demanding absolute, interoperable byte-level agreement, these steps proved insufficient.

The IETF CBOR Working Group addressed this gap by developing the CBOR Common Deterministic Encoding (CDE) profile, establishing a standardized baseline that mandates key rules like map key sorting and canonical number representations. CDE offers a significant improvement for many use cases requiring reliable data comparison or hashing.

Yet, driven by the specific, demanding requirements of systems like Gordian Envelope – systems built on verifiable data structures, cryptographic hashing, and privacy-preserving techniques like elision – an even stricter level of determinism was necessary. This led to the definition of dCBOR, an application profile layered on top of CDE, which introduces additional constraints such as numeric reduction and mandatory string normalization.

This progression – CBOR → CDE → dCBOR – is a response to the growing need for trustworthy digital systems. The subtle issues arising from serialization non-determinism can have profound impacts on the reliability and security of applications involving digital signatures, distributed consensus, content-addressable storage, and verifiable credentials. dCBOR, by providing an unambiguous, canonical byte representation for logical data according to its specific rules, serves as a critical enabling technology. It lays the necessary foundation for building robust, secure, and interoperable systems like Gordian Envelope, paving the way for a future where digital data can be more reliably verified, shared, and controlled by its users.

Using dCBOR

So after all that discussion of the motivation for dCBOR, let's just recap its rules all in one place, and specifically how they differ from basic CBOR:

Numeric Values: "Preferred Serialization" isn't just preferred, it's required.
Numeric Reduction: Floating point values that can accurately be represented as integers must be serialized as integers.
No NaNs with Payloads: Did you even know NaN has “payloads”?
Map Keys: No duplicates. Must be serialized sorted lexicographically by the serialized key.
Indefinite Lengths: Indefinite length arrays, maps, bytestrings, and strings are not allowed.
Simple Values: Only falsefalse, truetrue, and nullnull are allowed.
Strings: Must be encoded in Unicode Normalization Form C (NFC).
Decoders: Must check all the rules above and reject any serialization that doesn't conform to them.

Pretty simple, right?

It gets even simpler when you use a CBOR library that supports dCBOR directly, as the implementation should take care of all the details for you. In fact, a good API will even make it impossible to create invalid dCBOR serializations.

The dcbor crate is the Rust reference implementation of dCBOR from Blockchain Commons, and in this chapter we'll show you how easy it is to use.

Installation

This will add the latest version of the dcbor crate to your Cargo.toml file:

cargo add dcbor

Getting Started

dcbor includes a prelude module that re-exports all the types and traits you need to use dCBOR:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Many common types are directly convertible into dCBOR. Thanks to dCBOR's numeric reduction, you don't even need to specify whether common numeric types should be serialized as integers or floating point: the dcbor library will automatically choose the best representation for you.

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

When you use value.to_cbor() or CBOR::from(value), you're not actually encoding the CBOR serialization in that moment. You're actually creating an intermediate representation of the data (an instance of CBOR) that can be serialized later when you call a method like to_cbor_data.

Converting back from CBOR is also easy: you specify the type you want to convert to, and the dcbor library will do the rest. You use the try_from method to convert from CBOR to a Rust type, which will succeed if the CBOR can be accurately converted to that type. If the conversion fails, it will return an error:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

In the following example we use try_from to convert from CBOR to both a u8 type and an f64 type. Both succeed, because the value 42 can be represented as both an 8-bit unsigned integer and a 64-bit floating point number:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Note

Observe the call to clone() above, which we need because the try_from method consumes the CBOR value, and we still need an instance for the second try_from call. Instances of CBOR are immutable, and the dcbor library implements structure sharing, so cloning is always cheap.

Below we encode a floating point value with a non-zero fractional part, which succeeds in being decoded back to floating point but fails to decode back to an integer, because precision would be lost:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

This idiom is not just for numeric types: you can use it for any type that implements the TryFrom<CBOR> trait, like String:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

It even works for vectors:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Byte Strings

The last example raises an interesting question: is our Vec<u8> being serialized as a CBOR array or a CBOR byte string? Let's check:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

As you can see, the header byte specifies an array of five elements, followed by five CBOR data items for the integers 11, 22, 33, 44, and 55. So the Vec<u8> is being serialized as a CBOR array, not a byte string.

In Rust, Vec<u8> is often used to represent a string or buffer of bytes, but in CBOR, a byte string is a different type distinct from a vector or an array. The CBOR type provides a static method CBOR::to_byte_string that converts a Vec<u8> into a CBOR byte string:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Everything in the serialization in this example is the same as the last, except the header byte, which was 0x850x85 for a 5-element array, and 0x450x45 for a byte string of length 5.

Notice that recovering the byte string is also different. Since a byte string is not an array, we can't extract it as a Vec<u8>. Instead, we extract it as the type ByteString, and then convert that to a Vec<u8> using .into().

ByteString is just a wrapper around Vec<u8>, and it has most of the same capabilities, but the dcbor library treats it as a CBOR byte string, not a CBOR array.

Simple Values: `falsefalse`, `truetrue`, and `nullnull`

dCBOR supports three simple values— falsefalse, truetrue, and nullnull— and the dcbor library provides a set of conveniences for working with them. In the example below we create a CBOR array containing [true, false, null][true, false, null], and then test its CBOR diagnostic notation and annotated hex serialization:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Something interesting is going on here: our array has three values, two of which are booleans and the third is its own type: nullnull. CBOR is designed to handle such heterogeneous arrays with no problem. But Rust (unlike some languages like JavaScript) doesn't have a nullnull value (preferring Option<T> for values which may not be present). Rust also doesn't natively support Vecs containing mixed types. So how does the dcbor library handle this?

First, note that our array is not declared as a Vec<bool> but as a Vec<CBOR>. The CBOR type can hold any CBOR value, including complex values like nested arrays and maps. In the context of the vec! macro composing a Vec<CBOR>, the Rust boolean values true and false can just be converted directly using .into(), and that's what we're doing here.

Rust has no nullnull value, so the dcbor library provides a CBOR::null() method that returns a CBOR instance representing the nullnull value.

And since all three elements of the array are being converted directly into CBOR, there is no problem constructing the heterogeneous array.

Note

Of course, dCBOR doesn't support CBOR undefined or any of the other simple values, so the dcbor API doesn't have ways to let you construct them.

Extracting from a Heterogeneous Array

So now that we've gotten ourselves into this situation, how do we get the values back out? The dcbor library provides a set of methods for testing and extracting the CBOR major types, as well as unique values like truetrue, falsefalse, and nullnull:

In the example below we first begin by extracting our CBOR array from the composed CBOR instance. We then demonstrate several methods to either extract values or test them against expected values.

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Maps

As long as all the types contained in a Rust HashMap or BTreeMap are supported by CBOR (we'll discuss how to make your own types CBOR-compatible in a later chapter), then converting them to CBOR and back is straightforward.

In the example below we round-trip a Rust HashMap with String keys and Vec<String> values all the way to serialized CBOR data and back again:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Those familiar with JSON know that it only supports string keys, but CBOR supports any type of CBOR value as a key, and it's a common pattern to use integers as keys, which are much more compact:

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

Note the use of diagnostic_flat() in this example, which returns the diagnostic notation with no line breaks or indentation. In previous examples we also used either hex() or hex_annotated() depending on the desired formatting.

Heterogeneous Maps

CBOR (and the dcbor library) supports heterogeneous maps, which means that the keys and values can be of different types within the same map. The technique is basically the same as with heterogeneous arrays: you use CBOR as the type for the keys and values, and then convert them to the appropriate types when you extract them.

use anyhow::Result;
use std::{collections::HashMap, vec};

// This is all you need to import to use the library.
use dcbor::prelude::*;

#[rustfmt::skip]
pub fn main() {
    // Encode the integer 42
    let i = 42;
    let cbor: CBOR = i.to_cbor();
    // The CBOR type above here for clarity, can be inferred

    // Check the diagnostic representation
    assert_eq!(cbor.diagnostic(), "42");

    // Check the hex representation
    assert_eq!(cbor.hex(), "1a002a");

    // Check the CBOR data
    assert_eq!(cbor.to_cbor_data(), vec![0x1a, 0x00, 0x2a]);
}

#[test]
#[rustfmt::skip]
fn test_2() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();
let b = i32::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_3() -> Result<()> {
let a = 42;
let cbor = a.to_cbor();

// Decode as a u8
let b = u8::try_from_cbor(&cbor)?;
assert_eq!(a as u8, b);

// Decode as an f64
let c = f64::try_from_cbor(&cbor)?;
assert_eq!(a as f64, c);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_4() -> Result<()> {
let a = 1.23456;
let cbor = a.to_cbor();

// Decode as an f64
let b = f64::try_from_cbor(&cbor)?;
assert_eq!(a, b);

// Cannot decode as a i32
assert!(i32::try_from_cbor(&cbor).is_err());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_5() -> Result<()> {
let a = "Hello, dCBOR!";
let cbor = a.to_cbor();

// Decode as an f64 fails
assert!(f64::try_from_cbor(&cbor).is_err());

// Decode as a String succeeds
let b = String::try_from_cbor(&cbor)?;
assert_eq!(a, b);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_6() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

// Decode as Vec of a compatible type: 32-bit signed integers
let b: Vec<i32> = Vec::try_from_cbor(&cbor)?;
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_7() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a: Vec<u8> = vec![1, 2, 3, 4, 5];
let cbor = a.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

85      # array(5)
    01  # unsigned(1)
    02  # unsigned(2)
    03  # unsigned(3)
    04  # unsigned(4)
    05  # unsigned(5)

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_8() -> Result<()> {
// Encode a vector of 8-bit unsigned integers
let a = vec![1, 2, 3, 4, 5];
let byte_string = CBOR::to_byte_string(a);
let cbor = byte_string.to_cbor();

let hex = cbor.hex_annotated();
let expected_hex = r#"

45              # bytes(5)
    0102030405

"#.trim();

assert_eq!(hex, expected_hex);

let b: Vec<u8> = ByteString::try_from_cbor(&cbor)?.into();
assert_eq!(b, vec![1, 2, 3, 4, 5]);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_9() -> Result<()> {
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
let cbor = v.to_cbor();

let diagnostic = cbor.diagnostic();
let expected_diagnostic = "[true, false, null]";
assert_eq!(diagnostic, expected_diagnostic);

let hex = cbor.hex_annotated();
let expected_hex = r#"

83      # array(3)
    f5  # true
    f4  # false
    f6  # null

"#.trim();

assert_eq!(hex, expected_hex);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_10() -> Result<()> {
// Compose an array of CBOR values
let v: Vec<CBOR> = vec![
    true.into(),
    false.into(),
    CBOR::null(),
];
// Convert the array to a single CBOR object, which would
// be serialized to CBOR data or recovered from it.
let cbor: CBOR = v.to_cbor();

// Recover the array from the CBOR object
let v2: Vec<CBOR> = CBOR::try_array(&cbor)?;

// Check the length of the array
assert_eq!(v2.len(), 3);

// For the first value (`true`), extract it so it could be saved for later.
let t = CBOR::try_bool(&v2[0])?;
assert!(t);

// For the second value (`false`), just assert that it is false.
assert!(v2[1].is_false());

// For the third value (`null`), assert that it is null.
assert!(v2[2].is_null());
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_11() -> Result<()> {
// Create a HashMap with String keys and Vec<String> values
let mut h: HashMap<String, Vec<String>> = HashMap::new();
h.insert("animals".into(), vec!("cat".into(), "dog".into(), "horse".into()));
h.insert("colors".into(), vec!["red".into(), "green".into(), "blue".into()]);

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic();
let expected_diagnostic = r#"

{
    "colors":
    ["red", "green", "blue"],
    "animals":
    ["cat", "dog", "horse"]
}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Serialize the CBOR to binary data
let data: Vec<u8> = cbor.to_cbor_data();

// Check the hex representation of the serialized data
let hex = hex::encode(&data);
let expected_hex = "a266636f6c6f7273836372656465677265656e64626c756567616e696d616c73836363617463646f6765686f727365";
assert_eq!(hex, expected_hex);

// Deserialize the data back into a CBOR object
let cbor2: CBOR = CBOR::try_from_data(data)?;

// Convert the CBOR object back into a HashMap
let h2: HashMap<String, Vec<String>> = cbor2.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_12() -> Result<()> {
// Create a HashMap with integer keys and Vec<String> values
let mut h: HashMap<usize, Vec<String>> = HashMap::new();
h.insert(1, ["cat", "dog", "horse"].map(str::to_string).to_vec());
h.insert(2, ["red", "green", "blue"].map(str::to_string).to_vec());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<usize, Vec<String>> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}


#[test]
#[rustfmt::skip]
fn test_13() -> Result<()> {
// Create a HashMap with CBOR for its keys and values
let mut h: HashMap<CBOR, CBOR> = HashMap::new();
h.insert(1.into(), vec![CBOR::from("cat"), "dog".into(), "horse".into()].into());
h.insert(2.into(), vec![CBOR::from("red"), "green".into(), "blue".into()].into());

// Convert the HashMap to a CBOR object
let cbor = h.to_cbor();

// Check the representation in CBOR diagnostic notation
let diagnostic = cbor.diagnostic_flat();
let expected_diagnostic = r#"

{1: ["cat", "dog", "horse"], 2: ["red", "green", "blue"]}

"#.trim();
assert_eq!(diagnostic, expected_diagnostic);

// Convert the CBOR object back into a HashMap
let h2: HashMap<CBOR, CBOR> = cbor.try_into()?;

// Check that the original and deserialized HashMaps are equal
assert_eq!(h, h2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn test_14() -> Result<()> {
// number to CBOR
let n = 10;
let cbor = n.to_cbor();
// CBOR to number
assert_eq!(i32::try_from_cbor(&cbor)?, n);
assert_eq!(f64::try_from_cbor(&cbor)?, n as f64);

// bool to CBOR
let b = true;
let cbor = b.to_cbor();
// CBOR to bool
assert_eq!(bool::try_from_cbor(&cbor)?, b);
assert_eq!(CBOR::try_bool(&cbor)?, b);

// null to CBOR
// let n = CBOR::null();
// let cbor = n.to_cbor();
// // CBOR to null
// let n2 = CBOR::try_from_cbor(&cbor)?;
// assert_eq!(n2, n);
// assert!(cbor.is_null());

// bstr to CBOR
let v = vec![1, 2, 3, 4, 5];
let b = ByteString::from(&v);
let cbor = b.to_cbor();
let cbor2 = CBOR::to_byte_string(&v);
assert_eq!(cbor, cbor2);
// CBOR to bstr
assert_eq!(ByteString::try_from_cbor(&cbor)?, b);
let array: Vec<u8> = CBOR::try_byte_string(&cbor)?;
assert_eq!(array, v);

// tstr to CBOR
let t = "Hello";
let cbor = t.to_cbor();
// CBOR to tstr
assert_eq!(String::try_from_cbor(&cbor)?, t);
assert_eq!(CBOR::try_text(&cbor)?, t);

// array to CBOR
let a = vec![1, 2, 3];
let cbor = a.to_cbor();
// CBOR to homogenous array
let b = Vec::<i32>::try_from_cbor(&cbor)?;
assert_eq!(b, a);
// CBOR to heterogeneous array
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = b.iter()
    .map(|x| i32::try_from_cbor(x).map_err(Into::into))
    .collect::<Result<_>>()?;
assert_eq!(b, a);
let b: Vec<CBOR> = CBOR::try_array(&cbor)?;
let b: Vec<i32> = vec![
    i32::try_from_cbor(&b[0])?,
    i32::try_from_cbor(&b[1])?,
    i32::try_from_cbor(&b[2])?,
];
assert_eq!(b, a);

// map to CBOR
let mut m: HashMap<String, i32> = HashMap::new();
m.insert("a".into(), 1);
m.insert("b".into(), 2);
let cbor = m.to_cbor();
// CBOR to homogenous map
let m2 = HashMap::<String, i32>::try_from_cbor(&cbor)?;
assert_eq!(m, m2);
// CBOR to heterogeneous map
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let m2: HashMap<String, i32> = m2.iter()
    .map(|(k, v)| {
        let k = String::try_from_cbor(k).map_err(anyhow::Error::from)?;
        let v = i32::try_from_cbor(v).map_err(anyhow::Error::from)?;
        Ok((k, v))
    })
    .collect::<Result<_>>()?;
assert_eq!(m, m2);
let m2: dcbor::Map = CBOR::try_map(&cbor)?;
let a: i32 = m2.extract("a")?;
assert_eq!(a, 1);
let b: i32 = m2.extract("b")?;
assert_eq!(b, 2);

// tagged to CBOR
let t = CBOR::to_tagged_value(999, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(Tag::from(999), t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// tagged (with name) to CBOR
let named_tag = Tag::new(999, "my-tag");
let t = CBOR::to_tagged_value(&named_tag, "Hello");
let cbor = t.to_cbor();
// CBOR to tagged
let t2: (Tag, CBOR) = CBOR::try_tagged_value(&cbor)?;
assert_eq!(named_tag, t2.0);
assert_eq!(String::try_from_cbor(&t2.1)?, "Hello");
// Expecting a specific tag
let t2 = CBOR::try_expected_tagged_value(&cbor, &named_tag)?;
assert_eq!(String::try_from_cbor(&t2)?, "Hello");

// Registering a tag for diagnostic annotation
with_tags_mut!(|tags: &mut TagsStore| {
    tags.insert(named_tag);
});
assert_eq!(cbor.diagnostic_annotated(), r#"999("Hello")   / my-tag /"#);
Ok(())
}

In the next chapter we'll cover how to use tags in dCBOR.

dCBOR Tags

As discussed in Part I: CBOR Tags, CBOR tags are a powerful feature of CBOR that provides a space of integers used to "tag" CBOR data items, specifying their type or meaning.

Let's say we wanted to define a tag that identifies a string as holding an ISO 4217 currency code like USD or EUR. We could just use a bare string, but if we want our type to be completely self-describing, we can define a tag for it.

As long as you are the only one using that tag, you can choose any integer you want. But if you want your structure to interoperate with other systems, you should use a tag that is registered with IANA, discussed previously here.

For our demonstration we'll use the tag 3300033000, which as of this writing is unassigned by IANA.

How would we tag a string as a currency type? Let's start by defining a constant for our tag:

const TAG_CURRENCY_CODE: u64 = 33000;

We now associate our string with the tag by using the to_tagged_value() method:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

We can extract the tag and the tagged value using try_into_tagged_value(). The return type is a tuple of a Tag and the tagged item:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

The reason you have to call value() on the returned Tag to get back the numeric value is that a Tag may also include a human-readable name you can define for your tag. We'll discuss naming tags later in this chapter.

Note

A tagged value is the combination of a tag and the value (data item) it tags. But the value of the tag is the integer that identifies the tag.

If we print the diagnostic notation of our tagged value, we can see the tag in the output:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

As shown above, we can always extract the (Tag, CBOR) tuple from a tagged value, and then compare the tag value to our constant to see whether we want to process it further. But it's a common pattern to expect to find a specific tag in a particular place in a CBOR structure. dcbor provides a convenience method try_into_expected_tagged_value() to test the tag value and return an error if it doesn't match. If it succeeds, it returns the tagged value for further processing.

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

Tagging a Complex Structure

Let's say we want to combine our tagged currency code with an amount. Currency amounts can be tricky, because they are expressed as having decimal fractions, but many common floating point values, like 1.1 cannot be represented exactly in binary floating point, meaning that even highly-precise types like f64 can't represent common currency values accurately.

Let's define a new type called DecimalFraction that holds an integer mantissa and a signed integer exponent representing powers of 10. When negative, the exponent indicates the number of places to the right of the decimal point, so 1.1 would be represented as a mantissa of 11 with an exponent of -1, and 1.01 would be represented as a mantissa of 101 with an exponent of -2.

use dcbor::prelude::*;

use crate::tags::TAG_DECIMAL_FRACTION;

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct DecimalFraction {
    pub exponent: i8,
    pub mantissa: i64,
}

impl DecimalFraction {
    /// Create a new `DecimalFraction` from raw parts.
    pub fn new(exponent: i8, mantissa: i64) -> Self {
        Self { exponent, mantissa }
    }

    /// Convert back to `f64`. May lose precision on large exponents.
    pub fn to_f64(self) -> f64 {
        (self.mantissa as f64) * (10f64).powi(self.exponent as i32)
    }
}

impl std::fmt::Display for DecimalFraction {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        if self.mantissa == 0 {
            return write!(f, "0");
        }

        let abs_value = self.mantissa.abs();
        let is_negative = self.mantissa < 0;
        let prefix = if is_negative { "-" } else { "" };

        if self.exponent >= 0 {
            // For positive exponent, add zeros after the number
            write!(f, "{}{}{}", prefix, abs_value, "0".repeat(self.exponent as usize))
        } else {
            // For negative exponent, insert decimal point
            let abs_exp = -self.exponent as usize;
            let value_str = abs_value.to_string();

            if value_str.len() <= abs_exp {
                // Decimal point at the beginning with possible leading zeros
                let padding = abs_exp - value_str.len();
                write!(f, "{}0.{}{}", prefix, "0".repeat(padding), value_str)
            } else {
                // Insert decimal point within the number
                let decimal_pos = value_str.len() - abs_exp;
                let (integer_part, fractional_part) = value_str.split_at(decimal_pos);
                write!(f, "{}{}.{}", prefix, integer_part, fractional_part)
            }
        }
    }
}

impl From<DecimalFraction> for CBOR {
    fn from(value: DecimalFraction) -> Self {
        // Compose the two-element array
        let v = vec![value.exponent as i64, value.mantissa].to_cbor();

        // Return the tagged array
        CBOR::to_tagged_value(TAG_DECIMAL_FRACTION, v)
    }
}

impl TryFrom<CBOR> for DecimalFraction {
    type Error = dcbor::Error;

    fn try_from(cbor: CBOR) -> Result<Self, Self::Error> {
        // Decode the tagged array
        let item = cbor.try_into_expected_tagged_value(TAG_DECIMAL_FRACTION)?;

        // Convert the item to an array
        let arr = item.try_into_array()?;

        // Validate the length of the array
        if arr.len() != 2 {
            return Err("Expected a two-element array".into());
        }

        // Extract the exponent and mantissa
        let exponent: i8 = arr[0].clone().try_into()?;
        let mantissa: i64 = arr[1].clone().try_into()?;

        // Return the DecimalFraction
        Ok(DecimalFraction::new(exponent, mantissa))
    }
}

#[test]
fn decimal_fraction() {
    let a = DecimalFraction::new(-1, 11);
    assert_eq!(a.mantissa, 11);
    assert_eq!(a.exponent, -1);
    assert!((a.to_f64() - 1.1).abs() < f64::EPSILON);

    let b = DecimalFraction::new(-2, 101);
    assert_eq!(b.mantissa, 101);
    assert_eq!(b.exponent, -2);
    assert!((b.to_f64() - 1.01).abs() < f64::EPSILON);
}

#[test]
fn decimal_fraction_display() {
    // Test zero
    let zero = DecimalFraction::new(0, 0);
    assert_eq!(zero.to_string(), "0");

    // Test positive value with zero exponent
    let simple = DecimalFraction::new(0, 42);
    assert_eq!(simple.to_string(), "42");

    // Test positive values with positive exponent
    let pos_exp1 = DecimalFraction::new(2, 5);
    assert_eq!(pos_exp1.to_string(), "500");

    let pos_exp2 = DecimalFraction::new(3, 123);
    assert_eq!(pos_exp2.to_string(), "123000");

    // Test negative values with positive exponent
    let neg_pos_exp = DecimalFraction::new(1, -42);
    assert_eq!(neg_pos_exp.to_string(), "-420");

    // Test positive values with negative exponent
    let pos_neg_exp1 = DecimalFraction::new(-2, 123);
    assert_eq!(pos_neg_exp1.to_string(), "1.23");

    let pos_neg_exp2 = DecimalFraction::new(-1, 5);
    assert_eq!(pos_neg_exp2.to_string(), "0.5");

    let pos_neg_exp3 = DecimalFraction::new(-3, 5);
    assert_eq!(pos_neg_exp3.to_string(), "0.005");

    // Test negative values with negative exponent
    let neg_neg_exp1 = DecimalFraction::new(-2, -123);
    assert_eq!(neg_neg_exp1.to_string(), "-1.23");

    let neg_neg_exp2 = DecimalFraction::new(-3, -5);
    assert_eq!(neg_neg_exp2.to_string(), "-0.005");

    // Test boundary cases
    let boundary1 = DecimalFraction::new(-9, 123456789);
    assert_eq!(boundary1.to_string(), "0.123456789");

    let boundary2 = DecimalFraction::new(-1, 1);
    assert_eq!(boundary2.to_string(), "0.1");
}

Note

We're not showing a lot of the typical boilerplate code here, like the impls for Debug, Clone, Display, and things like new() methods. You can find the complete code in the repo for this book.

It turns out that RFC8949 §3.4.4 already defines a CBOR schema for decimal fractions, so we can use that: it's just a two-element array with the exponent first and the mantissa second. It also reserves the tag 4 for decimal fractions, so we can use that as our tag.

const TAG_DECIMAL_FRACTION: u64 = 4;

use dcbor::prelude::*;

use crate::tags::TAG_DECIMAL_FRACTION;

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct DecimalFraction {
    pub exponent: i8,
    pub mantissa: i64,
}

impl DecimalFraction {
    /// Create a new `DecimalFraction` from raw parts.
    pub fn new(exponent: i8, mantissa: i64) -> Self {
        Self { exponent, mantissa }
    }

    /// Convert back to `f64`. May lose precision on large exponents.
    pub fn to_f64(self) -> f64 {
        (self.mantissa as f64) * (10f64).powi(self.exponent as i32)
    }
}

impl std::fmt::Display for DecimalFraction {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        if self.mantissa == 0 {
            return write!(f, "0");
        }

        let abs_value = self.mantissa.abs();
        let is_negative = self.mantissa < 0;
        let prefix = if is_negative { "-" } else { "" };

        if self.exponent >= 0 {
            // For positive exponent, add zeros after the number
            write!(f, "{}{}{}", prefix, abs_value, "0".repeat(self.exponent as usize))
        } else {
            // For negative exponent, insert decimal point
            let abs_exp = -self.exponent as usize;
            let value_str = abs_value.to_string();

            if value_str.len() <= abs_exp {
                // Decimal point at the beginning with possible leading zeros
                let padding = abs_exp - value_str.len();
                write!(f, "{}0.{}{}", prefix, "0".repeat(padding), value_str)
            } else {
                // Insert decimal point within the number
                let decimal_pos = value_str.len() - abs_exp;
                let (integer_part, fractional_part) = value_str.split_at(decimal_pos);
                write!(f, "{}{}.{}", prefix, integer_part, fractional_part)
            }
        }
    }
}

impl From<DecimalFraction> for CBOR {
    fn from(value: DecimalFraction) -> Self {
        // Compose the two-element array
        let v = vec![value.exponent as i64, value.mantissa].to_cbor();

        // Return the tagged array
        CBOR::to_tagged_value(TAG_DECIMAL_FRACTION, v)
    }
}

impl TryFrom<CBOR> for DecimalFraction {
    type Error = dcbor::Error;

    fn try_from(cbor: CBOR) -> Result<Self, Self::Error> {
        // Decode the tagged array
        let item = cbor.try_into_expected_tagged_value(TAG_DECIMAL_FRACTION)?;

        // Convert the item to an array
        let arr = item.try_into_array()?;

        // Validate the length of the array
        if arr.len() != 2 {
            return Err("Expected a two-element array".into());
        }

        // Extract the exponent and mantissa
        let exponent: i8 = arr[0].clone().try_into()?;
        let mantissa: i64 = arr[1].clone().try_into()?;

        // Return the DecimalFraction
        Ok(DecimalFraction::new(exponent, mantissa))
    }
}

#[test]
fn decimal_fraction() {
    let a = DecimalFraction::new(-1, 11);
    assert_eq!(a.mantissa, 11);
    assert_eq!(a.exponent, -1);
    assert!((a.to_f64() - 1.1).abs() < f64::EPSILON);

    let b = DecimalFraction::new(-2, 101);
    assert_eq!(b.mantissa, 101);
    assert_eq!(b.exponent, -2);
    assert!((b.to_f64() - 1.01).abs() < f64::EPSILON);
}

#[test]
fn decimal_fraction_display() {
    // Test zero
    let zero = DecimalFraction::new(0, 0);
    assert_eq!(zero.to_string(), "0");

    // Test positive value with zero exponent
    let simple = DecimalFraction::new(0, 42);
    assert_eq!(simple.to_string(), "42");

    // Test positive values with positive exponent
    let pos_exp1 = DecimalFraction::new(2, 5);
    assert_eq!(pos_exp1.to_string(), "500");

    let pos_exp2 = DecimalFraction::new(3, 123);
    assert_eq!(pos_exp2.to_string(), "123000");

    // Test negative values with positive exponent
    let neg_pos_exp = DecimalFraction::new(1, -42);
    assert_eq!(neg_pos_exp.to_string(), "-420");

    // Test positive values with negative exponent
    let pos_neg_exp1 = DecimalFraction::new(-2, 123);
    assert_eq!(pos_neg_exp1.to_string(), "1.23");

    let pos_neg_exp2 = DecimalFraction::new(-1, 5);
    assert_eq!(pos_neg_exp2.to_string(), "0.5");

    let pos_neg_exp3 = DecimalFraction::new(-3, 5);
    assert_eq!(pos_neg_exp3.to_string(), "0.005");

    // Test negative values with negative exponent
    let neg_neg_exp1 = DecimalFraction::new(-2, -123);
    assert_eq!(neg_neg_exp1.to_string(), "-1.23");

    let neg_neg_exp2 = DecimalFraction::new(-3, -5);
    assert_eq!(neg_neg_exp2.to_string(), "-0.005");

    // Test boundary cases
    let boundary1 = DecimalFraction::new(-9, 123456789);
    assert_eq!(boundary1.to_string(), "0.123456789");

    let boundary2 = DecimalFraction::new(-1, 1);
    assert_eq!(boundary2.to_string(), "0.1");
}

Now we can create a DecimalFraction and convert it to CBOR, showing the diagnostic notation:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

Because conversion from CBOR to a given type can fail, we implement the TryFrom<CBOR> trait for our DecimalFraction type:

use dcbor::prelude::*;

use crate::tags::TAG_DECIMAL_FRACTION;

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct DecimalFraction {
    pub exponent: i8,
    pub mantissa: i64,
}

impl DecimalFraction {
    /// Create a new `DecimalFraction` from raw parts.
    pub fn new(exponent: i8, mantissa: i64) -> Self {
        Self { exponent, mantissa }
    }

    /// Convert back to `f64`. May lose precision on large exponents.
    pub fn to_f64(self) -> f64 {
        (self.mantissa as f64) * (10f64).powi(self.exponent as i32)
    }
}

impl std::fmt::Display for DecimalFraction {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        if self.mantissa == 0 {
            return write!(f, "0");
        }

        let abs_value = self.mantissa.abs();
        let is_negative = self.mantissa < 0;
        let prefix = if is_negative { "-" } else { "" };

        if self.exponent >= 0 {
            // For positive exponent, add zeros after the number
            write!(f, "{}{}{}", prefix, abs_value, "0".repeat(self.exponent as usize))
        } else {
            // For negative exponent, insert decimal point
            let abs_exp = -self.exponent as usize;
            let value_str = abs_value.to_string();

            if value_str.len() <= abs_exp {
                // Decimal point at the beginning with possible leading zeros
                let padding = abs_exp - value_str.len();
                write!(f, "{}0.{}{}", prefix, "0".repeat(padding), value_str)
            } else {
                // Insert decimal point within the number
                let decimal_pos = value_str.len() - abs_exp;
                let (integer_part, fractional_part) = value_str.split_at(decimal_pos);
                write!(f, "{}{}.{}", prefix, integer_part, fractional_part)
            }
        }
    }
}

impl From<DecimalFraction> for CBOR {
    fn from(value: DecimalFraction) -> Self {
        // Compose the two-element array
        let v = vec![value.exponent as i64, value.mantissa].to_cbor();

        // Return the tagged array
        CBOR::to_tagged_value(TAG_DECIMAL_FRACTION, v)
    }
}

impl TryFrom<CBOR> for DecimalFraction {
    type Error = dcbor::Error;

    fn try_from(cbor: CBOR) -> Result<Self, Self::Error> {
        // Decode the tagged array
        let item = cbor.try_into_expected_tagged_value(TAG_DECIMAL_FRACTION)?;

        // Convert the item to an array
        let arr = item.try_into_array()?;

        // Validate the length of the array
        if arr.len() != 2 {
            return Err("Expected a two-element array".into());
        }

        // Extract the exponent and mantissa
        let exponent: i8 = arr[0].clone().try_into()?;
        let mantissa: i64 = arr[1].clone().try_into()?;

        // Return the DecimalFraction
        Ok(DecimalFraction::new(exponent, mantissa))
    }
}

#[test]
fn decimal_fraction() {
    let a = DecimalFraction::new(-1, 11);
    assert_eq!(a.mantissa, 11);
    assert_eq!(a.exponent, -1);
    assert!((a.to_f64() - 1.1).abs() < f64::EPSILON);

    let b = DecimalFraction::new(-2, 101);
    assert_eq!(b.mantissa, 101);
    assert_eq!(b.exponent, -2);
    assert!((b.to_f64() - 1.01).abs() < f64::EPSILON);
}

#[test]
fn decimal_fraction_display() {
    // Test zero
    let zero = DecimalFraction::new(0, 0);
    assert_eq!(zero.to_string(), "0");

    // Test positive value with zero exponent
    let simple = DecimalFraction::new(0, 42);
    assert_eq!(simple.to_string(), "42");

    // Test positive values with positive exponent
    let pos_exp1 = DecimalFraction::new(2, 5);
    assert_eq!(pos_exp1.to_string(), "500");

    let pos_exp2 = DecimalFraction::new(3, 123);
    assert_eq!(pos_exp2.to_string(), "123000");

    // Test negative values with positive exponent
    let neg_pos_exp = DecimalFraction::new(1, -42);
    assert_eq!(neg_pos_exp.to_string(), "-420");

    // Test positive values with negative exponent
    let pos_neg_exp1 = DecimalFraction::new(-2, 123);
    assert_eq!(pos_neg_exp1.to_string(), "1.23");

    let pos_neg_exp2 = DecimalFraction::new(-1, 5);
    assert_eq!(pos_neg_exp2.to_string(), "0.5");

    let pos_neg_exp3 = DecimalFraction::new(-3, 5);
    assert_eq!(pos_neg_exp3.to_string(), "0.005");

    // Test negative values with negative exponent
    let neg_neg_exp1 = DecimalFraction::new(-2, -123);
    assert_eq!(neg_neg_exp1.to_string(), "-1.23");

    let neg_neg_exp2 = DecimalFraction::new(-3, -5);
    assert_eq!(neg_neg_exp2.to_string(), "-0.005");

    // Test boundary cases
    let boundary1 = DecimalFraction::new(-9, 123456789);
    assert_eq!(boundary1.to_string(), "0.123456789");

    let boundary2 = DecimalFraction::new(-1, 1);
    assert_eq!(boundary2.to_string(), "0.1");
}

Now we can round-trip our tagged value, converting it to CBOR and back to a DecimalFraction:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

Implementing a Tagged String

We used a tagged string for our currency code, but we can also define a CurrencyCode type using the newtype pattern. This is a common Rust idiom for creating a new type that wraps an existing type, like String, and provides additional functionality. In this case, the additional functionality is to implement From<CurrencyCode> for CBOR and TryFrom<CBOR> for CurrencyCode.

use dcbor::prelude::*;

use crate::TAG_CURRENCY_CODE;

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct CurrencyCode(String);

impl CurrencyCode {
    pub fn new(code: &str) -> Self {
        Self(code.into())
    }

    pub fn code(&self) -> &str {
        &self.0
    }
}

impl From<CurrencyCode> for CBOR {
    fn from(value: CurrencyCode) -> Self {
        CBOR::to_tagged_value(TAG_CURRENCY_CODE, value.0)
    }
}

impl TryFrom<CBOR> for CurrencyCode {
    type Error = dcbor::Error;

    fn try_from(cbor: CBOR) -> Result<Self, Self::Error> {
        let value = cbor.try_into_expected_tagged_value(TAG_CURRENCY_CODE)?;
        let currency_code: String = value.try_into()?;
        Ok(CurrencyCode(currency_code))
    }
}

impl std::fmt::Display for CurrencyCode {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{}", self.0)
    }
}

Now we can round-trip our CurrencyCode the same way we did with DecimalFraction:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

Combining the Two Types

Originally we set out to create a structure that combined a currency code with a decimal fraction: CurrencyAmount. We'd also like this structure to have its own tag, so we'll use 33001, which is also unassigned by IANA as of this writing.

const TAG_CURRENCY_AMOUNT: u64 = 33001;

Now that we have completely reusable constituents, we can define CurrencyAmount as a type that consists of a CurrencyCode and a DecimalFraction.

use dcbor::prelude::*;
use crate::{ CurrencyCode, DecimalFraction, TAG_CURRENCY_AMOUNT };

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct CurrencyAmount(CurrencyCode, DecimalFraction);

impl CurrencyAmount {
    pub fn new(currency: CurrencyCode, amount: DecimalFraction) -> Self {
        Self(currency, amount)
    }

    pub fn currency(&self) -> &CurrencyCode {
        &self.0
    }

    pub fn amount(&self) -> &DecimalFraction {
        &self.1
    }
}

impl From<CurrencyAmount> for CBOR {
    fn from(value: CurrencyAmount) -> Self {
        let v = vec![value.currency().to_cbor(), value.amount().to_cbor()].to_cbor();
        CBOR::to_tagged_value(TAG_CURRENCY_AMOUNT, v)
    }
}

impl TryFrom<CBOR> for CurrencyAmount {
    type Error = dcbor::Error;

    fn try_from(cbor: CBOR) -> Result<Self, Self::Error> {
        let item = cbor.try_into_expected_tagged_value(TAG_CURRENCY_AMOUNT)?;
        let arr = item.try_into_array()?;

        if arr.len() != 2 {
            return Err("Expected a two-element array".into());
        }

        let currency: CurrencyCode = arr[0].clone().try_into()?;
        let amount: DecimalFraction = arr[1].clone().try_into()?;

        Ok(CurrencyAmount(currency, amount))
    }
}

impl std::fmt::Display for CurrencyAmount {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{} {}", self.currency(), self.amount())
    }
}

Notice that in the above example, we're able to call the to_cbor() method on the CurrencyCode and DecimalFraction types, because the dcbor library includes a blanket implementation for another trait called CBOREncodable, which automatically applies to any type that implements Into<CBOR> and Clone. (We implemented From<CurrencyCode> for CBOR and From<DecimalFraction> for CBOR which also implicitly implement the Into<CBOR> trait, so we get the CBOREncodable trait for free.)

The CBOREncodable trait gives us the to_cbor() method, which can be called on a &self (reference to self) unlike the into() method, which consumes the value. It also gives us the to_cbor_data() method, which returns the final, serialized CBOR data as a Vec<u8>.

This use of blanket implementations is a common Rust idiom, similar to how types that implement the Display trait automatically implement the ToString trait and hence gain the to_string() method.

Now with all the pieces in place, we can do a full round-trip of our CurrencyAmount type:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn currency_amount_no_names() -> Result<()> {
// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation
let expected_diagnostic = r#"
33001(
    [
        33000("USD"),
        4(
            [-1, 11]
        )
    ]
)
"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data
let expected_hex = r#"

d9 80e9                 # tag(33001)
    82                  # array(2)
        d9 80e8         # tag(33000)
            63          # text(3)
                555344  # "USD"
        c4              # tag(4)
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

Named Tags

As mentioned, a CBOR tag is just an integer, and that integer is all that is ever serialized to the binary stream. But the dcbor library allows you to associate a human-readable name with a tag, which can be useful for debugging and documentation. The dcbor library provides a macro for defining compile-time constants for tags and their names:

use dcbor::prelude::*;

const_cbor_tag!(4, DECIMAL_FRACTION, "DecimalFraction");
const_cbor_tag!(33000, CURRENCY_CODE, "CurrencyCode");
const_cbor_tag!(33001, CURRENCY_AMOUNT, "CurrencyAmount");

pub fn register_tags() {
    with_tags_mut!(|tags_store: &mut TagsStore| {
        tags_store.insert_all(vec![
            cbor_tag!(DECIMAL_FRACTION),
            cbor_tag!(CURRENCY_CODE),
            cbor_tag!(CURRENCY_AMOUNT),
        ]);
    });
}

These macro invocations are a concise equivalent to the following code:

const TAG_DECIMAL_FRACTION: u64 = 4;
const TAG_NAME_DECIMAL_FRACTION: &str = "DecimalFraction";

const TAG_CURRENCY_CODE: u64 = 33000;
const TAG_NAME_CURRENCY_CODE: &str = "CurrencyCode";

const TAG_CURRENCY_AMOUNT: u64 = 33001;
const TAG_NAME_CURRENCY_AMOUNT: &str = "CurrencyAmount";

To make these names available to runtime calls like CBOR::diagnostic_annotated and CBOR::hex_annotated, we need to register them once at the start of our program:

use dcbor::prelude::*;

const_cbor_tag!(4, DECIMAL_FRACTION, "DecimalFraction");
const_cbor_tag!(33000, CURRENCY_CODE, "CurrencyCode");
const_cbor_tag!(33001, CURRENCY_AMOUNT, "CurrencyAmount");

pub fn register_tags() {
    with_tags_mut!(|tags_store: &mut TagsStore| {
        tags_store.insert_all(vec![
            cbor_tag!(DECIMAL_FRACTION),
            cbor_tag!(CURRENCY_CODE),
            cbor_tag!(CURRENCY_AMOUNT),
        ]);
    });
}

The cbor_tag! macro is actually doing the work of creating the Tag instances for us, using the same naming convention as the constants defined using the const_cbor_tag! macro. The with_tags_mut! macro provides writable, thread-safe access to the global tag registry.

Here's the same example as before, but calling register_tags() at the start of the program. Now both output formats include the human-readable names for the tags:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

The `Debug` and `Display` implementations on `CBOR`

You've been learning about calls like CBOR::diagnostic_annotated() and CBOR::hex_annotated(), which are used to print the CBOR data in a human-readable format, and CBOR::to_cbor_data(), which returns the raw CBOR data as a Vec<u8>.

These methods are useful for debugging (and of course serializing your CBOR), but they are not the same as the Debug and Display traits, and it's also important to understand the difference between how these trait outputs are formatted on your original structures, versus how they are formatted on the CBOR type:

use cbor_book::*;
use anyhow::Result;
use dcbor::prelude::*;

#[test]
#[rustfmt::skip]
fn example_2() -> Result<()> {

let usd = CBOR::to_tagged_value(TAG_CURRENCY_CODE, "USD");

let (tag, item) = usd.clone().try_into_tagged_value()?;
assert_eq!(tag.value(), TAG_CURRENCY_CODE);
assert_eq!(item.try_into_text()?, "USD");

let diagnostic = usd.diagnostic();
let expected_diagnostic = r#"

33000("USD")

"#.trim();

assert_eq!(diagnostic, expected_diagnostic);

let item = usd
    .try_into_expected_tagged_value(TAG_CURRENCY_CODE)?
    .try_into_text()?;
assert_eq!(item, "USD");

Ok(())
}
// 33000("USD")

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor() {
let a = DecimalFraction::new(-1, 11);
let cbor = a.to_cbor();
assert_eq!(cbor.diagnostic(), r#"

4(
    [-1, 11]
)

"#.trim());
}

#[test]
#[rustfmt::skip]
fn decimal_fraction_cbor_roundtrip() -> Result<()> {
// Create a DecimalFraction
let a = DecimalFraction::new(-1, 11);
assert_eq!(a.to_string(), "1.1");

// Convert to CBOR
let cbor = a.clone().to_cbor();

// Convert back to DecimalFraction
let b: DecimalFraction = cbor.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(a, b);
Ok(())
}


#[test]
#[rustfmt::skip]
fn currency_code_cbor() -> Result<()> {
let usd = CurrencyCode::new("USD");
let cbor = usd.to_cbor();
let usd2: CurrencyCode = cbor.try_into()?;
assert_eq!(usd, usd2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn currency_amount_cbor_named() -> Result<()> {
// Register our tags first thing
register_tags();

// Create a CurrencyAmount
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);
assert_eq!(currency_amount.to_string(), "USD 1.1");

// Convert to CBOR
let cbor = currency_amount.to_cbor();

// Check the diagnostic notation, now with named tags
let expected_diagnostic = r#"

33001(   / CurrencyAmount /
    [
        33000("USD"),   / CurrencyCode /
        4(   / DecimalFraction /
            [-1, 11]
        )
    ]
)

"#.trim();
assert_eq!(cbor.diagnostic_annotated(), expected_diagnostic);

// Convert to binary CBOR data
let data = cbor.to_cbor_data();

// Check the hex representation of the binary data, now with named tags
let expected_hex = r#"

d9 80e9                 # tag(33001) CurrencyAmount
    82                  # array(2)
        d9 80e8         # tag(33000) CurrencyCode
            63          # text(3)
                555344  # "USD"
        c4              # tag(4) DecimalFraction
            82          # array(2)
                20      # negative(-1)
                0b      # unsigned(11)

"#.trim();
assert_eq!(cbor.hex_annotated(), expected_hex);

// Convert back to CBOR
let cbor2 = CBOR::try_from_data(data)?;

// Convert back to CurrencyAmount
let currency_amount2: CurrencyAmount = cbor2.try_into()?;

// Check that the original and round-tripped values are equal
assert_eq!(currency_amount, currency_amount2);
Ok(())
}

#[test]
#[rustfmt::skip]
fn debug_and_display_formats() -> Result<()> {
let currency_amount = CurrencyAmount::new(
    CurrencyCode::new("USD"),
    DecimalFraction::new(-1, 11)
);

//
// Using the `Debug` implementation on `CurrencyAmount`
//
let expected_debug = r#"

CurrencyAmount(CurrencyCode("USD"), DecimalFraction { exponent: -1, mantissa: 11 })

"#.trim();
assert_eq!(format!("{:?}", currency_amount), expected_debug);

//
// Using the `Display` implementation on `CurrencyAmount`
//
let expected_display = r#"

USD 1.1

"#.trim();
assert_eq!(format!("{}", currency_amount), expected_display);

let cbor = currency_amount.to_cbor();

//
// Using the `Debug` implementation on `CBOR`
//
let expected_debug_cbor = r#"

tagged(33001, array([tagged(33000, text("USD")), tagged(4, array([negative(-1), unsigned(11)]))]))

"#.trim();
assert_eq!(format!("{:?}", cbor), expected_debug_cbor);

//
// Using the `Display` implementation on `CBOR`
//
let expected_display_cbor = r#"

33001([33000("USD"), 4([-1, 11])])

"#.trim();
assert_eq!(format!("{}", cbor), expected_display_cbor);

Ok(())
}

The Debug trait on CurrencyAmount is just the default Debug implementation for a struct, which prints the field names and values in a human-readable format.
The Display trait on CurrencyAmount is a custom implementation that formats the value as a string with the currency code and amount.
The Debug trait on CBOR is a nested symbolic representation of the CBOR major types and values.
The Display trait on CBOR is the same as would be returned by CBOR::diagnostic_flat(), which is valid diagnostic notation all on one line.

Each of these formats is useful in its own way, so knowing when to use each one will help you get the most out of the dcbor library.

The `dcbor` Command Line Tool

The dcbor tool is an easy to use tool facilitates encoding and decoding dCBOR data from the command line or scripts.

Tip

This chapter assumes some familiarity with the command line, also called the shell. If you're not comfortable using the shell, you may want to refer to the Command Line Basics course on CodeAcademy or similar resources.

Installation

To install the dcbor CLI tool, you need to have Rust and cargo installed on your system. If you don't have them installed, you can follow the instructions on the Rust website to get started.

Once you have Rust and Cargo installed, you can install the dcbor tool by running the following command:

cargo install dcbor-cli

You can verify that the installation was successful by running:

dcbor --version

You can get help on how to use the dcbor tool by running:

dcbor --help

This will display the available options and usage information.

Getting Started

The basic function of the dcbor tool is to transform CBOR between various formats and validate it against the deterministic encoding rules.

The three input formats are:

diag - CBOR diagnostic notation
hex - hexadecimal encoding
bin - binary encoding— this is CBOR's native format, but not human-readable

The three output formats are:

diag - CBOR diagnostic notation
hex - hexadecimal encoding
bin - binary encoding— this is CBOR's native format, but not human-readable
none - no output, just validate the input

The default input format is diag, and the default output format is hex. The hex format is useful for sending to other tools like the envelope command line tool we'll discuss in Part III. So by default you put in some human-readable diagnostic notation and get out some CBOR hex:

dcbor 42

│ 182a

│ 182a

In the above example, 42 is the string input, and 182a is the hexadecimal output.

Here's an example with a floating-point number:

dcbor 3.14

│ fb40091eb851eb851f

│ fb40091eb851eb851f

Quoting Input

For simple cases like the ones above, you don't need to do anything special with quoting the input. But in many cases you'll need to understand how the shell's use of quotes interacts with CBOR diagnostic notation, because the shell and CBOR diagnostic notation both use single and double quotes for their own purposes.

For example, in CBOR diagnostic notation, a string is quoted with "double quotes", so for the dcbor tool to recognize it as a string, you need to include the double quotes in the input. But in the shell, double quotes are also used to group a sequence of characters into an argument. So the following command has two arguments:

ls "First File" "Second File"

The shell ls command will see two arguments, without the quotes. The first argument is First File, and the second argument is Second File. If you passed the command without quotes, the shell would see four arguments:

ls First File Second File

So back to dcbor, if you naïvely run this command you'll get an error:

dcbor "Hello"

│ Error: line 1: Unrecognized token
│ Hello
│ ^

This is because the shell is interpreting the double quotes as its own argument grouping syntax, and strips them off, even though dcbor still needs them. To get around this, you can use single quotes to quote the entire argument:

dcbor '"Hello"'

│ 6548656c6c6f

│ 6548656c6c6f

The shell can use either single or double quotes to group arguments.

Another option is to escape the inner double quotes with a backslash:

dcbor "\"Hello\""

│ 6548656c6c6f

│ 6548656c6c6f

But the most general and flexible way is to use the here document ("heredoc") shell feature. This allows you to pass a block of text to the dcbor tool without worrying about quoting:

dcbor <<EOF
"Hello"
EOF

│ 6548656c6c6f

│ 6548656c6c6f

Notice the first line uses << to indicate the start of a heredoc, which is immediately followed by the delimiter EOF. The last line is the same delimiter, which indicates the end of the here document. You can use any string as a delimiter, but EOF is a common convention. The here document can also be used to pass as many lines of input as you want. Here is an example of an array of strings spread across multiple lines:

dcbor <<EOF
[
  "Hello",
  "World"
]
EOF

│ 826548656c6c6f65576f726c64

│ 826548656c6c6f65576f726c64

Comments

CBOR diagnostic notation supports two different types of comments: inline comments delimited by / and end-of-line comments delimited by #. The dcbor tool ignores comments in the input, so you can use them freely to annotate your input. Here is an example of both types of comments:

dcbor <<EOF
[ # Start of array
  / First element / "Hello",
  / Second element / "World"
] # End of array
EOF

│ 826548656c6c6f65576f726c64

│ 826548656c6c6f65576f726c64

This example produces the same output as the previous example, but it includes comments to explain what each part of the input is doing.

Round-Trip Conversion

Now that we have an example of hex-encoded CBOR, let's see how to convert it back to diagnostic notation. The dcbor tool can do this by specifying the input format as hex and the output format as diag:

dcbor -i hex -o diag 826548656c6c6f65576f726c64

│ ["Hello", "World"]

│ ["Hello", "World"]

Tip

The -i and -o flags are short for --in and --out, respectively. You can use the full names if you prefer, but the short names are more convenient for quick commands.

Warning

CBOR diagnostic notation looks a lot like JSON, but as its name suggests, it is for use during development and debugging, while CBOR is the actual binary encoding format. Diagnostic notation is not optimized for size or speed, and may not be perfectly compatible across implementations. So don't use diagnostic notation in production code.

Supported Data Types

The dcbor tool parses a variety of data types, including the primitives defined in the CBOR specification. It also supports specialized types like Uniform Resources (URs) and Known Values that can be quite useful when working with Gordian Envelope, which we'll discuss in Part III. Here is a summary of the supported data types:

Type	Example
Boolean	`truetrue` `falsefalse`
Null	`nullnull`
Integers	`00` `11` `-1-1` `4242`
Floats	`3.143.14` `-2.5-2.5` `InfinityInfinity` `-Infinity-Infinity` `NaNNaN`
Strings	`"hello""hello"` `"🌎""🌎"`
Hex Byte Strings	`h'68656c6c6f'h'68656c6c6f'`
Base64 Byte Strings	`b64'AQIDBAUGBwgJCg=='b64'AQIDBAUGBwgJCg=='`
Tagged Values	`1234("hello")1234("hello")` `5678(3.14)5678(3.14)`
Name-Tagged Values	`tag-name("hello")tag-name("hello")` `tag-name(3.14)tag-name(3.14)`
Known Values	`'1''1'` `'isA''isA'`
Unit Known Value	`UnitUnit` `''''` `'0''0'`
URs	`ur:date/cyisdadmlasgtapttlur:date/cyisdadmlasgtapttl`
Arrays	`[1, 2, 3][1, 2, 3]` `["hello", "world"]["hello", "world"]` `[1, [2, 3]][1, [2, 3]]`
Maps	`{1: 2, 3: 4}{1: 2, 3: 4}` `{"key": "value"}{"key": "value"}` `{1: [2, 3], 4: 5}{1: [2, 3], 4: 5}`

Note these are input formats, and not all of them will round-trip to the same output format. For example, the b64 format can be used to convert existing Base64-encoded data to CBOR, but when output back to diagnostic notation, it will be converted to a h format hex byte string.

The ur format is a URI encoding of tagged CBOR, and the dcbor tool will automatically convert URs it knows about to the appropriate tagged CBOR format. On output, tagged CBOR will appear in the usual integer(item) format.

The dcbor tool has built-in libraries of name-number correspondences for:

Where it can, it will accept these names in place of numeric values.

Note

Currently there is no way to define your own names for tagged values or known values within the dcbor tool.

Tagged values could definitely benefit from a feature where users can define name-tag correspondences.

Known values, by definition, are all publicly defined in the registry linked to above, so if you want to define some named known values, then registration information is also in the link above.

Validating Input

Every time the dcbor tool is run, it validates the input as valid CBOR, and the dCBOR deterministic encoding rules. If the input is valid, it will produce the output in the specified format. If the input is not valid, it will produce an error message. For example, if we change the first byte of the hex-encoded CBOR from the example above to 0x83 (representing a CBOR array of three elements), we will get an error:

dcbor -i hex -o diag 836548656c6c6f65576f726c64

│ Error: early end of CBOR data

This error message indicates that the input expected a third array element, but it reached the end of the input before finding it. We can "hack" the input to make it valid by adding a third element. Let's just add a 0 byte to the end of the array:

dcbor -i hex -o diag 836548656c6c6f65576f726c6400

│ ["Hello", "World", 0]

│ ["Hello", "World", 0]

This is a valid CBOR array of three elements, and the dcbor tool produces the expected output.

dCBOR does not allow duplicate map keys:

dcbor <<EOF
{ "Hello": "World", "Hello": "CBOR" }
EOF

│ Error: line 1: Duplicate map key
│ { "Hello": "World", "Hello": "CBOR" }
│                     ^^^^^^^

dCBOR does not distinguish between integer and floating point numbers, so this is also a case of duplicate keys:

dcbor <<EOF
{ 42: "Forty-Two", 42.0: "Forty-Two Float" }
EOF

│ Error: line 1: Duplicate map key
│ { 42: "Forty-Two", 42.0: "Forty-Two Float" }
│                    ^^^^

Annotated Output

When producing either hex or diagnostic notation output, the --annotate flag can be used to produce output that is even more human-readable. Let's define a shell variable with a hex-encoded CBOR structure, a cryptographic seed:

CBOR_SEED=d99d6ca4015059f2293a5bce7d4de59e71b4207ac5d202c11a6035970003754461726b20507572706c652041717561204c6f766504787b4c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696e6720656c69742c2073656420646f20656975736d6f642074656d706f7220696e6369646964756e74207574206c61626f726520657420646f6c6f7265206d61676e6120616c697175612e

Now we can pass this variable to the dcbor by just referring to it as $CBOR_SEED:

dcbor --in hex --out diag $CBOR_SEED

│ 40300({1: h'59f2293a5bce7d4de59e71b4207ac5d2', 2: 1(1614124800), 3: "Dark Purple Aqua Love", 4: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."})

│ 40300({1: h'59f2293a5bce7d4de59e71b4207ac5d2', 2: 1(1614124800), 3: "Dark Purple Aqua Love", 4: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."})

The output is "flat" diagnostic notation, entirely on one line. This is useful for passing to other tools, but not very human-readable. The --annotate flag will produce a more readable output:

dcbor --in hex --out diag --annotate $CBOR_SEED

│ 40300(   / seed /
│     {
│         1:
│         h'59f2293a5bce7d4de59e71b4207ac5d2',
│         2:
│         1(1614124800),   / date /
│         3:
│         "Dark Purple Aqua Love",
│         4:
│         "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
│     }
│ )

│ 40300(   / seed /
│     {
│         1:
│         h'59f2293a5bce7d4de59e71b4207ac5d2',
│         2:
│         1(1614124800),   / date /
│         3:
│         "Dark Purple Aqua Love",
│         4:
│         "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
│     }
│ )

Now it's much easier to see that the entire structure is a map with four keys: the first key is a byte string, the second key is a tagged number representing a date, and so on. The entire structure is tagged with 40300, which is a registered CBOR tag for a cryptographic seed in this format. You can see the definition of this format in the Blockchain Commons Registry of UR Types.

The hex output format also supports the --annotate flag:

dcbor --in hex --annotate $CBOR_SEED

│ d9 9d6c                                 # tag(40300) seed
│     a4                                  # map(4)
│         01                              # unsigned(1)
│         50                              # bytes(16)
│             59f2293a5bce7d4de59e71b4207ac5d2
│         02                              # unsigned(2)
│         c1                              # tag(1) date
│             1a60359700                  # unsigned(1614124800)
│         03                              # unsigned(3)
│         75                              # text(21)
│             4461726b20507572706c652041717561204c6f7665 # "Dark Purple Aqua Love"
│         04                              # unsigned(4)
│         78 7b                           # text(123)
│             4c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696e6720656c69742c2073656420646f20656975736d6f642074656d706f7220696e6369646964756e74207574206c61626f726520657420646f6c6f7265206d61676e6120616c697175612e # "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."

│ d9 9d6c                                 # tag(40300) seed
│     a4                                  # map(4)
│         01                              # unsigned(1)
│         50                              # bytes(16)
│             59f2293a5bce7d4de59e71b4207ac5d2
│         02                              # unsigned(2)
│         c1                              # tag(1) date
│             1a60359700                  # unsigned(1614124800)
│         03                              # unsigned(3)
│         75                              # text(21)
│             4461726b20507572706c652041717561204c6f7665 # "Dark Purple Aqua Love"
│         04                              # unsigned(4)
│         78 7b                           # text(123)
│             4c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696e6720656c69742c2073656420646f20656975736d6f642074656d706f7220696e6369646964756e74207574206c61626f726520657420646f6c6f7265206d61676e6120616c697175612e # "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."

The annotated hex output also uses indentation and comments to make its structure and semantics more clear.

Recall that the default input format is diag, and the default output format is hex. So in this case we're using --in to specify the input format as hex, and leaving the output format as the default hex, but adding the --annotate flag to produce the annotated output.

Binary Input and Output

CBOR is, after all, a binary format, and bin can be used with --in or --out to specify binary input or output:

# Write the binary to a file using stdout.
dcbor --in hex --out bin $CBOR_SEED >test.bin

# Show the first 48 bytes of the binary file.
hexdump -C test.bin | head -n 3

│ 00000000  d9 9d 6c a4 01 50 59 f2  29 3a 5b ce 7d 4d e5 9e  |..l..PY.):[.}M..|
│ 00000010  71 b4 20 7a c5 d2 02 c1  1a 60 35 97 00 03 75 44  |q. z.....`5...uD|
│ 00000020  61 72 6b 20 50 75 72 70  6c 65 20 41 71 75 61 20  |ark Purple Aqua |

# The file is exactly 178 bytes long.
wc -c < test.bin | xargs

│ 178

# Read it back in from the file.
dcbor --in bin <test.bin

│ d99d6ca4015059f2293a5bce7d4de59e71b4207ac5d202c11a6035970003754461726b20507572706c652041717561204c6f766504787b4c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696e6720656c69742c2073656420646f20656975736d6f642074656d706f7220696e6369646964756e74207574206c61626f726520657420646f6c6f7265206d61676e6120616c697175612e

│ d99d6ca4015059f2293a5bce7d4de59e71b4207ac5d202c11a6035970003754461726b20507572706c652041717561204c6f766504787b4c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696e6720656c69742c2073656420646f20656975736d6f642074656d706f7220696e6369646964756e74207574206c61626f726520657420646f6c6f7265206d61676e6120616c697175612e

# Read it in again, but with output in diagnostic notation.
dcbor --in bin --out diag <test.bin

│ 40300({1: h'59f2293a5bce7d4de59e71b4207ac5d2', 2: 1(1614124800), 3: "Dark Purple Aqua Love", 4: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."})

│ 40300({1: h'59f2293a5bce7d4de59e71b4207ac5d2', 2: 1(1614124800), 3: "Dark Purple Aqua Love", 4: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."})

# Clean up the test file.
rm test.bin

A More Complex Example

In a previous chapter we saw an example of a CBOR structure in diagnostic notation that used a number of different tags and complex types like arrays and maps. Let's see how to encode that structure using the dcbor tool:

dcbor <<EOF
{
  "sensorID": 37(h'f81d4fae7dec11d0a76500a0c91e6bf6'),    # Tag 37 for UUID
  "captureTime": 0("2023-10-27T14:30:15.123Z"),           # Tag 0 for RFC3339 string
  "reading": 4([-3, -12345]),                             # Tag 4 for decimal fraction
  "statusURL": 32("https://example.com/status/f81d4fae"), # Tag 32 for URI
  "alertPayload": h'01020304'                             # Direct byte string
}
EOF

│ a56772656164696e67c482223930386873656e736f724944d82550f81d4fae7dec11d0a76500a0c91e6bf66973746174757355524cd820782368747470733a2f2f6578616d706c652e636f6d2f7374617475732f66383164346661656b6361707475726554696d65c07818323032332d31302d32375431343a33303a31352e3132335a6c616c6572745061796c6f61644401020304

│ a56772656164696e67c482223930386873656e736f724944d82550f81d4fae7dec11d0a76500a0c91e6bf66973746174757355524cd820782368747470733a2f2f6578616d706c652e636f6d2f7374617475732f66383164346661656b6361707475726554696d65c07818323032332d31302d32375431343a33303a31352e3132335a6c616c6572745061796c6f61644401020304

Composing Arrays and Maps

We've seen how to compose arrays and maps in CBOR diagnostic notation:

# Compose an array of strings:
dcbor <<EOF
[ "Hello", "World" ]
EOF

│ 826548656c6c6f65576f726c64

│ 826548656c6c6f65576f726c64

# Compose a map with string keys and values:
dcbor <<EOF
{ "Hello": "World" }
EOF

│ a16548656c6c6f65576f726c64

│ a16548656c6c6f65576f726c64

When working with shell scripts, you can interpolate shell variable into the input. For example, let's define a shell variable with a string value:

HELLO="Hello"
dcbor <<EOF
[ "$HELLO", "World" ]
EOF

│ 826548656c6c6f65576f726c64

│ 826548656c6c6f65576f726c64

This isn't bad when you know the number of elements in the array, but what if you have a variable number of elements? Here's a little script that generates the first 10 Fibonacci numbers and puts them in an array:

FIB=($(awk 'BEGIN{a=1;b=1;for(i=0;i<10;i++){printf "%d ",a; t=a;a=b;b=t+b}}'))
echo "${FIB[@]}"

│ 1 1 2 3 5 8 13 21 34 55

To get this into a CBOR array, we can use the array subcommand of the dcbor tool:

FIB_DIAG=`dcbor array --out diag "${FIB[@]}"`
echo $FIB_DIAG

│ [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

│ [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

The array subcommand takes all its arguments, which can be any diagnostic notation (not just atomic values like the numbers we're using here), and produces a CBOR array so you don't have to muck about with brackets and commas.

Now let's say you want to use $FIB_DIAG in a CBOR map. You can use the map subcommand to do this:

FIB_MAP=`dcbor map --annotate --out diag \
    '"name"' '"Fibonacci Numbers"' \
    '"value"' "$FIB_DIAG"`
echo $FIB_MAP

│ {
│     "name":
│     "Fibonacci Numbers",
│     "value":
│     [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
│ }

│ {
│     "name":
│     "Fibonacci Numbers",
│     "value":
│     [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
│ }

The map subcommand takes a list alternating keys and values. Note how we're being careful to quote the keys and values, with single-quotes being used for strings (so they don't get interpreted by the shell) and double-quotes used for the $FIB_DIAG variable (so it gets interpolated).

When we've got our map composed the way we want it, we can serialize it to binary, or hex for use with the envelope tool we'll discuss in Part III:

dcbor --in diag --out hex $FIB_MAP

│ a2646e616d65714669626f6e61636369204e756d626572736576616c75658a0101020305080d1518221837

│ a2646e616d65714669626f6e61636369204e756d626572736576616c75658a0101020305080d1518221837

Matching Patterns with `dcbor` CLI

The dcbor CLI tool includes powerful pattern matching capabilities that allow you to search for, extract, and validate specific structures within dCBOR data. This chapter introduces the dcbor match subcommand, which leverages the comprehensive pattern expression (AKA "patex") syntax of the dcbor-pattern crate to enable sophisticated data analysis and extraction workflows.

Tip

This chapter builds on the foundation established in The dcbor Command Line Tool chapter. If you haven't read that chapter yet, we recommend doing so first to familiarize yourself with the basic dcbor CLI operations.

What is Pattern Matching?

Pattern matching in the context of dCBOR allows you to:

Find specific data structures within complex CBOR documents
Extract values that match certain criteria
Validate data conformance to expected patterns
Find the paths that lead to matching values within nested structures
Transform data by capturing and reformatting matches

The `dcbor match` Command

The basic syntax of the dcbor match command is:

dcbor match <PATTERN> [INPUT] [OPTIONS]

Where:

is a pattern expression (AKA "patex") written in dcbor-pattern expression syntax we'll explore in detail
[INPUT] is the dCBOR data to match against (or read from stdin)
[OPTIONS] control input/output formats and matching behavior

Pattern Syntax Reference

You can find a complete reference for the patex syntax in the dCBOR Expression Syntax Appendix. This appendix provides a quick reference for the patex syntax, including value patterns, structure patterns, and meta patterns we'll cover later.

Value Patterns

Value patterns are the foundation of dCBOR pattern matching. They allow you to match specific data types and exact values. Let's start with the most basic patterns and build up your understanding progressively.

Numbers

Recall that if you simply type:

dcbor 42

You get back the hex representation of the CBOR number 42:

│ 182a

│ 182a

If you want the CBOR diagnostic notation, you can use the --diag option:

dcbor -o diag 42

│ 42

│ 42

Note

In the examples in this chapter, the actual patex used is shown in its own block, and referred to in the command lines that follow it as $PATTERN. So when you see a block like this:

PATTERN=
number

PATTERN=
number

What we're hiding is that we really wrote this:

PATTERN=$(cat <<'EOF'
number
EOF
)

This little bit of heredoc awkwardness is the most reliable way to make sure everything in a pattern is assigned to a shell variable verbatim. For many patterns you won't need to use it yourself.

But if you do, now you know.

What if you have two pieces of CBOR data, and you want to check whether one of them is a number?

CBOR1=182a
CBOR2=6548656c6c6f

You can use the dcbor match command to check whether either of these is a number:

NUMBER=
number

NUMBER=
number

dcbor match $NUMBER -i hex $CBOR1

│ 42

│ 42

dcbor match $NUMBER -i hex $CBOR2

│ Error: No Match

│ Error: No Match

We can see that CBOR1 is the number 42, and CBOR2 is not a numeric value. So let's see whether it is a textual string by using the TEXT pattern:

TEXT=
text

TEXT=
text

dcbor match $TEXT -i hex $CBOR2

│ "Hello"

│ "Hello"

The pattern matches, and we can see it is the string "Hello".

The number pattern matches any numeric value, whether it's an integer or floating-point number:

NUMBER=
number

NUMBER=
number

dcbor match $NUMBER 42

│ 42

│ 42

dcbor match $NUMBER 3.14

│ 3.14

│ 3.14

Note

Numbers in CBOR can be positive or negative integers, or floating-point values.

Tip

To avoid confusion with command-line flags, you can use -- to separate the pattern from the input. -- signals that there are no command-line flags following it, allowing you to pass values that might otherwise be interpreted as flags. This is especially useful for negative numbers or special values like -Infinity.

NUMBER=
number

NUMBER=
number

dcbor match $NUMBER -- -1

│ -1

│ -1

Text Strings

As we demonstrated above, the text pattern matches any text string:

TEXT=
text

TEXT=
text

dcbor match $TEXT '"hello"'

│ "hello"

│ "hello"

dcbor match $TEXT '"🌎"'

│ "🌎"

│ "🌎"

Notice that when providing text strings as input to the CLI, you need to include the double-quotes as part of the dCBOR diagnostic notation. This is the same quoting consideration we discussed in the basic dcbor CLI chapter.

Byte Strings

The bstr pattern matches any byte string. Byte strings in CBOR are sequences of raw bytes, distinct from text strings which have UTF-8 character encoding semantics:

BSTR=
bstr

BSTR=
bstr

dcbor match $BSTR "h'68656c6c6f'"

│ h'68656c6c6f'

│ h'68656c6c6f'

The empty byte string is perfectly legal:

dcbor match $BSTR "h''"

│ h''

│ h''

Booleans and Null

The bool pattern matches both boolean values:

BOOL=
bool

BOOL=
bool

dcbor match $BOOL true

│ true

│ true

dcbor match $BOOL false

│ false

│ false

Note

Don't confuse the response falsefalse here as meaning that the pattern didn't match; it means that the input value was falsefalse, which is a valid match for the bool pattern.

The nullnull pattern matches CBOR's nullnull value:

NULL=
null

NULL=
null

dcbor match $NULL null

│ null

│ null

The Universal Pattern

The ** ("any") pattern matches any CBOR value whatsoever.

ANY=
*

ANY=
*

dcbor match $ANY 42

│ 42

│ 42

dcbor match $ANY '"hello"'

│ "hello"

│ "hello"

dcbor match $ANY "h'1234'"

│ h'1234'

│ h'1234'

** is useful when you want to match any value in a particular position within a larger structure.

Specific Value Matching

Beyond matching types, you can match exact values by providing the specific value as your pattern.

Specific Numbers

FORTY_TWO=
42

FORTY_TWO=
42

dcbor match $FORTY_TWO 42

│ 42

│ 42

This won't match because 43 ≠ 42:

dcbor match $FORTY_TWO 43

│ Error: No match

│ Error: No match

Specific Text Strings

HELLO=
"hello"

HELLO=
"hello"

dcbor match $HELLO '"hello"'

│ "hello"

│ "hello"

This won't match because the strings are different:

dcbor match $HELLO '"world"'

│ Error: No match

│ Error: No match

Specific Byte Strings

TWO_BYTES=
h'1234'

TWO_BYTES=
h'1234'

dcbor match $TWO_BYTES "h'1234'"

│ h'1234'

│ h'1234'

Specific Boolean Values

BOOL_TRUE=
true

BOOL_TRUE=
true

dcbor match $BOOL_TRUE true

│ true

│ true

This won't match because false ≠ true:

dcbor match $BOOL_TRUE false

│ Error: No match

│ Error: No match

Advanced Value Patterns

Beyond basic type and exact value matching, dCBOR patterns support sophisticated matching criteria including ranges for numbers and regular expressions for text and byte strings.

Number Ranges

Numbers can be matched using ranges and inequality operators, which is useful for validating data within acceptable bounds.

Range Matching

You can match numbers within a specific range using the ... syntax:

ONE_TO_TEN=
1...10

ONE_TO_TEN=
1...10

dcbor match $ONE_TO_TEN 5

│ 5

│ 5

dcbor match $ONE_TO_TEN 15

│ Error: No match

│ Error: No match

Note

The ... syntax is shorthand for an inclusive, or closed range, meaning it includes the start and end values in the range.

The same range of numbers can also be specified with a more complex syntax using the & operator, which we'll cover later.

ONE_TO_TEN=
>=1 & <=10

ONE_TO_TEN=
>=1 & <=10

dcbor match $ONE_TO_TEN 5

│ 5

│ 5

Inequality Operators

Numbers support various inequality operators. Quoting is important here to ensure the shell doesn't misinterpret the operators as command-line directives:

Greater than:

dcbor match ">5" 10

│ 10

│ 10

Greater than or equal to:

dcbor match ">=5" 5

│ 5

│ 5

Less than:

dcbor match "<10" 8

│ 8

│ 8

Less than or equal to:

dcbor match "<=10" 10

│ 10

│ 10

Half-Open Ranges

Using the & operator allows you to construct patterns that match half-open ranges (where one end is inclusive and the other is exclusive):

dcbor match ">1 & <=10" 10

│ 10

│ 10

dcbor match ">1 & <=10" 1

│ Error: No match

│ Error: No match

Special Number Values

You can also match three special floating-point values: NaN ("not a number"), Infinity, and -Infinity.

dcbor match "NaN" NaN

│ NaN

│ NaN

dcbor match "Infinity" Infinity

│ Infinity

│ Infinity

dcbor match -- "-Infinity" -Infinity

│ -Infinity

│ -Infinity

Note

Note the use of -- to signal the end of command-line options, allowing you to pass values that might otherwise be interpreted as flags.

Text Regular Expressions

Regular expressions (or regexes) are powerful pattern matching tools for text, allowing you to search for specific patterns rather than exact text. They use special characters and syntax to define search patterns. For instance, d+ matches one or more digits, [A-Z]+ matches one or more uppercase letters, and ^ and $ anchor patterns to the beginning and end of a string respectively. With regular expressions, you can validate formats, extract information, and perform sophisticated text processing operations.

dCBOR patexes that this chapter describes are based on some of the same concepts as regexes, but they are not the same. The dCBOR pattern expression syntax is designed specifically for matching CBOR data structures and values, while regular expressions are specifically for processing text. Nonetheless, some of the types you can match with dCBOR patterns, such as text strings and byte strings, can be matched using regular expressions.

Text strings can be matched using regular expressions, by using the a regex enclosed in forward slashes: /regex//regex/:

Match strings starting with "temp"

STARTS_WITH_TEMP=
/^temp/

STARTS_WITH_TEMP=
/^temp/

dcbor match $STARTS_WITH_TEMP '"temporary"'

│ "temporary"

│ "temporary"

This won't match because it doesn't start with "temp":

dcbor match $STARTS_WITH_TEMP '"permanent"'

│ Error: No match

│ Error: No match

Match any email-like pattern

EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/

EMAIL_ADDRESS=
/^[^@]+@[^@]+\.[^@]+$/

dcbor match $EMAIL_ADDRESS '"user@example.com"'

│ "user@example.com"

│ "user@example.com"

About Regular Expressions

Regular expressions use standard Rust regex syntax, which is based on Perl-compatible regular expressions (PCRE). This allows for complex pattern matching including:

Literal characters: /abc//abc/, /123//123/
Any character: /.//./
Character classes: /[a-z]//[a-z]/, /[0-9]//[0-9]/, /\d//\d/ (digit), /\w//\w/ (word character)
Quantifiers: /<pattern>*//<pattern>*/ (zero or more), /<pattern>+//<pattern>+/ (one or more), /<pattern>?//<pattern>?/ (zero or one), /<pattern>{n,m}//<pattern>{n,m}/ (between n and m times)
Anchors: /^<pattern>//^<pattern>/ (start), /<pattern>$//<pattern>$/ (end)
Groups: /(<pattern>)//(<pattern>)/
Alternation: /<pattern1>|<pattern2>//<pattern1>|<pattern2>/

Explaining the full syntax of regular expressions is beyond the scope of this book, but you can find more information on the specific Rust implementation in the Rust regex documentation.

Byte String Regular Expressions

Byte strings also support regular expression matching, useful for matching binary patterns or encoded data. Binary regexes operate on raw byte content, not on the hex string representation you see in diagnostic notation. The syntax is like h'<hex>'h'<hex>' above, but for regexes its: h'/<regex>/'h'/<regex>/'.

Flags for Binary Regexes

Binary regexes must start with the (?s-u) flags to work correctly:

(?s) enables "dot matches newline" mode, allowing . to match across newlines (like byte 0x0a)
(?-u) disables Unicode mode, allowing . to match any byte value instead of just valid UTF-8 sequences
Use x notation for specific byte values (e.g., xFF for byte 255)

Without these flags, patterns may fail on byte strings containing newlines or invalid UTF-8 sequences.

Match byte strings containing the byte `0xFF0xFF` anywhere

CONTAINS_FF=
h'/(?s-u).*\xFF.*/'

CONTAINS_FF=
h'/(?s-u).*\xFF.*/'

dcbor match $CONTAINS_FF "h'ff01020304'"

│ h'ff01020304'

│ h'ff01020304'

Match byte strings starting with specific bytes `01020102`

STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'

STARTS_WITH_0102=
h'/(?s-u)^\x01\x02/'

dcbor match $STARTS_WITH_0102 "h'01020304'"

│ h'01020304'

│ h'01020304'

Match byte strings ending with specific bytes

ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'

ENDS_WITH_0304=
h'/(?s-u)\x03\x04$/'

dcbor match $ENDS_WITH_0304 "h'01020304'"

│ h'01020304'

│ h'01020304'

Match any 4-byte sequence

ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'

ANY_FOUR_BYTES=
h'/(?s-u)^.{4}$/'

dcbor match $ANY_FOUR_BYTES "h'12345678'"

│ h'12345678'

│ h'12345678'

Practical Examples

These advanced patterns are particularly useful for data validation and extraction:

Validate that ages are reasonable (0-120)

dcbor match "0...120" 25

│ 25

│ 25

Extract valid email addresses from text

EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/

EMAIL_ADDRESS=
/^\w+@\w+\.\w+$/

dcbor match $EMAIL_ADDRESS '"john@example.com"'

│ "john@example.com"

│ "john@example.com"

Find numeric IDs above a threshold

dcbor match ">1000" 1001

│ 1001

│ 1001

Match ISO-8601 date-like strings

ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/

ISO_DATE=
/^\d{4}-\d{2}-\d{2}$/

dcbor match $ISO_DATE '"2023-12-25"'

│ "2023-12-25"

│ "2023-12-25"

These advanced value patterns form the building blocks for more complex structure matching, which we'll explore in the next section.

Understanding Match Output

When a pattern matches, the default output shows the matched value. This seems simple now, but it becomes more meaningful when we start working with complex structures where patterns might match multiple values or nested elements.

dcbor match number 42

│ 42

│ 42

The output 42 tells us that the pattern number matched the input value 42. When we move to structure patterns, you'll see how this output format shows the path through complex data structures.

Pattern Validation and Error Messages

When a pattern doesn't match, the CLI returns an error:

dcbor match text 42

│ Error: No match

│ Error: No match

This happens because the input 42 is a number, but the pattern text expects a string. Understanding these error messages helps you debug your patterns and understand why they might not be working as expected.

Finally, here's are a couple of example of patterns that fail to parse:

dcbor match tex '"Hello"'

│ Error: Failed to parse pattern at position 0..1: unrecognized token 't'
│ Pattern: tex
│          ^

│ Error: Failed to parse pattern at position 0..1: unrecognized token 't'
│ Pattern: tex
│          ^

dcbor match '"Hello' '"Hello"'

│ Error: Failed to parse pattern: Unterminated string literal at 0..1

│ Error: Failed to parse pattern: Unterminated string literal at 0..1

Structure Patterns

Beyond matching individual values, dCBOR patterns support matching complex structures like arrays, maps, and tagged values. These patterns allow you to validate data schemas and extract elements from nested structures.

Array Patterns

Basic Array Matching

The arrayarray pattern matches any array structure:

ANY_ARRAY=
array

ANY_ARRAY=
array

dcbor match $ANY_ARRAY '[1, 2, 3]'

│ [1, 2, 3]

│ [1, 2, 3]

dcbor match $ANY_ARRAY '["hello", "world"]'

│ ["hello", "world"]

│ ["hello", "world"]

dcbor match $ANY_ARRAY '[]'

│ []

│ []

Note

If you want to match the empty array specifically, then the pattern is just the empty array: [][].

Array Sequence Patterns

The array pattern can contain a comma-separated list of patterns, where each pattern matches zero or more elements in the array in sequence.

[ <patex>, <patex>, ... ]

[ <patex>, <patex>, ... ]

Match an array with a number followed by text

NUMBER_THEN_TEXT=
[number, text]

NUMBER_THEN_TEXT=
[number, text]

dcbor match $NUMBER_THEN_TEXT '[42, "hello"]'

│ [42, "hello"]

│ [42, "hello"]

Note

[number, text][number, text] means the first element must be a number, followed by a text string, and that's it: these must be the only elements and they must appear in that order, so adding another element would not match:

dcbor match $NUMBER_THEN_TEXT '[42, "hello", 0]'

│ Error: No match

│ Error: No match

In this case the first element must be the exact number 42, but the second element can be any text string:

FORTY_TWO_THEN_TEXT=
[42, text]

FORTY_TWO_THEN_TEXT=
[42, text]

dcbor match $FORTY_TWO_THEN_TEXT '[42, "hello"]'

│ [42, "hello"]

│ [42, "hello"]

This won't match because the elements are in wrong order:

dcbor match $FORTY_TWO_THEN_TEXT '["hello", 42]'

│ Error: No match

│ Error: No match

Match array starting with number, then text, then anything else

NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]

NUMBER_THEN_TEXT_THEN_ANY=
[number, text, *]

dcbor match $NUMBER_THEN_TEXT_THEN_ANY '[42, "hello", true]'

│ [42, "hello", true]

│ [42, "hello", true]

Note

In the example above, the ** operator by itself matches exactly one element. If you want to match zero or more of any elements from this point on, you can use the repeating pattern (*)*(*)*:

NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]

NUMBER_THEN_TEXT_THEN_REST=
[number, text, (*)*]

dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello"]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true]'
dcbor match $NUMBER_THEN_TEXT_THEN_REST '[42, "hello", true, false]'

│ [42, "hello"]
│ [42, "hello", true]
│ [42, "hello", true, false]

│ [42, "hello"]
│ [42, "hello", true]
│ [42, "hello", true, false]

We'll cover repeating patterns more thoroughly later.

Map Patterns

Basic Map Matching

The mapmap pattern matches any map structure

ANY_MAP=
map

ANY_MAP=
map

dcbor match $ANY_MAP '{1: 2, 3: 4}'

│ {1: 2, 3: 4}

│ {1: 2, 3: 4}

dcbor match $ANY_MAP '{"hello": "world"}'

│ {"hello": "world"}

│ {"hello": "world"}

dcbor match $ANY_MAP '{}'

│ {}

│ {}

Note

If you want to match the empty map specifically, then the pattern is just the empty map: {}{}.

Key-Value Constraints

Maps can be matched by specifying key-value constraints using <key>: <value><key>: <value> notation. For each constraint, the target map must have at least one key-value pair that satisfies the constraint.

Match map with a specific key, and a text value

HAS_KEY_NAME=
{"name": text}

HAS_KEY_NAME=
{"name": text}

dcbor match $HAS_KEY_NAME '{"name": "Alice", "age": 30}'

│ {"age": 30, "name": "Alice"}

│ {"age": 30, "name": "Alice"}

Notice that it is not necessary to match every key-value pair in the map; you can match just the ones you care about. The output will show the entire map.

Match map with number-valued key

HAS_KEY_1=
{1: text}

HAS_KEY_1=
{1: text}

dcbor match $HAS_KEY_1 '{1: "first", 2: "second"}'

│ {1: "first", 2: "second"}

│ {1: "first", 2: "second"}

If you want to match a map that only contains a specific key-value pair, you can specify the exact number of entries using the & operator and a map pattern containing a quantifier:

Match map with exactly one key-value pair, where key is 1 and value is any text

HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}

HAS_SINGLE_ENTRY_WITH_KEY_1=
{ {1} } & {1: text}

This will not match because it has two entries, and the patex specifies one:

dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first"}'

│ {1: "first"}

│ {1: "first"}

There are two entries, so no match:

dcbor match $HAS_SINGLE_ENTRY_WITH_KEY_1 '{1: "first", 2: "second"}'

│ Error: No match

│ Error: No match

Match map with multiple required entries

HAS_ID_AND_NAME=
{"id": number, "name": text}

HAS_ID_AND_NAME=
{"id": number, "name": text}

Both key-value pairs must exist, but other entries are allowed

dcbor match $HAS_ID_AND_NAME '{"id": 1, "name": "Alice", "age": 30}'

│ {"id": 1, "age": 30, "name": "Alice"}

│ {"id": 1, "age": 30, "name": "Alice"}

Tagged Value Patterns

CBOR tagged values apply semantic meaning to data. Patterns can match both the tag and the content.

Tag Number Matching

Match any value with tag 1234 containing a number

NUMBER_TAGGED_1234=
tagged(1234, number)

NUMBER_TAGGED_1234=
tagged(1234, number)

dcbor match $NUMBER_TAGGED_1234 "1234(42)"

│ 1234(42)

│ 1234(42)

Match tag 12345 with any content

ANY_TAGGED_12345=
tagged(12345, *)

ANY_TAGGED_12345=
tagged(12345, *)

dcbor match $ANY_TAGGED_12345 '12345("tagged string")'

│ 12345("tagged string")

│ 12345("tagged string")

Content Pattern Matching

Tagged patterns specify both the tag value and required content patterns:

Match tag 2 (bignum) with byte string content

BIGNUM=
tagged(2, bstr)

BIGNUM=
tagged(2, bstr)

dcbor match $BIGNUM "2(h'0102')"

│ 2(h'0102')

│ 2(h'0102')

Match tag with array content having specific structure

NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])

NUMBER_TEXT_ARRAY_TAGGED_42=
tagged(42, [number, text])

dcbor match $NUMBER_TEXT_ARRAY_TAGGED_42 '42([1, "data"])'

│ 42([1, "data"])

│ 42([1, "data"])

Introducing Paths

Single Path, Single Element Output

When a pattern matches, the default output shows the matching value. For structures, this represents the entire matching structure:

dcbor match 'array' '[1, 2, 3]'

│ [1, 2, 3]

│ [1, 2, 3]

dcbor match '{"key": *}' '{"key": "value", "other": 42}'

│ {"key": "value", "other": 42}

│ {"key": "value", "other": 42}

The examples above only include one match, and one way to get there. But dCBOR items are actually trees, with arrays and maps representing possible branchs. This becomes more meaningful when working with search patterns or captures that can match multiple items or nested elements. For example, later we'll discuss the search pattern, which visits all the elements in a dcbor item. For a quick example, if you match a pattern that finds all numbers in an array, the output will show each number along with its context, or path from the root of the structure:

dcbor match 'search(number)' '[1, [2, 3]]'

The output shows three paths from the root item to numbers within it:

│ [1, [2, 3]]
│     1
│ [1, [2, 3]]
│     [2, 3]
│         2
│ [1, [2, 3]]
│     [2, 3]
│         3

│ [1, [2, 3]]
│     1
│ [1, [2, 3]]
│     [2, 3]
│         2
│ [1, [2, 3]]
│     [2, 3]
│         3

You can choose to output the last item of each path using the --last-only option, which will only show the final matched items:

dcbor match --last-only "search(number)" '[1, [2, 3]]'

│ 1
│ 2
│ 3

│ 1
│ 2
│ 3

Output Options Overview

The dcbor match command provides several options for controlling output format:

--captures: Show named capture information (covered in advanced chapter)
--last-only: Show only the final matched items
--in FORMAT / --out FORMAT: Control input/output formats (hex, diag, etc.)

Work in Progress

The next chapter will cover advanced matching techniques.

The appendices include a dCBOR Patex Reference.

Introduction to Gordian Envelope

Forthcoming...

Much more will appear here soon, but for now, here are some videos you should watch.

Teaser

A two-minute look at the top features of Gordian Envelope.

Understanding Gordian Envelope, Part One

An overview of Gordian Envelope, Blockchain Commons' Smart Document system.

Understanding Gordian Envelope, Part Two

An overview on the extensions available for Gordian Envelope.

The `envelope` Command Line Tool

A much nicer tutorial is coming.

For now this is a redirect the the envelope CLI documentation at GitHub.

Matching Patterns with `envelope` CLI

A much nicer tutorial is coming.

For now this is a redirect the the envelope CLI documentation at GitHub, which includes an introduction to Pattern Matching.

The appendices include an Envelope Patex Reference.

CBOR Header Bytes

This table shows all possible CBOR header byte values and their meanings.

	`_0`	`_1`	`_2`	`_3`	`_4`	`_5`	`_6`	`_7`	`_8`	`_9`	`_a`	`_b`	`_c`	`_d`	`_e`	`_f`
`0_`	`0`	`1`	`2`	`3`	`4`	`5`	`6`	`7`	`8`	`9`	`10`	`11`	`12`	`13`	`14`	`15`
`1_`	`16`	`17`	`18`	`19`	`20`	`21`	`22`	`23`	`int 1+1`	`int 1+2`	`int 1+4`	`int 1+8`
`2_`	`-1`	`-2`	`-3`	`-4`	`-5`	`-6`	`-7`	`-8`	`-9`	`-10`	`-11`	`-12`	`-13`	`-14`	`-15`	`-16`
`3_`	`-17`	`-18`	`-19`	`-20`	`-21`	`-22`	`-23`	`-24`	`neg 1+1`	`neg 1+2`	`neg 1+4`	`neg 1+8`
`4_`	`bstr 0`	`bstr 1`	`bstr 2`	`bstr 3`	`bstr 4`	`bstr 5`	`bstr 6`	`bstr 7`	`bstr 8`	`bstr 9`	`bstr 10`	`bstr 11`	`bstr 12`	`bstr 13`	`bstr 14`	`bstr 15`
`5_`	`bstr 16`	`bstr 17`	`bstr 18`	`bstr 19`	`bstr 20`	`bstr 21`	`bstr 22`	`bstr 23`	`bstr 1+1`	`bstr 1+2`	`bstr 1+4`	`bstr 1+8`				`bstr indef`
`6_`	`str 0`	`str 1`	`str 2`	`str 3`	`str 4`	`str 5`	`str 6`	`str 7`	`str 8`	`str 9`	`str 10`	`str 11`	`str 12`	`str 13`	`str 14`	`str 15`
`7_`	`str 16`	`str 17`	`str 18`	`str 19`	`str 20`	`str 21`	`str 22`	`str 23`	`str 1+1`	`str 1+2`	`str 1+4`	`str 1+8`				`str indef`
`8_`	`arr 0`	`arr 1`	`arr 2`	`arr 3`	`arr 4`	`arr 5`	`arr 6`	`arr 7`	`arr 8`	`arr 9`	`arr 10`	`arr 11`	`arr 12`	`arr 13`	`arr 14`	`arr 15`
`9_`	`arr 16`	`arr 17`	`arr 18`	`arr 19`	`arr 20`	`arr 21`	`arr 22`	`arr 23`	`arr 1+1`	`arr 1+2`	`arr 1+4`	`arr 1+8`				`arr indef`
`a_`	`map 0`	`map 1`	`map 2`	`map 3`	`map 4`	`map 5`	`map 6`	`map 7`	`map 8`	`map 9`	`map 10`	`map 11`	`map 12`	`map 13`	`map 14`	`map 15`
`b_`	`map 16`	`map 17`	`map 18`	`map 19`	`map 20`	`map 21`	`map 22`	`map 23`	`map 1+1`	`map 1+2`	`map 1+4`	`map 1+8`				`map indef`
`c_`	`tag 0`	`tag 1`	`tag 2`	`tag 3`	`tag 4`	`tag 5`	`tag 6`	`tag 7`	`tag 8`	`tag 9`	`tag 10`	`tag 11`	`tag 12`	`tag 13`	`tag 14`	`tag 15`
`d_`	`tag 16`	`tag 17`	`tag 18`	`tag 19`	`tag 20`	`tag 21`	`tag 22`	`tag 23`	`tag 1+1`	`tag 1+2`	`tag 1+4`	`tag 1+8`
`e_`	`val 0`	`val 1`	`val 2`	`val 3`	`val 4`	`val 5`	`val 6`	`val 7`	`val 8`	`val 9`	`val 10`	`val 11`	`val 12`	`val 13`	`val 14`	`val 15`
`f_`	`val 16`	`val 17`	`val 18`	`val 19`	`false`	`true`	`null`	`undef`	`val 1+1`	`float 16`	`float 32`	`float 64`				`break`

Legend:

1+1 = 1 header byte + 1 data byte (24...255)
1+2 = 1 header byte + 2 data bytes (256...65535)
1+4 = 1 header byte + 4 data bytes
1+8 = 1 header byte + 8 data bytes
int = non-negative integer
neg = negative integer
bstr = byte string + length
str = UTF-8 text string + length
arr = array + length
map = map + length
tag = semantic tag + value
val = simple value
false = simple value 20
true = simple value 21
null = simple value 22
undef = simple value 23
float 16/32/64 = half/single/double precision float
indef = indefinite length
break = stop code for indefinite items

dCBOR Pattern Expressions (patex)

This syntax is inspired by regular expressions but is specifically designed for dCBOR.

The pattern syntax is designed to be flexible and expressive. Patterns can be composed of value patterns, structure patterns, and combinators known as meta-patterns.

Keywords like boolbool, numbernumber, etc., are case-sensitive. Patterns can include specific values, ranges, or regexes to match against the corresponding parts of the dCBOR item.

Parentheses are used to group patterns or specify ranges. The syntax (pattern)(pattern) is really just the repeat pattern with a repeat that matches the pattern exactly once.

White space is ignored between tokens, so you can use it to make patterns more readable. The syntax examples below include white space both to show where it can be used and to show where it cannot be used (i.e., between characters of a token like *?*?)

Value Patterns

All value patterns match atomic CBOR values.

Boolean
- boolbool
  - Matches any boolean value.
- truetrue
  - Matches the boolean value true.
- falsefalse
  - Matches the boolean value false.
ByteString
- bstrbstr
  - Matches any byte string.
- h'<hex>'h'<hex>'
  - Matches a byte string with the specified hex value.
- h'/<regex>/'h'/<regex>/'
  - Matches a byte string that matches the specified binary regex.
Date
- datedate
  - Matches any date value.
- date'<iso-date>'date'<iso-date>'
  - Matches a date value with the specified ISO 8601 format.
- date'<iso-date>...<iso-date>'date'<iso-date>...<iso-date>'
  - Matches a date value within the specified range.
- date'<iso-date>...'date'<iso-date>...'
  - Matches a date value greater than or equal to the specified ISO 8601 date.
- date'...<iso-date>'date'...<iso-date>'
  - Matches a date value less than or equal to the specified ISO 8601 date.
- date'/<regex>/'date'/<regex>/'
  - Matches a date value that matches the specified regex.
Known Value
- knownknown
  - Matches any known value. (See the known-values crate for more information.)
- '<value>''<value>'
  - Matches the specified known value, which is a u64 value. dCBOR prints known values enclosed in single quotes, so we use that syntax here for familiarity. Note: This is a non-prefixed single-quoted pattern.
- '<name>''<name>'
  - Matches the known value with the specified name. Again we use single quotes here for familiarity. Note: This is a non-prefixed single-quoted pattern.
- '/<regex>/''/<regex>/'
  - Matches a known value with a name that matches the specified regex. We do not use the single quotes here. Note: This is a non-prefixed single-quoted pattern.
Null
- nullnull
  - Matches the null value.
Number
- numbernumber
  - Matches any number.
- <n><n>
  - Bare numeric value matches the specified number.
- <n>...<m><n>...<m>
  - Matches a number within the specified range.
- >= <n>>= <n>
  - Matches a number greater than or equal to the specified value.
- <= <n><= <n>
  - Matches a number less than or equal to the specified value.
- > <n>> <n>
  - Matches a number greater than the specified value.
- < <n>< <n>
  - Matches a number less than the specified value.
- NaNNaN
  - Matches the NaN (Not a Number) value.
- InfinityInfinity
  - Matches the Infinity value.
- -Infinity-Infinity
  - Matches the negative Infinity value.
Text
- texttext
  - Matches any text value.
- "<string>""<string>"
  - Matches a text value with the specified string. dCBOR diagnostic notation uses double quotes for text strings, so we use that syntax here for familiarity.
- /<regex>//<regex>/
  - Matches a text value that matches the specified regex. No double quotes are used here, as the regex is not a string but a pattern to match against the text value.
Digest
- digestdigest
  - Matches any digest value.
- digest'<hex>'digest'<hex>'
  - Matches a digest whose value starts with the specified hex prefix. Up to 32 bytes can be specified, which is the length of the full SHA-256 digest.
- digest'<ur:digest>'digest'<ur:digest>'
  - Matches the specified ur:digest value.
- digest'/<regex>/'digest'/<regex>/'
  - Matches a digest value that matches the specified binary regex.

Structure Patterns

Structure patterns match parts of dCBOR items.

Array
- arrayarray
  - Matches any array.
- [{n}][{n}]
  - Matches an array with exactly n elements.
- [{n,m}][{n,m}]
  - Matches an array with between n and m elements, inclusive.
- [{n,}][{n,}]
  - Matches an array with at least n elements.
- [<patex>, <patex>, ...][<patex>, <patex>, ...]
  - Matches an array where the elements match the specified pattern. The pattern can be a simple pattern, a sequence of patterns, or patterns with repeat quantifiers.
  - Examples:
    - [*][*] - Array containing exactly one element of any type
    - [42][42] - Array containing exactly one element: the number 42
    - ["a", "b", "c"]["a", "b", "c"] - Array containing exactly ["a", "b", "c"]["a", "b", "c"] in sequence
    - [(*)*, 42, (*)*][(*)*, 42, (*)*] - Array containing 42 anywhere within it
    - [42, (*)*][42, (*)*] - Array starting with 42, followed by any elements
    - [(*)*, 42][(*)*, 42] - Array ending with 42, preceded by any elements
Map
- mapmap
  - Matches any map.
- {{n}}{{n}}
  - Matches a map with exactly n entries.
- {{n,m}}{{n,m}}
  - Matches a map with between n and m entries, inclusive.
- {{n,}}{{n,}}
  - Matches a map with at least n entries.
- {<patex>: <patex>, <patex>: <patex>, ...}{<patex>: <patex>, <patex>: <patex>, ...}
  - Matches if the specified patterns match the map's keys and values (order isn't important).
Tagged
- taggedtagged
  - Matches any CBOR tagged value.
- tagged ( <value>, <patex> )tagged ( <value>, <patex> )
  - Matches the specified CBOR tagged value with content that matches the given pattern. The tag value is a u64 value, formatted as a bare integer with no delimiters apart from the enclosing parentheses.
- tagged ( <name>, <patex> )tagged ( <name>, <patex> )
  - Matches the CBOR tagged value with the specified name and content that matches the given patex. The tag name is formatted as a bare alphanumeric string (including hyphens and underscores) with no delimiters apart from the enclosing parentheses.
- tagged ( /<regex>/, <patex> )tagged ( /<regex>/, <patex> )
  - Matches a CBOR tagged value with a name that matches the specified regex and content that matches the given pattern.

Meta Patterns

The following meta patterns are available to combine or modify other patterns.

Precedence: Repeat has the highest precedence, followed by And, Not, Sequence, and then Or. Parentheses can be used to group patterns and change precedence.

And
- <patex> & <patex> & <patex><patex> & <patex> & <patex>…
  - Matches if all specified patterns match.
Any
- **
  - A bare asterisk matches any single item.
Capture
- @name ( <patex> )@name ( <patex> )
  - Matches the specified pattern and captures the match for later use with the given name.
Not
- ! <patex>! <patex>
  - Matches if the specified pattern does not match.
    - The pattern !*!* matches no values.
Or
- <patex> | <patex> | <patex>...<patex> | <patex> | <patex>...
  - Matches if any of the specified patterns match.
Repeat
- Greedy — grabs as many repetitions as possible, then backtracks if the rest of the pattern cannot match.
  - ( <patex> )( <patex> ) (exactly once, this is used to group patterns)
  - ( <patex> )*( <patex> )* (0 or more)
  - ( <patex> )?( <patex> )? (0 or 1)
  - ( <patex> )+( <patex> )+ (1 or more)
  - ( <patex> ){ n , m }( <patex> ){ n , m } (n to m repeats, inclusive)
- Lazy — starts with as few repetitions as possible, adding more only if the rest of the pattern cannot match.
  - ( <patex> )*?( <patex> )*? (0 or more)
  - ( <patex> )??( <patex> )?? (0 or 1)
  - ( <patex> )+?( <patex> )+? (1 or more)
  - ( <patex> ){ n , m }?( <patex> ){ n , m }? (n to m repeats, inclusive)
- Possessive — grabs as many repetitions as possible and never backtracks; if the rest of the pattern cannot match, the whole match fails.
  - ( <patex> )*+( <patex> )*+ (0 or more)
  - ( <patex> )?+( <patex> )?+ (0 or 1)
  - ( <patex> )++( <patex> )++ (1 or more)
  - ( <patex> ){ n , m }+( <patex> ){ n , m }+ (n to m repeats, inclusive)
Search
- search ( <patex> )search ( <patex> )
  - Visits every node in the CBOR tree, matching the specified pattern against each node.

Example Composite Patterns

The following patterns show examples of combining structure patterns with meta patterns to create complex matching expressions:

Nested Structure Patterns
- tagged ( <value> , [ <patex>, <patex>, ... ] )tagged ( <value> , [ <patex>, <patex>, ... ] )
  - Matches a tagged value containing an array with the specified patterns. The pattern can be simple patterns, sequences, or patterns with repeat quantifiers.
- {<patex>: [{{n,}}]}{<patex>: [{{n,}}]}
  - Matches a map where the specified key pattern maps to an array with at least n elements.
- [{<patex>: <patex>}, (<patex>)*][{<patex>: <patex>}, (<patex>)*]
  - Matches an array starting with a map that contains the specified key-value pattern, followed by any other elements.

Envelope Pattern Expressions (patex)

This syntax is inspired by regular expressions but is specifically designed for Gordian Envelope.

The patex syntax is designed to be flexible and expressive. Patterns can be composed of leaf patterns, structure patterns, and combinators known as meta-patterns.

Keywords like boolbool, numbernumber, etc., are case-sensitive. Patterns can include specific values, ranges, or regexes to match against the corresponding parts of the envelope.

Parentheses are used to group patterns or specify ranges. The syntax (pattern)(pattern) is really just the repeat pattern with a repeat that matches the pattern exactly once.

Leaf Patterns

All leaf patterns match Envelope leaves, which are CBOR values.

dCBOR Value Patterns

Boolean
- boolbool
  - Matches any boolean value.
- truetrue
  - Matches the boolean value true.
- falsefalse
  - Matches the boolean value false.
ByteString
- bstrbstr
  - Matches any byte string.
- h'<hex>'h'<hex>'
  - Matches a byte string with the specified hex value.
- h'/<regex>/'h'/<regex>/'
  - Matches a byte string that matches the specified binary regex.
Date
- datedate
  - Matches any date value.
- date'<iso-date>'date'<iso-date>'
  - Matches a date value with the specified ISO 8601 format.
- date'<iso-date>...<iso-date>'date'<iso-date>...<iso-date>'
  - Matches a date value within the specified range.
- date'<iso-date>...'date'<iso-date>...'
  - Matches a date value greater than or equal to the specified ISO 8601 date.
- date'...<iso-date>'date'...<iso-date>'
  - Matches a date value less than or equal to the specified ISO 8601 date.
- date'/<regex>/'date'/<regex>/'
  - Matches a date value that matches the specified regex.
Known Value
- knownknown
  - Matches any known value. (See the known-values crate for more information.)
- '<value>''<value>'
  - Matches the specified known value, which is a u64 value. dCBOR prints known values enclosed in single quotes, so we use that syntax here for familiarity. Note: This is a non-prefixed single-quoted pattern.
- '<name>''<name>'
  - Matches the known value with the specified name. Again we use single quotes here for familiarity. Note: This is a non-prefixed single-quoted pattern.
- '/<regex>/''/<regex>/'
  - Matches a known value with a name that matches the specified regex. We do not use the single quotes here. Note: This is a non-prefixed single-quoted pattern.
Null
- nullnull
  - Matches the null value.
Number
- numbernumber
  - Matches any number.
- <n><n>
  - Bare numeric value matches the specified number.
- <n>...<m><n>...<m>
  - Matches a number within the specified range.
- >= <n>>= <n>
  - Matches a number greater than or equal to the specified value.
- <= <n><= <n>
  - Matches a number less than or equal to the specified value.
- > <n>> <n>
  - Matches a number greater than the specified value.
- < <n>< <n>
  - Matches a number less than the specified value.
- NaNNaN
  - Matches the NaN (Not a Number) value.
- InfinityInfinity
  - Matches the Infinity value.
- -Infinity-Infinity
  - Matches the negative Infinity value.
Text
- texttext
  - Matches any text value.
- "<string>""<string>"
  - Matches a text value with the specified string. dCBOR diagnostic notation uses double quotes for text strings, so we use that syntax here for familiarity.
- /<regex>//<regex>/
  - Matches a text value that matches the specified regex. No double quotes are used here, as the regex is not a string but a pattern to match against the text value.
Digest
- digestdigest
  - Matches any digest value.
- digest'<hex>'digest'<hex>'
  - Matches a digest whose value starts with the specified hex prefix. Up to 32 bytes can be specified, which is the length of the full SHA-256 digest.
- digest'<ur:digest>'digest'<ur:digest>'
  - Matches the specified ur:digest value.
- digest'/<regex>/'digest'/<regex>/'
  - Matches a digest value that matches the specified binary regex.
Array
- arrayarray
  - Matches any array.
- [{n}][{n}]
  - Matches an array with exactly n elements.
- [{n,m}][{n,m}]
  - Matches an array with between n and m elements, inclusive.
- [{n,}][{n,}]
  - Matches an array with at least n elements.
- [<patex>, <patex>, ...][<patex>, <patex>, ...]
  - Matches an array where the elements match the specified pattern. The pattern can be a simple pattern, a sequence of patterns, or patterns with repeat quantifiers.
  - Examples:
    - [*][*] - Array containing exactly one element of any type
    - [42][42] - Array containing exactly one element: the number 42
    - ["a", "b", "c"]["a", "b", "c"] - Array containing exactly ["a", "b", "c"]["a", "b", "c"] in sequence
    - [(*)*, 42, (*)*][(*)*, 42, (*)*] - Array containing 42 anywhere within it
    - [42, (*)*][42, (*)*] - Array starting with 42, followed by any elements
    - [(*)*, 42][(*)*, 42] - Array ending with 42, preceded by any elements
Map
- mapmap
  - Matches any map.
- {{n}}{{n}}
  - Matches a map with exactly n entries.
- {{n,m}}{{n,m}}
  - Matches a map with between n and m entries, inclusive.
- {{n,}}{{n,}}
  - Matches a map with at least n entries.
- {<patex>: <patex>, <patex>: <patex>, ...}{<patex>: <patex>, <patex>: <patex>, ...}
  - Matches if the specified patterns match the map's keys and values (order isn't important).
Tagged
- taggedtagged
  - Matches any CBOR tagged value.
- tagged ( <value>, <patex> )tagged ( <value>, <patex> )
  - Matches the specified CBOR tagged value with content that matches the given pattern. The tag value is a u64 value, formatted as a bare integer with no delimiters apart from the enclosing parentheses.
- tagged ( <name>, <patex> )tagged ( <name>, <patex> )
  - Matches the CBOR tagged value with the specified name and content that matches the given patex. The tag name is formatted as a bare alphanumeric string (including hyphens and underscores) with no delimiters apart from the enclosing parentheses.
- tagged ( /<regex>/, <patex> )tagged ( /<regex>/, <patex> )
  - Matches a CBOR tagged value with a name that matches the specified regex and content that matches the given pattern.

Envelope dCBOR Patterns

CBOR
- cborcbor
  - Matches any subject CBOR value.
- cbor ( <dcbor-diagnostic-notation> )cbor ( <dcbor-diagnostic-notation> )
  - Matches a subject CBOR value that matches the specified diagnostic notation.
- cbor ( <ur:type/value> )cbor ( <ur:type/value> )
  - Matches a subject CBOR value that matches the specified ur.
- cbor ( /<dcbor-patex>/ )cbor ( /<dcbor-patex>/ )
  - Matches a subject CBOR value that matches the specified dcbor-pattern expression. This enables advanced pattern matching within CBOR structures including quantifiers, captures, and complex structural patterns. The pattern expression uses dcbor-pattern syntax.

Structure Patterns

Structure patterns match parts of Gordian Envelope structures.

Leaf
- leafleaf
  - Matches any leaf envelope (terminal nodes in the envelope tree), a "bare subject". This is distinct from the node pattern, which matches a subject with one or more assertions.
Assertions
- assertassert
  - Matches any assertion.
- assertpred ( <patex> )assertpred ( <patex> )
  - Matches an assertion having a predicate that matches the specified pattern.
- assertobj ( <patex> )assertobj ( <patex> )
  - Matches an assertion having an object that matches the specified pattern.
Digest
- digest ( <hex> )digest ( <hex> )
  - Matches a digest whose value starts with the specified hex prefix. Up to 32 bytes can be specified, which is the length of the full SHA-256 digest.
- digest ( <ur:digest> )digest ( <ur:digest> )
  - Matches the specified ur:digest value, parsed using the bc-ur crate.
Node
- nodenode
  - Matches any Gordian Envelope node, which is an envelope with at least one assertion.
- node ( { n, m } )node ( { n, m } )
  - Matches a Gordian Envelope node with between n and m assertions, inclusive. An n of zero will never match.
Objects
- objobj
  - Matches any object.
- obj ( <patex> )obj ( <patex> )
  - Matches an object that matches the specified pattern.
Obscured
- obscuredobscured
  - Matches any obscured (elided, encrypted, or compressed) branch of the Envelope tree.
- elidedelided
  - Matches any elided branch of the Envelope tree.
- encryptedencrypted
  - Matches any encrypted branch of the Envelope tree.
- compressedcompressed
  - Matches any compressed branch of the Envelope tree.
Predicates
- predpred
  - Matches any predicate.
- pred ( <patex> )pred ( <patex> )
  - Matches a predicate that matches the specified pattern.
Subjects
- subjsubj
  - Matches any subject. If the envelope is not a NODE, then this is the identity function.
- subj ( <patex> )subj ( <patex> )
  - Matches a subject that matches the specified pattern.
Wrapped
- wrappedwrapped
  - Matches any wrapped Envelope.
- unwrapunwrap
  - Matches on the content of a wrapped Envelope.

Meta Patterns

The following meta patterns are available to combine or modify other patterns.

Precedence: Repeat has the highest precedence, followed by And, Not, Traversal, and then Or. Parentheses can be used to group patterns and change precedence.

And
- <patex> & <patex> & <patex>...<patex> & <patex> & <patex>...
  - Matches if all specified patterns match.
Any
- **
  - Always matches.
Capture
- @[name] ( <patex> )
  - Matches the specified pattern and captures the match for later use with the given name.
Not
- ! <patex>! <patex>
  - Matches if the specified patex does not match.
  - A pattern that never matches can be represented as !*!*.
Or
- <patex> | <patex> | <patex> ...<patex> | <patex> | <patex> ...
  - Matches if any of the specified patterns match.
Repeat
- Greedy — grabs as many repetitions as possible, then backtracks if the rest of the patex cannot match.
  - ( <patex> )( <patex> ) (exactly once, this is used to group patterns)
  - ( <patex> )*( <patex> )* (0 or more)
  - ( <patex> )?( <patex> )? (0 or 1)
  - ( <patex> )+( <patex> )+ (1 or more)
  - ( <patex> ){ n , m }( <patex> ){ n , m } (n to m repeats, inclusive)
- Lazy — starts with as few repetitions as possible, adding more only if the rest of the pattern cannot match.
  - ( <patex> )*?( <patex> )*? (0 or more)
  - ( <patex> )??( <patex> )?? (0 or 1)
  - ( <patex> )+?( <patex> )+? (1 or more)
  - ( <patex> ){ n , m }?( <patex> ){ n , m }? (n to m repeats, inclusive)
- Possessive — grabs as many repetitions as possible and never backtracks; if the rest of the pattern cannot match, the whole match fails.
  - ( <patex> )*+( <patex> )*+ (0 or more)
  - ( <patex> )?+( <patex> )?+ (0 or 1)
  - ( <patex> )++( <patex> )++ (1 or more)
  - ( <patex> ){ n , m }+( <patex> ){ n , m }+ (n to m repeats, inclusive)
Search
- search ( <patex> )search ( <patex> )
  - Visits every node in the Envelope tree, matching the specified pattern against each node.
Traversal
- <patex> -> <patex> -> <patex><patex> -> <patex> -> <patex>
  - Matches if the specified patterns match a traversal path, with no other nodes in between.

Keyboard shortcuts

The CBOR, dCBOR, and Gordian Envelope Book