Qoyod
Pricing

Knowledge Base

Embedding XML Inside PDF/A-3: The Object Chain Explained

When you generate an electronic invoice in PDF/A-3 format, it is not enough to place the XML file somewhere inside the document. There is a precise structure mandated by the international standard ISO 19005-3 that defines how the file is injected, how it is identified, and how the reader links it to the visible page. This technical guide breaks down that embedding mechanism at the level of the file structure: how the XML file becomes part of the document as an EmbeddedFile object, how the Associated Files relationship links it to the invoice, what role the File Specification Dictionary and the /AFRelationshipkey play, and how the receiving system extracts the XML copy programmatically.

This page assumes you already know what the PDF/A-3 format is and why the Zakat, Tax and Customs Authority (ZATCA) adopted it for human-readable invoices. If you have not read up on that yet, start with the guide on the PDF/A-3 format in the electronic invoice which explains the format and its archiving properties. Here we focus on the embedding mechanism alone, because it is the part developers most often get wrong when building an invoice generator from scratch.

Why a hybrid file at all: a page for the human and data for the machine

An electronic invoice in business-to-business (B2B) transactions needs two representations at the same time. The first is a visible page read by a procurement officer: the supplier’s name, its tax number, the line items, the amounts, the QR code. The second is a structured data file read by the receiving accounting system without human intervention. If you sent each representation in a separate file, the recipient would risk loading one file and losing the other, or dealing with two contradictory copies of the same invoice.

The hybrid file resolves this conflict. It is a single PDF/A-3 file that carries the invoice page in its visible layer and an embedded XML file matching the page’s content in its internal structure. The human recipient opens the file and sees the invoice, while the recipient’s automated system extracts the XML and processes it. Both copies live in one file, so there is no separation and no contradiction. This same principle underlies well-established international standards such as Germany’s ZUGFeRD and France’s Factur-X, all of which rely on a PDF/A-3 file carrying an embedded XML.

The essential point is that the embedded XML is not a casual attachment like an image you drop into an email. It is defined inside the file structure as the official alternative representation of the visible invoice. This definition is what distinguishes correct embedding from merely pasting a file. To reach it, we need to understand several objects in the PDF structure that work together.

This principle carries an important implicit requirement: content parity between the two layers. The visible page and the embedded XML file must carry the same data: the same amounts, the same line items, the same tax number. Any divergence between them makes the invoice contradict itself, so the officer sees one figure while the system records another. This is why the two representations are not built separately but generated from a single data source, so the visible page is a visual presentation of the same data the XML carries. This parity is a practical condition for accepting the invoice in business transactions, not a formal detail.

The four objects that build the embedding

Embedding a file inside PDF/A-3 rests on a chain of interlinked objects. Each object is a dictionary holding key-value pairs and pointing to the next object in the chain. Understanding this chain is a condition for generating a valid file or diagnosing a rejected one. The four objects are:

  • Embedded File Stream: the object that carries the bytes of the XML file itself, of type /EmbeddedFile.
  • File Specification Dictionary: the object that describes the embedded file: its name, its relationship to the document, its type, of type /Filespec.
  • EmbeddedFiles Name Tree: the index that registers the embedded file in the document catalog so the reader can find it.
  • Associated Files Array: the array that links the file to the document or to the page as an associated file, via the key /AF.

These objects work together to answer three questions: where are the file’s bytes? What is its name and relationship? And how does the reader find it and know it is an associated file rather than a mere attachment? We take each in turn.

The four objects for embedding XML in PDF/A-3
The structure by which an XML file is linked inside a PDF/A-3 document.
Embedding structure

The Catalog holds the /AF array

EmbeddedFiles name tree /EmbeddedFiles

File Specification Dictionary /Filespec

Embedded file stream /EmbeddedFile

These four objects link the XML file to the document formally.

The embedded file stream: where the XML bytes are stored

At the heart of the embedding sits a Stream object that carries the actual content of the XML file byte by byte. This object holds a dictionary describing it, followed by the raw bytes between the keywords stream andendstream. The dictionary defines the object’s subtype as /EmbeddedFile, specifies the content type via the key /Subtype, and records the length along with optional extra data such as the creation date and the original file size.

5 0 obj
<<
  /Type /EmbeddedFile
  /Subtype /text#2Fxml
  /Length 4096
  /Params <<
    /Size 4096
    /ModDate (D:20260624120000+03'00')
  >>
>>
stream
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
  ... structured invoice content ...
</Invoice>
endstream
endobj

Note the value of /Subtype. MIME types are written inside a PDF with a special encoding that replaces the character / with the sequence #2F, so text/xml is written as text#2Fxml. This is a fine detail that trips up anyone writing the structure by hand, producing a file that fails validation. Some generators use application#2Fxml instead, and both are acceptable as long as they accurately describe the file’s content.

The /Params dictionary is optional but recommended. It carries metadata about the original file before embedding: its size in bytes in /Size, its last modification date in /ModDate, and sometimes a checksum. This data helps the extracting system verify the file’s integrity after extraction and compare its size against what is recorded.

The File Specification Dictionary: the embedded XML’s identity card

The file stream alone is not enough. The document needs an object that describes this stream: what the file name shown to the user is, where it finds its bytes, and what its relationship to the document is. This object is the File Specification Dictionary of type /Filespec. It is the identity card of the embedded file, and the central object around which the rest of the chain revolves.

6 0 obj
<<
  /Type /Filespec
  /F (invoice.xml)
  /UF (invoice.xml)
  /AFRelationship /Alternative
  /Desc (Saudi e-invoice XML representation)
  /EF <<
    /F 5 0 R
    /UF 5 0 R
  >>
>>
endobj

We break down the keys of this dictionary one by one, because each carries a meaning that affects the file’s acceptance:

  • /Type /Filespec: declares that this object is a file specification dictionary. Required in PDF/A-3.
  • /F and/UF: the file name as it appears to the user. /F the traditional form, and/UF the newer Unicode form. They should match. A name such as invoice.xml is clear and meaningful.
  • /AFRelationship: the most important key in the embedding. It defines the relationship of the embedded file to the containing document. We devote the next section entirely to it.
  • /Desc: a short textual description of the file. Optional but useful for the human reader browsing the attachments.
  • /EF (Embedded File): the dictionary that links the file specification to the actual byte stream. The value 5 0 R is an indirect reference pointing to object number 5, which carries the XML bytes. Here the loop between description and content is completed.

The reference 5 0 R is what is called an indirect reference in the PDF structure. The first number is the object identifier, the second is the generation number, and the letter R indicates that it is a reference, not a direct value. With this reference, the /Filespec dictionary links itself to the /EmbeddedFile stream without duplicating the bytes.

The /AFRelationship key: what makes the file an associated file rather than a mere attachment

This key is the essence of what distinguishes a correct hybrid invoice file. It defines /AFRelationship the nature of the relationship between the embedded file and the document’s visible content. The ISO 19005-3 standard defines several possible values for this key, the most notable being:

  • /Source: the embedded file is the source from which the visible page was derived.
  • /Data: the file carries structured data used to extract the document’s content.
  • /Alternative: the file is an alternative representation of the same visible-page content, in another format the machine reads.
  • /Supplement: the file is supplementary content that adds to the page without being an alternative representation of it.
  • /Unspecified: the relationship is unspecified. Not recommended in invoices, because it leaves the reader without any indication.

In a business invoice, the correct value is usually /Alternative or /Data depending on the standard you follow. The rationale behind /Alternative is that the XML file is not a separate attachment but the invoice itself expressed in a format systems can read. The visible page and the XML file carry the same content, one for the human and the other for the machine. This definition specifically is what turns the file from “a PDF with an attachment” into “a recognized hybrid invoice.”

The common mistake here is that the developer embeds the XML file without setting /AFRelationship, or leaves it at a value that does not express the true relationship. The result is a file that an ordinary PDF reader may open and show the attachment, but a system looking for an associated invoice will not recognize it as the approved alternative representation. Setting this key correctly is a condition for the invoice’s acceptance by standard-compliant systems.

The /AFRelationship values and their meanings
How the relationship between the XML file and the visible document is described.
Value Meaning
/Source the document’s original source
/Data dependent data
/Alternative alternative copy (recommended for the invoice)
/Supplement supplement
/Unspecified unspecified
The invoice usually uses /Alternative to link the XML to the PDF copy.

The Associated Files array: linking the file to the document

Defining a file with a valid /Filespec dictionary is not on its own enough for the file to count as a recognized associated file. It must be registered in the Associated Files array via the key /AF at the document level, or at the level of a specific page. This array is precisely what the PDF/A-3 standard added to support associated files, and it is what distinguishes it from older versions.

1 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
  /Names 3 0 R
  /AF [6 0 R]
>>
endobj

The Catalog object is the root of the document structure. The key /AF here is an array holding a single reference 6 0 R pointing to the file specification dictionary we defined. With this registration, the file becomes associated at the level of the entire document, that is, linked to the invoice as a whole rather than to a specific page. In the invoice, where the XML file represents the whole invoice, this is the correct logical link.

Alternatively, the associated file can be linked to a specific page by placing the key /AF inside the page dictionary itself. But in the hybrid invoice, linking at the document level is the adopted practice, because the structured data pertains to the whole document. Combining registration of the file in the /AF array and in the name tree (next) is what guarantees that every compliant reader will find the file and recognize its relationship.

The EmbeddedFiles name tree: how the reader finds the file

Alongside the /AFarray, the file must be registered in the EmbeddedFiles Name Tree. This tree is an index that links each embedded file’s name to its specification dictionary, and it lives inside the Names Dictionary in the Catalog. It is the traditional mechanism PDF readers rely on to display the list of attachments to the user.

3 0 obj
<<
  /EmbeddedFiles 4 0 R
>>
endobj

4 0 obj
<<
  /Names [(invoice.xml) 6 0 R]
>>
endobj

The Names Dictionary 3 0 obj points via the key /EmbeddedFiles to the name tree 4 0 obj. The latter holds a /Names array made up of pairs: the file name as a text string, followed by a reference to its specification dictionary. Here the pair (invoice.xml) 6 0 R links the displayed name to dictionary number 6, which carries all the file’s details.

The file’s presence in two places, the /AF array and the name tree, may seem like duplication. But they serve two different purposes. The name tree is an old mechanism that makes the file visible as an attachment in any PDF reader. The /AF array is a newer mechanism that identifies it as an associated file with a defined relationship. The old reader sees the attachment via the tree, while the PDF/A-3-compliant reader understands the relationship via /AF. Registering the file in both guarantees the widest possible compatibility.

The difference between the associated file and the traditional attachment

Before the PDF/A-3 standard, a PDF file could carry attachments via the name tree alone, or via File Attachment Annotations bound to a point on a page. These traditional attachments are files the document carries without any semantic relationship linking them to it. The reader sees them as paper clips on the margin of the document, and does not know whether they are the source of the content, supplementary data, or an alternative copy.

The PDF/A-3 standard moved these files from the margin of the document into the core of its structure. The associated file is not a paper clip but a defined part of the document with a stated relationship via /AFRelationship, registered in the /AF array at the document or page level. This semantic shift is what enabled the hybrid invoice. Without the defined relationship, the recipient’s system could not distinguish the XML file representing the invoice from any other file that might be attached to the document for a different purpose.

In practice, this means it is not enough for the developer to “attach” the XML file the old way. If they did, they would produce a file that an ordinary PDF reader opens and shows the attachment, but that fails before a system looking for an approved associated invoice. The difference is not in the file’s presence but in how it is defined, registered, and linked. This is exactly what the object chain we explained adds on top of the traditional attachment.

The order of embedding relative to the cryptographic signature

The electronic invoice does not stop at embedding XML; it also carries a cryptographic signature that guarantees its integrity and that it has not been tampered with after issuance. Here the question of order arises: is the XML embedded before or after signing? The answer is decisive, because an ordering mistake invalidates the signature.

The rule is that the file structure must be completed, including embedding the XML and registering it in the Associated Files array, before applying the cryptographic signature. The signature computes a hash over the file’s content in its final state. If the embedded file were added after signing, the document’s content would change, so the signed hash would no longer match, and the signature would be rejected on verification. That is why embedding is a step that precedes sealing, not one that follows it.

This ordering explains why it is recommended to leave invoice generation to an approved system. Building the structure, then signing, then verifying is a sequential chain, and any flaw in its ordering drops either the PDF/A-3 status or the signature’s validity. The approved system executes these steps in the correct order automatically, producing a file that is signed, compliant, and loaded with the embedded XML in a single consistent operation.

Start today

Let your system build the correct PDF/A-3 structure automatically

Instead of building the object chain by hand and risking a violation that drops the invoice’s status, Qoyod generates a PDF/A-3 invoice compliant with the Authority, with embedded XML and a correctly configured associated-files relationship, automatically.

Start your free trial and issue compliant PDF/A-3 invoices

How the receiving system extracts the XML file

The purpose of all this structure is to let the receiving system extract the XML file automatically and process it. Extraction runs in reverse of the order in which we built the file. The system starts from the document’s root and follows the references until it reaches the bytes. The steps in order:

  • The system opens the Catalog object and looks for the key /AF. Its presence means the document carries associated files.
  • It follows the reference in the /AF array to the file specification dictionary /Filespec.
  • It reads the key /AFRelationship to learn the file’s nature. A value of /Alternative or /Data indicates it is the invoice’s structured representation.
  • It follows the key /EF then its subkey /F to the /EmbeddedFile stream carrying the bytes.
  • It reads the bytes between stream andendstream, decompresses them if they were compressed, obtaining the raw XML text.
  • It verifies the file’s integrity by comparing its size against the value of /Size in /Params if present, then passes it to an XML parser to process the invoice.

This sequence is what makes the hybrid file fully machine-processable. The system does not need to read the visible page or interpret the graphics. It is enough for it to follow the chain of references to reach the invoice’s structured data. And because the standard defines this path precisely, any compliant system extracts the file the same way, regardless of the program that generated the invoice.

Here the value of setting /AFRelationship , which we explained, shows itself. If the generator left it at a wrong value, the system might extract the file but would not trust it as the approved alternative representation, treating it as a casual attachment. Setting it correctly is what makes extraction reliable and authoritative, not merely reading bytes.

Extracting the XML file from PDF/A-3
How the reader extracts the embedded XML file.
1

Open the Catalog

2

Read the /AF array

3

Reach the /Filespec

4

Extract the XML stream and pass it to the parser

With these steps, the reader separates the machine-readable XML from the visual presentation.

Common mistakes that drop the invoice’s status

Building the object chain by hand is prone to fine violations that make the file fail validation or be rejected by compliant systems. The most frequent:

  • Absence of the /AF: array: embedding the file in the name tree only without registering it as an associated file. The result is an ordinary attachment, not a recognized associated invoice.
  • A wrong or missing /AFRelationship value: leaving out the key or setting it to /Unspecified strips the file of its meaning as the invoice’s alternative representation.
  • Improper /Subtype encoding: writing text/xml without replacing / with #2F produces an invalid value in the PDF structure.
  • Mismatch of /F and/UF: two different names for the file confuse some readers.
  • Fonts not embedded in the visible page: a violation of the fundamental PDF/A requirement, which drops the status regardless of the embedding’s correctness.
  • Prohibited content in the page: JavaScript, dynamic content, or encryption, all of which the standard forbids.

To verify the file’s integrity after generating it, pass it through a specialized validation tool such as veraPDF at the required level. The tool outputs a report detailing every violation of the standard, including embedding issues and the associated-files relationship. This is an indispensable step before sending the invoice in a production environment.

The practical takeaway is that generating a PDF/A-3 file by hand in a production environment is fraught with risk. The object chain is delicate, and any error in a reference, an encoding, or a relationship drops the invoice’s status with no visible warning to the user. It is better for an approved accounting system to build this structure automatically.

How Qoyod handles embedding XML inside PDF/A-3

Qoyod generates a PDF/A-3 invoice compliant with the Authority’s requirements without the developer or accountant having to deal with the object chain by hand. The system builds the embedded file stream and the file specification dictionary, sets the /AFRelationship key to the correct value, and registers the file in both the Associated Files array and the name tree, producing a hybrid file valid for both human reading and machine processing at once.

Beyond embedding, Qoyod handles the rest of the electronic invoice’s requirements: embedding the fonts in the visible page, the cryptographic signature, managing the Cryptographic Stamp Identifier (CSID) certificate automatically, generating the QR code, and linking the invoice to the Fatoora platform. This way the file comes out compliant with the Authority from the first generation, without the risk of the fine violations that hand-building falls into.

Frequently asked questions

What is the difference between the embedded file (EmbeddedFile) and the associated file (Associated File)?

The embedded file is an old concept meaning that the bytes of a file are stored inside the document. The associated file is a concept added by PDF/A-3, and it means an embedded file registered in the /AF array with a defined relationship to the content via /AFRelationship. Every associated file is an embedded file, but not every embedded file is associated.

What is the correct value for the /AFRelationship key in a business invoice?

Usually /Alternative, because the XML file is an alternative representation of the visible page carrying the same content in a machine-readable format. Some standards use /Data instead. What matters most is that the value is not left empty or set to /Unspecified.

Why is the file registered in both the /AF array and the name tree?

The name tree makes the file visible as an attachment in any old PDF reader, while the /AF array identifies it as an associated file with a defined relationship for the PDF/A-3-compliant reader. Registering it in both guarantees the widest compatibility across systems.

Do I need to compress the XML file before embedding it?

Compression is optional. The XML can be embedded raw or compressed with a filter such as /FlateDecode declared in the stream dictionary. Compression reduces the file size, and the extracting system decompresses it automatically before reading the bytes.

Can I build the embedding structure by hand as a developer?

Technically yes, but it is not recommended in production. The object chain is delicate, and any error in a reference, an encoding, or a relationship drops the invoice’s status. It is better to rely on an approved accounting system that builds the structure compliant with the Authority automatically.

How do I make sure the embedding is sound before sending?

Pass the file through a validation tool such as veraPDF at the required level. The tool inspects the embedding structure, the associated-files relationship, and the rest of the standard’s requirements, and outputs a report of every violation. Fix every error before approving the file in a production environment.

Practical takeaway

Embedding XML inside PDF/A-3 is not merely pasting a file, but building a chain of interlinked objects: a /EmbeddedFile stream carrying the bytes, a /Filespec dictionary describing it, an /AFRelationship key defining its relationship to the invoice, and an /AF array and name tree registering it so the reader finds and recognizes it. This structure specifically is what turns an ordinary file into a recognized hybrid invoice.

Setting each key correctly is a condition for the invoice’s acceptance and automated processing. And if you want to go deeper into the invoice’s technical representation formats, review the guide on XML invoice: the technical format of the electronic invoice and the guide on the electronic invoice structure. To understand the container format itself and its archiving properties, start from the guide on the PDF/A-3 format in the electronic invoice. And to see the full picture of the electronic invoicing requirements in Saudi Arabia, review the page on Qoyod’s electronic invoicing software.

Guides

Continue your learning journey

Explore the rest of Qoyod’s guides, or start applying what you’ve learned.

Live webinars hosted by the Qoyod team to help you use the software easily and answer your questions.

Discover Qoyod’s latest updates, ongoing improvements, and new features in one place.

Our team is ready to help you and provide instant support for any issue you face, around the clock.