When you generate an electronic invoice in PDF/A-3 format, it is not enough to place the XML file somewhere inside the document. There is a precise structure mandated by the international standard ISO 19005-3 that defines how the file is injected, how it is identified, and how the reader links it to the visible page. This technical guide breaks down that embedding mechanism at the level of the file structure: how the XML file becomes part of the document as an EmbeddedFile object, how the Associated Files relationship links it to the invoice, what role the File Specification Dictionary and the /AFRelationshipkey play, and how the receiving system extracts the XML copy programmatically.
This page assumes you already know what the PDF/A-3 format is and why the Zakat, Tax and Customs Authority (ZATCA) adopted it for human-readable invoices. If you have not read up on that yet, start with the guide on the PDF/A-3 format in the electronic invoice which explains the format and its archiving properties. Here we focus on the embedding mechanism alone, because it is the part developers most often get wrong when building an invoice generator from scratch.
Why a hybrid file at all: a page for the human and data for the machine
An electronic invoice in business-to-business (B2B) transactions needs two representations at the same time. The first is a visible page read by a procurement officer: the supplier’s name, its tax number, the line items, the amounts, the QR code. The second is a structured data file read by the receiving accounting system without human intervention. If you sent each representation in a separate file, the recipient would risk loading one file and losing the other, or dealing with two contradictory copies of the same invoice.
The hybrid file resolves this conflict. It is a single PDF/A-3 file that carries the invoice page in its visible layer and an embedded XML file matching the page’s content in its internal structure. The human recipient opens the file and sees the invoice, while the recipient’s automated system extracts the XML and processes it. Both copies live in one file, so there is no separation and no contradiction. This same principle underlies well-established international standards such as Germany’s ZUGFeRD and France’s Factur-X, all of which rely on a PDF/A-3 file carrying an embedded XML.
The essential point is that the embedded XML is not a casual attachment like an image you drop into an email. It is defined inside the file structure as the official alternative representation of the visible invoice. This definition is what distinguishes correct embedding from merely pasting a file. To reach it, we need to understand several objects in the PDF structure that work together.
This principle carries an important implicit requirement: content parity between the two layers. The visible page and the embedded XML file must carry the same data: the same amounts, the same line items, the same tax number. Any divergence between them makes the invoice contradict itself, so the officer sees one figure while the system records another. This is why the two representations are not built separately but generated from a single data source, so the visible page is a visual presentation of the same data the XML carries. This parity is a practical condition for accepting the invoice in business transactions, not a formal detail.
The four objects that build the embedding
Embedding a file inside PDF/A-3 rests on a chain of interlinked objects. Each object is a dictionary holding key-value pairs and pointing to the next object in the chain. Understanding this chain is a condition for generating a valid file or diagnosing a rejected one. The four objects are:
- Embedded File Stream: the object that carries the bytes of the XML file itself, of type
/EmbeddedFile. - File Specification Dictionary: the object that describes the embedded file: its name, its relationship to the document, its type, of type
/Filespec. - EmbeddedFiles Name Tree: the index that registers the embedded file in the document catalog so the reader can find it.
- Associated Files Array: the array that links the file to the document or to the page as an associated file, via the key
/AF.
These objects work together to answer three questions: where are the file’s bytes? What is its name and relationship? And how does the reader find it and know it is an associated file rather than a mere attachment? We take each in turn.
The Catalog holds the /AF array
EmbeddedFiles name tree /EmbeddedFiles
File Specification Dictionary /Filespec
Embedded file stream /EmbeddedFile
The embedded file stream: where the XML bytes are stored
At the heart of the embedding sits a Stream object that carries the actual content of the XML file byte by byte. This object holds a dictionary describing it, followed by the raw bytes between the keywords stream andendstream. The dictionary defines the object’s subtype as /EmbeddedFile, specifies the content type via the key /Subtype, and records the length along with optional extra data such as the creation date and the original file size.
Note the value of /Subtype. MIME types are written inside a PDF with a special encoding that replaces the character / with the sequence #2F, so text/xml is written as text#2Fxml. This is a fine detail that trips up anyone writing the structure by hand, producing a file that fails validation. Some generators use application#2Fxml instead, and both are acceptable as long as they accurately describe the file’s content.
The /Params dictionary is optional but recommended. It carries metadata about the original file before embedding: its size in bytes in /Size, its last modification date in /ModDate, and sometimes a checksum. This data helps the extracting system verify the file’s integrity after extraction and compare its size against what is recorded.
The File Specification Dictionary: the embedded XML’s identity card
The file stream alone is not enough. The document needs an object that describes this stream: what the file name shown to the user is, where it finds its bytes, and what its relationship to the document is. This object is the File Specification Dictionary of type /Filespec. It is the identity card of the embedded file, and the central object around which the rest of the chain revolves.
We break down the keys of this dictionary one by one, because each carries a meaning that affects the file’s acceptance:
/Type /Filespec: declares that this object is a file specification dictionary. Required in PDF/A-3./Fand/UF: the file name as it appears to the user./Fthe traditional form, and/UFthe newer Unicode form. They should match. A name such asinvoice.xmlis clear and meaningful./AFRelationship: the most important key in the embedding. It defines the relationship of the embedded file to the containing document. We devote the next section entirely to it./Desc: a short textual description of the file. Optional but useful for the human reader browsing the attachments./EF(Embedded File): the dictionary that links the file specification to the actual byte stream. The value5 0 Ris an indirect reference pointing to object number 5, which carries the XML bytes. Here the loop between description and content is completed.
The reference 5 0 R is what is called an indirect reference in the PDF structure. The first number is the object identifier, the second is the generation number, and the letter R indicates that it is a reference, not a direct value. With this reference, the /Filespec dictionary links itself to the /EmbeddedFile stream without duplicating the bytes.
The /AFRelationship key: what makes the file an associated file rather than a mere attachment
This key is the essence of what distinguishes a correct hybrid invoice file. It defines /AFRelationship the nature of the relationship between the embedded file and the document’s visible content. The ISO 19005-3 standard defines several possible values for this key, the most notable being:
/Source: the embedded file is the source from which the visible page was derived./Data: the file carries structured data used to extract the document’s content./Alternative: the file is an alternative representation of the same visible-page content, in another format the machine reads./Supplement: the file is supplementary content that adds to the page without being an alternative representation of it./Unspecified: the relationship is unspecified. Not recommended in invoices, because it leaves the reader without any indication.
In a business invoice, the correct value is usually /Alternative or /Data depending on the standard you follow. The rationale behind /Alternative is that the XML file is not a separate attachment but the invoice itself expressed in a format systems can read. The visible page and the XML file carry the same content, one for the human and the other for the machine. This definition specifically is what turns the file from “a PDF with an attachment” into “a recognized hybrid invoice.”
The common mistake here is that the developer embeds the XML file without setting /AFRelationship, or leaves it at a value that does not express the true relationship. The result is a file that an ordinary PDF reader may open and show the attachment, but a system looking for an associated invoice will not recognize it as the approved alternative representation. Setting this key correctly is a condition for the invoice’s acceptance by standard-compliant systems.
| Value | Meaning |
|---|---|
| /Source | the document’s original source |
| /Data | dependent data |
| /Alternative | alternative copy (recommended for the invoice) |
| /Supplement | supplement |
| /Unspecified | unspecified |
The Associated Files array: linking the file to the document
Defining a file with a valid /Filespec dictionary is not on its own enough for the file to count as a recognized associated file. It must be registered in the Associated Files array via the key /AF at the document level, or at the level of a specific page. This array is precisely what the PDF/A-3 standard added to support associated files, and it is what distinguishes it from older versions.
The Catalog object is the root of the document structure. The key /AF here is an array holding a single reference 6 0 R pointing to the file specification dictionary we defined. With this registration, the file becomes associated at the level of the entire document, that is, linked to the invoice as a whole rather than to a specific page. In the invoice, where the XML file represents the whole invoice, this is the correct logical link.
Alternatively, the associated file can be linked to a specific page by placing the key /AF inside the page dictionary itself. But in the hybrid invoice, linking at the document level is the adopted practice, because the structured data pertains to the whole document. Combining registration of the file in the /AF array and in the name tree (next) is what guarantees that every compliant reader will find the file and recognize its relationship.
The EmbeddedFiles name tree: how the reader finds the file
Alongside the /AFarray, the file must be registered in the EmbeddedFiles Name Tree. This tree is an index that links each embedded file’s name to its specification dictionary, and it lives inside the Names Dictionary in the Catalog. It is the traditional mechanism PDF readers rely on to display the list of attachments to the user.
The Names Dictionary 3 0 obj points via the key /EmbeddedFiles to the name tree 4 0 obj. The latter holds a /Names array made up of pairs: the file name as a text string, followed by a reference to its specification dictionary. Here the pair (invoice.xml) 6 0 R links the displayed name to dictionary number 6, which carries all the file’s details.
The file’s presence in two places, the /AF array and the name tree, may seem like duplication. But they serve two different purposes. The name tree is an old mechanism that makes the file visible as an attachment in any PDF reader. The /AF array is a newer mechanism that identifies it as an associated file with a defined relationship. The old reader sees the attachment via the tree, while the PDF/A-3-compliant reader understands the relationship via /AF. Registering the file in both guarantees the widest possible compatibility.
The difference between the associated file and the traditional attachment
Before the PDF/A-3 standard, a PDF file could carry attachments via the name tree alone, or via File Attachment Annotations bound to a point on a page. These traditional attachments are files the document carries without any semantic relationship linking them to it. The reader sees them as paper clips on the margin of the document, and does not know whether they are the source of the content, supplementary data, or an alternative copy.
The PDF/A-3 standard moved these files from the margin of the document into the core of its structure. The associated file is not a paper clip but a defined part of the document with a stated relationship via /AFRelationship, registered in the /AF array at the document or page level. This semantic shift is what enabled the hybrid invoice. Without the defined relationship, the recipient’s system could not distinguish the XML file representing the invoice from any other file that might be attached to the document for a different purpose.
In practice, this means it is not enough for the developer to “attach” the XML file the old way. If they did, they would produce a file that an ordinary PDF reader opens and shows the attachment, but that fails before a system looking for an approved associated invoice. The difference is not in the file’s presence but in how it is defined, registered, and linked. This is exactly what the object chain we explained adds on top of the traditional attachment.
The order of embedding relative to the cryptographic signature
The electronic invoice does not stop at embedding XML; it also carries a cryptographic signature that guarantees its integrity and that it has not been tampered with after issuance. Here the question of order arises: is the XML embedded before or after signing? The answer is decisive, because an ordering mistake invalidates the signature.
The rule is that the file structure must be completed, including embedding the XML and registering it in the Associated Files array, before applying the cryptographic signature. The signature computes a hash over the file’s content in its final state. If the embedded file were added after signing, the document’s content would change, so the signed hash would no longer match, and the signature would be rejected on verification. That is why embedding is a step that precedes sealing, not one that follows it.
This ordering explains why it is recommended to leave invoice generation to an approved system. Building the structure, then signing, then verifying is a sequential chain, and any flaw in its ordering drops either the PDF/A-3 status or the signature’s validity. The approved system executes these steps in the correct order automatically, producing a file that is signed, compliant, and loaded with the embedded XML in a single consistent operation.
Let your system build the correct PDF/A-3 structure automatically
Instead of building the object chain by hand and risking a violation that drops the invoice’s status, Qoyod generates a PDF/A-3 invoice compliant with the Authority, with embedded XML and a correctly configured associated-files relationship, automatically.
How the receiving system extracts the XML file
The purpose of all this structure is to let the receiving system extract the XML file automatically and process it. Extraction runs in reverse of the order in which we built the file. The system starts from the document’s root and follows the references until it reaches the bytes. The steps in order:
- The system opens the Catalog object and looks for the key
/AF. Its presence means the document carries associated files. - It follows the reference in the
/AFarray to the file specification dictionary/Filespec. - It reads the key
/AFRelationshipto learn the file’s nature. A value of/Alternativeor/Dataindicates it is the invoice’s structured representation. - It follows the key
/EFthen its subkey/Fto the/EmbeddedFilestream carrying the bytes. - It reads the bytes between
streamandendstream, decompresses them if they were compressed, obtaining the raw XML text. - It verifies the file’s integrity by comparing its size against the value of
/Sizein/Paramsif present, then passes it to an XML parser to process the invoice.
This sequence is what makes the hybrid file fully machine-processable. The system does not need to read the visible page or interpret the graphics. It is enough for it to follow the chain of references to reach the invoice’s structured data. And because the standard defines this path precisely, any compliant system extracts the file the same way, regardless of the program that generated the invoice.
Here the value of setting /AFRelationship , which we explained, shows itself. If the generator left it at a wrong value, the system might extract the file but would not trust it as the approved alternative representation, treating it as a casual attachment. Setting it correctly is what makes extraction reliable and authoritative, not merely reading bytes.
Open the Catalog
Read the /AF array
Reach the /Filespec
Extract the XML stream and pass it to the parser
Common mistakes that drop the invoice’s status
Building the object chain by hand is prone to fine violations that make the file fail validation or be rejected by compliant systems. The most frequent:
- Absence of the
/AF: array: embedding the file in the name tree only without registering it as an associated file. The result is an ordinary attachment, not a recognized associated invoice. - A wrong or missing
/AFRelationshipvalue: leaving out the key or setting it to/Unspecifiedstrips the file of its meaning as the invoice’s alternative representation. - Improper
/Subtypeencoding: writingtext/xmlwithout replacing/with#2Fproduces an invalid value in the PDF structure. - Mismatch of
/Fand/UF: two different names for the file confuse some readers. - Fonts not embedded in the visible page: a violation of the fundamental PDF/A requirement, which drops the status regardless of the embedding’s correctness.
- Prohibited content in the page: JavaScript, dynamic content, or encryption, all of which the standard forbids.
To verify the file’s integrity after generating it, pass it through a specialized validation tool such as veraPDF at the required level. The tool outputs a report detailing every violation of the standard, including embedding issues and the associated-files relationship. This is an indispensable step before sending the invoice in a production environment.
The practical takeaway is that generating a PDF/A-3 file by hand in a production environment is fraught with risk. The object chain is delicate, and any error in a reference, an encoding, or a relationship drops the invoice’s status with no visible warning to the user. It is better for an approved accounting system to build this structure automatically.
How Qoyod handles embedding XML inside PDF/A-3
Qoyod generates a PDF/A-3 invoice compliant with the Authority’s requirements without the developer or accountant having to deal with the object chain by hand. The system builds the embedded file stream and the file specification dictionary, sets the /AFRelationship key to the correct value, and registers the file in both the Associated Files array and the name tree, producing a hybrid file valid for both human reading and machine processing at once.
Beyond embedding, Qoyod handles the rest of the electronic invoice’s requirements: embedding the fonts in the visible page, the cryptographic signature, managing the Cryptographic Stamp Identifier (CSID) certificate automatically, generating the QR code, and linking the invoice to the Fatoora platform. This way the file comes out compliant with the Authority from the first generation, without the risk of the fine violations that hand-building falls into.
Frequently asked questions
What is the difference between the embedded file (EmbeddedFile) and the associated file (Associated File)?
The embedded file is an old concept meaning that the bytes of a file are stored inside the document. The associated file is a concept added by PDF/A-3, and it means an embedded file registered in the /AF array with a defined relationship to the content via /AFRelationship. Every associated file is an embedded file, but not every embedded file is associated.
What is the correct value for the /AFRelationship key in a business invoice?
Usually /Alternative, because the XML file is an alternative representation of the visible page carrying the same content in a machine-readable format. Some standards use /Data instead. What matters most is that the value is not left empty or set to /Unspecified.
Why is the file registered in both the /AF array and the name tree?
The name tree makes the file visible as an attachment in any old PDF reader, while the /AF array identifies it as an associated file with a defined relationship for the PDF/A-3-compliant reader. Registering it in both guarantees the widest compatibility across systems.
Do I need to compress the XML file before embedding it?
Compression is optional. The XML can be embedded raw or compressed with a filter such as /FlateDecode declared in the stream dictionary. Compression reduces the file size, and the extracting system decompresses it automatically before reading the bytes.
Can I build the embedding structure by hand as a developer?
Technically yes, but it is not recommended in production. The object chain is delicate, and any error in a reference, an encoding, or a relationship drops the invoice’s status. It is better to rely on an approved accounting system that builds the structure compliant with the Authority automatically.
How do I make sure the embedding is sound before sending?
Pass the file through a validation tool such as veraPDF at the required level. The tool inspects the embedding structure, the associated-files relationship, and the rest of the standard’s requirements, and outputs a report of every violation. Fix every error before approving the file in a production environment.
Practical takeaway
Embedding XML inside PDF/A-3 is not merely pasting a file, but building a chain of interlinked objects: a /EmbeddedFile stream carrying the bytes, a /Filespec dictionary describing it, an /AFRelationship key defining its relationship to the invoice, and an /AF array and name tree registering it so the reader finds and recognizes it. This structure specifically is what turns an ordinary file into a recognized hybrid invoice.
Setting each key correctly is a condition for the invoice’s acceptance and automated processing. And if you want to go deeper into the invoice’s technical representation formats, review the guide on XML invoice: the technical format of the electronic invoice and the guide on the electronic invoice structure. To understand the container format itself and its archiving properties, start from the guide on the PDF/A-3 format in the electronic invoice. And to see the full picture of the electronic invoicing requirements in Saudi Arabia, review the page on Qoyod’s electronic invoicing software.