Description
Calling XGraphics.FromPdfPage() on an existing PDF page that has a /Contents array with multiple streams can produce a corrupted PDF. Adobe Acrobat reports "An error exists on this page. Acrobat may not
display the page correctly." Visual content may also be lost.
Simply opening and immediately disposing the XGraphics object (without drawing anything) is enough to trigger the corruption.
Versions affected
- PDFsharp 6.1.1
- PDFsharp 6.2.4
How to reproduce
The attached file example.pdf is a single-page PDF generated by "Microsoft Print to PDF". Its page has 3 content streams in the /Contents array (~200KB + ~200KB + ~65KB uncompressed).
using PdfSharp.Drawing;
using PdfSharp.Pdf.IO;
string inputPdf = @"example.pdf";
string outputPdf = @"output_after_xgraphics.pdf";
using var doc = PdfReader.Open(inputPdf, PdfDocumentOpenMode.Modify);
var page = doc.Pages[0];
// Just open and immediately close XGraphics — no drawing
using (var gfx = XGraphics.FromPdfPage(page)) { }
doc.Save(outputPdf);
Open output_after_xgraphics.pdf in Adobe Acrobat. It will display the error dialog:
"An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
The original example.pdf opens without error. We have generated thousands of PDF files in this same manner, and have only recently started seeing this issue on a few PDF files.
Workaround
Flatten the multi-stream /Contents array into a single stream before calling XGraphics.FromPdfPage():
private static void FlattenPageContentStreams(PdfDocument pdf)
{
for (int i = 0; i < pdf.PageCount; i++)
{
var page = pdf.Pages[i];
if (page.Contents.Elements.Count <= 1)
continue;
using var combined = new MemoryStream();
for (int j = 0; j < page.Contents.Elements.Count; j++)
{
var dict = page.Contents.Elements.GetDictionary(j);
var data = dict?.Stream?.UnfilteredValue;
if (data != null)
{
combined.Write(data, 0, data.Length);
combined.WriteByte((byte)'\n');
}
}
// Replace with a single uncompressed content stream
while (page.Contents.Elements.Count > 1)
page.Contents.Elements.RemoveAt(page.Contents.Elements.Count - 1);
var remaining = page.Contents.Elements.GetDictionary(0);
if (remaining != null)
{
remaining.Stream.Value = combined.ToArray();
remaining.Elements.SetInteger("/Length", (int)combined.Length);
remaining.Elements.Remove("/Filter");
}
}
}
Environment
- .NET 8.0, Windows 11
- Test PDF generated by "Microsoft Print to PDF" printer driver
example.pdf
Description
Calling
XGraphics.FromPdfPage()on an existing PDF page that has a/Contentsarray with multiple streams can produce a corrupted PDF. Adobe Acrobat reports "An error exists on this page. Acrobat may notdisplay the page correctly." Visual content may also be lost.
Simply opening and immediately disposing the
XGraphicsobject (without drawing anything) is enough to trigger the corruption.Versions affected
How to reproduce
The attached file
example.pdfis a single-page PDF generated by "Microsoft Print to PDF". Its page has 3 content streams in the/Contentsarray (~200KB + ~200KB + ~65KB uncompressed).Open output_after_xgraphics.pdf in Adobe Acrobat. It will display the error dialog:
"An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
The original example.pdf opens without error. We have generated thousands of PDF files in this same manner, and have only recently started seeing this issue on a few PDF files.
Workaround
Flatten the multi-stream /Contents array into a single stream before calling XGraphics.FromPdfPage():
Environment
example.pdf