If anyone has ever needed to export html to a pdf or pdf generation in asp.net c# in general.. then you will know that this can be a massive pain to get right! Recently I had to do this for a website I was working on and thought I would share how I got on and what tools I ended up using to achieve this and more importantly for FREE, and not using some of the £2000 packages you can buy…
When I started this task I headed on over to nuget.org to see if there were any packages that I could use for this task.. and it turns out there is! https://www.nuget.org/packages?q=pdf but unfortunately most of these packages that show up are all paid solutions that need a hefty licence for commercial use (£2000k plus etc).
I did notice that one package was free for commercial use but was limited to only 10 pages, and as the tool I was creating wouldn’t need to go over 10 pages I decided to look into this further. The tool looked promising, and could convert a webpage from either a url or actual html into a pdf.. but after various attempts and hours to get it working perfectly the main flaw seemed to be that everything it generated html was basically an image of how it looks and doesn’t actually insert text into the pdf. This was no good and not fit for what I needed…
Back to the drawing board…
I took a google around and eventually came across a tool called “WkHtmlToPdf” which is (from http://wkhtmltopdf.org/) “an open source (LGPLv3) command line tools to render HTML into PDF and various image formats using the Qt WebKit rendering engine. These run entirely “headless” and do not require a display or display service.”.. GREAT!
It also has a nuget wrapper to integrate into visual studio / c# etc via Install-Package Codaxy.WkHtmlToPdf
After a bit of tweaking of some sample code and tweaking of my html passed into the parser, this seemed to work great and eventually was what I used and pretty much converted my html page exactly how it was into a pdf document.
It was last edited in 2012.. and there are newer packages, but this is the only one I found that would actually work and did everything I needed todo.
You need to install this small application on your computer / web server, you can get it from http://wkhtmltopdf.org/downloads.html (note where it installed it to, it will be needed in the code)
You then will need to install this package in your visual studio solution: https://www.nuget.org/packages/Codaxy.WkHtmlToPdf/ by running: Install-Package Codaxy.WkHtmlToPdf
var url = "http://www.neil-redfern.com/";
// when i installed wkhtmltopdf to my machine, it was in a different place than expected by the plugin, so I just had to change where it was looking for the .exe and it worked fine
PdfConvert.Environment.WkHtmlToPdfPath = @"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe";
Url = url
}, new PdfOutput
OutputFilePath = Server.MapPath("/uploads/export.pdf")
byte fileBytes = System.IO.File.ReadAllBytes(Server.MapPath("/uploads/export.pdf"));
string fileName = "export.pdf";
return File(fileBytes, System.Net.Mime.MediaTypeNames.Application.Pdf, fileName);
These may not be an issue, and all depends on how your html is set up.. but the pdf basically needs some content / css outputted in certain ways for it to render correctly.
Responsive CSS – the website was responsive and also mobile first, the problem the pdf export had with this was that it wasn’t picking up and media queries and was defaulting to the mobile sites css… to fix this i simply exported the css needed for the desktop elements and inserted them into the html that was getting exported to pdf.
Svgs / possibly background images – Some logos in the pdf were svgs and outputted via a css class.. the export didn’t like this and i needed to simply replace these with actual images in the html e.g. <img src=”image.png” />
Links – links around anything over than plain text was causing issues with the links not appearing, to fix this i simply made any link around text and e.g. not an image / html etc.
The only outstanding issue I can see at the moment is page breaks, for example I couldn’t find any way to be able to force a section onto a new page and this could cause sections to be split between pages etc. What would nice would be a way to e.g. wrap a section in a <div class=”section”></div> and the parser basically will make sure that entire section is on 1 page together.. otherwise it will place that entire content onto a new page. I am still looking into this and will update if i find out a solution.