Skip to content

Using Spire.Doc to convert documents

Some time back, I wrote an article about my first thoughts on Spire.Doc for .Net. For those who are not familiar with the product, Spire.Doc for .NET is a professional Word .NET library specially designed for developers to create, read, write, convert and print Word document files from any .NET(C#, VB.NET, ASP.NET) platform with fast and high quality performance. As an independent Word .NET component, Spire.Doc for .NET doesn’t need Microsoft Word to be installed on the machine. However, it can incorporate Microsoft Word document creation capabilities into any developers’ .NET applications.

Ad

Background

This article is intended to demonstrate and review the capabilities provided by the Spire.Doc for converting documents from one format to another. We have long passed the days when many developers would install Microsoft Office on the Server to manipulate the documents. First, it was a pretty bad design and practice. Second, Microsoft never intended to use Microsoft Office as a server component and it wasn’t built for interpreting and manipulating documents on the server side. This gave birth to the idea of having libraries like Spire.Doc. And when we are discussing this, it is worth to mention about Office Open Xml. Office Open XML (also informally known as OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents.Microsoft announced in November 2005 that it would co-sponsor standardization of the new version of their XML-based formats through Ecma International, as “Office Open XML”. The introduction of Open Xml has given more standardization to the structure of Office documents and using the Open Xml SDK developers can perform a lot of basic operations pretty straight forward, there are still gaps such as converting the word document in to different format such as PDF, image or HTML to name a few. And this is why libraries such as Spire.Doc comes to rescue us ‘developers’.

Ad

Document Conversion

I will use rest of this article to demonstrate various scenarios which can be covered using Spire.Doc. All the example demonstrated in this article are available under the project at Spire.Doc Demo and you can download to get your hands dirty. The project I have been using for the demonstration is a simple console application but it supports other platforms such as Web or Silverlight as well.

In their own words, Spire.Doc claims following which we will see in rest of the article.

“Spire.Doc for .NET enables converting Word documents to most common and popular formats.”

The first step you need to start using Spire.Doc is to add reference to your project to their libraries Spire.Doc, Spire.License and Spire.Pdf which are packaged in the Spire.Doc component.

You will need a valid Spire.Doc license to use the library otherwise an evaluation warning would be displayed on the document. To set the license, simply provide the path to the license file location and the library takes care of the rest to apply and validate the license information. There are other way as well to load the license such as dynamically retrieving it from the location or to add it as an embedded resource. A detailed documentation is available here.


 FileInfo licenseFile = new FileInfo(@"C:\ManasBhardwaj\license.lic");
 Spire.License.LicenseProvider.SetLicenseFile(licenseFile);

To validate the basic feature, I am using a word document with simple text, an image and a table. Looks something like this and you can find the original document in the Spire.Doc Demo.

The crux of the library is of course the Document class. So we start by creating the Document object and loading the document information from the file. The simplicity of Document object is that with just three lines of code, you can convert quite a complex word document with differ elements such as used in this document to a totally different document, in this case Html format.


//Create word document
Document document = new Document();
document.LoadFromFile(@"This is a Test Document.docx");

To Html


//Convert the file to HTML format
document.SaveToFile("Test.html", FileFormat.Html);

So, by now we should already have the converted document ready for use. Let’s see what it had done behind the scenes. What you would observe is that the new Html document has been created with additional files and folder. These files and folders are nothing but retains the additional information which is present in your word document. In this case, the folder contain the Test Image we added to the document and the style sheet contains the styling for the table. Thus, the conversion not only makes sure that your data is converted but it keeps the additional information such as styling intact as well.

The style sheet would look something like this:

Just a single different parameter can help you to convert the document to other format such as PDF as shown below. What I like about this is that it’s just one framework which can do multiple conversion without any additional styling and configurations for different format. And note that this is all done in memory, so that you don’t have to touch the file system rights etc. I remember in the past when in a project we wanted to the conversion and ended up passing the data from one component to another for conversion to Pdf and still you would not be able to retain the same layout across different formats.

To Pdf


//Convert the file to PDF
document.SaveToFile("Test.Pdf", FileFormat.PDF);

Few lines code and you see the PDF document as shown below. The license warning is just because I am using the trial version. Once you have the valid license file, it will disappear.

To Xml

A quick peak at the FileFormat class shows that it supports as many as 24 different formats. My personal favorite is Xml. It expands the possibility of what you can do with the data in the document. For e.g. you can just consume a word document and create an xml file out of raw document.


//Convert the file to Xml
document.SaveToFile("Test.Xml", FileFormat.Xml);

To Image

And what about converting the document as image file. Spire.Doc supports the conversion of document to the Image class and that can be used to save the image file in any supported ImageFormat by .Net framework.


//Save image file.
Image image = document.SaveToImages(0, ImageType.Metafile);
image.Save("Test.tif", System.Drawing.Imaging.ImageFormat.Tiff);

Conclusion

Spire.Doc is a very capable and easy to use product for converting Word documents to any other format. If you also use the reporting capability, then it’s even better. As with any 3rd party product, there’s usually other ways to do the same thing, and you need to weigh up the benefits against the cost involved in buying the product or in replicating another way.

From a license and pricing overview, it’s not very expensive compared to other products in the markets which are offering the same functionality. Thus, a real value for money in my opinion.

Disclosure of Material Connection: I received one or more of the products or services mentioned above for free in the hope that I would mention it on my blog. Regardless, I only recommend products or services I use personally and believe my readers will enjoy.

Published inUncategorized

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *