in Automation

Word documents generation from templates using Python

Working for a customer of mine, for each project I do, I always have the need to fill Word templates with fixed informations like, for example:
– Project ID;
– Project name;
– Project description;
– Relase date;
– etc.

The same informations need to be provided in different documents: technical analysis, test book, test report, technical specifications, etc.

So I decided to automate the process of generating base documents (then I need of course to complete them), using a script in Python with a simple approach:
– I set up a template directory with templates document containing placeholders;
– A Python script that using a dictionary template with keys/values as search/replace terms does the magic and produces a new document from each template.

If you know how to do it’s not hard, but just remember: I’m talking about the new .docx format for the template and output files, which is basically zipped XML (or better: are a number of zipped XML documents), not the old .doc format which is a pain.

So basically the script needs to read all .docx files in a directory and for each:
– unzip the document .docx file in each single part;
–┬ámodify interested parts (header, body, footer);
– re-zip all the files in a new .docx.

In the end I decided also to search/replace for placeholders in the generated file name, so I can use a file naming convention using variable parts.

This is the script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys 
import os 
import zipfile
import datetime

def create_doc(template_file, out_file, replaceText):
    templateDocx = zipfile.ZipFile(template_file)
    outdir = '/'.join(out_file.split('/')[:-1])
    if not os.path.exists(outdir):
        os.makedirs(outdir)
    newDocx = zipfile.ZipFile(out_file, "w")
    for file in templateDocx.filelist:
        content = templateDocx.read(file)
        for key in replaceText.keys():
            content = content.replace(str(key), str(replaceText[key]))
        newDocx.writestr(file.filename, content)
    templateDocx.close()
    newDocx.close()
    
basepath_in = '/TemplateDir/'
basepath_out = '/OutputDir/'
    
for template_file in os.listdir(basepath_in):
  if template_file.endswith('.docx'):
    replaceText = {
                    "{{AUTHOR}}" : 'Gianluca Nieri',
                    "{{PID}}" : 'P12345',
                    "{{PROJECT_TITLE}}" : 'Go to Mars',
                    "{{PROJECT_AREA}}" : 'Long shots',
                    "{{PROJECT_DESCRIPTION}}":"Take a ship and go.",
                    "{{DATE}}" : datetime.datetime.today().strftime("%d/%m/%Y"),
                    "{{STATUS}}":"FINAL"
                    }
    out_file = basepath_out+template_file
    for key in replaceText.keys():
        out_file = out_file.replace(str(key), str(replaceText[key]))
    create_doc(template_file, out_file,replaceText)

 

A few comments:
basepath_in is the directory containing .docs templates;
basepath_out is the directory where all the new files will be created according to the templates;
replaceText is the dictionary containing placeholders as keys and target values as values.

As you can see, It just works.

I wrote a real top-down script that easily does what I need, not a class, not a module: it’s up to you modify it as you prefer.

So Word documents generation from templates using Python it’s not so complicate and the script is more than useful for me (and I hope for you too).

NOTE:

If you insert a placeholder in the template file and it’s not converted at runtime, try to cut and paste it as a simple text: I noticed that very often Word uses format-specific codes that broke the script.