Recently I came across a project where one collaborator was generating CSV files that would then used to produce static minisite with an ElasticSearch-powered search. I decided to use Github Pages and Travis-CI for two main reasons:
- a simple
git push
would let anyone update the site any time - Travis-CI could take care of static file generation and search indexing whenever a new set of CSVs was deployed
If you have a similar need, read on. This post is informed by many blog posts around the web (see footnotes1 2 3 4).
NOTE: This is not a recommendation on how to produce static sites. There are better tools out there for that. Check out StaticSiteGenerators if you need a proper system for static site generation. This tutorial is more the README I wish I found in the web while looking to solve this particular problem.
About Travis-CI and Github Pages
Github Pages is a quick and easy way to host static websites (you do need to know git
and a Github account, but you already do, right?).
Travis-CI is a service that lets you trigger arbitrary code whenever you push changes to a code repository. The most popular use is code-testing. We will use the free version that requires your repository to be public. Take this into account if you require your code to be private.
In this example we will use Travis-CI to execute some Python code in the repository which takes care of indexing, static file-generation and repository updates. The project in question has two branches: the mandatory gh-pages
branch, which Github will use for hosting the static site, and a csv
branch to which will receive the latest CSVs. The gh-pages
branch will be updated every time a new push
arrives in the csv
branch.
Suppose a basic structure of the project like this:
1 2 3 4 5 6 7 8 9 10 11 |
.travis.yml index.html csv/data.csv static.txt python/indexer.py build.sh requirements.txt javascripts/... images/... css/... |
indexer.py
will update the ElasticSearch index using data.csv
, and also generate static.txt
as a sort of pre-caching of the site. This may sound a little roundabout but bear with me. This structure was actually useful in our case. I will eventually publish the final site.
You will also notice a /python/build.sh
file. This file contains the steps you use to create the index and static file manually. It is basically the list of UNIX commands you would type in your terminal to do the process yourself, only that you want Travis-CI to do it for you (magick!).
Setup permissions
Your Github account needs to allow Travis-CI some operations in your repositories.
NOTE: Make sure you consult others on security. I am not an expert on this subject. Refer to the footnotes for more details. I will just cover the basics.
In Github:
- Click on your avatar in the top-right and select
Settings > Personal access tokens
- Generate a new token with these permissions:
user:email
,read:org
,repo_deployment
,repo:status
,write:repo_hook
,public_repo
- IMPORTANT: Save the token somewhere you can easily retrieve it because Github shows it only once
In Travis-CI:
- Install the Travis Ruby gem in your machine and login:
1 2 3 |
gem install travis travis login |
- Go to your Travis-CI profile and turn on the repository you want to activate
- Click the little gear icon to access the settings for that repo
- Add any environment variables that your scripts use such as the URL to your ElasticSearch service or the
path/to/some/file
in the repository
Let’s look at a trimmed-down (useless) version of the Python indexer.py
file (the # comments
in the code will clarify the main parts):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
#!/usr/bin/python import csv import os # elasticsearch and elasticsearch_dsl are not in python by default from elasticsearch import Elasticsearch from elasticsearch import helpers from elasticsearch_dsl import connections, Index, DocType, Nested, String, GeoPoint, Integer ... # now an example function that uses an environment variable def process_csv(filename): basepath = os.environ['BASEPATH'] readpath = basepath+filename print "loaded " + readpath response = open(readpath) reader = csv.DictReader(response) # down here some code to index # also produce some flat static.txt file ... # note the presence of prints def main(): process_csv("data.csv") if __name__ == "__main__": main() |
The file has two purposes:
- index the CSV in the ElasticSearch
- produce a
csv/static.txt
The post-deploy script
You may have noticed a requirements.txt
file above. Vanilla Python in Travis-CI does not have every module by default. We need this file to tell Travis what to install once the repository is deployed. You can add as many modules as you want, these are just examples:
1 2 3 4 |
# requirements.txt elasticsearch==1.7.0 elasticsearch-dsl==0.0.8 |
Now let’s look at the build.sh
file. This is where the magic happens! This is also where the token we created above enters the scene. We will encrypt it in a minute.
But first:
Travis-CI requires a .travis.yml
file (you might have noticed it in the root folder, next to index.html
) that describes what happens once a new deploy is detected. Let’s start with the basic structure (once again, the # comments
will clarify):
1 2 3 4 5 6 7 8 9 10 11 |
language: python branches: except: - gh-pages # pushes to this branch will be ignored only: - csv # pushes to this branch will activate travis-ci python: - '2.7' # the python version required install: - pip install -r ./python/requirements.txt # to install the needed extra modules |
Once we have that file, we can add the encrypted Github token using the following command:
1 2 |
travis encrypt GH_TOKEN="whatever_github_generated" --add |
The --add
flag will append the encrypted string to the .travis.yml
file like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
language: python branches: except: - gh-pages only: - csv python: - '2.7' install: - pip install -r ./python/requirements.txt env: global: - secure: encrypted-stuff-here |
Note the new env > global > secure
structure in the above snippet where the Travis-CI command-line program inserted some encrypted-stuff-here
automatically.
This creates a new environment variable named GH_TOKEN
available to any scripts run by Travis-CI (similar to adding the variable in the settings panel but in a more secure way). We will also add a variable for the repository name. You may want to encrypt it also but I will leave it plain text for example purposes:
1 2 3 4 5 |
env: global: - GH_REPO="myaccount/myrepo" - secure: encrypted-stuff-here |
Now we need to create the build script itself, python/build.sh
. The steps are:
clone
the repository to a new folder (I had to do it this way because the scope in Travis-initiated processes seems to be limited to a single branch and I was not able to pull/push to other branches)checkout
andpull
the latest code in thecsv
branchcheckout
andmerge
the code into thegh-pages
branch- run the indexing and output new static files
add
the new files and create a newcommit
push
the result togh-pages
Below is a condensed version of this script. The echos
(and all other terminal-visible commands in your scripts such as print
) will be visible in the Travis console so you can debug what may be going wrong:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
#!/bin/bash export REPO_URL="https://$GH_TOKEN@github.com/$GH_REPO.git" git config --global user.name "travis-bot" git config --global user.email "travis" echo "Clone to the new folder" git clone $REPO_URL _cloned cd _cloned echo "Getting csv branch" git checkout origin csv echo "Pulling" git pull origin csv echo "Checkout of gh-pages" git checkout -b gh-pages origin/gh-pages echo "Merging" git merge csv -m "merge from travis-ci" echo "Run the index" python ./python/index_builder.py echo "Checking status" git status echo "Adding new files in /csv folder" git add csv git commit -m "new deploy from travis-ci" echo "Push" git push origin gh-pages |
Now we need to tell Travis-CI to add executable permissions to this file and run it in the install
part of the build lifecycle. We also add before_install
and script
sections to .travis.yml
. The end result looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
language: python branches: except: - gh-pages only: - csv python: - '2.7' before_install: - chmod 755 ./python/prebuild.sh install: - pip install -r ./python/requirements.txt script: - "./python/build.sh" env: global: - GH_REPO="account/repo" - secure: travis-generated-stuff |
And voilà! Once these files are added to the repository, upon next deploy on the csv
branch Travis-CI will trigger the scripts and update all the data in ElasticSearch and the Github Pages website.
Hope this is useful to you and do contact me if there’s any glaring issues/omissions in this quick example. Special thanks to @auremoser for her feedback while writing this text.