UK departmental spending over GBP 25000

This repository contains scripts to acquire, clean and process the spending information released by the UK central government.

ETL stages

The scripts have several stages that need to be run in order:

build_index - will find all related metadata (tagged: spend-transactions) on data.gov.uk
retrieve will then try to fetch all the files
extract will attempt to parse CSV/XLS/... and load it into a DB
scan_columns will do some initial processing for later stages
map_columns will outsource column name comprehension to the user
condense will try to establish a common column schema
format will try to munge numbers and dates
suppliers will query opencorporates.org for supplier name resolution
export will write a csv

Running the scripts

To run some of the scripts, use nosetests (the scripts are tests). Adding -v will give you the names of the individual stages, -x will stop on the first error and --with-xunit will generate an XML log file.

These scripts are: build_index, retrieve, extract, condense, format

The other scripts can simply be run directly

Open Issues

?

Punted

PDFs
Zip files containing a bunch of CSVs

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.rst		README.rst
build_index.py		build_index.py
common.py		common.py
condense.py		condense.py
export.py		export.py
extract.py		extract.py
formats.py		formats.py
map_columns.py		map_columns.py
pip-requirements.txt		pip-requirements.txt
retrieve.py		retrieve.py
suppliers.py		suppliers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UK departmental spending over GBP 25000

ETL stages

Running the scripts

Open Issues

Punted

About

Releases

Packages

Languages

asuffield/dpkg-uk25k

Folders and files

Latest commit

History

Repository files navigation

UK departmental spending over GBP 25000

ETL stages

Running the scripts

Open Issues

Punted

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages