Department for Business, Energy & Industrial Strategy (BEIS)

Energy Statistics Data Management Solutions

Incomplete applications

11
Incomplete applications
9 SME, 2 large

Completed applications

22
Completed applications
19 SME, 3 large
Important dates
Opportunity attribute name Opportunity attribute value
Published Wednesday 2 January 2019
Deadline for asking questions Wednesday 9 January 2019 at 11:59pm GMT
Closing date for applications Wednesday 16 January 2019 at 11:59pm GMT

Overview

Overview
Opportunity attribute name Opportunity attribute value
Summary of the work Work with BEIS to deliver a step change in the data management systems that underpin the critical evidence base of energy statistics. Design and implement optimal data & technical architecture, db models, governance, standards and processes to help us exploit the value of our data assets.
Latest start date Monday 11 February 2019
Expected contract length 3 months
Location London
Organisation the work is for Department for Business, Energy & Industrial Strategy (BEIS)
Budget range Maximum day rates of £800 p/d (including any expenses & VAT).
Expected total budget for ~3 months project is £180-200k, depending on team composition and skills.

About the work

About the work
Opportunity attribute name Opportunity attribute value
Why the work is being done We need to deliver a step change to the data management and processing systems that support production of BEIS's energy statistics. This is the first phase of work to further understand requirements, develop solutions and deliver initial outcomes. We wish to find enthusiastic and skilled data and technical specialists to work alongside BEIS statisticians and system experts in a 3 month project (Feb-Apr 2019).
Problem to be solved Much of our data on energy is stored and manipulated in linked Excel spreadsheets that have grown organically over time. This is inefficient and carries risk of error. We wish to move to a modern data warehousing solution, with automated (yet transparent) reproducible processes for managing, assuring and analysing our data assets. We need to develop solid and scalable data standards, models, dictionaries, governance and architecture to enable high quality data management with solutions for permissioning, audit, revisions, versioning and linking.
Who the users are and what they need to do As a statistician I need to:
- spend less time processing data so I can spend more time using it and gaining insight.
- have harmonised data definitions, standards and governance so that I am confident in linking and using data to provide high quality outputs
- be able to easily discover and update my data, control access and audit revisions so that I can adhere to data security and am confident of the 'truth'
- have reproducible and transparent processes for validating and manipulating data so that I am confident of the quality of my statistics.
Early market engagement
Any work that’s already been done An internal scoping phase has been carried out to identify the problems we are aiming to address, the benefits to be realised, initial requirements and some small scale testing of solutions. There is an existing cloud based internal secure infrastructure in which new data warehousing and processing solutions can be developed. There is also similar work taking place in other governments that we would want to learn from (Reproducible Analytical Pipelines).

We want to focus this phase of work on storing and handling data in a structured warehouse, e.g. using MS SQL Server Management Studio and R.
Existing team You will be working closely with internal staff in the Analyst Directorate in BEIS. Primarily the Energy Statistics team and the Data Analytics team who have formed a joint working group for this project. Within Energy Stats there are ~15 individuals involved in the Energy Balances statistics (published monthly, quarterly and annually) that have the detailed knowledge of the data in scope and the business rules to be incorporated into the agreed solution . The Data Analytics team host the cloud based data analysts system (CBAS) upon which development work is carried out, and include IT specialists.
Current phase Alpha

Work setup

Work setup
Opportunity attribute name Opportunity attribute value
Address where the work will take place The Analyst Directorate are located at 1 Victoria Street, London. It is expected that you will be co-located for at least 3 days per week throughout the duration of this project.
Working arrangements It is imperative that you work closely and collaboratively with the internal team. The knowledge and expertise around the data and statistics reside in individuals and internal documentation. Development work will need to be on the CBAS platform which can only be access through BEIS IT systems.
Knowledge transfer into existing teams is a necessary outcome of this work.
We therefore expect you to be co-located with the team at least 3 days per week and work collaboratively through digital tools. We do not expect to pay travel and subsistence expenses (these should be included in total costs/ day rates)
Security clearance Basic security clearance is required (can be arranged although this could delay work starting).

Additional information

Additional information
Opportunity attribute name Opportunity attribute value
Additional terms and conditions

Skills and experience

Buyers will use the essential and nice-to-have skills and experience to help them evaluate suppliers’ technical competence.

Skills and experience
Opportunity attribute name Opportunity attribute value
Essential skills and experience
  • Previous experience of designing databases and new data and technical architecture solutions that meet user / business needs
  • Strong knowledge and understanding, and a track record of delivering automated data pipelines and producing robust and reproducible analysis
  • Experience in working with complex, messy & interlinked datasets
  • Previous experience of using MS SQL 2012 and above
  • Previous experience of using SQL Server Integration Services
  • Previous experience of using version control software and knowledge management solutions such as gitlab
  • Ability to work closely and collaboratively with internal data and technical experts
  • Ability to clearly communicate key criteria, trade-offs and risks to underpin business and development decisions
  • Transfer skills and documentation for business as usual operations and future development
  • Previous experience of using scripting languages such as Python, R.
  • Previous experience of MS Excel and MS Access 2010 or later
Nice-to-have skills and experience
  • A track record of delivering data management systems for government
  • Knowledge of energy data
  • Broader technical skills around javascript, PHP
  • Ability to train and embed knowledge and skills into internal teams
  • Knowledge of using geographic or spatial data with databases
  • Previous experience of Linux OS, such as Ubuntu 16 (or later) or similar
  • Previous experience of using R
  • Previous experience of working with PostgresSQL or another open source SQL database

How suppliers will be evaluated

How suppliers will be evaluated
Opportunity attribute name Opportunity attribute value
How many suppliers to evaluate 3
Proposal criteria
  • Approach and methodology
  • Team composition and skills
  • Technical solution
  • Time frame
  • Value for money
  • Knowledge transfer
  • Project management
  • Risk management
Cultural fit criteria
  • Work as part of the Energy Stats and Data Analytics team
  • Be transparent and collaborative when designing solutions and making decisions
  • Take responsibility for driving and delivering work
  • Ensure knowledge sharing and transfer into BEIS
  • Work with data experts with a range of technical expertise
Payment approach Capped time and materials
Assessment methods
  • Written proposal
  • Case study
  • Work history
  • Reference
Evaluation weighting

Technical competence

50%

Cultural fit

20%

Price

30%

Questions asked by suppliers

Questions asked by suppliers
Supplier question Buyer answer
1. Are you happy to house your data using Microsoft Access (and SharePoint) or are you considering moving to an Oracle based solution? We would prefer a solution which is not MS Access or Oracle based. Our preference is for a MS SQL or similar open source solution for data storage.
2. Is this solution for internal /departmental use only or would you be looking for a public facing interface as well? Initially the solution is only for internal use within BEIS. There are existing methods in the Dept for providing publicly available outputs (e.g. apps, visualisations, reports) which the solution should be able to feed, and similarly it should be possible to integrate the solution with a public facing interface (e.g. dissemination/publishing function) in future but creation of a public interface is not within scope of this piece of work.
3. how many users are intending on using the system and would you be looking for data administration services following the implementation? The primary customer is around 15 statisticians in the Energy Stats team in BEIS, with a further around 30 analysts who will either use the data outputs, or look to scale up the solution to their areas.
The solution should be fully transparent and able to be maintained, updated and extended by the lead statisticians.
4. Is the £800 per day an absolute requirement? If a supplier could fulfil the project requirement with fewer more highly skilled individuals and deliver below or significantly below the budget would the department consider lifting the day rate cap? Yes, this would be considered if the proposal would fully justify this.
5. Is the £800 per day rate an average across all roles or an absolute cap on all roles? It is not an absolute cap if value for money and justification can be provided for other day rates and if the total is within budgetary constraints.
6. Was the initial scoping work described in the tender listing completed using an internal team or supported using an external supplier? This was completed by an internal team.
7. Please can you confirm the cloud platform that is used to host CBAS? We don’t see the specific identity of the cloud provider as relevant, however the following may help: the CBAS remote virtualised cloud infrastructure is based on Microsoft Hyper-V. It provides Infrastructure as a Service with server VM instances (running Windows 212 R2 and Ubuntu 16 LTS) and client VM instances (running Windows 10) on a local high-bandwidth LAN. Users connect to the client VM desktop over the internet. Some technologies on the CBAS environment are MS SQL Server, PostgreSQL, Python, R, SPSS, SAS and STATA. This should not be taken as a constraint to your proposal or any suggested implementation.
8. Can you provide an outline of the 'existing cloud' infrastructure that is available for this project? The CBAS remote virtualised cloud infrastructure is based on Microsoft Hyper-V. It provides Infrastructure as a Service with server VM instances (running Windows 212 R2 and Ubuntu 16 LTS) and client VM instances (running Windows 10) on a local high-bandwidth LAN. Users connect to the client VM desktop over the internet. Some technologies on the CBAS environment are MS SQL Server, PostgreSQL, Python, R, SPSS, SAS and STATA. This should not be taken as a constraint to your proposal or any suggested implementation.
9. Who are the cloud provider? We don’t see the specific identity of the cloud provider as relevant.
The CBAS remote virtualised cloud infrastructure is based on Microsoft Hyper-V. It provides Infrastructure as a Service with server VM instances (running Windows 212 R2 and Ubuntu 16 LTS) and client VM instances (running Windows 10) on a local high-bandwidth LAN. Users connect to the client VM desktop over the internet. Some of the core technologies that we use are MS SQL Server, PostgreSQL, Python, R, SPSS, SAS and STATA. Note that the above should not be taken as a constraint to your proposal or any suggested implementation.
10. What is the core database technology, are the department using Database as a Service? Our primary relational database technology of choice in MS SQL Server with PostgreSQL as a second choice if the primary is inappropriate for any reason. Note that the above should not be taken as a necessary constraint to your proposal or any suggested implementation.
11. Is your given max day rate of £800 including VAT correct? Should it read £800 / day excluding VAT? The figure of £800 is a maximum day rate that would need to include VAT and all expenses. The intention is to remain within total project budget including these elements.
12. Is this role outside IR35 or inside IR35? The Department requires evidence that the person or company is registered to pay tax. This requirement can be applied for by either a registered person(s) or company.
13. Can the department share the " scoping phase" document mentioned in the tender listing? This was an internal business case document that has not been published. However, relevant information has been placed into a doc which is available at this link https://docs.google.com/document/d/1Vhz7g5EnNSmRxDDfWRdeXacAbiCkYgj9M_9LYryLqTo/edit?usp=sharing

Note that the scoping phase covered a high level view of a longer term programme of work, much of which is out of scope for this specific set of requirements.
14. What is the nature of the geospatial data currently envisaged? Everything happens somewhere - therefore most of the data we collect and report on has some geospatial attributes, e.g. country, region, local authority. Standard geographic codes (as set out by the ONS/GSS) are used to represent this metadata.
Some of our datasets have additional dimensions such as the location of the site (e.g. powerstation) which could be address or postcode based and may not follow a standardised format / code.
15. Does any of the work relate to remote devices and installations? If by this, you mean receiving data directly from such devices, then no it does not.
16. What databases or applications does BEIS currently use e.g. Access? We currently have a range of different databases and analytical software available and used by analysts including SAS, MS SQL, Access, Excel, SPSS, R, Python etc.
The focus of this work is data that is mainly currently stored and managed in Excel, with some processing via macros and VBA. There is very limited use of Access.
17. How much progress has been made ontologically? With respect to the development of a data ontology, we are at the initial stages. There is a standard for energy products (https://unstats.un.org/unsd/classifications/Family/Detail/2007) but not as yet for flows (e.g. imports, exports, production) or consumption . We are considering the adoption of the data ontology used by international bodies (e.g. the European Union or the International Energy Agency). The EU are in the final stages of standardising their SDMX protocol.
These may need to be adapted to meet specific requirements on UK reporting.
18. What is the volume and condition of the data currently in scope? What is the source of this data? There are forty data collections, collected on monthly, quarterly and annual basis covering ~ 200 of the larger energy companies in the UK. Archive copies of the questionnaires are available at https://webarchive.nationalarchives.gov.uk/20161207232641/https://www.gov.uk/government/publications/statistical-surveys.
Data complexity varies from a few to several 100 data points (averages <50 per collection). Data are held mainly in spreadsheets, with a small number of SAS and MS Access databases. The total volume of data involved is less than 1TB, including historical records.
Not all of the 40 data collections are in scope for this project. We will discuss with the successful contractor which to prioritise.
19. What is the volume / frequency of source data to be processed? There are forty data collections, collected on monthly, quarterly and annual basis covering ~ 200 of the larger energy companies in the UK. Archive copies of the questionnaires are available at https://webarchive.nationalarchives.gov.uk/20161207232641/https://www.gov.uk/government/publications/statistical-surveys.
Data complexity varies from a few to several 100 data points (averages <50 per collection). Data are held mainly in spreadsheets, with a small number of SAS and MS Access databases. The total volume of data involved is less than 1TB, including historical records.
Not all of the 40 data collections are in scope for this project. We will discuss with the successful contractor which to prioritise.
20. How many different data sources are there? There are forty collections although not all are in scope -we can agree prioritisation with the successful supplier. E.g.:
Quarterly gas return: ~30 spreadsheets from suppliers which are linked to other spreadsheets to aggregate and provide contextual information. A simple return with 50-100 data points p/m.
Monthly oil stocks from ~50 companies, includes cross-referencing against other company data using linking, VBA, and macros.
Monthly DORS returns, linked to AccessDB and spreadsheets with VBA and macros for aggregation and checking; complex checks and balances and several hundred data points.
Quarterly renewable electricity; compiled by contractor from surveys, administrative returns and estimations.
21. What tools are currently used to analyse/ visualise the data? For the data in scope of this project we predominantly use Excel, with some VBA, macros and linking across spreadsheets. Visualisation is usually through Excel charting functions, or GIS mapping tools (ArcGIS or QGIS). We have some experimental prototypes using R Shiny and java for some of our data and are keen to explore this further. We are open to other suggestions for ways to carry out analysis and visualisations using software available within the Dept.
22. Are you expecting to continue using the same tools as part of the solution? We are open to recommendations that will meet our requirements, which can be managed and developed by the stats team in future and which are compatible with available BEIS software.
23. How will the project be managed? Will it be run as an agile project? Suppliers should recommend an appropriate project management approach that fits with the way they will tackle the requirements. An agile-style approach would seem to be suitable.
24. What scope is there for 3rd party software? This is large range of available software already available on the CBAS platform (see other responses). To add software on our system it would need to pass our software acceptance criteria and testing. To give a firmer view you will need to identify the specific software and its required operating system/platform.
25. Have information requirements been established? This question is somewhat unclear and may benefit from clarification. We have clear output requirements which relate to maintaining our published statistics and we know the data we receive from suppliers. Business rules for translating from inputs to outputs are documented in part - this varies by data series. The internal scoping phase identified high level (statistical) user requirements. More detailed requirements are likely to need to be established.
26. How many spreadsheets are in scope, and of what size? There are forty data collections on monthly, quarterly and annual basis covering ~ 200 of the larger energy companies in the UK. E.g.:
Quarterly gas return: ~30 spreadsheets linked to other spreadsheets to aggregate and provide contextual information. A simple return with 50-100 data points p/m.
Monthly oil stocks from ~50 companies, includes cross-referencing against other company data using linking, VBA, and macros.
Monthly DORS returns with complex checks and balances and several hundred data points.
Quarterly renewable electricity; compiled by contractor from surveys, administrative returns and estimations.
Total volume of data involved is less than 1TB, including historical records.
27. How many spreadsheets are currently used to store the existing data? There are forty data collections, collected on monthly, quarterly and annual basis covering ~200 of the larger energy companies in the UK. Archive copies of the questionnaires are available at https://webarchive.nationalarchives.gov.uk/20161207232641/https://www.gov.uk/government/publications/statistical-surveys
Data complexity varies from a few to several 100 data points (averages <50 per collection). Data are held mainly in spreadsheets, with a small number of SAS and MS Access databases. The total volume of data involved is less than 1TB, including historical records.
Not all of the 40 data collections are in scope for this project. We will discuss with the successful contractor which to prioritise.