Awarded to Mobilise Cloud Services Ltd

Start date: Monday 8 November 2021
Value: £84,850
Company size: SME
The National Archives

UK Government Web Archive production database (GWDB) replacement project

6 Incomplete applications

5 SME, 1 large

15 Completed applications

14 SME, 1 large

Important dates

Published
Thursday 2 September 2021
Deadline for asking questions
Thursday 9 September 2021 at 11:59pm GMT
Closing date for applications
Thursday 16 September 2021 at 11:59pm GMT

Overview

Off-payroll (IR35) determination
Supply of resource: the off-payroll rules will apply to any workers engaged through a qualifying intermediary, such as their own limited company
Summary of the work
The UK Government Web Archive requires a supplier to design, develop and deliver a replacement data management system for key website and crawl data, using AWS cloud based technologies. Data in the existing database must be mapped, merged and migrated into the proposed solution.
Latest start date
Friday 1 October 2021
Expected contract length
4-5 months (to include a support period, estimated 4 weeks)
Location
No specific location, for example they can work remotely
Organisation the work is for
The National Archives
Budget range
Up to £100,000. Suppliers are requested to provide their rate cards with their submissions. Budget based on rates in the region of £750-£995/day (max.)

About the work

Why the work is being done
The National Archives’ Web Archiving team (UKGWA) use a proprietary, legacy SQL database (called GWDB), for the data management of critical website and crawl data for the UK Government Web Archive (https://www.nationalarchives.gov.uk/webarchive/). The system is buggy and difficult to use. GWDB is not flexible and the team have to find workarounds to issues (including holding data in Excel sheets) rather than adapt the system to meet their needs. The existing system runs on old infrastructure that is not well supported, there is a lack of documentation and institutional knowledge about the system.
The Web Archiving team need a new flexible data management system that can be supported and maintained by the team to avoid future technical debt issues.
Problem to be solved
We require a multidisciplinary team to develop a replacement database system to manage the website and crawl data. The supplier will need to produce requirements and a data model to support the required processes and reports. The supplier will design, build, test, and implement a cloud based data system that meets the needs of UKGWA.

The UKGWA team use the current system to track all websites included in the web archive. Data is sent via XML files to a third party to initiate crawls of the websites. The data is well understood by the UKGWA team (although the data model is not), established processes and data flows must be maintained to ensure continuity of service. The new system must hold most data from the current GWDB system and support many of its functions.
The team requires and new system, including a new content model, data quality/validation rules. It must have a user-friendly, intuitive front-end that enables efficient data input and management. A search/reporting facility is required that allows users to query, filter and export data. Reports/results lists must be configurable and exportable to standard formats.
The system must be hosted in AWS and will be supported by the UKGWA team.
Who the users are and what they need to do
The primary user of the data management system is the Web Archiving team. They use the system to record the websites and social media channels that need to be archived. The information details if, how and when the websites should be crawled. The team query this data, run reports and export information from the database to help with decision making. XML files are produced from the database via scheduled tasks or when they are triggered manually. These files contain all the key configurations for crawl of the website, they are sent from the system to a third party on a daily basis.
The Web Archiving team need to be able to handle user management; add, amend, delete records; query, filter and export data from the database. The system needs to support the team’s workflow (search, view, edit, sign-off). The team supplements the database with data held in Excel files, this creates a fragmented workflow with multiple stages and hand-offs. These datasets and workflows should be incorporated into the new system wherever possible. At the end of the project, the new system must be delivered in such a way that will allow the Web Archiving team to develop it going forward.
Early market engagement
Any work that’s already been done
The team has a good understanding of (1) current workflows that rely on the GWDB data, (2) the specification of the XML output file, (3) the data requirements, (4) many of the functional requirements of the new system.

The existing SQL database and reporting services function is hosted in-house. It is neither large, nor very complex. The system has search functionality and Reporting Services that allows the UKGWA team to run reports and export data. A copy of the database is available, but with limited documentation.
Coding and security standards - see https://www.gov.uk/service-manual/design/services-for-government-users
Existing team
The UK Government Web Archiving Team is a small team of 9 specialists within the Digital Directorate at The National Archives. The team is highly skilled in web archiving and will be the primary source of information during this project.

Technologies that can be supported by the Web Archiving Team:
• Relational databases - PostgreSQL, MySQL or MariaDB hosted on AWS RDS.
• Our primary scripting language is Python.
• CI/CD - GitHub and GitHub Actions with Docker images pushed to our Dockerhub organisation.
• Cloud infrastructure - AWS.
Current phase
Not started

Work setup

Address where the work will take place
The National Archives, Kew, Richmond, Surrey TW9 4DU
Working arrangements
Flexible
Security clearance
Baseline security clearance will be required.

Additional information

Additional terms and conditions
Relevant National Archives and Civil Service policies and terms and conditions

Skills and experience

Buyers will use the essential and nice-to-have skills and experience to help them evaluate suppliers’ technical competence.

Essential skills and experience
  • Strong, demonstrable experience in database design and data modelling.
  • Must demonstrate excellent competence in building front-end applications to interrogate and view a database. (Describe a recent project. What, where, when, duration, result.)
  • Must have relevant experience of delivering solutions using AWS services. (Describe a recent project. What, where, when, duration, result.)
  • Must have experience of deploying a suitably structured team to deliver a database or data management system. (Set out indicative roles and team structure.)
  • Must have experience of rapid delivery. (Describe when you have delivered a database solution to a short timescale.)
  • Must demonstrate excellent competence in working in an Agile way to deliver capability in an incremental way (Please can you describe how you have achieved this.)
  • Must have experience in designing solutions that include generation of output files to a predefined schedule. (Please can you describe how you have achieved this.)
  • Must demonstrate excellent competence in providing handover training and deployment support. (Describe a recent project. What, where, when, duration, result.)
  • Must have experience working with subject matter experts. (Describe how you have effectively worked with SMEs.)
Nice-to-have skills and experience
  • Evidence of guaranteeing the design and build of a database system, where the ongoing support of such is provided by another party
  • Experience of designing solutions that include an API for accessing data
  • Ability to provide innovative ideas whilst delivering the core requirements
  • Demonstrate understanding and ability to deliver digital services/products to the Government Digital Service standards

How suppliers will be evaluated

All suppliers will be asked to provide a written proposal.

How many suppliers to evaluate
5
Proposal criteria
  • Demonstrated understanding of scope of work
  • Track record of meeting or exceeding requirements
  • Proven skills in developing data management systems based on examples of previous work
  • Proven skills in implementing replacement legacy systems, including evaluation of workflows, data requirements, data mapping and migration
  • Evidence of creative approaches and ability to design interfaces to meet user needs
  • Capacity to perform work within timescale and budget
Cultural fit criteria
  • Have collaborative and flexible working approach, e.g. working with in-house technical and other digital specialists
  • Approach to supporting teams to adopt new technologies
  • Examples of delivering transition, knowledge transfer and handover of code
  • An appreciation for the importance of technical documentation as a means of ensuring ongoing maintainability of systems
  • Demonstrable commitment to a diverse working environment, with a team comprised of experts from a wide variety of backgrounds
Payment approach
Capped time and materials
Additional assessment methods
  • Case study
  • Work history
  • Reference
  • Presentation
Evaluation weighting

Technical competence

50%

Cultural fit

15%

Price

35%

Questions asked by suppliers

1. Is it outside or inside ir35?
We have checked the requirements (to the best of our knowledge) using the assessment tool found at https://www.gov.uk/guidance/check-employment-status-for-tax and the determination for the role(s) as advertised is that the intermediaries legislation does apply to this engagement.
2. In terms of working arrangements, would primarily remote from the UK with onsite work when required be acceptable?
In principle yes, provided you can work core UK business hours. Meetings will be required, and for some periods (such as testing) then we would need someone able to respond to issues/questions as they arise in core business hours. If work was to be off-site in a significantly different time-zone then you will need to provide a plan on how you would manage communications.
3. from reading this advert, we get the impression that the team is predisposed to an AWS solution.
For our solution to be effective, skills sets will need to re-learned which can be hurdle that is undesirable for clients.
Before we respond, can you confirm if a semantic knowledge graph solution would be an option?
We believe that the solution needs to a relational database and not a semantic knowledge graph system. The dataset is relatively small and is highly structured, it does not contain complex many-to-many relationships. The data model is unlikely to change often and we will need to run queries over whole tables.
4. Can you provide an estimate of the number of workflows that will need to be supported in the front-end?
The system will need to support searching, adding, amending and deletion of records. We are looking for the supplier to work with us to establish the workflows needed but we would expect there to be a small number.
5. Do you have any high level architectural diagrams of the system that you can share?
The current system is poorly documented. Part of the scoping section of the project will be uncovering this type of documentation.
6. How large is the existing database?
The current database is approx. 2GB.
7. Can you provide any more details on the functional requirements of the relational database?
The system will need to support searching, adding, amending and deletion of records. It will need to produce configurable output files in XML format according to a schedule and on demand. Users must be able to run reports to analyse the data (search/filter/sort functionality) and export these reports. Users need to enter and manage lists of URLs, the system must support a bulk import of this data.
8. Are you storing the raw archived data in the relational database or is it metadata linked to the archived pages in some backing store?
The database will store the metadata related to the parameters needed to run a web crawl. Data such as (but not limited to):
Domain ID; Parent Domain ID; Domain Type; Government Dep; Domain Type; Domain Name; Homepage URL; Status; Crawl Frequency (months); Scheduled Crawl Date; Exceptional Crawl Date; Crawl Mode; Special Instructions; Archivist Notes; Date added; Closure date; Crawl ID

The system will not store the WARC files (archive data) created by the crawlers.
9. Is there a requirement to develop APIs to feed the public portal?
We will need a simple JSON API that allows us to create tools to facilitate internal processes or allow external suppliers to access the data without the need of the current XML output files.
10. Do you need new front-end web portal for public users or are you reusing the existing Public Search facility?
The system is not for public users but it will be web-based, accessible via a browser, behind a login screen for only National Archives staff to access. The system is a back-end tool for the Web Archiving team to manage data.
11. Can you provide an example of what you would export from the database for review?
An example of one of the XML files we need the system to produce, and an example of a report the current system produces (it is very likely that we will require change to the report but it’s a typical example) can be found here https://tna-ukgwa-sharing.s3.eu-west-2.amazonaws.com/gwdb2project/documents.zip