Awarded to Create/Change

Start date: Monday 17 February 2020
Value: £286,000
Company size: SME
The Cabinet Office is leading this development, with scope to partner with other departments.

Analysing and Managing Documents - Discovery and Alpha

22 Incomplete applications

18 SME, 4 large

54 Completed applications

44 SME, 10 large

Important dates

Tuesday 12 November 2019
Deadline for asking questions
Tuesday 19 November 2019 at 11:59pm GMT
Closing date for applications
Tuesday 26 November 2019 at 11:59pm GMT


Summary of the work
Discovery and Alpha to research, build and test approaches to automate document management; including analysis (content, metadata), categorisation (document-type, topic) and deletion/ retention automation.

Particularly interested in automated analysis of recurrent file-types through common data-entities, e.g. minutes containing “decisions” / "actions”, as well as exploring cross-document content and thematic linkages.
Latest start date
Monday 13 January 2020
Expected contract length
18 weeks, incl. 6-week Discovery and 12-week Alpha (however we welcome suggested time-frames).
No specific location, for example they can work remotely
Organisation the work is for
The Cabinet Office is leading this development, with scope to partner with other departments.
Budget range
£225 - 300k ex. VAT.

All travel, subsistence and other expenses must be included in the overall cost.

As the contract extends into the next financial year there may be a need for proposals to accomodate a breakpoint at year end - this will be clarified for shortlisted providers in the call for proposals.

About the work

Why the work is being done
This work is being done as part of the Better Information for Better Government programme. This is a cross-government initiative to improve how government manages its digital information and facilitates collaboration online. The Cabinet Office is leading work to adapt core processes and attitudes to information in departments to fit with new digital ways of working in the 21st century civil service.
Problem to be solved
The government holds a wealth of information across a high volume of digital documents received or created by public servants. Accessing, analysing and retrieving this information is largely manual and resource-intensive, with the government being at risk of legal and reputational damage if it fails to do so effectively.

Moreover, there is an opportunity-cost in not accessing and utilising valuable information held in recent and legacy digital documents to inform current government work.

The challenge is therefore to explore how government can automate the analysis of existing digital documents, to better manage and exploit its digital information.
Who the users are and what they need to do
We have identified a backlog of epics that we are keen to explore and refine (acknowledging that these will not all be in-scope for development in the alpha).

These include:

- As a public servant, I need to be able to find documents by key characteristics (content, metadata), so that I can access relevant information easily.

- As a KIM professional, I need to know what document-type a file is (e.g. a business case), so that I can process the document correctly.

- As a public servant, I need to be able to find related documents, so that I can access all relevant information.

- As an inquirer, I need to be able to search high document volumes easily, so that I can provide accurate and timely responses.

Further details can be shared for information.
Early market engagement
Any work that’s already been done
The programme recently commenced an alpha to explore the use of Natural Language Processing and Machine Learning to automate management of emails, with the aim of automating email categorisation (including taxonomic) and deletion/retention decisions. See:

In latter part of this opportunity we intend to explore exploiting the emerging outcomes of this work for analysis of digital documents.

We have also conducted pre-discovery research into this problem-space, identifying the potential use-cases, user-needs and identifying some existing practices in government; including development of related open-source software.
Existing team
You'll be working with our internal team: a product manager, deputy product manager and a service owner.

There will also be a requirement to work with the supplier delivering the email alpha service.
Current phase
Not started

Work setup

Address where the work will take place
Remote with some travel to London offices.
Working arrangements
We hope to co-locate the product manager with the team for at least half the time in each sprint to act as subject matter experts and to be a core part of development.

Please note - we aim to complete the initial sift and call for proposals in early December. We anticipate presentations will therefore take place between 16th - 19th of December. This will be confirmed with shortlisted suppliers in the call for proposals.
Security clearance
All members of the team need at least BPSS. SC would be preferable where possible. If needed, we can work with the chosen supplier to secure this clearance once this work has commenced.

Additional information

Additional terms and conditions
We may need to agree additional clauses on security and data protection with the supplier due to the sensitivity of the data being handled.

Skills and experience

Buyers will use the essential and nice-to-have skills and experience to help them evaluate suppliers’ technical competence.

Essential skills and experience
  • Experience managing successful delivery within a complex multi-organisation landscape (3 points)
  • Experience using natural language processing and machine learning technologies (3 points)
  • Experience of successfully designing and delivering services aligned to GDS service standards or equivalent (3 points)
  • Experience of digital projects handling complex, sensitive data at scale (3 points)
Nice-to-have skills and experience
  • Experience using variety of user research methods, including prototyping (2 points)
  • Experience of managing successful delivery alongside other suppliers (2 points)

How suppliers will be evaluated

All suppliers will be asked to provide a written proposal.

How many suppliers to evaluate
Proposal criteria
  • Proposed approach and methodology - including an overview of intended sprint plan / timeframes (8 points)
  • Technical competency and capability (10 points)
  • How the supplier intends to identify and meet user needs (8 points)
  • How the approach is designed to meet our organisational goal (8 points)
  • How the supplier has identified risks and dependencies and offered approaches to manage them (5 points)
  • How the supplier intends to ensure effective knowledge transfer (5 points)
Cultural fit criteria
  • Evidence of open, transparent and collaborative working, both internally and with clients (6 points)
  • Evidence of strategic awareness; understanding the relevant landscape (political, legal, technological) and shaping design and delivery accordingly (6 points)
  • Evidence of working in a creative environment where idea-exchange, originality and innovation are valued (5 points)
  • Team structure (3 points)
Payment approach
Time and materials
Additional assessment methods
Evaluation weighting

Technical competence


Cultural fit




Questions asked by suppliers

Payment approach will be CAPPED TIME AND MATERIALS
2. Are there any technology constraints for the content management or the AI/ML stacks
The documents will be held across a number of virtual and physical servers and in various digital formats. As part of the Discovery we anticipate determining the appropriate document scope, taking technical feasibility into account.

Regarding the stack - our email asset will be held in a Cabinet Office AWS account. Any integration would therefore need to be compatible with this and developed in-line with the GDS technology strategy.
3. What volume of these documents?
We will aim to process thousands of documents within the Alpha. More detailed volumes and scope can be established in the Discovery.
4. Who and how often add the documents?
Documents in-scope are generated by civil servants. We anticipate focusing on a sizeable sample of digitally archived documents created over 5 years ago.
5. What types of documents
All forms of digital documents are potentially in-scope. We anticipate these are primarily in Microsoft Word and Google Docs, however there will be other file formats including (but not limited to) Microsoft Excel and Powerpoint, plus Google Sheets and Slides.
6. What connections are in place between documents? How should users interact with documents? Will it be just a search and a list of documents in response, or documents should be organized in some kind of structure?
Documents are currently held in a variety of digital storage systems and file directories. They may have linkages i.e. via labels, however we do not know to what extent. This will need to be explored in the Discovery.

The exact design of the system regarding user-interaction, storage and search and retrieval will need to be determined during this engagement.
7. Should connections between documents be automatical or would users set it up?
We are keen to explore the potential to create automatic linkages, potentially with user reinforcement (i.e. confirmation/rejection).
8. Will the system be widely available or just a limited number of users work with it?
We anticipate that the scope of the Alpha will be limited to specific user group(s) of up to 200.

The final product that's created we expect would have a far wider user base. The exact scope of this would become clearer as we refine the product development as part of the Discovery and Alpha.
9. What capacity (number of simultaneous users, requests, new documents per unit time, etc.) should the system support?
We will need to establish this as part of the Discovery.
10. Will documents need to be edited? Add/delete metadata? or automatically detect metadata from documents?
We anticipate that existing metadata would be used as the basis for some of the automated analysis. We also expect there to be some user action to confirm/ edit/ reject the automated analysis.
11. What is the current process of working with these documents in place? Could you please provide concrete use cases? How would you like to improve this process?
Archived documents that we expect to be in-scope for this Discovery and Alpha are currently managed largely manually. In some instances these are supported by AI products that analyse document metadata to help streamline document management. We however are seeking to go further and automate analysis of document content in order to support this process.

We anticipate that the user research conducted within the Discovery will help identify more detailed use-cases.
12. Have you defined how success looks like on the project delivery, and do you have any performance metrics that you would like tracked?
We do not have confirmed performance metrics, however we will seek to establish and baseline these ahead of the Alpha development.
13. Was Cabinet Office supported by an external supplier during Alpha or any previous phases of the project?
This specific project hasn't yet started and therefore that have been no previous stages of development.

For the 'Email Filing and Sorting' Alpha referenced in our requirements (, the Cabinet Office are partnering with Faculty.
14. With regards to the recent alpha to explore the use of Natural Language Processing and Machine Learning to automate management of emails. Was this bespoke development, or using a platform e.g. IBM Watson?
This is a bespoke product that is currently in development.
15. Can you list the current document management tools in use, that will be in scope for this Discovery and Alpha?
During the Discovery we will need to carry out detailed scoping, including establishing the intended department(s) we intend to partner with, as well as the corpus and age of documents we intend to analyse.

Doing so will enable us to identify specific document management tools that will be in-scope. As such this can't be confirmed at this stage. Example tools however may include AODocs, SharePoint, Microsoft Fileshare and Google Team Folder.
16. Does Cabinet Office already have a document management solution place that meets part of this requirement? If so, is the intention of this work to assess options for replacing or upgrading that solution?
Our intention is explore the feasibility of developing a product that classifies documents according to their content, as well as metadata. One outcome of this may therefore be a recommendation to replace or upgrade the existing document management system, however this is not necessarily a core aim of this project.

Whilst exact scope will need to be determined within the Discovery, as this will be dependent on the department(s) we partner with, example document management tools may include AODocs, SharePoint, Microsoft Fileshare and Google Team Folder.
17. Are you planning to take the project through a GDS service assessment?
All development will need to be undertaken in accordance with the Technology Code of Practice ( and Service Standard (

As such, we anticipate the product will be taken through an Alpha assessment, either through formal GDS assessment or a departmental assessment. This will be determined as scope is refined during Discovery.
18. Have you determined whether the solution to this business problem is an existing tool/service or a bespoke build. or will this determination need to be part of the Discovery phase?
Our research to-date indicates that there are no existing off-the-shelf products that enable document analysis and categorisation by content in the manner we require.

We have therefore procured a joint Discovery and Alpha on the understanding that a bespoke solution is required.
19. Will the final system be provided for all Government departments or is the scope limited to Cabinet Office?
The scope of the programme is cross-government and as such, the scope for the final product may be cross-government.

For the Alpha, however, we envisage limiting scope to 1 or 2 departments (which may or may not include the Cabinet Office).
20. What AI products are currently in use that analyse document metadata?
We have experimented with a data classification tool and have used this to analyse metadata to determine the value of information in a digital collection.
21. Are you looking for suppliers with experience in any specific technology or any specific machine-learning technology (e.g. TensorFlow)?
Whilst we are looking for suppliers with Machine Learning experience, these do not have to be in any specific Machine Learning technologies.

All development will need to take into account the TCOP ( and NCSC Cloud Guidance (
22. Who is the supplier running the e-mail alpha?
For the Email Filing and Sorting service, the Cabinet Office have partnered with Faculty.
23. What tools have been used for work in response to the “Filing and Sorting Emails (Alpha)”
We are using a bespoke development platform. Further details may be shared with the successful supplier.
24. Can you share either of the pre-discovery use cases or discovery work done for the e-mail alpha?
Yes - user stories can be available on request. Please email to request a copy.

Please note - these stories were derived from an initial internal workshop and will require refinement. We also do not anticipate that all will be in-scope for delivery within the Alpha.
25. Is there any “stack” (if any) pre-exists, or can be provisioned by the Cabinet Office, in support of the Alpha?
There is no prescribed stack for this development.
Hosting will need to either use Cabinet Office's AWS space or the GDS PaaS. This will need to be determined during scoping and Discovery.

All development will need to take into account the TCOP ( and NCSC Cloud Guidance (
26. Has pre-discovery been conducted by external vendor also are they going to be participant for this proposal?
Pre-discovery had a limited focus and was carried out in-house by the team.