Office for National Statistics

PU-20-0089 ONS Matching System

Incomplete applications

19
Incomplete applications
13 SME, 6 large

Completed applications

31
Completed applications
17 SME, 14 large
Important dates
Opportunity attribute name Opportunity attribute value
Published Friday 20 September 2019
Deadline for asking questions Friday 27 September 2019 at 11:59pm GMT
Closing date for applications Friday 4 October 2019 at 11:59pm GMT

Overview

Overview
Opportunity attribute name Opportunity attribute value
Summary of the work A system to match records (people) from two datasets and assure that the quality of the matches meets strict targets. The immediate need relating to successful delivery of the 2021 Census is for a solution to be ready for testing in March 2020.
Latest start date Sunday 1 December 2019
Expected contract length 12-18 months with possible extension
Location No specific location, eg they can work remotely
Organisation the work is for Office for National Statistics
Budget range

About the work

About the work
Opportunity attribute name Opportunity attribute value
Why the work is being done The ONS is aiming to publish outputs from the 2021 Census within a year of collection. To achieve this, we must have completed a matching exercise within an 8-week timeframe. The matching exercise itself involves matching Census data with a follow-up coverage survey (e.g. by name, sex, date of birth, address etc.) to identify if we have over or under-counted different parts of the population, which influences the final statistical outputs we produce.
Problem to be solved We require a capability to enable the matching of records from two datasets and assure that the quality of the matches meet strict targets.

The immediate need relating to successful delivery of the 2021 Census is for a solution to be ready for testing in March 2020 to provide confidence that we will be able to complete a matching exercise on Census datasets within an 8-week timeframe during 2021. A secondary need relates to potential re-use of the capability for wider ONS business needs beyond the Census with other datasets.
Who the users are and what they need to do As a clerical matcher, I need to compare two or more records that have been marked as potential matches by an automated algorithm, so that I can determine if they really are the same person or not.
Early market engagement The ONS conducted an early market engagement, hosting two webinars with the opportunity for suppliers to ask questions about the procurement.

Links to the recordings for the webinars, a write-up of the questions and answers as well as a summary of the topics discussed can be found on the ONS InTend site:

https://in-tendhost.co.uk/ons/

Registration is required, but free. Once complete, please search ‘Clerical Matching’ within the ‘Current Opportunities’ area and register your interest. We will confirm you are registered on the DOS3 Framework prior to enabling access to the documents.
Any work that’s already been done For Census, the ONS methodology team have developed an automated matching algorithm that has been assured, including by external panels, to provide confidence we can match 91% of records automatically. We need to use this algorithm for the automated matching step for Census. Generally, this either means integrating the algorithm into a matching system or the ONS running the automated matching and the system utilising the output (we would need to agree the interface/format specification for this).

For wider ONS use (outside of Census), different algorithms may be used and/or we could utilise existing/off-the-shelf automated matching within a system/product.
Existing team The ONS has a data integration team that the supplier will be working with. This team consists of experts in record matching with a deep knowledge of the matching problems we need to solve and the quality targets that need to be met.
Current phase Not started

Work setup

Work setup
Opportunity attribute name Opportunity attribute value
Address where the work will take place Office for National Statistics
Segensworth Road
Titchfield
Fareham
Hampshire
PO15 5RR
Working arrangements The expectation is that some time on site will be required, to work with the users, testing and understanding requirements. It is generally not a problem for the supplier to work from their own location if we can schedule some face-to-face time.

Face-to-face time would primarily be at our Titchfield site with some limited time needed in Newport.

See https://www.ons.gov.uk/aboutus/contactus/officelocations
Security clearance There may be some SC and BC (BPSS) clearance required, we also utilise CTC (counter terrorism check) for some roles. Generally staff accessing Census data will need to be SC cleared.

Additional information

Additional information
Opportunity attribute name Opportunity attribute value
Additional terms and conditions

Skills and experience

Buyers will use the essential and nice-to-have skills and experience to help them evaluate suppliers’ technical competence.

Skills and experience
Opportunity attribute name Opportunity attribute value
Essential skills and experience
  • Expert knowledge of data matching processes and algorithms (6 points)
  • Experience of storing/retrieving/displaying large volumes of multi-page images/scanned documents (PDFs) (3 points)
  • Evidence of working with large volumes of data (the Census is circa 60 million records) (4 points)
  • Experience with fuzzy-match based searching (4 points)
  • Evidence of developing/configuring/optimising user interfaces for displaying data records (5 points)
  • Experience of integrating simple case and workflow management (3 points)
  • Evidence of support for data capture as part of a user workflow (e.g. storing case notes against decisions) (3 points)
  • Experience of systems able to scale to meet increases in concurrent usage (4 points)
  • Evidence of working with both Welsh (or international) and English data and typefaces (4 points)
  • Experience of working with highly sensitive data and the associated assurance requirements (4 points)
  • Knowledge of the latest NCSC guidance and principles (2 points)
  • Experience of public, private or hybrid cloud hosting architecture & solutions (5 points)
  • Experience integrating solutions with other systems and architecture (4 points)
  • Evidence of applying the best practice principles of the Government Service Standard (4 points)
  • Evidence of gathering user needs in accordance with the Government Service Standard (4 points)
  • Evidence in user centred design (4 points)
  • Proven record of effectively using agile methods to design, build and deliver (4 points)
  • Experience developing production-ready solutions incrementally (3 points)
  • Experience running and supporting an operational system (5 points)
  • Experience of logging, auditing and monitoring of system and user activity in accordance with agreed metrics (4 points)
Nice-to-have skills and experience
  • Applying user research insights to product development (1 point)
  • Experience of system migration/install (e.g. to/from Cloud to private data centre) (2 points)
  • Knowledge of Apache Hadoop based stacks (HDFS, Hue, Hive, Impala, Spark) (2 points)
  • Knowledge of Python and PySpark (1 points)
  • Experience of VMWare/VSphere (1 point)
  • Knowledge of working with geospatial data (2 point)
  • Experience working with APIs (1 point)
  • Experience of different data formats (e.g. CSV, JSON, Avro, Parquet) (1 point)

How suppliers will be evaluated

How suppliers will be evaluated
Opportunity attribute name Opportunity attribute value
How many suppliers to evaluate 5
Proposal criteria
  • How the approach or solution will meet the needs of our users (4 points)
  • How the approach or solution meets our goals and targets (4 points)
  • Estimated timeframes for the work (4 points)
  • Identification of risks and dependencies and proposed approaches to managing them (4 points)
  • Value for money (4 points)
Cultural fit criteria
  • Be transparent and collaborative when making decisions (4 points)
  • Have a no-blame culture and encourage people to learn from their mistakes (4 points)
  • Take responsibility for their work (4 points)
  • Challenge the status quo (4 points)
  • Be comfortable standing up for their discipline (4 points)
  • Can work with clients with low technical expertise (2 points)
Payment approach Capped time and materials
Assessment methods
  • Written proposal
  • Case study
  • Work history
  • Reference
  • Presentation
Evaluation weighting

Technical competence

70%

Cultural fit

10%

Price

20%

Questions asked by suppliers

Questions asked by suppliers
Supplier question Buyer answer
1. Please, can you specify the budget for the project? Thanks As advised in the Webinar Q&A document, ONS require the most economically advantageous proposal from all tenderers. The commercial submissions from tenderers will form part of the tender evaluation at the next stage for the tenderers shortlisted to the tender stage. At this time, we are not disclosing the budget.
2. Can you download all the questions that require answering within the application rather than discovering each one as you work screen by screen, so we get an overview of the entire response required? We have followed the process within the Portal with all Skills and Experience criteria requiring a response being listed.
3. What is the overall budget for this piece of work? As advised in the Webinar Q&A document, ONS require the most economically advantageous proposal from all tenderers. The commercial submissions from tenderers will form part of the tender evaluation at the next stage for the tenderers shortlisted to the tender stage. At this time, we are not disclosing the budget.
4. To clarify. The solution on offer must perform automated matching using the algorithm created by ONS on the 75% of data that is already in digital format in the Cloudera Hadoop environment. And it must extract and match the data from the 25% of data that is stored in.pdf format. Is that correct, does the system need to have the capability to extract census data from the.pdf files and then perform matching and exception reporting on the data stored in the.pdf's. The solution must either perform automated matching using the ONS supplied algorithm, or the ONS will run the algorithm on our internal system and provide the output to the solution. In either case, the algorithm runs against all the data (whether from online capture or scanned/OCR’d data from paper questionnaire images). There is no requirement for the solution to extract data from images, this will already have been done. The exception reporting/manual matching processes will be performed on all matches the automated algorithm doesn’t have sufficient confidence in, regardless of the source (paper/online).

Please review the pre-tender market engagement documents
5. What format should responses be in? Please submit as much detail as possible including at least one example to support your experience in responding to each question inline with the Framework guidance.  We are happy for bullet points, for example, to be used providing your response fully addresses the question raised.  Please ensure your response in fully relevant to the question asked.
6. In the ‘Nice-to-have skills and experience’, the item ‘Applying user research insights to product development (1 point)’ appears to already be covered under the essential criteria, is that correct? "We agree this is the case, as a result, there is no need to answer this specific question in the nice-to-have skills and experience.
Suppliers are advised that as per the Framework, the Nice-to-have responses will be scored and only used in the event of a tie when shortlisting the five suppliers to proceed through to the tender stage. All Essential Skills and Experience responses will be scored and used as part of the shortlisting process."
7. Please can you confirm the process once we've submitted our EOI responses? As detailed in the webinar, we will evaluate the responses to the EOI criteria. The five highest scoring suppliers at the EOI evaluation will proceed to the tender stage.

EOI responses to be submitted on the DOS Portal by 23.59 on 4/10/19.

Indicative ITT dates are:

ITT issued via Intend to the five shortlisted suppliers: 11/10/19;
ITT responses by 12 noon on 1/11/19;
Clarification Interviews if required w/c 18/11/19.

Tenderers will respond inline with the evaluation criteria with our specification to be issued at ITT stage. This will formulate information all suppliers have been advised of during the process.
8. Good Morning

Are you expecting any pricing at this stage?
Good morning

No. This is the Expression of Interest Stage. The suppliers shortlisted through to the tender stage will price the requirements based on the criteria stated on the portal, capped time and materials. The tender stage will be evaluated as stated on the portal with the financial element contributing to the overall evaluation score.

It is anticipated that a standard Framework template will be issued to tenderers to respond to the cost and quality requirements. This applies to the tender stage only. The EOI responses are via the DOS Portal only.
9. You mention "Evidence of working with both Welsh (or international) and English data and typefaces". Is this required for natural language processing on large batch data sources, or front end (for eg GUI) design in multiple user languages? Please see the response contained on the Q&A responses from the Webinars available on Intend.
10. For the Question "Evidence of Experience of system migration/install (e.g. to/from Cloud to private data centre) " are you looking for suppliers to show their experience of moving from cloud back to private data centres? Please provide examples demonstrating the install/movement of systems and infrastructure either from public cloud to private data centres or private data centres to public cloud to demonstrate you understand the logistics, challenges and approach from this kind of migration. Please see the pre-tender market engagement documents on InTend for more details.
11. Can the solution take data from the Cloudera environment, perform matching, display exceptions and matches, and push the results back into the ONS Cloudera platform for onward processing? Yes, that may be practical if we can agree the interface/integration specification/format etc. and the approach doesn’t overly burden the ONS in terms of significant effort to implement/support.
12. We have attempted to access the Webinar and documentation via the portal but we haven't been able to access it, even after our expression of interest. Thanks for advising. All documentation on Intend is available. Please could you look again and let us know if you cannot access them
13. Evidence of support for data capture as part of a user workflow: Are you able to please define what kind of support you are referring to in this case? Please see the pre-tender market engagement documents on InTend, specifically the Q&A summary.
14. You quote – “Limited availability to extend, modify, build and integrate with internal infrastructure” – can you expand and clarify this and describe whether there are any examples of this linking to any infrastructure systems/services or is this expected for the solution to be stand alone? We are open to a standalone solution or one that integrates with existing ONS infrastructure (e.g. our Hadoop cluster) – as long as the approach doesn’t require significant effort from the ONS as we have limited capacity at this time.
15. Timeline stated – December 2019 – March 2020 but also highlight this being a 12-18 month engagement, can you elaborate on this thinking and provide steer. For March 2020 we're asking for something sufficiently representative of the final system to enable us to run a realistic test (representative of the real thing), using a large volume of real Census data, producing matched/non-matched outputs, timed metrics and management information. We expect the system itself will need to be iterated on after this to improve it, ready for the actual Census matching in 2021.
16. What are the minimum requirements for users searching to find matches during clerical search and to what extent should this incorporate more advanced techniques/ technologies? Further, does this vary for use in census matching compared with application to further use cases? The minimum requirement is for a matcher to be able to search on any variable, or a combination; and to be able to do so using wildcards or similar. For example, a combination of occupation, marital status, religion, town and ‘first name is similar to ‘Sandra’, or %andr%’.
Nice-to-have extensions include machine learning/algorithmic/smart methods of finding candidates.
Potential use cases outside Census may involve searching across multiple datasets that may differ significantly in available variables.