Awarded to MDRxTECH

Start date: Tuesday 4 January 2022
Value: £110,228
Company size: large
The National Archives

Enriching court judgments and legislation documents, adding hyperlinks and creating Linked Data

3 Incomplete applications

2 SME, 1 large

13 Completed applications

11 SME, 2 large

Important dates

Tuesday 9 November 2021
Deadline for asking questions
Tuesday 16 November 2021 at 11:59pm GMT
Closing date for applications
Tuesday 23 November 2021 at 11:59pm GMT


Off-payroll (IR35) determination
Contracted out service: the off-payroll rules do not apply
Summary of the work
Turn the references in court judgments to other cases and legislation, into hyperlinks. Enrich the documents, identifying citations and references to other named entities, and extract the enriched data into a knowledge graph.
Latest start date
Saturday 1 January 2022
Expected contract length
4 months with an option for an extension of up to additional 2 months
No specific location, for example they can work remotely
Organisation the work is for
The National Archives
Budget range
Up to £150,000

About the work

Why the work is being done
The National Archives manages and is developing a new service to provide public access to Court Judgments and Tribunal Decisions. We want to turn the references in these important legal documents into hyperlinks. This to enable users to move seamlessly between judgments, legislation and other official documents such as guidance documents, for example those held in the UK Government Web Archive. The main priority is to enrich Court Judgments, which we are storing in the Legal Document Mark-up Language XML (also called Akoma Ntoso). These documents contain a variety of textual references to other sources, including other cases, legislation and other official documents. We want to turn those textual references into hyperlinks for users, by enriching the data we hold in the Legal Document Mark-up Language. We also want to identify various named entities in the documents to improve our intellectual control over the collection. Once the documents are enriched, we want to extract all the additional information into a Linked Data knowledge graph that can support searching and browsing features in the new public service, and also enable data analysis of the collection.
Problem to be solved
Develop and configure an automated Natural Language Processing (NLP) capability to process the texts of court judgments, to identify references to legislation, cases (both UK and abroad) and significant named entities.
The results need to be added to the source documents, creating enriched documents with hyperlinks, and extracted into a Linked Data Knowledge Graph. We prefer hyperlinks to sources managed by The National Archives but will need to link to external sources too. There are 50,000 documents stored in LegalDocML XML to enrich. We will also use the NLP component in a publishing pipeline for new documents. Where there is a public source and a high confidence in the reference, we will create a hyperlink. These should be as specific as the granularity of the reference. The NLP capability should identify a wider set of references than just those that can become hyperlinks.
One challenge is managing identifying an initial full reference and subsequent usage through indirect reference (“this Act”), abbreviation or acronym. Another part of the problem is modelling the enriched data having regard to different confidence levels. False positives / erroneous hyperlinks are misleading for users, but may be acceptable in a knowledge graph.
Who the users are and what they need to do
The main users of the whole service are likely to be legal professionals, law students, and academics.
We expect users to start their user journey from a web search, using Google or Bing say. By linking the documents together, we can improve the search engine’s ability to rank the documents, and ultimately help improve the search results.
Once the user has arrived at the service, we know from user research that they value hyperlinks between documents, as it saves time and aids research. However, the wrong link, or a broken link, is frustrating, creates confusion and undermines confidence.
Users have sophisticated research questions. For example, how has a specific provision in legislation been interpreted by the courts?; or, which later judgments build on a particular precedent? Our service can’t directly answer these questions but by creating a knowledge graph of extracted data, we can begin to support a more sophisticated user interface for searching and browsing, so that users can more easily research questions like this for themselves.
Early market engagement
We have regular conversations with legal publishers, who have similar needs to ours, to enrich and link court judgments and legislation. Through those conversations we have developed a good appreciation of what it is possible to do using automated tools, versus manual editorial work.
Our experience using GATE (the General Architecture for Text Engineering: with gives us confidence around the feasibility of this work.
Any work that’s already been done
We have developed and deployed a data enrichment pipeline for legislation documents in using GATE (the General Architecture for Text Engineering: This uses data for the titles of legislation and has rules that identify various types of legislation reference. We think this solution can be extended to support references in court judgments.
We have developed a parser that turns court judgments into the Legal Document Mark-up Language and stores the documents in a Marklogic database. We anticipate storing the documents and the knowledge graph together in the database, which will also provide search for end users.
Existing team
The supplier’s team will deliver the work. The National Archives team will include a Data Scientist, a Product Manager, a Delivery Manager and a User Researcher.
Current phase

Work setup

Address where the work will take place
Mostly remote but some meetings as necessary onsite at The National Archives, Kew, Surrey TW9 4AD.
Working arrangements
The supplier will work in accordance with Agile methodologies to scope, plan, and deliver the work incrementally, with daily stand-ups, active communication, and will conduct regular ‘show and tell’ sessions to demonstrate progress. Online meetings will take place via Microsoft Teams with Slack available for quick communication.

The National Archives’ staff will be available during UK core hours (10am-4pm) each working day. The supplier will provide their own equipment and technology but will be given access to our organisational tracking app and Slack resources as appropriate.
Security clearance
Baseline clearance will be required (BPSS)

Additional information

Additional terms and conditions

Skills and experience

Buyers will use the essential and nice-to-have skills and experience to help them evaluate suppliers’ technical competence.

Essential skills and experience
  • Experience of natural language processing and enrichment of texts in XML
  • Experience of modelling linked data
  • Experience of generating linked data from NLP pipelines
  • Experience of cloud deployments, in particular AWS
  • Experience of documenting technical solutions so they can be maintained by others
Nice-to-have skills and experience
  • Experience of developing pipelines for the General Architecture for Text Engineering
  • Experience of the Legal Document Mark-up language
  • Experience of working with legislation documents
  • Experience of working with court judgments or tribunal decision documents

How suppliers will be evaluated

All suppliers will be asked to provide a written proposal.

How many suppliers to evaluate
Proposal criteria
  • Evidence of delivering natural language processing solutions
  • Evidence of delivering linked data solutions
  • Evidence of familiarity with the GDS Service Standard
  • Team structure, including the relevance of the team members' skills and experience
Cultural fit criteria
  • Work in an open and transparent way, sharing work in progress and involving others as you go
  • Explain what methods you propose to use to engage; communicate, constructively challenge and work effectively with our team and other suppliers
  • Describe how you propose to support positive working relationships throughout the life of the contract
Payment approach
Capped time and materials
Additional assessment methods
  • Work history
  • Presentation
Evaluation weighting

Technical competence


Cultural fit




Questions asked by suppliers

1. May I ask if there is an incumbent supplier?
There is no incumbent supplier.
2. Within the 50,000 documents, how many pages there are in the average document?
Whilst the documents are not paginated in the target format for data enrichment (LegalDocML XML), in terms of document size, we estimate they are 8-10 A4 pages long on average, when printed.
3. Within the 50,000 documents, what is the content structure predominately: Text, Tables, Pictures/diagrams
There is a header portion of the document setting out the main information (neutral citation, date, court, parties, judge/s and representatives). The rest of the document consists largely of headings, sub headings and numbered paragraphs of text. There are some blocks of quoted content, most often from a section of legislation. Occasionally there are tables and images.
4. Within the 50,000 documents, what is the content structure predominately: Text, Tables, Pictures/diagrams
We store the documents in LegalDocML, which provides us with a data model for court judgments in XML.
5. Within the 50,000 documents, will there be any hand written text?
6. You mention use of AWS, will you be provisioning your own cloud, or would you prefer the supplier to provide a managed cloud service?
We expect suppliers to use TNA’s provisioning of AWS cloud services.
7. You mention the requirement to document the technical solution so that it can be maintained by others. Are you intending to provide your own support and ongoing maintenance of the NLP Automations, or would you like the service provide to provide support as a managed service?
We would like suppliers to document their solution so that it can be supported and maintained, either in-house, or by a third party under a support contract.
8. What organisations (if any) did you work with to conduct and complete the discovery and alpha work?
Work to date has been largely delivered in-house.
9. At what stage in the bidding process will the discovery and alpha outputs be made available to prospective bidders?
These will be shared with the appointed supplier post award.
10. Can you provide details of the current technology and approach that the parser uses to turn court judgments into the Legal Document Mark-up Language?
The parser is a C# application which has been developed using the Microsoft Office Open XML SDK. The styling information for the whole document is extracted into a <presentation> <style> element block in LegalDocML. Further styling information (font size, weight, decoration etc) is then included in style attributes, as it is needed. <span>s are used for this inline, as in HTML. The body of the document is in the <judgmentBody>, are marked-up using <level>, <paragraph>, <num>, <content> and <p> elements. Specific parts of the document <header> are marked-up with semantic elements such as <neutralCitation>, <docDate>, <judge>, <party> etc.
11. Does the legislation reference solution successfully identify both direct and indirect forms of reference, and to what level of accuracy?
Yes, the current legislation reference solution identifies both direct and indirect forms of reference to other pieces of legislation, within and for legislation documents. It does this to a good level of accuracy, in part benefitting from the formal structure the documents provide (of Parts, Chapters, Sections etc). We have not tried the current solution for identifying legislation references in Court Judgments so we do not know how successful it will be.
12. Does the £150,000 budget include all expected costs for both all services and any potential software licences to achieve the desired outcomes?
Software licences will be procured separately. Suppliers will need to justify their proposed solution. Our preference is to use open source tools, such as GATE, for data enrichment. We have made the decision to store the documents using Marklogic and we envisage using this for storing the linked data. Hosting is also procured separately. We use AWS.
13. How will the 2 month extension to the initial 4 months work?
The optional extension gives The National Archives flexibility if the supplier is able to achieve the outcome but needs longer time.