Extract medical codes from text

POST/construe/extract

Converts natural language text into structured medical codes.

Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.

RequiresBearerauthentication

Body parameters

textstringrequired

Natural language text to extract codes from

systemobjectoptional
namestringoptional

Code system name. Can be a built-in system or a custom system name.

Built-in systems:

  • SNOMED_CT_US_LITE - version 20240901
  • RXNORM - version 11042024
  • ICD-10-CM - version 2025
  • ICD-10-PCS - version 2025
  • LOINC - version 2.78
  • HPO - version 2025
  • CPT - version 2025

Custom systems:

  • Any valid system name uploaded via /construe/upload. Requires a paid plan.

Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.

versionstringoptional

Code system version. Must match the version available in your environment.

configobjectoptional
chunking_methodstringoptionaldefault sentences

Method for splitting input text into chunks before code extraction.

  • none - Treat the full input as a single chunk.
  • sentences - Split on sentence boundaries (supports citations).
  • paragraphs / topics / soap_note - LLM-based chunking.
  • clinical_ner_extract - Extract clinical concepts (problems, tests, treatments) and use each as a chunk (supports citations).
  • fasthpocr - Extract HPO concepts directly with category annotations (supports citations). Requires system: HPO; other systems are rejected. Causes most other config options to be ignored.
nonesentencesparagraphstopicssoap_noteclinical_ner_extractfasthpocr
max_codes_per_chunkintegeroptional

Maximum number of codes to extract per chunk. If not specified, uses system-specific defaults:

  • SNOMED: 10
  • LOINC, HPO, RXNORM: 20
  • All other systems: 5
code_similarity_filternumber (float)optionaldefault 0.9

Threshold for filtering similar codes (0.0-1.0)

validation_methodstringoptionaldefault simple

Method for validating extracted codes:

  • none - No validation, returns all candidate codes
  • simple - LLM-based validation
  • medication_search - LLM-based validation tailored for medication concepts
  • chunk_code_jaccard_similarity - Token-level Jaccard similarity between source text chunk and code description
nonesimplemedication_searchchunk_code_jaccard_similarity
chunk_code_jaccard_similarity_filtering_thresholdnumber (float)optionaldefault 0.2

Minimum Jaccard similarity (0.0-1.0) for a code to be considered valid when using the "chunk_code_jaccard_similarity" validation method. Ignored by other methods.

include_rationalebooleanoptionaldefault true

Whether to include explanations for why each code was extracted

include_ancestorsbooleanoptionaldefault false

Whether to include ancestor/parent codes in the results

include_invalidbooleanoptionaldefault false

Whether to include codes that failed validation in the results

include_citationsbooleanoptionaldefault false

Whether to include source text citations for each extracted code. Citations show the exact text spans (with character offsets) that led to each code. Supported when chunking_method is "sentences", "clinical_ner_extract", or "fasthpocr".

extraction_contextstringoptional

Optional context describing the goal of the extraction. Required when min_context_relevance is greater than 0.

min_context_relevancenumber (double)optionaldefault 0

Minimum relevance score (0.0–1.0) a chunk must reach to proceed to code extraction. Chunks are scored by an LLM against the extraction_context goal. Chunks below this threshold are dropped, reducing noise and extraction cost. Set to 0 (the default) to disable relevance filtering and extract from all chunks. Requires the "extraction_context" field when set above 0.

consistency_effortstringoptionaldefault none

How much effort to spend ensuring consistent results across repeated requests. Higher levels apply stricter filtering to remove borderline results that may vary between calls, improving determinism at the cost of additional latency.

When validation_method is set to a value other than "none", consistency is applied to the validation step: codes must be unanimously validated across multiple rounds to be included.

When validation_method is "none" and min_context_relevance is set above 0, consistency is applied to the relevance ranking step instead: chunks must pass the relevance threshold in every round to be included.

nonelowmediumhigh
Returns  

Successfully extracted codes

Response fields

systemobjectrequired
namestringoptional

Code system name. Can be a built-in system or a custom system name.

Built-in systems:

  • SNOMED_CT_US_LITE - version 20240901
  • RXNORM - version 11042024
  • ICD-10-CM - version 2025
  • ICD-10-PCS - version 2025
  • LOINC - version 2.78
  • HPO - version 2025
  • CPT - version 2025

Custom systems:

  • Any valid system name uploaded via /construe/upload. Requires a paid plan.

Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.

versionstringoptional

Code system version. Must match the version available in your environment.

codesobject[]required
codestringrequired

The extracted code

descriptionstringrequired

Short description of the code

validbooleanrequired

Whether the code passed validation. Always true unless include_invalid is set to true, in which case invalid codes will have this set to false.

reasonstringoptional

Explanation for why this code was extracted (if include_rationale is true)

is_ancestorbooleanoptional

Whether this code is an ancestor (parent) of an extracted code rather than directly extracted. Only present when include_ancestors is true.

citationsobject[]optional

Source text references showing where this code was found in the input. Only present when include_citations is true and chunking method supports it. Ancestor codes do not receive citations.

textstringrequired

The exact text span containing evidence for the code

begin_offsetintegerrequired

Starting byte offset in the original input text (0-indexed)

end_offsetintegerrequired

Ending byte offset (exclusive), such that input[begin_offset:end_offset] == text

categoriesobject[]optional

Higher-level groupings the extracted code belongs to (e.g. HPO category terms). Only populated by full-extraction chunking methods such as "fasthpocr".

uristringrequired

Identifier for the category term (e.g. an HPO URI).

labelstringrequired

Human-readable label for the category term.

POSTRequest
curl -X POST 'https://experiment.app.pheno.ml/construe/extract' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
  "text": "Patient is a 14-year-old female, previously healthy, who is here for evaluation of abnormal renal ultrasound with atrophic right kidney.",
  "system": {
    "name": "ICD-10-CM",
    "version": "2025"
  }
}'
200 OKResponse
{
  "system": {
    "name": "ICD-10-CM",
    "version": "2025"
  },
  "codes": [
    {
      "code": "Q60.3",
      "description": "Renal hypoplasia, unspecified",
      "valid": true
    },
    {
      "code": "R93.42",
      "description": "Abnormal findings on diagnostic imaging of kidney",
      "valid": true
    },
    {
      "code": "N28.1",
      "description": "Cyst of kidney, acquired",
      "valid": true
    }
  ]
}