Extract medical codes from text
/construe/extractConverts natural language text into structured medical codes.
Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.
Body parameters
textstringrequiredNatural language text to extract codes from
systemobjectoptionalnamestringoptionalCode system name. Can be a built-in system or a custom system name.
Built-in systems:
- SNOMED_CT_US_LITE - version 20240901
- RXNORM - version 11042024
- ICD-10-CM - version 2025
- ICD-10-PCS - version 2025
- LOINC - version 2.78
- HPO - version 2025
- CPT - version 2025
Custom systems:
- Any valid system name uploaded via /construe/upload. Requires a paid plan.
Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.
versionstringoptionalCode system version. Must match the version available in your environment.
configobjectoptionalchunking_methodstringoptionaldefault sentencesMethod for splitting input text into chunks before code extraction.
- none - Treat the full input as a single chunk.
- sentences - Split on sentence boundaries (supports citations).
- paragraphs / topics / soap_note - LLM-based chunking.
- clinical_ner_extract - Extract clinical concepts (problems, tests, treatments) and use each as a chunk (supports citations).
- fasthpocr - Extract HPO concepts directly with category
annotations (supports citations). Requires
system: HPO; other systems are rejected. Causes most other config options to be ignored.
nonesentencesparagraphstopicssoap_noteclinical_ner_extractfasthpocrmax_codes_per_chunkintegeroptionalMaximum number of codes to extract per chunk. If not specified, uses system-specific defaults:
- SNOMED: 10
- LOINC, HPO, RXNORM: 20
- All other systems: 5
code_similarity_filternumber (float)optionaldefault 0.9Threshold for filtering similar codes (0.0-1.0)
validation_methodstringoptionaldefault simpleMethod for validating extracted codes:
- none - No validation, returns all candidate codes
- simple - LLM-based validation
- medication_search - LLM-based validation tailored for medication concepts
- chunk_code_jaccard_similarity - Token-level Jaccard similarity between source text chunk and code description
nonesimplemedication_searchchunk_code_jaccard_similaritychunk_code_jaccard_similarity_filtering_thresholdnumber (float)optionaldefault 0.2Minimum Jaccard similarity (0.0-1.0) for a code to be considered valid when using the "chunk_code_jaccard_similarity" validation method. Ignored by other methods.
include_rationalebooleanoptionaldefault trueWhether to include explanations for why each code was extracted
include_ancestorsbooleanoptionaldefault falseWhether to include ancestor/parent codes in the results
include_invalidbooleanoptionaldefault falseWhether to include codes that failed validation in the results
include_citationsbooleanoptionaldefault falseWhether to include source text citations for each extracted code. Citations show the exact text spans (with character offsets) that led to each code. Supported when chunking_method is "sentences", "clinical_ner_extract", or "fasthpocr".
extraction_contextstringoptionalOptional context describing the goal of the extraction. Required when min_context_relevance is greater than 0.
min_context_relevancenumber (double)optionaldefault 0Minimum relevance score (0.0–1.0) a chunk must reach to proceed to code extraction. Chunks are scored by an LLM against the extraction_context goal. Chunks below this threshold are dropped, reducing noise and extraction cost. Set to 0 (the default) to disable relevance filtering and extract from all chunks. Requires the "extraction_context" field when set above 0.
consistency_effortstringoptionaldefault noneHow much effort to spend ensuring consistent results across repeated requests. Higher levels apply stricter filtering to remove borderline results that may vary between calls, improving determinism at the cost of additional latency.
When validation_method is set to a value other than "none", consistency is applied to the validation step: codes must be unanimously validated across multiple rounds to be included.
When validation_method is "none" and min_context_relevance is set above 0, consistency is applied to the relevance ranking step instead: chunks must pass the relevance threshold in every round to be included.
nonelowmediumhighSuccessfully extracted codes
Response fields
systemobjectrequirednamestringoptionalCode system name. Can be a built-in system or a custom system name.
Built-in systems:
- SNOMED_CT_US_LITE - version 20240901
- RXNORM - version 11042024
- ICD-10-CM - version 2025
- ICD-10-PCS - version 2025
- LOINC - version 2.78
- HPO - version 2025
- CPT - version 2025
Custom systems:
- Any valid system name uploaded via /construe/upload. Requires a paid plan.
Usage of CPT is subject to AMA requirements: see PhenoML Terms of Service.
versionstringoptionalCode system version. Must match the version available in your environment.
codesobject[]requiredcodestringrequiredThe extracted code
descriptionstringrequiredShort description of the code
validbooleanrequiredWhether the code passed validation. Always true unless include_invalid is set to true, in which case invalid codes will have this set to false.
reasonstringoptionalExplanation for why this code was extracted (if include_rationale is true)
is_ancestorbooleanoptionalWhether this code is an ancestor (parent) of an extracted code rather than directly extracted. Only present when include_ancestors is true.
citationsobject[]optionalSource text references showing where this code was found in the input. Only present when include_citations is true and chunking method supports it. Ancestor codes do not receive citations.
textstringrequiredThe exact text span containing evidence for the code
begin_offsetintegerrequiredStarting byte offset in the original input text (0-indexed)
end_offsetintegerrequiredEnding byte offset (exclusive), such that input[begin_offset:end_offset] == text
categoriesobject[]optionalHigher-level groupings the extracted code belongs to (e.g. HPO category terms). Only populated by full-extraction chunking methods such as "fasthpocr".
uristringrequiredIdentifier for the category term (e.g. an HPO URI).
labelstringrequiredHuman-readable label for the category term.