UCD.js Docs

Codegen

Architecture notes for @ucdjs/codegen

@ucdjs/codegen turns raw Unicode data files into generated TypeScript artifacts.

Role

  • Loads raw UCD text files into RawDataFile structures.
  • Uses model-driven field generation to infer property typings.
  • Produces generated interfaces and field-name constants for downstream packages.

Mental Model

There are two layers:

  • processFile() is the shared file-loading primitive
  • runFieldsCodegen() is the higher-level orchestration for field inference and code emission

The package is intentionally narrow. It does not own store discovery or output persistence. It accepts files, processes them, and returns generated content.

Method Flows

processFile(filePath, version, processor)

processor callbackRawDataFilereadFile()processFile()processor callbackRawDataFilereadFile()processFile()alt[read fails][file content]CallerprocessFile(filePath, version, processor)read file from diskerrornullUTF-8 contentnew RawDataFile(content)processor(datafile, fileName, version)result or nullresult or nullCaller

runFieldsCodegen()

interface + const buildersgenerateFields()concurrency limiterLanguageModelrunFieldsCodegen()interface + const buildersgenerateFields()concurrency limiterLanguageModelrunFieldsCodegen()alt[inline content][file path]loop[each input file]CallerrunFieldsCodegen({ files, model/openaiKey })create model and concurrency limiterschedule processinggenerate fields from RawDataFile(content)processFile(filePath, version, callback)generate fields from RawDataFileinferred fieldsgenerated TypeScript sourcenon-null generated filesCaller

Generated artifact assembly

knitwork buildersgenerateFieldsCode()knitwork buildersgenerateFieldsCode()Callerfields + file metadatadedupe field namesbuildInterface(PascalCase file name)buildStringArray(field names){ fields, code, fileName, version }Caller

Design Notes

  • processFile() deliberately returns null on failure so batch codegen can continue.
  • Field inference is model-based, but code emission is deterministic once fields are known.
  • Concurrency limiting happens at the orchestration layer rather than inside the processor callback.
  • Generated identifiers are sanitized before code emission.

Testing Use

  • disk-backed vs inline-content processing
  • processor failure and null filtering
  • duplicate field-name handling
  • deterministic code emission from inferred fields
  • model wiring and concurrency behavior in runFieldsCodegen()

On this page