Usage¶

Basic Usage¶

Note

To load images, we relly on the OpenSlide library. Please only use images that are supported by OpenSlide. If you encounter any issues with loading images, please check the OpenSlide documentation for supported formats. If you are using a virtual environment, ensure it is activated before running the command. If you want to use .tiff or other plain images that are not supported by OpenSlide, install libvips and convert them to pyramid format. For example, you can use the following command to convert a plain image to pyramid format:

vips tiffsave input.tiff output.tiff --tile --tile-width 256 --tile-height 256 --pyramid --compression jpeg --Q 90 --vips-progress

This will create a pyramid image that can be used with CellViT.

This package is designed as a command-line tool. Configuration can be provided either directly via the CellViT CLI or by using a configuration file. The configuration file is a YAML file containing the settings for the inference pipeline. The main script is located in the cellvit module, and can be run using the following command:

cellvit-inference

You then have to either specify a configuration file

cellvit-inference --config <path_to_config_file>

or provide the required parameters directly in the command line. To list all available parameters, run:

cellvit-inference --help

You can select to run inference for one slide only or for a batch of slides.

Configuration Options¶

The configuration options are divided into several sections, each with its own purpose. Below is a summary of the main sections and their parameters. They appear in the .yaml file as well as in the CLI.

Danger

Always one of the process_wsi or process_dataset options must be selected. They are mutually exclusive.

Section	Name	Description	Type	Default	Required
General
	model	Segmentation model to use Choices: [“SAM”, “HIPT”]	str	➖	✔️
	nuclei_taxonomy	Defines the nuclei classification taxonomy Choices: [“binary”, “pannuke”, “consep”, “lizard”, “midog”, “nucls_main”, “nucls_super”, “ocelot”, “panoptils”]	str	“pannuke”	➖
Inference
	gpu	GPU ID to use for inference	int	0	➖
	enforce_amp	Whether to use Automatic Mixed Precision (AMP) for inference	bool	false	➖
	batch_size	Number of images (1024 x 1024 patches) processed per batch	int	8	➖
Output Settings
	outdir	Path to the output directory where results will be stored	str	➖	✔️
	geojson	Whether to export results in GeoJSON format (for QuPath or other tools)	bool	false	➖
	graph	Whether to generate a cell graph representation	bool	false	➖
	compression	Whether to use Snappy compression for output files	bool	false	➖
System
	cpu_count	Number of CPU cores to use for inference	int	System configuration	➖
	ray_worker	Number of ray worker to use for inference (limited by cpu-count)	int	System configuration	➖
	ray_remote_cpus	Number of CPUs per ray worker	int	System configuration	➖
	memory	RAM in MB to use	int	System configuration	➖
Debug
	debug	If debug should be used - this changes to logger level and requires ray[default]. Also outputs segmentation mask of the tissue preprocessing	bool	false	➖
Processing Mode: Process a Single Whole Slide Image (WSI)
	wsi_path	Path to the Whole Slide Image (WSI) file	str	➖	✔️
	wsi_mpp	Microns per pixel (spatial resolution of the slide)	float	Extracted automatically from file (if available)	➖
	wsi_magnification	Magnification level of the slide (e.g., 40)	int	Extracted automatically from file (if available)	➖
Processing Mode: Process a Dataset (Multiple WSI Files)
	wsi_folder	Path to a folder containing multiple WSI files	str	➖	✔️ (if wsi_filelist is NOT used)
	wsi_filelist	Path to a CSV file listing WSI files (must have a ‘path’ column)	str	➖	✔️ (if wsi_folder is NOT used)
	wsi_extension	File extension of WSI files (used for detection within wsi_folder)	str	svs	➖
	wsi_mpp	Microns per pixel (spatial resolution of the slide). Overwrites slide settings and also mpp set in the filelist	float	Extracted automatically from file (if available)	➖
	wsi_magnification	Magnification level of the slide (e.g., 40). Overwrites slide settings and also magnification set in the filelist	int	Extracted automatically from file (if available)	➖

YAML-Configuration¶

The configuration file for CellViT Inference is structured in .yaml format. Below is an example configuration with explanations for each setting.

# ==========================
# CellViT Inference Config
# ==========================

# Model selection (REQUIRED)
model:                # REQUIRED | str: Segmentation model to use.
                      # Choices: ["SAM", "HIPT"]
# Nuclei classification taxonomy (OPTIONAL)
nuclei_taxonomy:      # OPTIONAL | str: Defines the nuclei classification taxonomy.
                      # Choices: ["binary", "pannuke", "consep", "lizard", "midog", "nucls_main", "nucls_super", "ocelot", "panoptils"]
                      # Default: "pannuke"

# ==========================
# Inference Settings (OPTIONAL)
# ==========================
inference:
  gpu:                # OPTIONAL | int: GPU ID to use for inference.
                      # Default: 0 (use first available GPU)
  enforce_amp:        # OPTIONAL | bool: Whether to use Automatic Mixed Precision (AMP) for inference.
                      # Default: false (disabled)
  batch_size:         # OPTIONAL | int: Number of images (1024 x 1024 patches) processed per batch.
                      # Default: 8

# ==========================
# Output Settings
# ==========================
output_format:
  outdir:             # REQUIRED | str: Path to the output directory where results will be stored.
  geojson:            # OPTIONAL | bool: Whether to export results in GeoJSON format (for QuPath or other tools).
                      # Default: false (disabled)
  graph:              # OPTIONAL | bool: Whether to generate a cell graph representation.
                      # Default: false (disabled)
  compression:        # OPTIONAL | bool: Whether to use Snappy compression for output files.
                      # Default: false (disabled)

# ==========================
# Processing Mode (Choose One)
# ==========================
# Either `process_wsi` (for a single image) or `process_dataset` (for multiple images) should be used.

# --- Process a Single Whole Slide Image (WSI) ---
process_wsi:
  wsi_path:           # REQUIRED | str: Path to the Whole Slide Image (WSI) file.
  wsi_mpp:            # OPTIONAL | float: Microns per pixel (spatial resolution of the slide).
                      # Default: Extracted automatically from file (if available).
  wsi_magnification:  # OPTIONAL | int: Magnification level of the slide (e.g., 20x, 40x).
                      # Default: Extracted automatically from file (if available).

# --- Process a Dataset (Multiple WSI Files) ---
process_dataset:
  wsi_folder:         # REQUIRED (if `wsi_filelist` is NOT used) | str: Path to a folder containing multiple WSI files.
                      # Either `wsi_folder` OR `wsi_filelist` must be provided (not both).
  wsi_extension:      # OPTIONAL | str: File extension of WSI files (used for detection within wsi_folder).
                      # Default: "svs"
  wsi_filelist:       # REQUIRED (if `wsi_folder` is NOT used) | str: Path to a CSV file listing WSI files.
                      # CSV must have a 'path' column, with optional 'wsi_mpp' and 'wsi_magnification' columns.
                      # If 'wsi_mpp' and 'wsi_magnification' are provided, they override global settings.
  wsi_mpp:            # OPTIONAL | float: Microns per pixel (spatial resolution).
                      # Default: Extracted automatically from file (if available).
                      # Can be used with both `wsi_folder` and `wsi_filelist`.
  wsi_magnification:  # OPTIONAL | int: Magnification level of the slides.
                      # Default: Extracted automatically from file (if available).
                      # Can be used with both `wsi_folder` and `wsi_filelist`.

# ==========================
# System Settings (OPTIONAL)
# ==========================
system:
  cpu_count:          # OPTIONAL | int: Number of CPU cores to use for inference.
                      # Default: Uses system configuration.
  ray_worker:         # OPTIONAL | int: Number of ray workers to use for inference. Limited by cpu_count.
                      # Default: Uses system configuration.
  ray_remote_cpus:    # OPTIONAL | int: Number of CPUs per ray worker.
                      # Default: Uses system configuration.
  memory:             # OPTIONAL | int: RAM in MB to use.
                      # Default: Uses system configuration.

# ==========================
# Debug Settings (OPTIONAL)
# ==========================
debug:                # OPTIONAL | bool: If debug should be used - this changes to logger level and requires ray[default]
                      # Default: False

Examples for .yaml configuration files can be found in the Examples section.

Note

The configuration file must be in YAML format.
Either run a single WSI or a dataset of WSIs, but not both at the same time.
The wsi_path and wsi_folder or wsi_filelist parameters are mutually exclusive.
The wsi_mpp and wsi_magnification parameters can be set globally or per WSI in the file list.
The output_format section allows you to customize the output format and compression settings.
The system section allows you to customize the CPU and memory settings for inference.
The debug section allows you to enable debug mode for more detailed logging.
The configuration file can be passed as a command-line argument using the –config flag.

CLI-Configuration¶

The CLI configuration allows you to specify the parameters directly in the command line.

General configuration¶

usage: cellvit-inference [-h] [--config CONFIG] [--model {SAM,HIPT}] [--nuclei_taxonomy {binary,pannuke,consep,lizard,midog,nucls_main,nucls_super,ocelot,panoptils}] [--gpu GPU]
                  [--enforce_amp] [--batch_size BATCH_SIZE] [--outdir OUTDIR] [--geojson] [--graph] [--compression] [--cpu_count CPU_COUNT] [--ray_worker RAY_WORKER]
                  [--ray_remote_cpus RAY_REMOTE_CPUS] [--memory MEMORY] [--debug]
                  {process_wsi,process_dataset} ...

Perform CellViT++ inference

positional arguments:
  {process_wsi,process_dataset}
                        Select processing mode
    process_wsi         Process a single Whole Slide Image
    process_dataset     Process multiple WSI files

options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to a YAML configuration file. If provided, CLI arguments are ignored. (default: None)
  --model {SAM,HIPT}    Segmentation model to use (default: None), REQUIRED
  --nuclei_taxonomy {binary,pannuke,consep,lizard,midog,nucls_main,nucls_super,ocelot,panoptils}
                        Defines the nuclei classification taxonomy (default: pannuke), OPTIONAL
  --debug               Enable debug mode (changes logger level and requires ray[default]) (default: False), OPTIONAL

Inference Settings:
  --gpu GPU             GPU ID to use for inference (default: 0), OPTIONAL
  --enforce_amp         Whether to use Automatic Mixed Precision (AMP) for inference (default: False), OPTIONAL
  --batch_size BATCH_SIZE
                        Number of images processed per batch (default: 8), OPTIONAL

Output Settings:
  --outdir OUTDIR       Path to the output directory where results will be stored (default: None), REQUIRED
  --geojson             Whether to export results in GeoJSON format (for QuPath or other tools) (default: False), OPTIONAL
  --graph               Whether to generate a cell graph representation (default: False), OPTIONAL
  --compression         Whether to use Snappy compression for output files (default: False), OPTIONAL

System Settings:
  --cpu_count CPU_COUNT
                        Number of CPU cores to use for inference (default: None), OPTIONAL
  --ray_worker RAY_WORKER
                        Number of ray worker to use for inference (limited by cpu-count) (default: None), OPTIONAL
  --ray_remote_cpus RAY_REMOTE_CPUS
                        Number of CPUs per ray worker (default: None), OPTIONAL
  --memory MEMORY       RAM in MB to use (default: None), OPTIONAL

Process a single image¶

All previous configuration options need to be set before running the command with process_wsi:

cellvit-inference [previous options] process_wsi [wsi_options]

The ``process_wsi``options are:

usage: cellvit-inference process_wsi [-h] (--wsi_folder WSI_FOLDER | --wsi_filelist WSI_FILELIST) [--wsi_extension WSI_EXTENSION] [--wsi_mpp WSI_MPP]
                                    [--wsi_magnification WSI_MAGNIFICATION]

options:
  -h, --help            show this help message and exit
  --wsi_path WSI_PATH   Path to the Whole Slide Image (WSI) file, REQUIRED
  --wsi_mpp WSI_MPP     Microns per pixel (spatial resolution of the slide), OPTIONAL
                        Default: Extracted automatically from file (if available)
  --wsi_magnification WSI_MAGNIFICATION
                        Magnification level of the slide (e.g., 40), OPTIONAL
                        Default: Extracted automatically from file (if available)

Process a dataset¶

cellvit-inference [previous options] process_dataset [wsi_options]

The ``process_dataset``options are:

usage: cellvit-inference process_dataset [-h] (--wsi_folder WSI_FOLDER | --wsi_filelist WSI_FILELIST) [--wsi_extension WSI_EXTENSION] [--wsi_mpp WSI_MPP]
                                    [--wsi_magnification WSI_MAGNIFICATION]

options:
  -h, --help            show this help message and exit
  --wsi_folder WSI_FOLDER
                        Path to a folder containing multiple WSI files, REQUIRED if wsi_filelist is NOT used
  --wsi_filelist WSI_FILELIST
                        Path to a CSV file listing WSI files (must have a 'path' column), REQUIRED if wsi_folder is NOT used
  --wsi_extension WSI_EXTENSION
                        File extension of WSI files (used for detection within wsi_folder), OPTIONAL
  --wsi_mpp WSI_MPP     Microns per pixel (spatial resolution), OPTIONAL
                        Default: Extracted automatically from file (if available)
                        Can be used with both wsi_folder and wsi_filelist
  --wsi_magnification WSI_MAGNIFICATION
                        Magnification level of the slides, OPTIONAL
                        Default: Extracted automatically from file (if available)
                        Can be used with both wsi_folder and wsi_filelist

Note

The wsi_path and wsi_folder or wsi_filelist parameters are mutually exclusive.
The wsi_mpp and wsi_magnification parameters can be set globally or per WSI in the file list.
The output_format section allows you to customize the output format and compression settings.
The system section allows you to customize the CPU and memory settings for inference.
The debug section allows you to enable debug mode for more detailed logging.
The configuration file can be passed as a command-line argument using the –config flag.
The –wsi_folder option is used to specify a folder containing multiple WSI files.
The –wsi_filelist option is used to specify a CSV file listing WSI files, even from different folders. Provide the entire WSI-paths in the path column.
The –wsi_extension option is used to specify the file extension of WSI files (e.g., “svs”).