Concepts

Information on some key concepts in the anonapi CLI

Batch

A file holding one or more job ids. This makes it possible to easily query or modify several jobs at once. See the batch command.

Mapping

A file that contains everything that is needed to create one or more anonymization jobs.

A typical mapping file will look like this:

## Description ##
Mapping created February 12 2020

## Options ##
root_source_path, \\server\share2\data
project,          Wetenschap-Algemeen
destination_path, \\server\share\folder

## Mapping ##
source,                            patient_id, patient_name, description
folder:example/folder1,            001,        Patient1,     All files from folder1
study_instance_uid:123.12178,      002,        Patient2,     A StudyInstanceUID from PACS
accession_number:12345678.1234567, 003,        Patient3,     An AccessionNumber from PACS
fileselection:a/fileselection.txt, 004,        Patient4,     A selection of files in a

This is a CSV (comma separated values) file that can be edited by any editor. The most convenient way to edit is probably the edit command.

A mapping consists of three sections:

Description

This can contain any text. A description of what this mapping is for

Options

Parameters that are the same for each job. The following parameters can be set:

Parameter

Description

destination_path

Write data to this UNC path after anonymization

pims_key

Use this PIMS project to pseudonymize

project

Anonymize according to this project

root_source_path

Path sources are all relative to this UNC path

Note

Any paths defined in this section have to be UNC paths. No windows drive letters like H:\ or linux mounts such as /mnt/data allowed

Mapping

Parameters that are different for each job. The following parameters can be set:

Parameter

Description

description

Job description, free text

pseudo_id

Pseudonym for Patient ID to set in anonymized data

pseudo_name

Pseudonym for Patient name to set in anonymized data

source

Data to anonymize comes from this source

The value of the source parameter is a source identifier. The different types of identifiers are listed below.

For an overview of map functions, see map.

Input file

A csv or excel file that contains one or more columns with folders, pseudonyms or accession numbers. A file like this can be used as an input for map functions such as add-study-folders to add multiple values at once.

Example input file containing folders and pseudonyms:

folder        pseudonym
folder1       studyA
folder2/st1   studyB
folder2/st2   studyC

The column headers (‘folder’ and ‘pseudonym’ above) are used to identify type of data and to find where the columns are in the file. The following column types are currently supported:

Parameter

Allowed column names

accession_number

accession number, acc nr

path

folder, map, path

pseudo_name

pseudoID, pseudonym, name

Finding column headers ignores case and space characters. For example, the following are all valid column headers for accession number: accession number, Accession Number, accession_number, accession-number, AccessionNumber

Information that is not recognized as valid is ignored. For example, the following input file is valid and contains the same information as the example given above:

Some descriptive text that will just be ignored when
parsing this as an input file.

Columns with headers that are not recognized are ignored as well.
Below, 'folder' and 'pseudonym' will be recognized, others ignored

folder        value   pseudonym  comment
folder1       A       studyA
folder2/st1   A       studyB     this column
folder2/st2   B       studyC     will be ignored

Source Identifier

Used in mapping to indicate where the data for a job is coming from. Always of the form <identifier_type>:<value>. Types of identifiers:

Folder

Example: folder:mydata/experiment1

Refers to all files in the given folder, relative to the source root path.

Note

If the folder contains any files that are not valid DICOM, the job will fail. Only use this identifier if you want to anonymize all files in a folder, and the folder contains only valid DICOM

File selection

Example: fileselection:mydata/patient1/fileselection.txt

Refers to all the paths listed in the fileselection file. Contrary to the Folder identifier, file selection can be used in a folder where there are non-DICOM files or where only part of the files should be anonymized. When creating a fileselection with add-study-folders or add, non-DICOM files can be excluded automatically

Study instance UID

Example: study_instance_uid:123.1217.23234.2323

Refers to a single study. The anonymization server will retrieve this study from PACS by matching the DICOM tag StudyInstanceUID.

Accession number

Example: accession_number:12345678.1234567

Refers to a single study. The anonymization server will retrieve this study from PACS by matching the DICOM tag AccessionNumber.

Job

The basic unit of information on an anonymization server. A job specifies three things. Where the data is, how to anonymize it and where it should go. For working with jobs see job.

File Selection

A file typically called fileselection.txt that contains a list of paths. A selection can be a data source for a job. It makes it possible to specify which files should be sent for anonymization and which should not. Methods like add-study-folders and add only include valid DICOM files in a selection.

The contents of a typical file selection that contains 4 file paths:

description: a typical file selection
id: bfc33f5e-d1cc-472e-aa05-31a5979d52be
selected_paths:
- folder1/1.dcm
- folder1/2.dcm
- folder2/1.dcm
- folder4/raw/raw1.dcm

A selection file can be edited by any text editor. See select.

Note

Selected paths are always relative to the location of fileselection.txt. Selected paths are always in a path on or below the selection file.

Server

An anonymization server fetches, anonymizes and delivers your data according to the jobs it has in its database. Servers can retrieve data from PACS or from network shares. The anonapi CLI can work with multiple servers. See Server commands.

UNC paths

Any path sent to the anonymization server should be a UNC path. A UNC path is any path starting with:

\\<server_name>\<share_name>

For example:

\\umcfilesp01\research\folder1\file.dcm
\\server1\share2\myfolder\

UNC paths are mandatory for creating anonymization jobs because they are well supported in most operating systems and unambiguous. In contrast, windows drive letters such as C:\, mapped network drives such as X:\ and linux mounts like /mnt/share1 can refer to different locations on different computers.

You can find more unc_path_info online.

Finding a UNC path

Windows

In windows shares are often mapped to a drive letter such as H:\ or X:\. To find the UNC path for these drive letters, open windows explorer (start menu -> explorer) and expand the computer icon in the lower left side:

Finding UNC paths in windows

In this example (H:) radngdata$ (\\umcfs097) corresponds to the UNC path \\umcfs097\radngdata$ note the path in this case includes the final $

Linux

In linux UNC paths are mounted in fstab. Use:

$ less  /etc/fstab

To find out which UNC path is mapped to which mount point.