Concepts¶

Information on some key concepts in the anonapi CLI

Batch¶

A file holding one or more job ids. This makes it possible to easily query or modify several jobs at once. See the batch command.

Mapping¶

A file that contains everything that is needed to create one or more anonymization jobs.

A typical mapping file will look like this:

## Description ##
Mapping created February 12 2020

## Options ##
root_source_path, \\server\share2\data
project,          Wetenschap-Algemeen
destination_path, \\server\share\folder

## Mapping ##
source,                            patient_id, patient_name, description
folder:example/folder1,            001,        Patient1,     All files from folder1
study_instance_uid:123.12178,      002,        Patient2,     A StudyInstanceUID from PACS
accession_number:12345678.1234567, 003,        Patient3,     An AccessionNumber from PACS
fileselection:a/fileselection.txt, 004,        Patient4,     A selection of files in a

This is a CSV (comma separated values) file that can be edited by any editor. The most convenient way to edit is probably the edit command.

A mapping consists of three sections:

Description

This can contain any text. A description of what this mapping is for

Options

Parameters that are the same for each job. The following parameters can be set:

Parameter	Description
destination_path	Write data to this UNC path after anonymization
pims_key	Use this PIMS project to pseudonymize
project	Anonymize according to this project
root_source_path	Path sources are all relative to this UNC path

Note

Any paths defined in this section have to be UNC paths. No windows drive letters like H:\ or linux mounts such as /mnt/data allowed

Mapping

Parameters that are different for each job. The following parameters can be set:

Parameter	Description
description	Job description, free text
pseudo_id	Pseudonym for Patient ID to set in anonymized data
pseudo_name	Pseudonym for Patient name to set in anonymized data
source	Data to anonymize comes from this source

The value of the source parameter is a source identifier. The different types of identifiers are listed below.

For an overview of map functions, see map.

Input file¶

A csv or excel file that contains one or more columns with folders, pseudonyms or accession numbers. A file like this can be used as an input for map functions such as add-study-folders to add multiple values at once.

Example input file containing folders and pseudonyms:

folder        pseudonym
folder1       studyA
folder2/st1   studyB
folder2/st2   studyC

The column headers (‘folder’ and ‘pseudonym’ above) are used to identify type of data and to find where the columns are in the file. The following column types are currently supported:

Parameter	Allowed column names
accession_number	accession number, acc nr
path	folder, map, path
pseudo_name	pseudoID, pseudonym, name

Finding column headers ignores case and space characters. For example, the following are all valid column headers for accession number: accession number, Accession Number, accession_number, accession-number, AccessionNumber

Information that is not recognized as valid is ignored. For example, the following input file is valid and contains the same information as the example given above:

Some descriptive text that will just be ignored when
parsing this as an input file.

Columns with headers that are not recognized are ignored as well.
Below, 'folder' and 'pseudonym' will be recognized, others ignored

folder        value   pseudonym  comment
folder1       A       studyA
folder2/st1   A       studyB     this column
folder2/st2   B       studyC     will be ignored

Source Identifier¶

Used in mapping to indicate where the data for a job is coming from. Always of the form <identifier_type>:<value>. Types of identifiers:

Folder

Example: folder:mydata/experiment1

Refers to all files in the given folder, relative to the source root path.

Note

If the folder contains any files that are not valid DICOM, the job will fail. Only use this identifier if you want to anonymize all files in a folder, and the folder contains only valid DICOM

File selection

Example: fileselection:mydata/patient1/fileselection.txt

Refers to all the paths listed in the fileselection file. Contrary to the Folder identifier, file selection can be used in a folder where there are non-DICOM files or where only part of the files should be anonymized. When creating a fileselection with add-study-folders or add, non-DICOM files can be excluded automatically

Study instance UID

Example: study_instance_uid:123.1217.23234.2323

Refers to a single study. The anonymization server will retrieve this study from PACS by matching the DICOM tag StudyInstanceUID.

Accession number

Example: accession_number:12345678.1234567

Refers to a single study. The anonymization server will retrieve this study from PACS by matching the DICOM tag AccessionNumber.

Job¶

The basic unit of information on an anonymization server. A job specifies three things. Where the data is, how to anonymize it and where it should go. For working with jobs see job.

File Selection¶

A file typically called fileselection.txt that contains a list of paths. A selection can be a data source for a job. It makes it possible to specify which files should be sent for anonymization and which should not. Methods like add-study-folders and add only include valid DICOM files in a selection.

The contents of a typical file selection that contains 4 file paths:

description: a typical file selection
id: bfc33f5e-d1cc-472e-aa05-31a5979d52be
selected_paths:
- folder1/1.dcm
- folder1/2.dcm
- folder2/1.dcm
- folder4/raw/raw1.dcm

A selection file can be edited by any text editor. See select.

Note

Selected paths are always relative to the location of fileselection.txt. Selected paths are always in a path on or below the selection file.

Server¶

An anonymization server fetches, anonymizes and delivers your data according to the jobs it has in its database. Servers can retrieve data from PACS or from network shares. The anonapi CLI can work with multiple servers. See Server commands.

UNC paths¶

Any path sent to the anonymization server should be a UNC path. A UNC path is any path starting with:

\\<server_name>\<share_name>

For example:

\\umcfilesp01\research\folder1\file.dcm
\\server1\share2\myfolder\

UNC paths are mandatory for creating anonymization jobs because they are well supported in most operating systems and unambiguous. In contrast, windows drive letters such as C:\, mapped network drives such as X:\ and linux mounts like /mnt/share1 can refer to different locations on different computers.

You can find more unc_path_info online.

Finding a UNC path¶

Windows

In windows shares are often mapped to a drive letter such as H:\ or X:\. To find the UNC path for these drive letters, open windows explorer (start menu -> explorer) and expand the computer icon in the lower left side:

In this example (H:) radngdata$ (\\umcfs097) corresponds to the UNC path \\umcfs097\radngdata$ note the path in this case includes the final $

Linux

In linux UNC paths are mounted in fstab. Use:

$ less  /etc/fstab

To find out which UNC path is mapped to which mount point.