Dataset tools#


Available commands:




Initialize a new dataset.


Add storage location to dataset.


Remove storage location from dataset.


Rename storage location.


Update dataset.


Show status of files.


Show global information about dataset.


Show difference between two storage locations.


Lock as storage location.


Send f”/shelephant/storage/{name}.yaml” to storage location.


Get f”/shelephant/storage/{name}.yaml” from storage location.


Copy files from one location to another.


Move files from one location to another (both local).


Remove files from one location.


Print equivalent directory in the storage location.


Run git command on the database directory (.shelephant).


Add all symbolic links at .shelephant to dataset’s .gitignore.

usage: shelephant [-h] [--version]

Positional Arguments#


Possible choices: status, info, update, cp, mv, rm, pwd, diff, gitignore, send_storage, get_storage, add, remove, rename, lock, git, init

Command to run.

Named Arguments#


show program’s version number and exit

shelephant init#

Initialize a shelephant database by creating a directory .shelephant with an ‘empty’ database. Use shelephant add to add storage locations.

usage: shelephant init [-h] [--version]

Named Arguments#


show program’s version number and exit

shelephant add#

Add a storage location to the database. The database in .shelephant is updated as follows:

  • The name is added to .shelephant/storage.yaml.

  • A file .shelephant/storage/<name>.yaml is created with the search settings and the present state of the storage location.

  • A symlink .shelephant/data/<name> is created to the storage location. (if --ssh is given, the symlink points to a dead link).


A special case is

shelephant add here --rglob '*.h5'

which helps to investigate your database directory. Note that here is a reserved name and that you should not specify the root.

usage: shelephant add [-h] [--ssh str] [--mount Path] [--prefix Path]
                      [--rglob str] [--glob str] [--exec str] [--skip str]
                      [--shallow] [-q] [--version]
                      str [Path]

Positional Arguments#


Name of the storage location.


Path to the storage location.

Named Arguments#


SSH host (e.g. user@host).


Optional mount location for SSH host.


Add prefix to all files.


Search pattern for Path(root).rglob(...).

Default: []


Search pattern for Path(root).glob(...).

Default: []


Command to run from root.

Default: []


Pattern to skip (Python regex).

Default: []


Do not compute checksums.

Default: False

-q, --quiet

Do not print progress.

Default: False


show program’s version number and exit

shelephant remove#

Remove a storage location to the database. The database in .shelephant is updated as follows:

  • The name is removed from .shelephant/storage.yaml.

  • .shelephant/storage/<name>.yaml is removed.

  • The symlink .shelephant/data/<name> is removed.

usage: shelephant remove [-h] [--version] str

Positional Arguments#


Name of the storage location.

Named Arguments#


show program’s version number and exit

shelephant update#

Update the database. This function always update the symbolic links, and optionally updates the available files and checksums of (a) )storage location(s).

usage: shelephant update [-h] [--version] [--base-link] [--clean] [-s]
                         [--verbose] [--chunk <lambda>] [--force] [-q]
                         [str] [Path ...]

Positional Arguments#


Update storage location(s).


Update only specific paths on location.

Named Arguments#


show program’s version number and exit


Update link .shelephant/data/{name} based on .shelephant/storage/{name}.yaml.

Default: False


Clean database entry with symlinks.

Default: False

-s, --shallow

Do not compute checksums.

Default: False


Verbose commands.

Default: False


Chunk size for computing checksums (bytes).

Default: 30000000000.0


Force update of path(s).

Default: False

-q, --quiet

Do not print progress.

Default: False

shelephant status#

Status of the storage locations.


Use --list or --print0 to get a list of files instead of a table. Use for example as:

shelephant cp source dest $(shelephant status --copies 1 --list)

or to copy in batches of 100:

shelephant status --copies 1 --print0 | xargs -n 100 -0 shelephant cp source dest $@

The latter you can also do with the --nout (-n) option of shelephant status:

shelephant cp source dest $(shelephant status --copies 1 --list -n 100)

usage: shelephant status [-h] [--version] [--min-copies int] [--copies int]
                         [--ne] [--na] [--unknown] [--list] [--print0]
                         [-n int] [--table str] [--in-use str] [--not-on str]
                         [--on str] [-b]
                         [str ...]

Positional Arguments#


Filter to paths (either one directory, or multiple files).

Named Arguments#


show program’s version number and exit


Show files with minimal number of copies.


Show files with specific number of copies.


Show files with unequal copies.

Default: False


Show files unavailable somewhere.

Default: False


Show files with unknown sha256.

Default: False


Print list of files (no table).

Default: False


Print list of files (no table).

Default: False

-n, --nout

Maximal number of output arguments.


Select print style.



Select storage location in use (use ‘none’ for unavailable).


List files that are not on a storage location.


Limit to files available on storage location.

Default: []

-b, --relative-to-base

Show path relative to base directory of dataset.

Default: False

shelephant info#

Show global information about dataset.

usage: shelephant info [-h] [--version] [--cachedir] [--basedir] [str ...]

Positional Arguments#


Name of the storage location(s).

Named Arguments#


show program’s version number and exit


Print cache-dir and quit.

Default: False


Print basedir (containing ‘.shelephant’) and quit.

Default: False

shelephant lock#

Lock as storage location.

usage: shelephant lock [-h] [--version] str

Positional Arguments#


Name of the storage location.

Named Arguments#


show program’s version number and exit

shelephant cp#

Copy files between storage locations and update the database. After copying, the checksums on the destination are recomputed and the database updated. Use:

  • -s, --shallow to skip the checksum computation (store only path/size/mtime).

  • -x, --no-update to skip the database update all together.


The paths that you specify are reduced to only the paths known to exist on the source. If you know that the paths exist, but they are not part of the database (or it is outdated), use -e, --exists to avoid the filter.


To make a clone call shelephant cp source destination . from the dataset’s root.


The copied files are added to the database of the destination. There is no check that this fits dump and search settings.

usage: shelephant cp [-h] [--version] [--colors str] [-f] [-q] [-n] [-x] [-e]
                     [-s] [--mode str]
                     str str Path [Path ...]

Positional Arguments#


name of the source


name of the destination


path(s) to copy

Named Arguments#


show program’s version number and exit


color scheme [none, dark]

Default: “dark”

-f, --force

overwrite without prompt

Default: False

-q, --quiet

do not print progress

Default: False

-n, --dry-run

print copy-plan and exit

Default: False

-x, --no-update

no database update

Default: False

-e, --exists

all paths exists on source

Default: False

-s, --shallow

do not compute checksums

Default: False


use ‘sha256’, ‘rsync’, and/or ‘basic’ to compare files

Default: “sha256,basic”

shelephant mv#

Move files from one storage location to another.


The copied files are added to the database of the destination. There is no check that this fits dump and search settings.

usage: shelephant mv [-h] [--version] [--colors str] [-f] [-q] [-n]
                     str str Path [Path ...]

Positional Arguments#


name of the source.


name of the destination.


path(s) to copy.

Named Arguments#


show program’s version number and exit


Color scheme [none, dark].

Default: “dark”

-f, --force

Overwrite without prompt.

Default: False

-q, --quiet

Do not print progress.

Default: False

-n, --dry-run

Print copy-plan and exit.

Default: False

shelephant rm#

Remove files from a storage location.


This removes the actual data. The link is also if there is no alternative source left.

usage: shelephant rm [-h] [--version] [-f] [-q] [-n] str Path [Path ...]

Positional Arguments#


name of the source.


path(s) to remove.

Named Arguments#


show program’s version number and exit

-f, --force

Overwrite without prompt.

Default: False

-q, --quiet

Do not print progress.

Default: False

-n, --dry-run

Print copy-plan and exit.

Default: False

shelephant pwd#

Change the current working directory to a storage location.

usage: shelephant pwd [-h] [--version] [--base] [--abspath] str

Positional Arguments#


name of the source.

Named Arguments#


show program’s version number and exit


Print the base directory.

Default: False


Print absolute path.

Default: False

shelephant gitignore#

Add all symbolic links managed to the dataset’s root .gitignore.


This is the /path/to/dataset/.gitignore file, not /path/to/dataset/.shelephant/.gitignore.

usage: shelephant gitignore [-h] [--version]

Named Arguments#


show program’s version number and exit