Dataset tools#

shelephant#

Available commands:

command

description

init

Initialize a new dataset.

add

Add storage location to dataset.

remove

Remove storage location from dataset.

rename

Rename storage location.

update

Update dataset.

status

Show status of files.

info

Show global information about dataset.

diff

Show difference between two storage locations.

lock

Lock as storage location.

send_storage

Send f”/shelephant/storage/{name}.yaml” to storage location.

get_storage

Get f”/shelephant/storage/{name}.yaml” from storage location.

cp

Copy files from one location to another.

mv

Move files from one location to another (both local).

rm

Remove files from one location.

pwd

Print equivalent directory in the storage location.

git

Run git command on the database directory (.shelephant).

gitignore

Add all symbolic links at .shelephant to dataset’s .gitignore.

usage: shelephant [-h] [--version]
                  {status,info,update,cp,mv,rm,pwd,diff,gitignore,send_storage,get_storage,add,remove,rename,lock,git,init}

Positional Arguments#

command

Possible choices: status, info, update, cp, mv, rm, pwd, diff, gitignore, send_storage, get_storage, add, remove, rename, lock, git, init

Command to run.

Named Arguments#

--version

show program’s version number and exit

shelephant init#

Initialize a shelephant database by creating a directory .shelephant with an ‘empty’ database. Use shelephant add to add storage locations.

usage: shelephant init [-h] [--version]

Named Arguments#

--version

show program’s version number and exit

shelephant add#

Add a storage location to the database. The database in .shelephant is updated as follows:

  • The name is added to .shelephant/storage.yaml.

  • A file .shelephant/storage/<name>.yaml is created with the search settings and the present state of the storage location.

  • A symlink .shelephant/data/<name> is created to the storage location. (if --ssh is given, the symlink points to a dead link).

Note

A special case is

shelephant add here --rglob '*.h5'

which helps to investigate your database directory. Note that here is a reserved name and that you should not specify the root.

usage: shelephant add [-h] [--ssh str] [--mount Path] [--prefix Path]
                      [--rglob str] [--glob str] [--exec str] [--skip str]
                      [--shallow] [-q] [--version]
                      str [Path]

Positional Arguments#

name

Name of the storage location.

root

Path to the storage location.

Named Arguments#

--ssh

SSH host (e.g. user@host).

--mount

Optional mount location for SSH host.

--prefix

Add prefix to all files.

--rglob

Search pattern for Path(root).rglob(...).

Default: []

--glob

Search pattern for Path(root).glob(...).

Default: []

--exec

Command to run from root.

Default: []

--skip

Pattern to skip (Python regex).

Default: []

--shallow

Do not compute checksums.

Default: False

-q, --quiet

Do not print progress.

Default: False

--version

show program’s version number and exit

shelephant remove#

Remove a storage location to the database. The database in .shelephant is updated as follows:

  • The name is removed from .shelephant/storage.yaml.

  • .shelephant/storage/<name>.yaml is removed.

  • The symlink .shelephant/data/<name> is removed.

usage: shelephant remove [-h] [--version] str

Positional Arguments#

name

Name of the storage location.

Named Arguments#

--version

show program’s version number and exit

shelephant update#

Update the database. This function always update the symbolic links, and optionally updates the available files and checksums of (a) )storage location(s).

usage: shelephant update [-h] [--version] [--sync-search] [--base-link]
                         [--clean] [-s] [--verbose] [--chunk <lambda>]
                         [--force] [-q]
                         [str] [Path ...]

Positional Arguments#

name

Update storage location(s).

path

Update only specific paths on location.

Named Arguments#

--version

show program’s version number and exit

--sync-search

Set the same search settings for all locations (except ‘here’).

Default: False

--base-link

Update link .shelephant/data/{name} based on .shelephant/storage/{name}.yaml.

Default: False

--clean

Clean database entry with symlinks.

Default: False

-s, --shallow

Do not compute checksums.

Default: False

--verbose

Verbose commands.

Default: False

--chunk

Chunk size for computing checksums (bytes).

Default: 30000000000.0

--force

Force update of path(s).

Default: False

-q, --quiet

Do not print progress.

Default: False

shelephant status#

Status of the storage locations.

Tip

Use --list or --print0 to get a list of files instead of a table. Use for example as:

shelephant cp source dest $(shelephant status --copies 1 --list)

or to copy in batches of 100:

shelephant status --copies 1 --print0 | xargs -n 100 -0 shelephant cp source dest $@

The latter you can also do with the --nout (-n) option of shelephant status:

shelephant cp source dest $(shelephant status --copies 1 --list -n 100)

usage: shelephant status [-h] [--version] [--min-copies int] [--copies int]
                         [--ne] [--na] [--unknown] [--list] [--print0]
                         [-n int] [--table str] [--in-use str] [--not-on str]
                         [--on str] [-b]
                         [str ...]

Positional Arguments#

path

Filter to paths (either one directory, or multiple files).

Named Arguments#

--version

show program’s version number and exit

--min-copies

Show files with minimal number of copies.

--copies

Show files with specific number of copies.

--ne

Show files with unequal copies.

Default: False

--na

Show files unavailable somewhere.

Default: False

--unknown

Show files with unknown sha256.

Default: False

--list

Print list of files (no table).

Default: False

--print0

Print list of files (no table).

Default: False

-n, --nout

Maximal number of output arguments.

--table

Select print style.

Default: “SINGLE_BORDER”

--in-use

Select storage location in use (use ‘none’ for unavailable).

--not-on

List files that are not on a storage location.

--on

Limit to files available on storage location.

Default: []

-b, --relative-to-base

Show path relative to base directory of dataset.

Default: False

shelephant info#

Show global information about dataset.

usage: shelephant info [-h] [--version] [--cachedir] [--basedir] [str ...]

Positional Arguments#

location

Name of the storage location(s).

Named Arguments#

--version

show program’s version number and exit

--cachedir

Print cache-dir and quit.

Default: False

--basedir

Print basedir (containing ‘.shelephant’) and quit.

Default: False

shelephant lock#

Lock as storage location.

usage: shelephant lock [-h] [--version] str

Positional Arguments#

name

Name of the storage location.

Named Arguments#

--version

show program’s version number and exit

shelephant cp#

Copy files between storage locations and update the database. After copying, the checksums on the destination are recomputed and the database updated. Use:

  • -s, --shallow to skip the checksum computation (store only path/size/mtime).

  • -x, --no-update to skip the database update all together.

Note

The paths that you specify are reduced to only the paths known to exist on the source. If you know that the paths exist, but they are not part of the database (or it is outdated), use -e, --exists to avoid the filter.

Tip

To make a clone call shelephant cp source destination . from the dataset’s root.

Note

The copied files are added to the database of the destination. There is no check that this fits dump and search settings.

usage: shelephant cp [-h] [--version] [--colors str] [-f] [-q] [-n] [-x] [-e]
                     [-s] [--mode str]
                     str str Path [Path ...]

Positional Arguments#

source

name of the source

destination

name of the destination

path

path(s) to copy

Named Arguments#

--version

show program’s version number and exit

--colors

color scheme [none, dark]

Default: “dark”

-f, --force

overwrite without prompt

Default: False

-q, --quiet

do not print progress

Default: False

-n, --dry-run

print copy-plan and exit

Default: False

-x, --no-update

no database update

Default: False

-e, --exists

all paths exists on source

Default: False

-s, --shallow

do not compute checksums

Default: False

--mode

use ‘sha256’, ‘rsync’, and/or ‘basic’ to compare files

Default: “sha256,basic”

shelephant mv#

Move files from one storage location to another.

Note

The copied files are added to the database of the destination. There is no check that this fits dump and search settings.

usage: shelephant mv [-h] [--version] [--colors str] [-f] [-q] [-n]
                     str str Path [Path ...]

Positional Arguments#

source

name of the source.

destination

name of the destination.

path

path(s) to copy.

Named Arguments#

--version

show program’s version number and exit

--colors

Color scheme [none, dark].

Default: “dark”

-f, --force

Overwrite without prompt.

Default: False

-q, --quiet

Do not print progress.

Default: False

-n, --dry-run

Print copy-plan and exit.

Default: False

shelephant rm#

Remove files from a storage location.

Warning

This removes the actual data. The link is also if there is no alternative source left.

usage: shelephant rm [-h] [--version] [-f] [-q] [-n] str Path [Path ...]

Positional Arguments#

source

name of the source.

path

path(s) to remove.

Named Arguments#

--version

show program’s version number and exit

-f, --force

Overwrite without prompt.

Default: False

-q, --quiet

Do not print progress.

Default: False

-n, --dry-run

Print copy-plan and exit.

Default: False

shelephant pwd#

Change the current working directory to a storage location.

usage: shelephant pwd [-h] [--version] [--base] [--abspath] str

Positional Arguments#

source

name of the source.

Named Arguments#

--version

show program’s version number and exit

--base

Print the base directory.

Default: False

--abspath

Print absolute path.

Default: False

shelephant gitignore#

Add all symbolic links managed to the dataset’s root .gitignore.

Note

This is the /path/to/dataset/.gitignore file, not /path/to/dataset/.shelephant/.gitignore.

usage: shelephant gitignore [-h] [--version]

Named Arguments#

--version

show program’s version number and exit