Python module#

Dataset#

shelephant.dataset.Location(root[, ssh, ...])

Location information.

SSH interface#

shelephant.ssh.is_file(hostname, path[, verbose])

Check if a file exists on a remote system.

shelephant.ssh.is_dir(hostname, path[, verbose])

Check if a directory exists on a remote system.

shelephant.ssh.has_keys_set(hostname)

Check if the ssh keys are set for a given host.

shelephant.ssh.tempdir(hostname)

Create a temporary directory on a remote system.

scp interface#

shelephant.scp.copy(source_dir, dest_dir, files)

Copy files using scp.

rsync interface#

shelephant.rsync.diff(source_dir, dest_dir, ...)

Check if files are different using rsync.

shelephant.rsync.copy(source_dir, dest_dir, ...)

Copy files using rsync.

local interface#

shelephant.local.diff(source_dir, dest_dir, ...)

Check if files exist.

shelephant.local.copy(source_dir, dest_dir, ...)

Copy files using shutil.copy2.

shelephant.local.remove(source_dir, files[, ...])

Remove files using os.remove.

shelephant.local.move(source_dir, dest_dir, ...)

Move files using os.replace.

Type conversion#

shelephant.convert.flatten(data)

Flatten a nested list to a one dimensional list.

shelephant.convert.squash(data)

Squash a dictionary to a single list. For example::.

shelephant.convert.get(data, key)

Get an item from a nested dictionary.

shelephant.convert.split_key(key)

Split a key separated by "/" in a list.

YAML handling#

shelephant.yaml.read(filename[, default])

Read YAML file and return its content.

shelephant.yaml.read_item(filename[, key])

Get an item from a YAML file.

shelephant.yaml.dump(filename, data[, ...])

Dump data to YAML file.

shelephant.yaml.preview(data[, width])

Print data formatted as YAML.

File information#

shelephant.compute_hash.compute_sha256(files)

Get the sha256 hash and size of a list of files.

File operations#

shelephant.path.filter_deepest(files)

Return list with only the deepest paths.

shelephant.path.dirnames(files[, return_unique])

Get the os.path.dirname of all file paths.

shelephant.path.makedirs(dirnames[, force])

(Prompt and) Create directories that do not yet exist.

Formatted print#

shelephant.output.copyplan(status[, colors, ...])

Print copy plan.

Command-line interface#

shelephant.cli.shelephant_cp(args[, paths, ...])

Command-line tool, see --help.

shelephant.cli.shelephant_diff(args)

Command-line tool, see --help.

shelephant.cli.shelephant_dump(args)

Command-line tool, see --help.

shelephant.cli.shelephant_hostinfo(args)

Command-line tool, see --help.

shelephant.cli.shelephant_mv(args[, paths])

Command-line tool, see --help.

shelephant.cli.shelephant_parse(args)

Command-line tool, see --help.

shelephant.cli.shelephant_rm(args[, paths])

Command-line tool, see --help.

Details#

cli#

shelephant.cli.shelephant_cp(args: list[str], paths: list[str] = None, filter_paths: bool = True)#

Command-line tool, see --help.

Parameters:
  • args – Command-line arguments (should be all strings).

  • paths – Instead of reading files from the source YAML-file, specify a list of paths to copy.

  • filter_paths – If True, paths that are not in files of the YAML-file are ignored. If False all paths are copied: requires paths to exist on the source.

Returns:

List of changed files.

Note

For input from dataset (paths is not None) the storage locations can have a prefix. paths is a lost of paths relative to the root of the dataset. For example:

foo/bar/a.txt
foo/bar/b.txt

Consider that

  • source1 only stores files and folders in foo .

  • source2 only stores files and folders in foo/bar.

Then:

shelephant_cp(["source1.yml", "source2.yml"], paths=["foo/bar/a.txt", "foo/bar/b.txt"])

will effectively run a copy of:

copy(
    sourcepath="/path/to/root/of/source1/bar",
    destpath="/path/to/root/of/source2",
    files=["a.txt", "b.txt"],
)

And:

shelephant_cp(["source2.yml", "source1.yml"], paths=["foo/bar/a.txt", "foo/bar/b.txt"])

will effectively run a copy of:

copy(
    sourcepath="/path/to/root/of/source2",
    destpath="/path/to/root/of/source1/bar",
    files=["a.txt", "b.txt"],
)
shelephant.cli.shelephant_diff(args: list[str])#

Command-line tool, see --help.

Parameters:

args – Command-line arguments (should be all strings).

shelephant.cli.shelephant_dump(args: list[str])#

Command-line tool, see --help.

Parameters:

args – Command-line arguments (should be all strings).

shelephant.cli.shelephant_hostinfo(args: list[str])#

Command-line tool, see --help.

Parameters:

args – Command-line arguments (should be all strings).

shelephant.cli.shelephant_mv(args: list[str], paths: list[str] = None)#

Command-line tool, see --help.

Parameters:
  • args – Command-line arguments (should be all strings).

  • paths – Paths to move (if not given, all files in source are moved).

shelephant.cli.shelephant_parse(args: list[str])#

Command-line tool, see --help.

Parameters:

args – Command-line arguments (should be all strings).

shelephant.cli.shelephant_rm(args: list[str], paths: list[str] = None)#

Command-line tool, see --help.

Parameters:
  • args – Command-line arguments (should be all strings).

  • paths – Paths to remove (if not given, all files in source are removed).

compute_hash#

shelephant.compute_hash.compute_sha256(files: list[Path], sha256: bool = True, progress: bool = True) tuple[list[str], list[int]]#

Get the sha256 hash and size of a list of files.

Parameters:
  • files – A list of files.

  • sha256 – Calculate the sha256 hash.

  • progress – Show a progress bar.

Returns:

A tuple of lists of (size, mtime, sha256).

convert#

shelephant.convert.flatten(data: list[list]) list#

Flatten a nested list to a one dimensional list.

Parameters:

data – A nested list.

Returns:

A one dimensional list.

shelephant.convert.get(data: dict[dict], key: str | list[str]) dict | list | str | int | float#

Get an item from a nested dictionary.

Parameters:
  • data – A nested dictionary.

  • key

    The item to read. E.g. * [] for a YAML file containing only a list. * ['foo'] for a plain YAML file. * ['key', 'to', foo'] for a YAML file with nested items.

    An item specified as str separated by “/” is also accepted.

Returns:

The read item.

shelephant.convert.split_key(key: str) list[str]#

Split a key separated by “/” in a list.

Parameters:

key – A key.

Returns:

A list of key components.

shelephant.convert.squash(data: dict[list]) list#

Squash a dictionary to a single list. For example:

>>> squash({"foo": [1, 2], "bar": {"foo": [3, 4], "bar": 5}})
[1, 2, 3, 4, 5]
Parameters:

data – A nested dictionary.

Returns:

A one dimensional list.

dataset#

class shelephant.dataset.Location(root: str | Path, ssh: str = None, mount: Path = None, prefix: Path = None, files: list[str] = [], description: str = None)#

Location information.

Attributes:

  • Location.root: The base directory.

  • Location.ssh (optional): [user@]host

  • Location.hostpath.

  • Location.prefix (optional): Prefix to add to all paths.

  • Location.python (optional): The python executable on the ssh host.

  • Location.dump (optional): Location of “dump” file – file with list of files.

  • Location.search (optional): Commands to search for files, see shelephant.search.search().

  • Location.files(): List of files (with properties).

  • Location.description: Description of the location.

Initialize:

  • Read from yaml file:

    location = Location.from_yaml("location.yaml")
    
  • Create from scratch:

    location = Location(root="~/data", ...)
    
asdict() dict#

Return as dictionary.

Returns:

Dictionary.

check_changes(paths: list[Path] = None, progress: bool = False, verbose: bool = False)#

Remove sha256 from all files of which the size/mtime has changed.

Parameters:
  • paths – List of paths to check. Paths to be relative to the root of the dataset. Default: all files in the database.

  • progress – Show progress bar (only relevant if ssh is not set).

  • verbose – Show verbose output (only relevant if ssh is set).

copy_files(other)#

Copy files (and size/mtime/sha256) from other location.

Parameters:

other – Other location.

diff(other) dict#

Compare the database entries of two locations.

Warning

The information is taken from the database. There are no checks that that information if up to date.

Parameters:

other – Other location.

Returns:

Dictionary with differences:

{
    "==" : [ ... ], # in a and b, equal sha256
    "?=" : [ ... ], # in a and b, unknown sha256
    "!=" : [ ... ], # in a and b, different sha256
    "->" : [ ... ], # in a not in b
    "<-" : [ ... ], # in b not in a
}

files(info: bool = True) list#

Return as list of files. Items without sha256 and size are returned as str, items with sha256 and size are returned as dict.

Parameters:

info – Return sha256 and size (if available).

Returns:

List of files.

classmethod from_yaml(path: str | Path)#

Read from yaml file.

Parameters:

path – Path to yaml file.

Returns:

Location.

getinfo(paths: list[Path] = None, max_size: int = None, progress: bool = False, verbose: bool = False)#

Compute sha256/size/mtime of all files for which this information is not available.

To compute the sha256/size/mtime only on a fraction of files set max_size. This will stop the computation when the total size exceeds max_size. You can then call this function recursively (with clean=False) to flush you buffer.

Parameters:
  • paths – List of paths to check. Have to be relative to the root of the dataset.

  • max_size – Compute the sha256/size/mtime until the total size exceeds max_size.

  • progress – Show progress bar (only relevant if ssh is not set).

  • verbose – Show verbose output (only relevant if ssh is set).

has_info() bool#

Check if sha256/size/mtime is available for all files.

Returns:

True if sha256/size/mtime is available for all files.

property hostpath: str#

Return:

  • root if ssh is not set.

  • ssh:"root" if ssh is set.

is_mounted() bool#

Check if a location is a local directory, or if a remote directory is mounted.

Returns:

True if mounted.

isavailable(mount: bool = False) bool#

Check if location is available.

Parameters:

mount – Check if mount is available.

Returns:

True if available.

overwrite_yaml(path: str | Path)#

Overwrite yaml file. This function only changes the file if the content has indeed changed.

Parameters:

path – Path to yaml file.

read(verbose: bool = False, getinfo: bool = False)#

Read files from location.

  • If dump is set, read from dump file. This overwrites the database (sha256/size/mtime will only be available if they are in the YAML file).

  • If search is set, search for files. This preserves sha256/size/mtime if paths are already in the database (there is no check that they are still accurate).

Parameters:
  • verbose – Print progress (only relevant if ssh is set).

  • getinfo – Get sha256/size/mtime (calls getinfo).

remove(paths: list[str])#

Remove files from list of files.

Parameters:

paths – List of paths to remove.

remove_info(paths: list[Path] = None)#

Remove sha256/size/mtime for a list of files.

Parameters:

paths – List of paths to remove.

sort(key: str = 'files')#

Sort files.

Parameters:

key – Key to sort by (files, size, sha256).

to_yaml(path: str | Path, force: bool = False)#

Write to yaml file.

Parameters:
  • path – Path to yaml file.

  • force – Do not prompt to overwrite file.

local#

shelephant.local.copy(source_dir: str, dest_dir: str, files: list[str], progress: bool = True)#

Copy files using shutil.copy2.

Parameters:
  • source_dir – Source directory

  • dest_dir – Source directory

  • files – List of file-paths (relative to source_dir and dest_dir).

  • progress – Show progress bar.

shelephant.local.diff(source_dir: str, dest_dir: str, files: list[str]) dict[list[str]]#

Check if files exist.

Parameters:
  • source_dir (str) – Source directory (optionally with hostname).

  • dest_dir (str) – Source directory (optionally with hostname).

  • files (list) – List of file-paths (relative to source_dir and dest_dir).

  • verbose – Verbose commands.

Returns:

Dictionary with differences:

{
    "?=" : [ ... ], # in source_dir and dest_dir
    "->" : [ ... ], # in source_dir not in dest_dir
    "<-" : [ ... ], # in dest_dir not in source_dir
}

shelephant.local.move(source_dir: str, dest_dir: str, files: list[str], progress: bool = True)#

Move files using os.replace.

Parameters:
  • source_dir – Source directory

  • dest_dir – Source directory

  • files – List of file-paths (relative to source_dir and dest_dir).

  • progress – Show progress bar.

shelephant.local.remove(source_dir: str, files: list[str], progress: bool = True)#

Remove files using os.remove.

Parameters:
  • source_dir – Source directory

  • dest_dir – Source directory

  • files – List of file-paths (relative to source_dir and dest_dir).

  • progress – Show progress bar.

output#

shelephant.output.autoprint(text: str)#

Print text to stdout. If the text is longer than the terminal height, it will be piped to a pager.

shelephant.output.copyplan(status: dict[list[str]], colors: str = 'none', display: bool = True, max_align: int = 80) str#

Print copy plan.

Parameters:
  • status

    Dictionary of copy status. E.g.:

    {
        '->' : ['file1', 'file2'],
        '!=' : ['file3'],
        '==' : ['file4'],
    }
    

  • colors – Color theme name, see theme().

  • display – Display output (False: return as string).

  • max_align – Maximum width of the first column.

Returns:

Output string (if display=False).

shelephant.output.diff(status: dict[list[str]], colors: str = 'none', display: bool = True, max_align: int = 80) str#

Print copy plan.

Parameters:
  • status

    Dictionary of status. E.g.:

    {
        '==' : ['file4'],
        '?=' : [],
        '!=' : ['file3'],
        '->' : ['file1', 'file2'],
        '<-' : [],
    }
    

  • colors – Color theme name, see theme().

  • display – Display output (False: return as string).

  • max_align – Maximum width of the first column.

Returns:

Output string (if display=False).

path#

shelephant.path.cwd(dirname: Path)#

Set the cwd to a specified directory:

with cwd("foo"):
    # Do something in foo
Parameters:

dirname – The directory to change to.

shelephant.path.dirnames(files: list[str], return_unique: bool = True) list[str]#

Get the os.path.dirname of all file paths.

Parameters:
  • files – List of file paths.

  • return_unique – Filter duplicates.

Returns:

List of dirnames.

shelephant.path.filter_deepest(files: list[str]) list[str]#

Return list with only the deepest paths.

For example:

filter_deepest(["foo/bar/dir", "foo/bar"])
>>> ["foo/bar/dir"]
Parameters:

files – List of paths.

Returns:

List of paths.

shelephant.path.makedirs(dirnames: list[str], force: bool = False)#

(Prompt and) Create directories that do not yet exist. This function creates parent directories if needed.

Parameters:
  • dirnames – List of directory paths.

  • force – Create directories without prompt.

shelephant.path.tempdir()#

Set the cwd to a temporary directory:

with tempdir("foo"):
    # Do something in foo

rsync#

Copy & query using rsync.

  1. Tom de Geus, 2021, MIT

shelephant.rsync.copy(source_dir: str, dest_dir: str, files: list[str], options: str = '-a', verbose: bool = False, progress: bool = True)#

Copy files using rsync. This a wrapper around rsync {options:s} --files-from.

Parameters:
  • source_dir – Source directory. If remote: [user@]host:path.

  • dest_dir – Source directory. If remote: [user@]host:path.

  • files – List of file-paths (relative to source_dir and dest_dir).

  • options – Options passed to rsync.

  • verbose – Verbose commands.

  • progress – Show progress bar.

shelephant.rsync.diff(source_dir: str, dest_dir: str, files: list[str], options: str = '-nai', verbose: bool = False) dict[list[str]]#

Check if files are different using rsync.

Note

rsync uses basic criteria such as file size and creation and modification date. This is much faster than using checksums but is only approximate. See rsync manual.

Parameters:
  • source_dir (str) – Source directory (optionally with hostname).

  • dest_dir (str) – Source directory (optionally with hostname).

  • files (list) – List of file-paths (relative to source_dir and dest_dir).

  • verbose – Verbose commands.

Returns:

Dictionary with differences:

{
    "==" : [ ... ], # equal
    "!=" : [ ... ], # not equal
    "->" : [ ... ], # in source_dir not in dest_dir
}

scp#

shelephant.scp.copy(source_dir: str, dest_dir: str, files: list[str], options: str = '-p', verbose: bool = False, progress: bool = True)#

Copy files using scp.

Parameters:
  • source_dir – Source directory. If remote: [user@]host:path.

  • dest_dir – Source directory. If remote: [user@]host:path.

  • files – List of file-paths (relative to source_dir and dest_dir).

  • options – Options passed to scp.

  • verbose – Verbose commands.

  • progress – Show progress bar.

search#

shelephant.search.search(*settings: dict, root: Path = PosixPath('.')) list[Path]#

Search for files using a list of settings, as follows:

[
    {"rglob": "*.py", "skip": ["\..*", "build"]},
    {"exec": "find . -name '*.cpp'"},
]
Parameters:
  • settings – A list of settings.

  • root – The root directory to search in.

Returns:

A list of paths.

ssh#

shelephant.ssh.has_keys_set(hostname: str) bool#

Check if the ssh keys are set for a given host.

Parameters:

hostname – Hostname.

Returns:

True if the host can be accessed without password.

shelephant.ssh.is_dir(hostname: str, path: str, verbose: bool = False) bool#

Check if a directory exists on a remote system. Uses ssh.

Parameters:
  • hostname – Hostname.

  • path – Directory (path on hostname).

  • verbose – Verbose commands.

Returns:

True if the file exists, False otherwise.

shelephant.ssh.is_file(hostname: str, path: str, verbose: bool = False) bool#

Check if a file exists on a remote system. Uses ssh.

Parameters:
  • hostname – Hostname.

  • path – Filename (path on hostname).

  • verbose – Verbose commands.

Returns:

True if the file exists, False otherwise.

shelephant.ssh.tempdir(hostname: str)#

Create a temporary directory on a remote system. Uses ssh.

with tempdir(“localhost”) as remote_tempdir:

print(remote_tempdir)

yaml#

shelephant.yaml.dump(filename: str | Path, data: list | dict, force: bool = False, width: int = inf)#

Dump data to YAML file.

Parameters:
  • filename – The output filename.

  • data – The data to dump.

  • force – Do not prompt to overwrite file.

  • width – The maximum line-width of the file.

shelephant.yaml.overwrite(filename: str | Path, data: list | dict)#

Overwrite existing YAML file with data. This function only changes the file if the content has indeed changed.

Parameters:
  • filename – The output filename.

  • data – The data to dump.

shelephant.yaml.preview(data: list | dict, width: int = inf)#

Print data formatted as YAML.

Parameters:
  • data – The data to dump.

  • width – The maximum line-width of the file.

shelephant.yaml.read(filename: str | Path, default=None) list | dict#

Read YAML file and return its content.

Parameters:
  • filename – The YAML file to read.

  • default – The default value to return if the file is empty.

Returns:

The content of the YAML file.

shelephant.yaml.read_item(filename: str | Path, key: str | list[str] = []) list | dict#

Get an item from a YAML file.

Parameters:

key

The item to read. E.g. * [] for a YAML file containing only a list. * ['foo'] for a plain YAML file. * ['key', 'to', foo'] for a YAML file with nested items.

An item specified as str separated by “/” is also accepted.

Returns:

The content of the item.