datalad_next.shell

A persistent shell connection

This module provides a context manager that establishes a connection to a shell and can be used to execute multiple commands in that shell. Shells are usually remote shells, e.g. connected via an ssh-client, but local shells like zsh, bash or PowerShell can also be used.

The context manager returns an instance of ShellCommandExecutor that can be used to execute commands in the shell via the method ShellCommandExecutor.__call__(). The method will return an instance of a subclass of ShellCommandResponseGenerator that can be used to retrieve the output of the command, the result code of the command, and the stderr-output of the command.

Every response generator expects a certain output structure. It is responsible for ensuring that the output structure is generated. To this end every response generator provides a method ShellCommandResponseGenerator.get_command_list(). The method ShellCommandExecutor.__call__ will pass the user-provided command to ShellCommandResponseGenerator.get_command_list() and receive a list of final commands that should be executed in the connected shell and that will generate the expected output structure. Instances of ShellCommandResponseGenerator have therefore four tasks:

Create a final command list that is used to execute the user provided command. This could, for example, execute the command, print an end marker, and print the return code of the command.

Parse the output of the command, yield it to the user.

Read the return code and provide it to the user.

Provide stderr-output to the user.

A very versatile example of a response generator is the class VariableLengthResponseGenerator. It can be used to execute a command that will result in an output of unknown length, e.g. ls, and will yield the output of the command to the user. It does that by using a random end marker to detect the end of the output and read the trailing return code. This is suitable for almost all commands.

If VariableLengthResponseGenerator is so versatile, why not just implement its functionality in ShellCommandExecutor? There are two major reasons for that:

Although the VariableLengthResponseGenerator is very versatile, it is not the most efficient implementation for commands that produce large amounts of output. In addition, there is also a minimal risk that the end marker is part of the output of the command, which would trip up the response generator. Putting response generation into a separate class allows to implement specific operations more efficiently and more safely. For example, DownloadResponseGenerator implements the download of files. It takes a remote file name as user "command" and creates a final command list that emits the length of the file, a newline, the file content, a return code, and a newline. This allows DownloadResponseGenerator to parse the output without relying on an end marker, thus increasing efficiency and safety
Factoring out the response generation creates an interface that can be used to support the syntax of different shells and the difference in command names and options in different operating systems. For example, the response generator class VariableLengthResponseGeneratorPowerShell supports the invocation of commands with variable length output in a PowerShell.

In short, parser generator classes encapsulate details of shell-syntax and operation implementation. That allows support of different shell syntax, and the efficient implementation of specific higher level operations, e.g. download. It also allows users to extend the functionality of ShellCommandExecutor by providing their own response generator classes.

The module datalad_next.shell.response_generators provides two generally applicable abstract response generator classes:

VariableLengthResponseGenerator

FixedLengthResponseGenerator

The functionality of the former is described above. The latter can be used to execute a command that will result in output of known length, e.g. echo -n 012345. It reads the specified number of bytes and a trailing return code. This is more performant than the variable length response generator (because it does not have to search for the end marker). In addition, it does not rely on the uniqueness of the end marker. It is most useful for operation like download, where the length of the output can be known in advance.

As mentioned above, the classes VariableLengthResponseGenerator and FixedLengthResponseGenerator are abstract. The module datalad_next.shell.response_generators provides the following concrete implementations for them:

VariableLengthResponseGeneratorPosix

VariableLengthResponseGeneratorPowerShell

FixedLengthResponseGeneratorPosix

FixedLengthResponseGeneratorPowerShell

When shell() is executed it will use a VariableLengthResponseClass to skip the login message of the shell. This is done by executing a zero command (a command that will possibly generate some output, and successfully return) in the shell. The zero command is provided by the concrete implementation of class VariableLengthResponseGenerator. For example, the zero command for POSIX shells is test 0 -eq 0, for PowerShell it is Write-Host hello.

Because there is no way for func:shell to determine the kind of shell it connects to, the user can provide an alternative response generator class, in the zero_command_rg_class-parameter. Instance of that class will then be used to execute the zero command. Currently, the following two response generator classes are available:

VariableLengthResponseGeneratorPosix: works with POSIX-compliant shells, e.g. sh or bash. This is the default.

VariableLengthResponseGeneratorPowerShell: works with PowerShell.

Whenever a command is executed via ShellCommandExecutor.__call__(), the class identified by zero_command_rg_class will be used by default to create the final command list and to parse the result. Users can override this on a per-call basis by providing a different response generator class in the response_generator-parameter of ShellCommandExecutor.__call__().

`ShellCommandExecutor`(process_inputs, stdout, ...)	Execute a command in a shell and return a generator that yields output
`ShellCommandResponseGenerator`(stdout_gen, ...)	An abstract class the specifies the minimal functionality of a response generator
`VariableLengthResponseGenerator`(stdout)	Response generator that handles outputs of unknown length
`VariableLengthResponseGeneratorPosix`(stdout)	A variable length response generator for POSIX shells
`VariableLengthResponseGeneratorPowerShell`(stdout)	A variable length response generator for PowerShell shells
`FixedLengthResponseGenerator`(stdout, length)	Response generator for efficient handling of outputs of known length
`FixedLengthResponseGeneratorPosix`(stdout, length)
`FixedLengthResponseGeneratorPowerShell`(...)
`DownloadResponseGenerator`(stdout)	Response generator interface for efficient download
`DownloadResponseGeneratorPosix`(stdout)	A response generator for efficient download commands from Linux systems
`operations.posix.upload`(shell, local_path, ...)	Upload a local file to a named file in the connected shell
`operations.posix.download`(shell, ...[, ...])	Download a file from the connected shell
`operations.posix.delete`(shell, files, *[, ...])	Delete files on the connected shell

datalad_next.shell.shell(shell_cmd: list[str], *, credential: str | None = None, chunk_size: int = 65536, zero_command_rg_class: type[VariableLengthResponseGenerator] = <class 'datalad_next.shell.response_generators.VariableLengthResponseGeneratorPosix'>) → Generator[ShellCommandExecutor, None, None][source]

Context manager that provides an interactive connection to a shell

This context manager uses the provided argument shell_cmd to start a shell-subprocess. Usually the commands provided in shell_cmd will start a client for a remote shell, e.g. ssh.

shell() returns an instance of ShellCommandExecutor in the as-variable. This instance can be used to interact with the shell. That means, it can be used to execute commands in the shell, receive the data that the commands write to their stdout and stderr, and retrieve the return code of the executed commands. All commands that are executed via the returned instance of ShellCommandExecutor are executed in the same shell instance.

Simple example that invokes a single command:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     result = ssh(b'ls -l /etc/passwd')
...     print(result.stdout)
...     print(result.returncode)
...
b'-rw-r--r-- 1 root root 2773 Nov 14 10:05 /etc/passwd\n'
0

Example that invokes two commands, the second of which exits with a non-zero return code. The error output is retrieved from result.stderr, which contains all stderr data that was written since the last command was executed:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     print(ssh(b'head -1 /etc/passwd').stdout)
...     result = ssh(b'ls /no-such-file')
...     print(result.stdout)
...     print(result.returncode)
...     print(result.stderr)
...
b'root:x:0:0:root:/root:/bin/bash\n'
b''
2
b"Pseudo-terminal will not be allocated because stdin is not a terminal.\r\nls: cannot access '/no-such-file': No such file or directory\n"

The following example demonstrates how to use the check-parameter to raise a CommandError-exception if the return code of the command is not zero. This delegates error handling to the calling code and help to keep the code clean:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     print(ssh(b'ls /no-such-file', check=True).stdout)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 279, in __call__
    return create_result(
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 349, in create_result
    result.to_exception(command, error_message)
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 52, in to_exception
    raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'ls /no-such-file' failed with exitcode 2 [err: 'cannot access '/no-such-file': No such file or directory']

Manual checking of the return code:

>>> from datalad_next.shell import shell
>>> def file_exists(file_name):
...     with shell(['ssh', 'localhost']) as ssh:
...         result = ssh(f'ls {file_name}')
...         return result.returncode == 0
... print(file_exists('/etc/passwd'))
True
... print(file_exists('/no-such-file'))
False

An example for result content checking:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     result = ssh(f'grep root /etc/passwd', check=True).stdout
...     if len(result.splitlines()) != 1:
...         raise ValueError('Expected exactly one line')

For long running commands a generator-based result fetching can be used. To use generator-based output the command has to be executed with the method ShellCommandExecutor.start(). This method returns a generator that provides command output as soon as it is available:

>>> import time
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     result_generator = ssh.start(b'c=0; while [ $c -lt 6 ]; do head -2 /etc/passwd; sleep 2; c=$(( $c + 1 )); done')
...     for result in result_generator:
...         print(time.time(), result)
...     assert result_generator.returncode == 0
1713358098.82588 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
1713358100.8315682 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
1713358102.8402972 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
1713358104.8490314 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
1713358106.8577306 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
1713358108.866439 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'

(The exact output of the above example might differ, depending on the length of the first two entries in the /etc/passwd-file.)

Parameters:

shell_cmd (list[str]) -- The command to execute the shell. It should be a list of strings that is given to iter_subproc() as args-parameter. For example: ['ssh', '-p', '2222', 'localhost'].
chunk_size (int, optional) -- The size of the chunks that are read from the shell's stdout and stderr. This also defines the size of stored stderr-content.
zero_command_rg_class (type[VariableLengthResponseGenerator], optional, default: 'VariableLengthResponseGeneratorPosix') --
Shell uses an instance of the specified response generator class to execute the zero command ("zero command" is the command used to skip the login messages of the shell). This class will also be used as the default response generator for all further commands executed in the ShellCommandExecutor-instances that is returned by shell(). Currently, the following concrete subclasses of VariableLengthResponseGenerator exist:
- VariableLengthResponseGeneratorPosix: compatible with POSIX-compliant shells, e.g. sh or bash.
- VariableLengthResponseGeneratorPowerShell: compatible with PowerShell.

Yields:

ShellCommandExecutor