sync.aws
Sync - AWS¤
aws_cli(*cmd)
¤
aws_cli invokes the aws cli processes in python to execute awscli commands.
Warning
This is not the most elegant way of using awscli. However, it has been a convinient function in data science projects.
This function is adapted from https://github.com/boto/boto3/issues/358#issuecomment-372086466
AWS credential env variables should be configured before calling this function. The awscli command should be wrapped as a tuple. To download data from S3 to a local path, use
>>> aws_cli(('s3', 'sync', 's3://s2-fpd/augmentation/', '/tmp/test'))
Similarly, upload is done in the following way
>>> # local_path = ''
>>> # remote_path = ''
>>> _aws_cli(('s3', 'sync', local_path, remote_path))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*cmd |
tuple of awscli command. |
() |
Source code in haferml/sync/aws.py
def aws_cli(*cmd):
"""
aws_cli invokes the aws cli processes in python to execute awscli commands.
!!! warning
This is not the most elegant way of using awscli.
However, it has been a convinient function in data science projects.
This function is adapted from https://github.com/boto/boto3/issues/358#issuecomment-372086466
AWS credential env variables should be configured before calling this function.
The awscli command should be wrapped as a tuple. To download data from S3 to a local path, use
```
>>> aws_cli(('s3', 'sync', 's3://s2-fpd/augmentation/', '/tmp/test'))
Similarly, upload is done in the following way
>>> # local_path = ''
>>> # remote_path = ''
>>> _aws_cli(('s3', 'sync', local_path, remote_path))
```
:param *cmd: tuple of awscli command.
"""
old_env = dict(os.environ)
try:
# Set up environment
env = os.environ.copy()
env["LC_CTYPE"] = "en_US.UTF"
os.environ.update(env)
# Run awscli in the same process
exit_code = create_clidriver().main(*cmd)
# Deal with problems
if exit_code > 0:
raise RuntimeError(f"AWS CLI exited with code {exit_code}")
finally:
os.environ.clear()
os.environ.update(old_env)
s3_download(path, folder)
¤
s3_download
downloads files from S3.
>>> s3_download(config_path, base_folder)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str |
s3 uri |
required |
folder |
str |
destination folder |
required |
Source code in haferml/sync/aws.py
def s3_download(path, folder):
"""
`s3_download` downloads files from S3.
```
>>> s3_download(config_path, base_folder)
```
:param path: s3 uri
:type path: str
:param folder: destination folder
:type folder: str
"""
if not path.startswith("s3://"):
raise Exception(f"{path} is not S3 uri!")
else:
# e.g., s3://mein-work/abc/performance/model_performance_log.json
s3_bucket = path.split("/")[2]
s3_filepath = "/".join(path.split("/")[3:])
# get the name of the config file
s3_filename = path.split("/")[-1]
# local config path is constructed from base folder and filename
path = os.path.join(folder, s3_filename)
s3 = boto3.client("s3")
s3.download_file(s3_bucket, s3_filepath, path)