I recently started using cloud computing services.
Amazon seems to be the preferred provider of cloud services and they do rightly so: their breath of services and their customization is currently unparalleled. Although I had experimented with some Amazon Web Services (AWS) before (e.g. S3 storage), I had never used it for computing.
Unfortunately, Amazon has quite restrictive limits for new users (while you can’t get out of the Free Tier):
- limit of two usage zones (this wouldn’t be a problem, weren’t it for:)
- all possible zones to choose from are in the US
- really weak VMs available
I contacted service to change my zones and have my permissions raised and start real work, willing to pay the costs, but the issue took 3 days to be responded with basically “though luck” as reply and this seems like a general pattern.
So I figured other providers might give better conditions to starting users due to their smaller market share. So it was with Google Cloud Compute.
- credit of 300$ to spend over 60 days - very attractive;
- unrestricted choice of zones;
- more choice of VMs for starting users;
- simpler interface (also less features);
- competitive prices per hour and Gb storage compared with AWS.
Computing and storage aren’t as separated as in AWS. The computing service is called Google Cloud Engine - similar to AWS’ EC2. Long-term storage is called Google Storage and is equivalent to AWS’ S3. Disks can be mounted on instances in a way equivalent to AWS’ EBS storage.
Following is a series of notes on how to interface with GCE and GCS, written mostly for the future me.
Mounting new disks in instances:
df -h # see mounted volumes sudo mkdir /projects sudo chown user:user /projects sudo /usr/share/google/safe_format_and_mount -m "mkfs.ext4 -F" /dev/sdb /projects
Set to mount at startup - add:
/dev/sdaX /media/mydata ext4 defaults 0 0
You can give an external IP to your instances and transfer files easily.
You can use Filezilla by adding your instance key (
Edit -> Preferences -> SFTP -> Add key...) and using
Pretty much similar to AWS EC2: create a new instance, install all your software and save an image of the instance. Next time start a new instance with this image and voilá all your software is there.
Unfortunately, I haven’t found a way of sharing images :disappointed:.
gcloud: manage services, instances, configurations, permissions
gsutil: manage cloud storage (upload, download to and from local)
Uploading to gcs
Upload in parallel to Google cloud storage:
pip install crcmod # configure ~/.boto # uncomment parallel_process_count line # or use this: https://github.com/afrendeiro/dotfiles/blob/master/.boto # with Rsync gsutil -m rsync -r . gs://storage/data/ # selectively using grep ls /localdir/data/mapped | grep .dups.bam | \ # grep samples grep -v _string_ | \ # exclude some samples based on some string gsutil -m cp -I gs://storagedir/data/mapped/ # upload
e.g. upload bigwig tracks and hub, make them publicly accessible
gsutil -m rsync data/bigWig gs://storage/bigWig/ gsutil cp trackHub_hg19.txt gs://storage/bigWig/ gsutil -m acl ch -g All:R gs://storage/bigWig/*
Auto-resumable uploads, pretty fast.
Uploaded ~250 bam files (1-5 Gb each) overnight!