Big data tools to handle various cryospheric remote sensing datasets, mostly in python.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Wei Ji 54be7b78af
🔀 Merge branch 'doc_proposal'
4 years ago
code 🐳 Remove conda-canary channel, alphabetical conda env list 4 years ago
data Tidy up ICESAT stuff and upload metadata files 5 years ago
docs 📸 Adding jupyter slides on proposal talk 4 years ago
tuts Change nn_from_scratch submodule to use personal fork 5 years ago
.gitignore 📝 Add glacier flow figs and write why ice is interesting with water 4 years ago
.gitmodules Change nn_from_scratch submodule to use personal fork 5 years ago
LICENSE.md Update license from MIT to LGPL v3 5 years ago
README.md 📝 Document conda env setup and tidy mainpage README 5 years ago
environment.yml 🐳 Remove conda-canary channel, alphabetical conda env list 4 years ago

README.md

Cryospheric Data Lakes

License: Open Data Commons Attribution License: LGPL v3 License: CC BY-SA 4.0

Open-source big data tools to handle various cryospheric remote sensing datasets.

Data lake

... a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files... ~Wikipedia

Contents

Find the underlying data here used in this project (or at least links to the sources since they might be too big).

Examine the code here which mingles with the data to give some (hopefully) nice scientifically meaningful outputs (whatever that means). You may find some interesting dockerfiles and python3 code inside (if that clicks with you).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Pre-requisites

You have some form of git installed for version control. Ideally, docker should be installed too to fully replicate this scientific development environment, unless you do not have root/admin privilleges. For conda users, you may skip the docker install, but take note of the section below on setting up a conda environment.

For Debian/Ubuntu-based systems, you can try something like:

sudo apt install git docker-ce

Note: You may need to set-up the repository first to install docker-ce. See instructions for Debian and Ubuntu.

For Windows, if you have chocolatey (recommended!), it can be as easy as:

choco install git docker

For Mac OS X:

TODO??

Cloning the repository

With git installed, fire up your command prompt and do a git clone from this repo-url:

git clone <repo-url>

Alternatively, download the zip file from here, and unzip it.

The standard clone code above will skip over some submodules, such as external tutorials I have cloned into the tuts folder. To get absolutely everything (beware beware!), you can do:

git clone --recursive <repo-url>

Setup conda environment (for Anaconda/Miniconda users)

You can replicate most of the libraries used in this repository by running:

conda env create --file=environment.yml

Running the code

To try out the code (that downloads big data files, processes the data, etc) you can use a Jupyter lab or notebook environment. Do so by running either one of the below:

jupyter lab
jupyter notebook

Alternatively, you can use the atom-hydrogen-beta docker container here to ensure ease of reproducibility (aka mitigate denpendency hell problems). Yes, I like to do my code writing and execution inside that 'atom' docker container with interactive Hydrogen functionality!!

atom-demo-10

But of course, you can install the libraries yourself.

Contributing

Feel free to submit a pull request or issue (nice ways of saying hi!) if you'd like to see something in here that's not here yet.

License

Data

Any raw data (e.g. binary satellite files) used here is licensed accordingly as per the upstream source. Derived datasets are licensed under the Open Data Commons Attribution license unless otherwise stated.

Code

Source code used in the handling of the data is licensed under the GNU Lesser General Public License v3.0.

Other

Other forms of content (such as documentation) in this project repository which is not covered by the above two licenses is licensed under the Creative Commons Attribution Share Alike 4.0 License. Linked submodules (e.g. in the tuts folder) are subjected to their respective upstream licenses.