XArray: the power of pandas for multidimensional arrays
Processing thousands of satellite images to understand air quality in the UK - it's efficient and easy with XArray
Monday 17th, 12:30 (Ferrier Hall)
"I wish there was a way to easily manipulate this huge multi-dimensional array in Python...", I thought, as I stared at a huge chunk of satellite data on my laptop. The data was from a satellite measuring air quality - and I wanted to slice and dice the data in some supposedly simple ways. Using pure numpy - the go-to library when the words 'multi-dimensional', 'array' and 'python' are mentioned in the same sentence - was just such a pain. What I wished for was something like pandas - with datetime indexes, fancy ways of selecting subsets, group-by operations and so on - but something that would work with my huge multi-dimensional array.
The solution: XArray - a wonderful library which provides the power of pandas for multi-dimensional data. In this talk I will introduce the XArray library by showing how just a few lines of code can answer questions about my data that would take a lot of complex code to answer with pure numpy - questions like 'What is the average air quality in March?', 'What is the time series of air quality in Southampton?' and 'What is the seasonal average air quality for each census output area?'.
After demonstrating how these questions can be answered easily with XArray, I will introduce the fundamental XArray data types, and show how indexes can be added to raw arrays to fully utilise the power of XArray. I will discuss how to get data in and out of XArray, and how XArray can use dask for high-performance data processing on multiple cores, or distributed across multiple machines. Finally I will leave you with a taster of some of the advanced features of XArray - including seamless access to data via the internet using OpenDAP, complex apply functions, and XArray extension libraries.
- The speaker suggested this session is suitable for data scientists.