Parallel tile fetching and CPU-and-memory statistics

The hips package now supports parallel tile fetching. The user can achieve this either using the urllib or aiohttp package.

In case of aiohttp, the fetched tile data is coupled with HipsTileMeta to create a HipsTile object. This ensures there is no misalignment of tile data, otherwise, tiles could get swapped during the drawing period.

async def fetch_tile_aiohttp(url: str, meta: HipsTileMeta, session, timeout: float) -> HipsTile:
    """Fetch a HiPS tile asynchronously using aiohttp."""
    async with session.get(url, timeout=timeout) as response:
        raw_data = await response.read()
        return HipsTile(meta, raw_data)

We also limit the amount of simultaneously open connections using aiohttp.TCPConnector class. The returned object is passed to aiohttp.ClientSession's __init__ method. This procedure can be understood in the code block below:

async def fetch_all_tiles_aiohttp(tile_metas: List[HipsTileMeta], hips_survey: HipsSurveyProperties, progress_bar: bool, n_parallel: int, timeout: float) -> List[HipsTile]:
    """Fetch HiPS tiles from a remote URL using aiohttp."""
    connector = aiohttp.TCPConnector(limit=n_parallel)
    async with aiohttp.ClientSession(connector=connector) as session:
        futures = []
        for meta in tile_metas:
            url = hips_survey.tile_url(meta)
            future = asyncio.ensure_future(fetch_tile_aiohttp(url, meta, session, timeout))
            futures.append(future)

Another recently added feature is progress bar reporting functionality, for which we are using the tqdm package. We achieve this by wrapping asyncio.Future list in tqdm function which triggers the progress bar once we await on the object.

if progress_bar:
    from tqdm import tqdm
    futures = tqdm(futures, total=len(tile_metas), desc='Fetching tiles')

tiles = []
for future in futures:
    tiles.append(await future)

Using parallel tile fetching, the overall fetch time is reduced by almost 75%¹. The statistics shown below are for the high level make_sky_image function, the result of which can be seen on our Getting started page. The response time using synchronous fetching is:

tile-fetch-async

After adding support for asynchronous fetching, the response time has enhanced:

tile-fetch-sync

To monitor how this affects our CPU and memory usage, we make use of an open source package psrecord, which provides functionality on top of psutil.

The package provides a command line interface which requires a process ID (PSID) to monitor its activity. It also provides facility for plotting the result using matplotlib, in addition to the standard text log.

plt-make-sky-image

To fully understand the above plot for the make_sky_image function, let's look at the primary steps involved in HipsPainter class:

Asynchronously fetch the HiPS tiles
Split the parent tile into four children tiles to fix the tile distortion issue
Apply projective transformation to each tile

The entire process took around \(50\) seconds to finish execution, in which almost \(40\) seconds are dedicated to tile fetching. In the beginning, CPU consumption is high, which is due to the creation of threads and Future objects. After this, the CPU consumption is almost idle for around \(30\) seconds, this is when we await for the result.

Towards the end, a large rise can be seen in both CPU and memory consumption, this is the drawing phase. As we apply projective transformation on each tile separately, this process is computationally expensive.

The full activity text log can be viewed for a detailed analysis. A document containing response time comparison between urllib, grequests, aiohttp, and asyncio can be viewed as well. These response times were calculated using this Python script.

This percentage was calculated using urllib. ↩