Parallel tile fetching and CPU-and-memory statistics

The hips package now supports parallel tile fetching. The user can achieve this either using urllib or aiohttp.

In case of aiohttp, the fetched tile data is coupled with HipsTileMeta to create a HipsTile object. This ensures there is no misalignment of tile data, otherwise, tiles could be swapped during the drawing period.

async def fetch_tile_aiohttp(url: str, meta: HipsTileMeta, session, timeout: float) -> HipsTile:
    """Fetch a HiPS tile asynchronously using aiohttp."""
    async with session.get(url, timeout=timeout) as response:
        raw_data = await response.read()
        return HipsTile(meta, raw_data)

We also limit the amount of simultaneously opened connections using aiohttp.TCPConnector class. The returned object is passed to aiohttp.ClientSession's __init__ method. The whole procedure can be understood by the code block below:

async def fetch_all_tiles_aiohttp(tile_metas: List[HipsTileMeta], hips_survey: HipsSurveyProperties, progress_bar: bool, n_parallel: int, timeout: float) -> List[HipsTile]:
    """Fetch HiPS tiles from a remote URL using aiohttp."""
    connector = aiohttp.TCPConnector(limit=n_parallel)
    async with aiohttp.ClientSession(connector=connector) as session:
        futures = []
        for meta in tile_metas:
            url = hips_survey.tile_url(meta)
            future = asyncio.ensure_future(fetch_tile_aiohttp(url, meta, session, timeout))
            futures.append(future)

Another recently added feature is progress bar reporting functionality using the tqdm package. This is achieved by wrapping asyncio.Future list in tqdm function which triggers the progress bar once we await on the object.

if progress_bar:
    from tqdm import tqdm
    futures = tqdm(futures, total=len(tile_metas), desc='Fetching tiles')

tiles = []
for future in futures:
    tiles.append(await future)

Using parallel tile fetching, the overall fetch time is reduced by almost 75%**. The statistics below are for the high level make_sky_image function, the result of which can be seen on our Getting started page. The response time using synchronous fetching is:

tile-fetch-async

However, after adding support for asynchronous fetching, the response time has enhanced:

tile-fetch-sync

To monitor how this affects our CPU and memory usage, we make use of an open source package psrecord, which provides functionality on top of psutil.

The package provides a command line interface which requires a process ID (PSID) to monitor its activity. It also provides facility for plotting the result using matplotlib, in addition to the standard text log.

plt-make-sky-image

To fully understand the above plot for make_sky_image function, let's look at the primary steps involved in HipsPainter class.

  • Asynchronously fetch the HiPS tiles
  • Split the parent tile into four children tiles to fix the tile distortion issue
  • Apply projective transformation to each tile

The entire process took around 50 seconds to finish execution, in which almost 40 seconds are dedicated to tile fetching. In the beginning, CPU consumption is high, which is due to the creation of threads and Future objects. After this, the CPU consumption is almost idle for around 30 seconds, this is when we await for the result.

Lastly, a large rise can be seen in both CPU and memory consumption, this is the drawing phase. As we apply projective transformation on each tile separately, the process is computationally expensive.

The full activity text log can be viewed for a detailed analysis. A document containing response time comparison between urllib, grequests, aiohttp, and asyncio can be viewed as well. These response times were calculated using this Python script.

** This percentage was calculated using urllib

blogroll

social