The Two-Factor Authentication System at CERN

This blog post is a summary of my project at the European Organization for Nuclear Research (CERN) where I worked in their Identity and Access Management (IAM) team.

My project was to introduce Two-Factor Authentication (2FA) in the Keycloak system. CERN started migration to the Keycloak Identity Provider (IdP) as part of the MALT project which aimed to move away from Microsoft products. The project was cancelled in late 2021, however, some services were allowed to continue development. The IAM team was allowed to continue development for the CERN SSO, which was based on Keycloak.

We developed a custom 2FA implementation with Keycloak which allowed users to optionally login from the SSO login page. Internally, this setup posed a few …

Rate limiting in HAProxy and Nginx

Rate-limiting is a common strategy for safe guarding a server from potential DDoS attacks or sudden peaks in network traffic. Rate-limiting instructs the server to block requests from certain IP addresses that are sending an unusual number of requests to the system.

We can apply rate-limiting to both Nginx and HAProxy. Nginx runs on each end node hosting the service, while HAProxy serves as the load-balancer and distributes incoming requests among available nodes. This post describes how to rate-limit requests on both Nginx and HAProxy and shows how to whitelist IPs and rate-limit a single URL. The final section shows how to apply this configuration in Puppet.

1. Rate-limiting in HAProxy

This section describes how to configure HAProxy to rate-limit …

Creating a JSON logger for Flask

By default Flask writes logs to the console in plain-text format. This can be limiting if you intend to store your logs in a text file and periodically send them to a central monitoring service. For example, Kibana, only accepts JSON logs by default.

You might also want to enrich your logs with additional metadata, e.g. timestamps, method names, log type (Warn, Debug, etc.). In this post we will use the Python logging library to modify Flask’s logging format and write them to a text file. In the end we will see how to periodically send these logs to an external service using Flume.

In our app we would like to setup two types of loggers. One for …

My course portfolio for Computational Photography

This semester I took Georgia Tech’s Computational Photography course. It was a very hands-on course, mostly comprising of assignments and projects. This post includes the results for all its assignments and projects.

1. Pyramid blending

The goal of this assignment was to combine two separate images into a seamlessly blended image, using a mask. The input took a left image, a right image, and a mask, which was a binary image used to overlap the two inputs.

Pyramid blending

2. Panoramas

For this assignment we wrote code to align & stitch together a series of images into a panorama. We followed the text from Computer Vision: Algorithms and Applications book. The assignment followed homography techniques to create the output picture.

panorama_1 panorama_2 panorama_3

After stitching …

Building a Camera Obscura

A camera obscura is the predecessor of modern day cameras. It works by letting light in through a small pinhole and projects it onto a surface (e.g. a wall).

To build a room obscura we need to choose a room which gets plenty of sunlight. I chose my bedroom for this.


Next we need to identify all the sources of light and completely seal them. In my case I used garbage bags to cover my two windows.

setup_2 setup_3

I used packing tape instead of duct tape as it can tear off paint from the wood. However, this led to some light bleeding in from the edges. I fixed this using aluminum foil.


I experimented with three pinholes of various sizes …

Passwordless logins with Yubikey

Yubikey is currently the de facto device for U2F authentication. It enables adding an extra layer of security on top of SSH, system login, signing GPG keys, and so on. It is also compatible with several other authentication methods, such as WebAuthn and PAM.

This post will show how to leverage your Yubikey for unlocking the system lock-screen, both with and without using a password. It will then delve into how to automatically lock the screen when the Yubikey is unplugged.

To achieve logins with Yubikeys we require a PAM configuration. PAM or Pluggable Authentication Modules define the authentication flow for common Linux utilities, such as sudo, su, and passwd. We will override the default authentication flow for the xlock …

Plotting graphical data using RRDtool and a Python Collectd plugin

Collectd is Unix daemon used for periodically collecting system usage statistics, which can include identifying CPU or memory bottleneck issues. The collected data can then be transformed to graphs using RRDtool or a Grafana dashboard (Grafana provides real time graphs and complex search queries).

The daemon itself is modular and functions through external plugins with each plugin performing a distinct function. This post will explore a plugin which collects weather information of a given city. The first section will explain how the plugin configuration works and how to plot a graph of the output data using RRDtool. Finally, we will delve into the plugin internals and see how it is written.

Note: For an intro on how to setup Collectd …

The Kerberos Authentication System for Single Sign-On (SSO)

When working with authentication protocols the commonly used technique in the past was known as authentication by assertion. In this scheme a user logs in to their machine which then authenticates their request to a remote server. Once the authentication is finished the user can then communicate with other services. This provides a very low level of security, which has led to numerous vulnerabilities in the early versions of the rlogin Unix login utility.

An alternative solution is for the user to repeatedly provide their password each time they wish to use a service. This however requires the user to send their plain text password over the network, which could potentially be intercepted by a third-party user and can get …

Programmatically organising your backpacking trip using Google My Maps

This blog post has been converted from a presentation I gave during the Thematic CERN School of Computing 2019.

When planning a journey to a new country or a city it helps to mark down all the places you would like to visit and eventually create a travel plan for each day. I personally use Google Maps for finding places of interest including historical buildings, museums, and libraries. As an example, if I was to visit say Split, Croatia I could search for “places to visit split” on Google Maps. It will then list all the attractions based on features such as reviews and popularity.

Things to do in Split

Although it is possible to individually “Save” each place in Google Maps, it does not …

Building RPM packages with rpmbuild, Koji, and GitLab-CI

The RPM system facilitates the user to query and update a software package. It also allows examining package interdependencies, and verifying package file permissions. This blog post will describe the process of building an RPM package using the rpmbuild utility and will then explain how to schedule build tasks using Koji. Finally, it will describe how to automate the build pipeline using continuous integration in GitLab.

1. RPM Package Manager

RPM Package Manager is an open-source package management system which was originally designed for Red Hat Linux, but it is now supported on most Linux distributions. RPM packages can generally be of two types:

  • Binary RPM: A binary RPM contains the compiled binary of a complete application (or a library …

Google Summer of Code 2018 final evaluation report

Link to GitHub repository: https://github.com/BoostGSoC18/geometry

The work is present under the following branches:


The goal of this project was to implement the direct and inverse geodesic algorithms in the Boost Geometry library. These methods were proposed by Charles Karney in his paper in 2011.

In a previous blog post, the inaccuracy of the existing methods was discussed, which provided inconsistent results for nearly antipodal points. To monitor the progress, a weekly report was provided through GitHub, which summarized the work done. Finally, benchmarks were performed against existing methods in Boost Geometry. The performance metric used was execution time and accuracy.

Additional material, such as utility scripts for parsing the …

Using variadic templates with lambda expressions in C++ for constrained optimization

Constrained optimization problems are encountered in numerous domains, such as protein folding, Magnetic Resonance Image reconstruction, and radiation therapy. In this problem, we are given with an objective function which is to be minimized or maximized with respect to constraints on some variables. The constraints can either be soft constraints or hard constraints, which can be specified by boolean operators, such as equality, relational, and conditional operators.

This post provides insight on how to model constraints using lambda expressions, and how to pass a varying number of constraints to a function using variadic templates. Before moving on with the C++ implementation, it will be helpful to review how variadic functions are used in C and how they differ from the …

Inaccuracy in Boost Geometry geodesic algorithms for nearly antipodal points

Nearly antipodal points or antipodes refer to the most geographically distant points on a sphere, that is, the points are diametrically opposite to each other. If a line is drawn between these two points, it passes through the center of the sphere and forms its diameter.

Computing the great circle distance between these two points is often a corner case for most geodesic computations, and the distance is either overestimated or underestimated. In case of Vincenty’s formulae, the solution fails to converge, or provides inaccurate results. This can have major implications in applications which rely on accurate results, such as flight navigation systems. The software can handle this either by doing an error analysis check and providing specific values …

An overview of activation functions used in neural networks

An activation function is used to introduce non-linearity in an artificial neural network. It allows us to model a class label or score that varies non-linearly with independent variables. Non-linearity means that the output cannot be replicated from a linear combination of inputs; this allows the model to learn complex mappings from the available data, and thus the network becomes a universal approximator. On the other hand, a model which uses a linear function (i.e. no activation function) is unable to make sense of complicated data, such as, speech, videos, etc. and is effective only up to a single layer.

To allow backpropagation through the network, the selected activation function should be differentiable. This property is required to compute …

Parallel tile fetching and CPU-and-memory statistics

The hips package now supports parallel tile fetching. The user can achieve this either using the urllib or aiohttp package.

In case of aiohttp, the fetched tile data is coupled with HipsTileMeta to create a HipsTile object. This ensures there is no misalignment of tile data, otherwise, tiles could get swapped during the drawing period.

async def fetch_tile_aiohttp(url: str, meta: HipsTileMeta, session, timeout: float) -> HipsTile:
    """Fetch a HiPS tile asynchronously using aiohttp."""
    async with session.get(url, timeout=timeout) as response:
        raw_data = await response.read()
        return HipsTile(meta, raw_data)

We also limit the amount of simultaneously open connections using aiohttp.TCPConnector class. The returned object is passed to aiohttp.ClientSession‘s __init__ method. This procedure can be understood in …

Google Summer of Code 2017 final evaluation report

Link to GitHub repository: http://github.com/hipspy/hips

In addition to the main hips repository, I also maintained my personal HIPS-to-Py repository on GitHub. This contains Jupyter notebooks which showcase the functionality in hips and numerous related Python scripts. The Wiki page contains a short description on hips. It also contains links to resource documents and telcon notes, which are hosted on Google Docs.

List of Pull Requests

Work related with HiPS tile drawing

Fixing tile distortion issue in hips package

As documented in the tile distortion issue section, the previous technique for drawing HiPS tiles brings some astrometry offsets for distorted tiles.

An example of such distortions can be viewed at this link (uncheck “Activate deformations reduction algorithm” to view the astrometry offsets): http://cds.unistra.fr/~boch/AL/test-reduce-deformations2.html

To overcome this issue, the parent tile is divided into four children tiles if it meets the following two criteria:

  • One edge is greater than 300 pixels when projected
  • Or, the ratio of smaller diagonal on larger diagonal is smaller than 0.7 and one of the diagonal is greater than 150 pixels when projected

For handling these checks, a function is_tile_distorted is introduced:

def is_tile_distorted(corners: tuple) -> bool …

RGB tile drawing in hips package

The hips package now supports RGB tile drawing. To make this possible, the output image dimensions had to be altered according to the following configuration:

The output image shape is two dimensional for grayscale, and three dimensional for color images:

  • shape = (height, width) for FITS images with one grayscale channel
  • shape = (height, width, 3) for JPG images with three RGB channels
  • shape = (height, width, 4) for PNG images with four RGBA channels

In addition to this, in-case of JPG and PNG format, the tiles are flipped in the vertical direction, which leads to incorrect tile drawing using the previous technique. The figure below is taken from the hips paper, figure 6, which shows the inverted tiles.

HiPS inverted tiles figure

To overcome this, the …

Parameterized testing using Pytest

Pytest provides a feature for parameterized testing in Python. The built-in pytest.mark.parametrize decorator enables parametrization of arguments for a test function. This allows the user to compare the values for input and output.

Here is a typical example which shows its usage:

get_hips_order_for_resolution_pars = [
    dict(tile_width=512, resolution=0.01232, resolution_res=0.06395791924665553, order=4),
    dict(tile_width=256, resolution=0.0016022, resolution_res=0.003997369952915971, order=8),
    dict(tile_width=128, resolution=0.00009032, resolution_res=0.00012491781102862408, order=13),

@pytest.mark.parametrize('pars', get_hips_order_for_resolution_pars)
def test_get_hips_order_for_resolution(pars):
    hips_order = _get_hips_order_for_resolution(pars['tile_width'], pars['resolution'])
    assert hips_order == pars['order']
    hips_resolution = hp.nside2resol(hp.order2nside(hips_order))
    assert_allclose(hips_resolution, pars['resolution_res'])

Without the support of parameterized testing, the code had to be duplicated three times …

Creating custom decorators in Python 3.6

In the hips package, often data has to be fetched from remote servers, especially HiPS tiles. One way to cut back on the queries was by introducing the hips-extra repository. This contains HiPS tiles from various HiPS surveys. This allows us to quickly fetch tiles from local storage, which makes the testing process less time-consuming.

As hips-extra repository does not come with the standard hips package, user has to manually clone it. The availability of the package is checked using an environment variable. This can be set using:

$export HIPS_EXTRA=\path\to\hips-extra

In Python, the path can be retrieved using the os module: os.environ['HIPS_EXTRA']. Now, what if the user does not have hips-extra repository …

HiPS tile drawing

One of the major part of the hips package is being able to draw HiPS tiles onto a larger sky image. This involves using projective transformation for computing and drawing a HiPS tile at the correct location. The discussion below is for the tile containing the galactic center pixel values. To achieve this, several steps are involved.

Computing boundaries of a HiPS tile

A tile is defined by four corners, hips uses the astropy_healpix.HEALPix.boundaries_skycoord function which returns the angle ($\theta$ and $\phi$) in radians wrapped inside astropy.coordinates.SkyCoord class. This contains the four corners of a HiPS tile in the order (North, West, South, East). A snippet which computes the corners of a HiPS tile is provided …

Type annotations in Python 3.6 and using Mypy as a static type checker

The main goal of type annotations is to open up Python code for static analysis. It makes it easier to debug and maintain code because each type is explicitly stated. It also makes the code review process simpler as the parameters and return types can be inferred from the function header. These changes were introduced in PEP 484.

In this regards, static type checking is the most important. It allows support for off-line third-party type checkers, such as Mypy, which will be introduced in a later section.

Purpose of annotations

The typing module in Python 3.6 contains many definitions that are useful in statically typed code. For instance, the Any type is used by default for every argument and …

An overview of Hierarchical Progressive Surveys (HiPS) and the HEALPix framework

The Hierarchical Progressive Surveys (HiPS) is a scheme for describing astronomical images and provides a solution for managing large amounts of data. Underneath, HiPS utilizes the HEALPix framework for mapping a sphere (in this case, part of a sky) and transforms it into HiPS tiles and HiPS pixels which contain the astronomical data. The HiPS scheme emphasizes on usability, and abstracts the scientific details to reach a wider audience. This can be further built upon for statistical analysis of large datasets. A brief overview of HEALPix is given below before moving onto the working of HiPS.

Introduction to HEALPix

HEALPix, an acronym of ‘Hierarchical Equal Area isoLatitude Pixelization of a sphere’, is a framework for discretizing high resolution data. It …

An introduction to coordinate systems used in Astronomy

From Wikipedia:

In geometry, a coordinate system is a system which uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space.

The following text briefly explains the coordinate systems being used in astronomy, some of which are listed below:


RA (right ascension) and DEC (declination) are the longitudes and latitudes of the sky. RA corresponds to east / west direction, similar to longitude, while DEC measures north / south directions, like latitude.


World Coordinate System (WCS) is a set of transformations that map pixel locations in an image to their real-world units, such as their position on the sky sphere. These transformations can …

A comparison of response times using URLLib, GRequests, and asyncio

For the HiPS client multiple tiles have to be fetched for time efficiency. To achieve this, we create a separate thread for each outgoing request. Thus, requests are sent concurrently. A comparison is done utilizing Python’s threading library. The elapsed time is calculated using the time module. For fetching the tiles urllib, grequests, aiohttp, and asyncio packages are used. The HiPS survey chosen for this comparison is alasky.u-strasbg.fr.

For fetching 10 tiles, it takes the following time (in seconds):

Elapsed Time URLLib (without concurrency): 3.5430831909179688
Elapsed Time URLLib (with concurrency): 0.388397216796875
Elapsed Time URLLib (with aiohttp): 0.3900480270385742
Elapsed Time GRequests: 1.6238431930541992

Similarly, for fetching 100 tiles, it takes:

Elapsed Time URLLib (without concurrency …

My First Article

Hello World!

This blog will be extensively used for posting GSoC updates, apart from other technical ramblings.