Archive

2024

Handle Irregular Bursts of Files using EventBridge and Glue Workflow

7 minute read

Exploring ways of handling irregular and sudden bursts of multiple files for data processing using event driven architecture on AWS. This blog posts showcases how to use S3 notification with EventBridge to trigger a Glue Workflow that has number of events and batch window trigger conditions.

Personal Project - pytransflow

33 minute read

I’m thrilled to present pytransflow, a Python library I developed in my free time. pytransflow simplifies record-level processing through transformation flows defined in YAML files. I hope you find this library engaging and that it sparks your interest to both use and contribute to it.

Nginx Reverse Proxy and Lua Scripting

13 minute read

Exploring the implementation of Lua scripting for dynamically altering API requests in an Nginx Reverse Proxy. This investigation opens up possibilities to write and run dynamic content using Lua scripts directly within the Nginx server, making it a powerful tool for web applications.

Back to top ↑

2023

S3 Batch Operations - Lambda

4 minute read

A brief guide outlining the process of setting up and running S3 Batch Operations Jobs with Lambda integration.

Exploring Pytest Fixtures: Notes and Examples

11 minute read

Here, I present a compilation of notes and practical scenarios drawn from my experiences, demonstrating the effective utilization of pytest fixtures. These examples provide valuable insights into leveraging fixtures to refine and improve the architecture of your testing module.

Fine-Tuning Glue Export File Size for Athena Queries

13 minute read

Exploring different strategies for fine-tuning the output file size in AWS Glue and consolidating small files during post-processing. By implementing these techniques, you’ll not only enhance the efficiency of Athena queries but also significantly reduce the cost associated with querying large da...

Exploring AWS CloudWatch Alarms

10 minute read

Exploring the functionality of AWS CloudWatch alarms, understanding their operation, configuration, and practical application within CDK applications. Learn to define and customize alarms, including adjusting periods, evaluation ranges, and handling missing data, to ensure robust monitoring and e...

Personal Project - Automating numerical calculations and implementing ML models

10 minute read

The objective of this project is to develop a system enabling scientists to automate numerical calculations on remote clusters and build an internal database of calculation outcomes. It also involves training machine learning models on these calculations and seamlessly integrating them for numeri...

Authentication and Authorization in FARM Stack using JWT

11 minute read

Discover the fundamentals of JWT authentication and its advantages within distributed systems and microservice architectures. Explore the integration of authentication into the FARM stack, consisting of FastAPI, React, and MongoDB, utilizing JSON Web Tokens (JWT).

Back to top ↑

2022

Local Python Development Environment

11 minute read

Discover a comprehensive guide on configuring your local machine for Python projects. This guide provides an overview of the most commonly used tools throughout the development process.

PGAS and Coarray Fortran

14 minute read

Exploring the PGAS paradigm and experimenting with coarrays in Fortran. Learning about the principles behind PGAS, Fortran coarrays and its applications in parallel programming.

Understanding systemd and creating Linux services

16 minute read

Delve into the fundamentals of systemd, covering dependencies, unit files, and service configuration. Explore the process of configuring custom applications as systemd services. Learn how to efficiently manage and run applications using systemd within your system.

Dynamic generation of multiple CI/CD parent-child pipelines using GitLab

11 minute read

Set up GitLab CI/CD locally for easier experimentation and testing. Investigate methods for creating nested parent-child pipelines and explore the process and advantages of implementing this approach. Learn how to streamline your development workflow with nested pipelines for better organization ...

SageMaker Serverless Inference using BYOC

9 minute read

Discover techniques for deploying custom models within Docker images using SageMaker and serverless inference. Explore the functionalities and benefits of each approach. Learn how to efficiently deploy your models for scalable and efficient inference.

Understanding Big Data File Formats

9 minute read

Dive into the structure of popular Big Data file formats like Parquet, Avro, and ORC. Understand their unique features and advantages. Learn how these formats optimize data storage and processing.

DynamoDB Stream, Lambda, and S3 - Local Setup

6 minute read

Create a simple application that utilizes DynamoDB Stream, Lambda, and S3 services. Set it up locally for easy development, testing, and experimentation. This setup demonstrates how these AWS services can work tosgether.

Achieving Scalable Multilingual Semantic Search

19 minute read

Understand the basics of seq2seq architecture and artificial neural networks (ANNs). Learn about multilingual models and their applications. Discover how to use these technologies to achieve scalable multilingual semantic search.

DICOM File Processing

14 minute read

Discover how to handle and process DICOM files. Explore popular free and open-source libraries that can help you develop applications for efficient DICOM processing. These tools and libraries make managing medical images much easier and straightforward.

Back to top ↑

2021

DICOM File Format Basics

14 minute read

Explore the fundamentals of the DICOM file format! This quick introduction covers the basics of DICOM’s structure, its essential uses, and tips for easily navigating its complex and abstract components.

Back to top ↑