Now showing 1 - 10 of 13
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Distributed Denial of Wallet Attack on Serverless Pay-as-you-go Model
    (IEEE, 2022-11-15)
    ;
    The serverless pay-as-you-go model in the cloud enables payment of services during execution and resources used at the smallest, most granular level, as was the initial idea when setting the foundations and concepts of the pay-as-you-go model in the cloud. The disadvantage of this method of payment during execution and the resources used is that it is subject to financial damage if we have an attack on serverless services. This paper defines notions for three types of attacks that can cause financial damage to the serverless pay-as-you-go model and are experimentally validated. The first attack is Blast DDoW - Distributed Denial of Wallet, the second attack is Continual Inconspicuous DDoW, and the third one is Background Chained DDoW. We discussed financial damages and the consequences of each type of attack.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    High-Performance Serverless Request Generator: Capable of Generating a Hundred Thousand Requests Per Second
    (IEEE, 2023-05-22)
    ;
    Depending on the scope, performance testing systems are distributed and scalable systems that can use many distributed instances to generate workload. Server-based systems used in performance testing can not cope with flexible workloads and specifications that may generate high costs for an extensive number of generated requests. Our goal was to develop a Virtual Patient Generator for testing the electrocardiogram streaming system for monitoring thousands of patients. To deliver a fully functional workload generator of a hundred thousand requests per second, in this paper, we developed a High-Performance Serverless Request Generator as an implementation of a Distributed Serverless Workload Generator concept. The system architecture comprises Serverless services, Pub/Sub, and Cloud Storage capable to deliver a cheaper version of a flexible number of requests, being scalable to an even higher extend. We present limitations, performance, and advantages by comparing the presented serverless load testing system with other workload generators available as SaaS or distributed load testing systems based on servers.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    FinOps in Cloud-Native Near Real-Time Serverless Streaming Solutions
    (IEEE, 2023-11-21)
    ;
    FinOps is a novel discipline in cloud computing and technology management that aims to optimize an organization’s cloud spending, enhance financial resource management, and promote collaboration between technology and finance teams for efficient cloud resource utilization and cost control. The adoption of cloud-native and serverless architectures has revolutionized the way organizations design and deploy their near-real-time streaming solutions. These solutions are vital for various applications, including data analytics, monitoring, and content delivery. However, understanding the cost implications of scaling these solutions to accommodate varying numbers of concurrent users remains a challenge. This paper presents a cost analysis for a cloud-native near real-time serverless streaming solution, considering both streaming and idle times of the system. The cost of cloud services is heavily influenced by efficient processing management and the selection of suitable services. This decision can result in substantial cost differences across various cloud providers, varying from expensive to cost-efficient options. This paper explores the critical factors impacting costs, highlighting differences in cost levels based on several factors.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    An Overview of Legal Artificial Intelligence Assistants Landscape
    (IEEE, 2025-11-25)
    ;
    Kostov, Alen
    ;
    ;
    ;
    This survey presents the current landscape of AI legal tools, serving both legal professionals and the general public. It compares existing solutions, while also addressing technological and business challenges that shape their development and use. The findings contribute to a clearer understanding of the role and potential of AI assistants in the legal domain, offering insights relevant to both practitioners and researchers.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    CUDA Calculation of Shannon Entropy for a Sliding Window System
    (IEEE, 2024-11-26)
    Velichkovski, Gordon
    ;
    ;
    Entropy algorithms are crucial in fields where assessing randomness, uncertainty, or complexity is vital. As datasets grow, efficient entropy calculations become important. This work explores the parallelization of Shannon entropy calculations, using GPU acceleration through CUDA for sliding window systems. By leveraging GPUs’ parallel architecture, the approach achieves up to 15x speedup for large datasets. However, smaller datasets show limited improvements due to overhead, underscoring the need for optimization to harness GPU acceleration’s potential.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs
    (IEEE, 2025-06-02)
    ;
    Petrovski, Nikola
    ;
    Training large language models requires extensive processing, made possible by many high-performance computing resources. This study compares multi-node and multi-GPU environments for training large language models using electrocardiogram data. It provides a detailed mapping of current frameworks for distributed deep learning in multi-node and multi-GPU settings, including Horovod from Uber, DeepSpeed from Microsoft, and the built-in distributed capabilities of PyTorch and TensorFlow. We compare various multi-GPU setups for different dataset configurations, utilizing multiple HPC nodes independently and focusing on scalability, speedup, efficiency, and overhead. The analysis leverages HPC infrastructure with SLURM, Apptainer (Singularity) containers, CUDA, PyTorch, and shell scripts to support training workflows and automation. We achieved a sub-linear speedup when scaling the number of GPUs, with values of 1.6x for two GPUs and 1.9x for four GPUs.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Optimal Scalable Real-Time ECG Monitoring of Thousands of Concurrent Patients
    (IEEE, 2024-05-20)
    ;
    ;
    Gushev, Pano
    ;
    ;
    This paper explores the transformation of electrocardiogram (ECG) monitoring from traditional offline to Real-Time analysis, enabled by high-speed mobile networks and affordable data plans. The transition to live monitoring presents challenges in data streaming and processing and the necessity of balancing immediacy with accuracy. We optimize two critical aspects of cloud architecture and scalability under the broader umbrella of cloud efficiency by evaluating the architecture’s components and their contribution to overall efficiency. The focus is on accommodating over a thousand concurrent patients streaming ECG data while maintaining cost-effectiveness, constrained by Near Real-Time Round Trip Time (RTT) of ≤ 3 seconds, achieving a throughput of ≥ 333.333 (msgs/s).
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Serverless Implementations of Real-time Embarrassingly Parallel Problems
    (IEEE, 2022-11-15)
    ;
    In this paper, we conduct experiments to deploy a scalable serverless computing solution for real-time monitoring of thousands of patients with streaming electrocardiograms as an example of embarrassingly parallel tasks originally executed on two virtual machines. The research question is to find the speedup of such solution versus classical virtual machine approaches with sequential or parallel threads.The challenge of migrating an existing service to a serverless solution is to adapt and reconfigure the code for serverless platform, to write the code to invoke the service in parallel and asynchronously, and to use other services in the cloud that are needed for the whole solution to be functional and scalable. Evaluation of developing various solutions matching migration challenges to Google Cloud Run, Google Cloud Compute Engine, and Google Cloud Storage (customization of code, the configuration of services) shows that greater speedups can be achieved by dividing the Embarrassingly Parallel tasks into sub-tasks executed as a serverless service. We achieved highest speedup of almost 40 for Serverless solution compared to a sequential execution on a virtual machine solution, and speedup of 23 for Serverless solution compared to a Parallel execution using virtual machines.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Loop Unrolling Impact on CUDA Matrix Multiplication Operations
    (IEEE, 2024-11-26)
    Stefkovski, Vojdan
    ;
    ;
    This paper investigates the impact of loop unrolling on CUDA matrix multiplication operations’ performance across NVIDIA GPUs. We benchmarked both basic and unrolled kernels with varying unroll factors (2, 4, 8, and 16) and CUDA block sizes (8, 16, and 32) on matrices ranging from 128 × 128 to 4096 × 4096. Using two GPUs, the GeForce RTX 4060 and GTX TITAN X, we analyze how unrolling factors impact execution time. Our findings indicate that loop unrolling, particularly with factors of 8 and 16 and a block size of 32, yields significant performance gains on larger matrices. These results confirm loop unrolling as an effective optimization technique for CUDA matrix operations, providing insights for developers to enhance computational efficiency across different GPU architectures.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    CardioHPC: Serverless Approaches for Real-Time Heart Monitoring of Thousands of Patients
    (IEEE, 2022-11)
    ;
    ;
    Amza, Andrei
    ;
    Hohenegger, Armin
    ;
    Prodan, Radu
    We analyze a heart monitoring center for patients wearing electrocardiogram sensors outside hospitals. This prevents serious heart damages and increases life expectancy and health-care efficiency. In this paper, we address a problem to provide a scalable infrastructure for the real-time processing scenario for at least 10,000 patients simultaneously, and efficient fast processing architecture for the postponed scenario when patients upload data after realized measurements. CardioHPC is a project to realize a simulation of these two scenarios using digital signal processing algorithms and artificial intelligence-based detection and classification software for automated reporting and alerting. We elaborate the challenges we met in experimenting with different serverless implementations: 1) container-based on Google Cloud Run, and 2) Function-as-a-Service (FaaS) on AWS Lambda. Experimental results present the effect of overhead in the request and transfer time, and speedup achieved by analyzing the response time and throughput on both container-based and FaaS implementations as serverless workflows.