Faculty of Computer Science and Engineering

Permanent URI for this communityhttps://repository.ukim.mk/handle/20.500.12188/5

The Faculty of Computer Science and Engineering (FCSE) within UKIM is the largest and most prestigious faculty in the field of computer science and technologies in Macedonia, and among the largest faculties in that field in the region. The FCSE teaching staff consists of 50 professors and 30 associates. These include many “best in field” personnel, such as the most referenced scientists in Macedonia and the most influential professors in the ICT industry in the Republic of Macedonia.

Browse

Search Results

Now showing 1 - 1 of 1
  • Some of the metrics are blocked by your 
    Item type:Publication,
    A Multimodal Vision: Language Framework for Intelligent Detection and Semantic Interpretation of Urban Waste
    (MDPI AG, 2026-04-03)
    Jonuzi, Verda Misimi
    ;
    Urban waste management remains a significant challenge for achieving environmental sustainability and advancing smart city infrastructures. This study proposes a multimodal vision–language framework that integrates real-time object detection with automated semantic interpretation and structured semantic analysis for intelligent urban waste monitoring. A custom dataset including 2247 manually annotated images was constructed from publicly available sources (TrashNet and TACO), enabling robust multi-class detection across six waste categories. Two state-of-the-art object detection models, YOLOv8m and YOLOv10m, were trained and evaluated using a fixed 70/15/15 train–validation–test split. Under this configuration, YOLOv8m achieved a mAP@50 of 90.5% and a mAP@50–95 of 87.1%, slightly outperforming YOLOv10m (89.5% and 86.0%, respectively). Moreover, YOLOv8m demonstrated superior inference efficiency, reaching 120 FPS compared to 105 FPS for YOLOv10m. To obtain a more reliable estimate of performance stability across data partitions, stratified 5-Fold Cross-Validation was conducted. YOLOv8m achieved an average Precision of 0.9324 and an average mAP@50–95 of 0.9315 ± 0.0575 across folds, suggesting generally stable performance across data partitions, while also revealing variability associated with dataset heterogeneity. Beyond object detection, the framework integrates MiniGPT-4 to generate context-aware textual descriptions of detected waste items, thereby enhancing semantic interpretability and user engagement. Furthermore, GPT-5 Vision is incorporated as a structured auxiliary semantic classification and category-suggestion module that analyzes object crops and multi-class scenes, producing constrained JSON-formatted outputs that include category labels, concise descriptions, and recyclability indicators. Overall, the proposed YOLOv8–MiniGPT-4–GPT-5 Vision pipeline shows that combining accurate real-time detection with multimodal semantic reasoning can improve interpretability and support interactive, semantically enriched waste analysis in smart-city and environmental monitoring scenarios.