Skip to content

ppatrik-dev/prelearn

Repository files navigation

Bachelor's Thesis Project

Author: Patrik Procházka (xprochp00@stud.fit.vut.cz)
Year: 2025/2026

Interactive Application for Data Preprocessing

PreLearn is an interactive web application for preprocessing structured data such as CSV, Excel, JSON, and XML files.

The application supports 6 data formats and includes 26 preprocessing methods, covering both fundamental and advanced data processing techniques. To support learning, it also provides 8 example datasets with predefined preprocessing scenarios.


Project Structure

app/
├── data/
│   ├── examples/          # Sample datasets for demonstrations
│   └── tmp/               # Temporary upload and cache files
│
├── src/
│   ├── core/              # Preprocessing logic and pipeline engine
│   ├── gui/               # Graphical user interface components
│   ├── utils/             # Shared utility functions and helpers
│   └── app.py             # Main application entry point
│
├── tests/
│   ├── data/              # Data input validation and parser tests
│   └── pipeline/          # Pipeline processing and integration tests
│
├── Dockerfile             # Docker container definition
├── docker-compose.yml     # Containers setup configuration
├── requirements.in        # Source dependency definitions
├── requirements.txt       # Compiled project dependencies
├── LICENSE                # Project license information
└── README.md              # Project documentation

Instalation Guide

This section describes the steps required to build, run, and manage the PreLearn application using Docker and Docker Compose in both development and production environments.

Requirements

Install the required technologies:

  • Docker
  • Docker Compose

Note An active internet connection is required for loading external application icons.


Build

Build Docker images:

docker compose build

Development

Start the development environment:

docker compose up dash-dev

Application will be available at:

http://localhost:8050

Production

Start the production environment:

docker compose up dash-prod

Application will be available at:

http://localhost:8050

Stop

Stop all running containers:

docker compose down

Remove containers and volumes:

docker compose down -v

User Manual

This section describes the step-based workflow of the application, guiding the user from data upload through configuration to execution of the preprocessing pipeline.

Step 1: File Upload and Data Loading

The initial screen is split into two main parts, allowing users either to upload their own datasets or restore a previously saved preprocessing workflow.

  • On the left side, users can upload data files for preprocessing.
  • On the right side, users can upload an exported preprocessing archive to restore a previous workflow.

If no custom data is available, the lower section provides example datasets grouped by preprocessing categories, including predefined operations for typical use cases.


Step 2: Data Management and Pipeline Overview

After loading data, the main workspace provides an overview of both the dataset and the preprocessing workflow.

  • Uploaded datasets are shown in interactive tables with sorting and filtering support.
  • A history panel tracks all preprocessing steps, enabling navigation, editing, insertion, and removal of steps.
  • Users can switch between raw data preview and processed outputs for each step.

The interface also supports downloading processed data and exporting the full preprocessing pipeline.


Step 3: Method Selection and Configuration

The preprocessing pipeline is defined using a side panel with three sections:

  • Select: Choose preprocessing methods grouped by category.
  • Order: Define execution order and assign compatible dataset columns.
  • Config: Adjust parameters for individual methods.

Changes are applied via the Apply Operations button, which updates the pipeline state. If no valid changes are detected, the button remains disabled.


Step 4: Execution and Results

Once configured, the pipeline can be executed to apply all preprocessing steps.

  • The system displays dataset statistics such as missing values, duplicates, distribution, and dataset size.
  • Each step provides detailed outputs in dedicated result cards.
  • Outputs may include textual summaries or interactive visualizations depending on the method type.

This structure ensures full transparency of the preprocessing process and supports iterative refinement of the workflow.


License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Interactive web application for data preprocessing

Resources

License

Stars

Watchers

Forks

Contributors

Languages