Author: Patrik Procházka (xprochp00@stud.fit.vut.cz)
Year: 2025/2026
PreLearn is an interactive web application for preprocessing structured data such as CSV, Excel, JSON, and XML files.
The application supports 6 data formats and includes 26 preprocessing methods, covering both fundamental and advanced data processing techniques. To support learning, it also provides 8 example datasets with predefined preprocessing scenarios.
app/
├── data/
│ ├── examples/ # Sample datasets for demonstrations
│ └── tmp/ # Temporary upload and cache files
│
├── src/
│ ├── core/ # Preprocessing logic and pipeline engine
│ ├── gui/ # Graphical user interface components
│ ├── utils/ # Shared utility functions and helpers
│ └── app.py # Main application entry point
│
├── tests/
│ ├── data/ # Data input validation and parser tests
│ └── pipeline/ # Pipeline processing and integration tests
│
├── Dockerfile # Docker container definition
├── docker-compose.yml # Containers setup configuration
├── requirements.in # Source dependency definitions
├── requirements.txt # Compiled project dependencies
├── LICENSE # Project license information
└── README.md # Project documentation
This section describes the steps required to build, run, and manage the PreLearn application using Docker and Docker Compose in both development and production environments.
Install the required technologies:
DockerDocker Compose
Note An active internet connection is required for loading external application icons.
Build Docker images:
docker compose buildStart the development environment:
docker compose up dash-devApplication will be available at:
http://localhost:8050
Start the production environment:
docker compose up dash-prodApplication will be available at:
http://localhost:8050
Stop all running containers:
docker compose downRemove containers and volumes:
docker compose down -vThis section describes the step-based workflow of the application, guiding the user from data upload through configuration to execution of the preprocessing pipeline.
The initial screen is split into two main parts, allowing users either to upload their own datasets or restore a previously saved preprocessing workflow.
- On the left side, users can upload data files for preprocessing.
- On the right side, users can upload an exported preprocessing archive to restore a previous workflow.
If no custom data is available, the lower section provides example datasets grouped by preprocessing categories, including predefined operations for typical use cases.
After loading data, the main workspace provides an overview of both the dataset and the preprocessing workflow.
- Uploaded datasets are shown in interactive tables with sorting and filtering support.
- A history panel tracks all preprocessing steps, enabling navigation, editing, insertion, and removal of steps.
- Users can switch between raw data preview and processed outputs for each step.
The interface also supports downloading processed data and exporting the full preprocessing pipeline.
The preprocessing pipeline is defined using a side panel with three sections:
- Select: Choose preprocessing methods grouped by category.
- Order: Define execution order and assign compatible dataset columns.
- Config: Adjust parameters for individual methods.
Changes are applied via the Apply Operations button, which updates the pipeline state. If no valid changes are detected, the button remains disabled.
Once configured, the pipeline can be executed to apply all preprocessing steps.
- The system displays dataset statistics such as missing values, duplicates, distribution, and dataset size.
- Each step provides detailed outputs in dedicated result cards.
- Outputs may include textual summaries or interactive visualizations depending on the method type.
This structure ensures full transparency of the preprocessing process and supports iterative refinement of the workflow.
This project is licensed under the MIT License. See the LICENSE file for details.