feat(research): add section on required syntax elements

This commit is contained in:
2026-05-13 10:48:43 +02:00
parent 987886bc66
commit 5551f381cd

View File

@@ -8,6 +8,8 @@ Finally, there is also the option to define a new ad-hoc syntax, which may or ma
NB: The option to define the annotations in Python comments will not be considered. Although it would allow custom syntax while keeping the code valid Python, it does not fit the vision for this project, nor is it suitable for a full type system implementation.
The framework must not only allow defining data-frame schemas and custom types, but also operations (e.g. scaling a length), inter-compatibility (e.g. adding latitudes doesn't make sense but adding lengths does), and ad-hoc transformation (e.g. using a scaler from `sklearn` should be allowed and it will transform the type).
## Comparison
|**Syntax**|**Using Python constructs**|**Valid Python code**|
@@ -24,7 +26,24 @@ In terms of integration, the first option seems the most well suited as it provi
Looking at the following examples, my personnal preference would go towards the last option. The only notable downsides with that option is the need to compile the code to make it become valid Python, and the fact that it doesn't integrate into any Python LSP as is.
## Dataframe definition examples
## Required syntax elements
- Defining a data-frame schema
- Defining a column with a type
- Giving a column a name
- Specifying constraints on a column (could be defined in the type itself for simplicity)
- Defining a custom type
- A type must be based on a underlying Python type
- A type can have properties (e.g. a GeoCoordinate has a latitude and a longitude)
- Defining operations
- Defining allowed operations between the same or different types, and the resulting type
- Defining ad-hoc transformations (e.g. `sklearn` scaler)
Defining operations needs to be simple and concise. Many types will support basic mathematical operations with unit-less factors (e.g. scaling), or self-operations (e.g. addition, subtraction, ratio).
In a further development, we may want the framework to support units. This would be a more general kind of types with many similar operations. A dedicated unit management system might be useful to avoid redundant and verbose code.
## Data-frame definition examples
### Python syntax - using Python constructs