HEL/TB-Docs

Fork 0

Files

LordBaryhobal f5c8f6fa62

fix(research): typo in syntax brainstorming

2026-05-13 11:15:40 +02:00

6.4 KiB

Raw Blame History

Syntax Brainstorming

The syntax depends on the kind of implementation. For example, we could want the type system to be valid Python and use Python classes, functions and annotations. In that case, the syntax would be quite restricted by the available set of valid Python expression which don't have a direct effect on the program.

Moreover, if we do use Python's builtin syntax, there could be two approaches: either define real Python classes and functions in Python, or simply use the syntax and parse it externally, without any real Python semantic.

Finally, there is also the option to define a new ad-hoc syntax, which may or may not use similar constructs present in Python. This would require a program to be compiled to become valid, parsable and executable Python code. This also means that an extension of the Python Language Server would need to be created for developers to use the framework effectively.

NB: The option to define the annotations in Python comments will not be considered. Although it would allow custom syntax while keeping the code valid Python, it does not fit the vision for this project, nor is it suitable for a full type system implementation.

The framework must not only allow defining data-frame schemas and custom types, but also operations (e.g. scaling a length), inter-compatibility (e.g. adding latitudes doesn't make sense but adding lengths does), and ad-hoc transformation (e.g. using a scaler from sklearn should be allowed and it will transform the type).

Comparison

Syntax	Using Python constructs	Valid Python code
Python	Yes	Yes
Python	No	Yes
Custom	Yes	No
Custom	No	No

In terms of integration, the first option seems the most well suited as it provides a simple Python package that can be added to any Python project, but it has multiple disadvantages:

May be complex to work well with Python's builtin type and annotation system
Can be quite verbose
Doesn't involve the creation of a custom parser

Looking at the following examples, my personal preference would go towards the last option. The only notable downsides with that option is the need to compile the code to make it become valid Python, and the fact that it doesn't integrate into any Python LSP as is.

Required syntax elements

Defining a data-frame schema
- Defining a column with a type
- Giving a column a name
- Specifying constraints on a column (could be defined in the type itself for simplicity)
Defining a custom type
- A type must be based on a underlying Python type
- A type can have properties (e.g. a GeoCoordinate has a latitude and a longitude)
Defining operations
- Defining allowed operations between the same or different types, and the resulting type
Defining ad-hoc transformations (e.g. sklearn scaler)

Defining operations needs to be simple and concise. Many types will support basic mathematical operations with unit-less factors (e.g. scaling), or self-operations (e.g. addition, subtraction, ratio).

In a further development, we may want the framework to support units. This would be a more general kind of types with many similar operations. A dedicated unit management system might be useful to avoid redundant and verbose code.

Data-frame definition examples

Python syntax - using Python constructs

from datetime import datetime
from typing import Annotated

from midas import Frame, Column
import pandas as pd

df: Annotated[pd.DataFrame, Frame[
    Column["verified", bool],
    Column["birth_year", int],
    Column["height", float],
    Column["name", str],
    Column["date", datetime]
]] = pd.read_csv("data.csv")

Python syntax - without Python constructs

from __future__ import annotations

from datetime import datetime
from typing import Annotated

import pandas as pd

df: Annotated[pd.DataFrame, Frame[
    Column["verified", bool],
    Column["birth_year", int],
    Column["height", float],
    Column["name", str],
    Column["date", datetime]
]] = pd.read_csv("data.csv")

# or

df: pd.DataFrame = pd.read_csv("data.csv")
"""midas
column 'verified' bool
column 'birth_year' int
column 'height' float
column 'name' str
column 'date' datetime
"""

Custom syntax - using Python constructs

from datetime import datetime

import pandas as pd

Frame[
    Column["verified", bool],
    Column["birth_year", int],
    Column["height", float],
    Column["name", str],
    Column["date", datetime]
]
df: pd.DataFrame = pd.read_csv("data.csv")

Custom syntax - without Python constructs

from datetime import datetime

import pandas as pd

Frame[
    Column<bool> {name: "verified"},
    Column<int>{name: "birth_year"}
    Column<float>{name: "height"}
    Column<str>{name: "name"}
    Column<datetime>{name: "date"}
]
df: pd.DataFrame = pd.read_csv("data.csv")

Custom types examples

Python syntax - using Python constructs

from midas import Type

class Latitude(Type[float]): ...
class Longitude(Type[float]): ...

class GeoCoordinates(Type[tuple[Latitude, Longitude]]):
    @property
    def lat(self) -> Latitude:
        return self[0]

    @property
    def lon(self) -> Longitude:
        return self[0]

Python syntax - without Python constructs

...

Custom syntax - using Python constructs

type Latitude[float] = ...  # `= ...` is just for syntax highlighting
type Longitude[float] = ...

type GeoCoordinates[Latitude, Longitude]:
    lat: Latitude
    lon: Longitude

Custom syntax - without Python constructs

type Latitude<float>
type Longitude<float>

type GeoCoordinates<Latitude, Longitude>{
    lat Latitude
    lon Longitude
}

Operations

Custom syntax - without Python constructs

type Latitude<float>
type Longitude<float>
type LatitudeDiff<float>
type LongitudeDiff<float>
type Distance<float>

op <Latitude> - <Latitude> = <LatitudeDiff>
op <Longitude> - <Longitude> = <LongitudeDiff>

op <LatitudeDiff> + <LatitudeDiff> = <LatitudeDiff>
op <LongitudeDiff> + <LongitudeDiff> = <LongitudeDiff>
op <LatitudeDiff> - <LatitudeDiff> = <LatitudeDiff>
op <LongitudeDiff> - <LongitudeDiff> = <LongitudeDiff>

op <GeoCoordinates>.distance(<GeoCoordinates>) = <Distance>

6.4 KiB Raw Blame History

Syntax Brainstorming

Comparison

Required syntax elements

Data-frame definition examples

Python syntax - using Python constructs

Python syntax - without Python constructs

Custom syntax - using Python constructs

Custom syntax - without Python constructs

Custom types examples

Python syntax - using Python constructs

Python syntax - without Python constructs

Custom syntax - using Python constructs

Custom syntax - without Python constructs

Operations

Custom syntax - without Python constructs

6.4 KiB

Raw Blame History