feat(research): add base research for syntax prototype

This commit is contained in:
2026-05-13 10:07:46 +02:00
parent 5fb1d118d9
commit 987886bc66

158
research/01_syntax.md Normal file
View File

@@ -0,0 +1,158 @@
# Syntax Brainstorming
The syntax depends on the kind of implementation. For example, we could want the type system to be valid Python and use Python classes, functions and annotations. In that case, the syntax would be quite restricted by the available set of valid Python expression which don't affect have a direct effect on the program.
Moreover, if we do use Python's builtin syntax, there could be two approaches: either define real Python classes and functions in Python, or simply use the syntax and parse it externally, without any real Python semantic.
Finally, there is also the option to define a new ad-hoc syntax, which may or may not use similar constructs present in Python. This would require a program to be compiled to become valid, parsable and executable Python code. This also means that an extension of the Python Language Server would need to be created for developers to use the framework effectively.
NB: The option to define the annotations in Python comments will not be considered. Although it would allow custom syntax while keeping the code valid Python, it does not fit the vision for this project, nor is it suitable for a full type system implementation.
## Comparison
|**Syntax**|**Using Python constructs**|**Valid Python code**|
|:--------:|:-------------------------:|:-------------------:|
| Python | Yes | Yes |
| Python | No | Yes |
| Custom | Yes | No |
| Custom | No | No |
In terms of integration, the first option seems the most well suited as it provides a simple Python package that can be added to any Python project, but it has multiple disadvantages:
- May be complex to work well with Python's builtin type and annotation system
- Can be quite verbose
- Doesn't involve the creation of a custom parser
Looking at the following examples, my personnal preference would go towards the last option. The only notable downsides with that option is the need to compile the code to make it become valid Python, and the fact that it doesn't integrate into any Python LSP as is.
## Dataframe definition examples
### Python syntax - using Python constructs
```python
from datetime import datetime
from typing import Annotated
from midas import Frame, Column
import pandas as pd
df: Annotated[pd.DataFrame, Frame[
Column["verified", bool],
Column["birth_year", int],
Column["height", float],
Column["name", str],
Column["date", datetime]
]] = pd.read_csv("data.csv")
```
### Python syntax - without Python constructs
```python
from __future__ import annotations
from datetime import datetime
from typing import Annotated
import pandas as pd
df: Annotated[pd.DataFrame, Frame[
Column["verified", bool],
Column["birth_year", int],
Column["height", float],
Column["name", str],
Column["date", datetime]
]] = pd.read_csv("data.csv")
# or
df: pd.DataFrame = pd.read_csv("data.csv")
"""midas
column 'verified' bool
column 'birth_year' int
column 'height' float
column 'name' str
column 'date' datetime
"""
```
### Custom syntax - using Python constructs
```python
from datetime import datetime
import pandas as pd
Frame[
Column["verified", bool],
Column["birth_year", int],
Column["height", float],
Column["name", str],
Column["date", datetime]
]
df: pd.DataFrame = pd.read_csv("data.csv")
```
### Custom syntax - without Python constructs
```python
from datetime import datetime
import pandas as pd
Frame[
Column<bool> {name: "verified"},
Column<int>{name: "birth_year"}
Column<float>{name: "height"}
Column<str>{name: "name"}
Column<datetime>{name: "date"}
]
df: pd.DataFrame = pd.read_csv("data.csv")
```
## Custom types examples
### Python syntax - using Python constructs
```python
from midas import Type
class Latitude(Type[float]): ...
class Longitude(Type[float]): ...
class GeoCoordinates(Type[tuple[Latitude, Longitude]]):
@property
def lat(self) -> Latitude:
return self[0]
@property
def lon(self) -> Longitude:
return self[0]
```
### Python syntax - without Python constructs
...
### Custom syntax - using Python constructs
```python
type Latitude[float] = ... # `= ...` is just for syntax highlighting
type Longitude[float] = ...
type GeoCoordinates[Latitude, Longitude]:
lat: Latitude
lon: Longitude
```
### Custom syntax - without Python constructs
```python
type Latitude<float>
type Longitude<float>
type GeoCoordinates<Latitude, Longitude>{
lat Latitude
lon Longitude
}
```