From 987886bc6625ac2e57057f9d0ba18120a3264fad Mon Sep 17 00:00:00 2001 From: LordBaryhobal Date: Wed, 13 May 2026 10:07:46 +0200 Subject: [PATCH] feat(research): add base research for syntax prototype --- research/01_syntax.md | 158 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 research/01_syntax.md diff --git a/research/01_syntax.md b/research/01_syntax.md new file mode 100644 index 0000000..133ebf9 --- /dev/null +++ b/research/01_syntax.md @@ -0,0 +1,158 @@ +# Syntax Brainstorming + +The syntax depends on the kind of implementation. For example, we could want the type system to be valid Python and use Python classes, functions and annotations. In that case, the syntax would be quite restricted by the available set of valid Python expression which don't affect have a direct effect on the program. + +Moreover, if we do use Python's builtin syntax, there could be two approaches: either define real Python classes and functions in Python, or simply use the syntax and parse it externally, without any real Python semantic. + +Finally, there is also the option to define a new ad-hoc syntax, which may or may not use similar constructs present in Python. This would require a program to be compiled to become valid, parsable and executable Python code. This also means that an extension of the Python Language Server would need to be created for developers to use the framework effectively. + +NB: The option to define the annotations in Python comments will not be considered. Although it would allow custom syntax while keeping the code valid Python, it does not fit the vision for this project, nor is it suitable for a full type system implementation. + +## Comparison + +|**Syntax**|**Using Python constructs**|**Valid Python code**| +|:--------:|:-------------------------:|:-------------------:| +| Python | Yes | Yes | +| Python | No | Yes | +| Custom | Yes | No | +| Custom | No | No | + +In terms of integration, the first option seems the most well suited as it provides a simple Python package that can be added to any Python project, but it has multiple disadvantages: +- May be complex to work well with Python's builtin type and annotation system +- Can be quite verbose +- Doesn't involve the creation of a custom parser + +Looking at the following examples, my personnal preference would go towards the last option. The only notable downsides with that option is the need to compile the code to make it become valid Python, and the fact that it doesn't integrate into any Python LSP as is. + +## Dataframe definition examples + +### Python syntax - using Python constructs + +```python +from datetime import datetime +from typing import Annotated + +from midas import Frame, Column +import pandas as pd + +df: Annotated[pd.DataFrame, Frame[ + Column["verified", bool], + Column["birth_year", int], + Column["height", float], + Column["name", str], + Column["date", datetime] +]] = pd.read_csv("data.csv") +``` + +### Python syntax - without Python constructs + +```python +from __future__ import annotations + +from datetime import datetime +from typing import Annotated + +import pandas as pd + +df: Annotated[pd.DataFrame, Frame[ + Column["verified", bool], + Column["birth_year", int], + Column["height", float], + Column["name", str], + Column["date", datetime] +]] = pd.read_csv("data.csv") + +# or + +df: pd.DataFrame = pd.read_csv("data.csv") +"""midas +column 'verified' bool +column 'birth_year' int +column 'height' float +column 'name' str +column 'date' datetime +""" +``` + +### Custom syntax - using Python constructs + +```python +from datetime import datetime + +import pandas as pd + +Frame[ + Column["verified", bool], + Column["birth_year", int], + Column["height", float], + Column["name", str], + Column["date", datetime] +] +df: pd.DataFrame = pd.read_csv("data.csv") +``` + + +### Custom syntax - without Python constructs + +```python +from datetime import datetime + +import pandas as pd + +Frame[ + Column {name: "verified"}, + Column{name: "birth_year"} + Column{name: "height"} + Column{name: "name"} + Column{name: "date"} +] +df: pd.DataFrame = pd.read_csv("data.csv") +``` + +## Custom types examples + +### Python syntax - using Python constructs + +```python +from midas import Type + +class Latitude(Type[float]): ... +class Longitude(Type[float]): ... + +class GeoCoordinates(Type[tuple[Latitude, Longitude]]): + @property + def lat(self) -> Latitude: + return self[0] + + @property + def lon(self) -> Longitude: + return self[0] +``` + +### Python syntax - without Python constructs + +... + +### Custom syntax - using Python constructs + +```python +type Latitude[float] = ... # `= ...` is just for syntax highlighting +type Longitude[float] = ... + +type GeoCoordinates[Latitude, Longitude]: + lat: Latitude + lon: Longitude +``` + + +### Custom syntax - without Python constructs + +```python +type Latitude +type Longitude + +type GeoCoordinates{ + lat Latitude + lon Longitude +} +``` \ No newline at end of file