Files
MSE-PI-E2EEDA-Plein-de-eeee…/report/main/05-implementation.typ

461 lines
25 KiB
Typst
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
#import "/metadata.typ": *
// highlight box for security callouts
#pagebreak()
= #i18n("implementation-title", lang:option.lang) <sec:impl>
#add-chapter()[
While the complete application did not change from the Design in @sec:design, some adjustments has been added as improvements.
Namely, the _nodes\_interface_ has been added by a key/value, available in @tab:nodes_interface_addition
#figure(
table(
columns: (auto, auto, auto, auto),
align: center,
table.header("name", "key", "data construction", "data size"),
[Battery percent of charge],[0x05],[Integer giving the raw percentage],[1B],
),
caption: [Nodes interface addition during the implementation],
)<tab:nodes_interface_addition>
This addition in the _nodes\_interface_ has been motivated by the nodes uptime.
Observations reported that their battery ran out after few days instead of the expected weeks.\
Hence, the battery level has been added to the values reported by the nodes.
== Nodes
No @llm has been used at any point during the nodes implementation.
The nodes implementation followed the design for the major part.
The implementation has ultimately been done in C using ZephyrOS, hence the use of
classes is in spirit only.
The choice for this embedded @os was motivated by the symbiosis with Nordic hardware, alongside the
opportunity for the assigned student to try this solution out. Following are the major elements that differs from the nodes design.
=== Window opening status
The window opening status is observed with a switch.
This switch is expected to be pressed by a closed window, and released when the window is opened.\
The switch is active low to ensure that a node without switch does not report the room window as opened.
The wiring is configured such as the switch _NC_ pin shall be wired to the thingy _EXT0_ pin.
#pagebreak()
=== Battery level monitoring
The nodes monitoring has been observed as very scarce as its values are sent only in the BLE frames.
Furthermore, the nodes battery has been left around many question marks during the first integration.\
Hence, the _battery_percent_unit_ has been added to the nodes.
This unit retrieve the battery voltage and extrapolate the level of charge, following the #link("https://github.com/zephyrproject-rtos/zephyr/tree/main/samples/boards/nordic/battery")[nordic example] #footnote[https://github.com/zephyrproject-rtos/zephyr/tree/main/samples/boards/nordic/battery].
The resulting class diagram is given in @fig:nodes_class_diagram_impl
#let nodes_class_diagram_impl = [
#figure(
image("../resources/img/nodes_class_diagram_impl.svg"),
caption: [Nodes class diagram following implementation]
) <fig:nodes_class_diagram_impl>
]
#nodes_class_diagram_impl
=== @ble period
The designed @ble behaviour doe snot comply with the protocol specifications.
As such, the @ble sending status is manually managed.
As described in @fig:nodes_sequence_diagram, the advertisement of data is done once per supervisor loop.\
During this advertisement, the @ble data are update and the sending of data is activated.
After a short amount of time, the sending of data is disabled until the next advertising.
The amount of time during which the sending is active has been empirically set to 500ms.
This time window ensures that the gateway is able to receive the measurements.
=== Battery usage
Each nodes, with a complete battery, has been observed as running for 3 days before shutting down.
This is obviously too high and the reason has been found thanks to Rémi.\
The air quality sensor CCS811 working frequency is described in @tab:ccs811_drive_mode
#figure(
table(
columns: (auto, auto),
align: center,
table.header("CCS811 drive mode value", "air quality sensing period [s]"),
[0], [disabled],
[1 (default)],[1],
[2], [10],
[3], [60],
[4], [0.25]
),
caption: [Air quality sensor CCS811 drive mode],
)<tab:ccs811_drive_mode>
Setting the drive mode to 3 avoid 59 measures, hence improving the battery usage accordingly.\
The battery standard usage is then demonstrated with @fig:nodes:battery_consumption
#let nodes_battery_consumption = [
#figure(
image("../resources/img/nodes_battery_consumption.png", width: 90%),
caption: [Node battery consumption over time]
) <fig:nodes:battery_consumption>
]
#nodes_battery_consumption
In @fig:nodes:battery_consumption, the peaks in the consumption are occurring
when the air level is measured.
The period for these peaks is 60 seconds, as intended by the CCS811's drive mode 3
=== Deployment
#let nodes_deployment_simple = [
#figure(
image("../resources/img/node_naked.png", width: 90%),
caption: [Node deployment without window opening status sensor]
) <fig:nodes_deployment_simple>
]
#let nodes_deployment_window = [
#figure(
image("../resources/img/node_window.png",width: 75%),
caption: [Node deployment with window opening status sensor]
) <fig:nodes_deployment_window>
]
#let text_deployment = [
Several nodes were deployed on site.
All but four were setup without window sensor.
As such, they did not need anything else that some glue to fix them on the wall.
This deployment is illustrated with the picture in @fig:nodes_deployment_simple.
]
#let uglify = false
#if uglify [
#text_deployment
#grid(
columns: (1.5fr, 1fr),
column-gutter: 2em,
align: center+horizon,
nodes_deployment_simple,
nodes_deployment_window
)
] else [
#grid(
columns: (1.5fr, 1fr),
column-gutter: 2em,
grid.cell(align: left+top, text_deployment),
grid.cell(rowspan: 2, align: center+horizon, nodes_deployment_window),
grid.cell(align: center+bottom, nodes_deployment_simple)
)
]
#v(1em)
For the remaining, window sensor has been wired accordingly.
This deployment is illustrated with the picture in @fig:nodes_deployment_window.
In @fig:nodes_deployment_window, the switch telling if the window is opened is glue in a way it is pressed when the window is closed and released otherwise.
The window is also opened, leading the switch to be released.
== Gateway
@llm was used for debugging specific technical code issues (@ble filtering,
payload decoding, asyncio thread safety, @mqtt reconnection) and to
support the redaction of this section.
=== From design to implementation
The design specified passive @ble advertising with @uuid filtering.
The final implementation differs on several points discovered during
development and integration with our custom node firmware..
=== First implementation: GATT connections
The first approach used active GATT connections with the stock Nordic
firmware. The Pi connected to each Thingy:52 and subscribed to
characteristic notifications temperature (ef680201), humidity
(ef680203) and @co2 (ef680204). This implementation allowed running an overnight test with two
Thingy:52 nodes placed in two separate rooms for seven hours, windows
closed. The first node was in a room with four occupants @co2 rose
progressively from 400 @ppm to a peak of 1071 @ppm. The second was in
an empty room and stayed stable between 400 and 465 @ppm. This
confirmed that the data was meaningful and the communication chain
was working. However, this approach hit a hard wall with BlueZ the Linux
Bluetooth stack does not support scanning and connecting
simultaneously. Any connection attempt while the scanner is active
raises org.bluez.Error.InProgress. The workaround was to stop the
scanner before each connection and restart it afterwards, which
introduced race conditions when multiple nodes were detected at the
same time. Some nodes also occasionally ignored connection attempts,
blocking the script until a ten-second timeout was added via
asyncio.wait_for().
=== Switch to passive advertising
After a team review confirming that our custom node firmware uses
broadcasting rather than GATT, the implementation was fully
refactored. The GATT connection logic was removed entirely and
replaced by a passive BleakScanner.
=== Discovering the right filter
The original design used the service @uuid ef680100 to identify
Thingy:52 packets — this @uuid was found during the stock firmware
exploration phase and worked correctly at the time. When integrating
with our custom node firmware, no packets were detected despite the
node being physically nearby. A Python investigation script
(scan_uuid.py) was written to display all @ble devices in range with
their raw data. This revealed that our custom firmware does not
announce any service @uuid instead it uses the manufacturer data
field with company_id = 0xffff, as defined in the firmware
specification. Note that 0xffff is not an officially registered
company identifier and should ideally be replaced by a dedicated
one such as 0x025A (HES-SO Valais) in a future iteration.
The @uuid filter was replaced by a company_id filter. The scan script
also confirmed that the raw payload matched the expected format
14 bytes with keys 0x01 to 0x05 validating the consistency between
the firmware and the gateway.
=== Integration challenges
*False positives from third-party devices.* Several @ble devices in
the environment also use company_id = 0xffff, triggering the
gateway filter and flooding the logs. Adding a strict 14-byte payload
size check eliminated them — the Thingy:52 payload is always exactly
14 bytes as confirmed by the firmware author.
*Non-standard payload from one node.* One node (DC:06:D9:40:7A:CB)
was broadcasting a 13-byte payload with an unknown key 0x05,
apparently flashed with a different firmware version. The exact size
filter excludes it automatically.
*@co2 sensor warm-up.* After each reboot, the @co2 sensor
returns 0xFFFFFFFF during its warm-up period. The gateway discards
these values silently to avoid publishing invalid data to the
database. A proper error reporting protocol would require agreement
across all components — gateway, database and UI — which was not
defined at specification time and is out of scope for this version.
*Network firewall.* The school network blocks outbound connections
on port 8883. The broker was reconfigured to also accept connections
on port 80 as a workaround.
*@mqtt disconnection.* In production, the gateway occasionally lost its
connection to the broker due to network instability. The most common
error cases — disconnection, publish failure and failed initial
connection — are now detected and trigger os.exit(1), allowing
systemd to restart the gateway automatically within 10 seconds.
This crash-and-restart approach is a pragmatic solution — a proper
implementation would handle reconnection logic explicitly without
crashing, but this was sufficient for the current version and was
validated overnight with two automatic recoveries.
=== @ble channel hopping and deduplication
@ble advertising uses three dedicated channels (37, 38 and 39). When
a node broadcasts, it sends the same packet on all three channels
in quick succession. The BleakScanner can capture the same packet
on multiple channels within the same window, resulting in the same
measurement being published multiple times to the broker.
To address this, a deduplication mechanism was implemented — the
gateway maintains a cache of the last publication timestamp per MAC
address and ignores packets from the same node received within a
10-second window. This value was defined during a team meeting and
is sufficient to filter duplicates from the same broadcast frame
without missing the next one, which arrives at minimum 2 minutes
later.
=== Remote debugging via Tailscale
Tailscale was installed on the Pi and on the developers' machines to
allow @ssh access from anywhere. This was essential during the custom firmware integration phase — monitoring logs and pushing code updates remotely
made the debugging loop much faster.
#pagebreak()
== Database & API <sec:impl:db>
#include "database/implementation.typ"
#pagebreak()
== User interface <sec:impl:ui>
=== Overview
The UI is a single-page Angular application providing real-time air quality monitoring for building rooms. It consists of two main views: an interactive floor-plan map and a per-room detail page, both backed by a @rest @api served by a Go gateway that reads from InfluxDB.
=== Technology choices
_Why Angular over React, Vue and Svelte?_
Four front-end frameworks were considered: React, Vue 3, Svelte, and Angular.
*React* is the most widely adopted option and has the largest ecosystem. However, it is a library rather than a framework: routing, @http, forms, and dependency injection all require third-party packages, which increases maintenance burden in a team project.
*Vue 3* offers a gentler learning curve and a Composition @api close in spirit to Angular Signals. It was a serious candidate, but its dependency injection model is weaker and its TypeScript support less strict than Angular's by default.
*Svelte* compiles components to vanilla JavaScript, producing very small bundles. However, it lacks a built-in @http client, a mature DI system, and established patterns for large-scale reactive data flows.
*Angular* was chosen for the following reasons:
*• Built-in @http:short client with interceptors* : Basic Auth injection and error handling are first-class features, not library choices.
*• Dependency injection* : Services are singletons by default (providedIn: 'root'), making shared state and polling logic straightforward to centralise.
*• Strong TypeScript integration* : The compiler catches type mismatches between @api responses and component models at build time, which is valuable given that the backend response format (raw InfluxDB rows) required explicit mapping.
*• Signals* : Reactive local state without the verbosity of RxJS BehaviorSubject for simple derived values, while RxJS is retained for asynchronous streams (@http polling).
*• Team familiarity* : Prior exposure to Angular reduces framework-specific overhead and keeps focus on domain logic.
The main trade-off is bundle size and framework verbosity: Angular produces larger initial bundles than Svelte or Vue and requires more boilerplate for simple components. For a real-time monitoring dashboard served over a local network, this was considered acceptable.
=== Interactive Room Map (room-map)
#let room-with-data = [
#figure(
image("../resources/img/ui_images/dashbord_room_with_data.png", width: 100%),
caption: [Room with data],
alt: "Full dashboard view: floor plan with CO₂ badges, legend sidebar visible, with room data and different CO₂ colors"
) <fig:room-with-data>
]
#room-with-data
As illustrated in @fig:room-with-data the floor plan is a static @svg file served as a public asset and injected via innerHTML after being fetched over HTTP. A separate @svg overlay, sharing the same viewBox, renders @co2 badges positioned on each rooms geometric center.
*Pan & zoom* is implemented with pointer events and CSS transform: translate / scale on a canvas <div>. Touch support is enabled via touch-action: none. The wheel event listener is registered outside Angulars zone (NgZone.runOutsideAngular) to avoid triggering change detection on every scroll tick. A fitToView() call is deferred 400 ms after ngAfterViewInit to ensure the @svg dimensions are known before computing the initial scale.
*Polling *is set to 30 seconds, fetching /api/v1/rooms/:id/current for each room with a sensor in parallel via forkJoin. The last-updated timestamp reflects the moment the @http response is received (browser local time), not the sensor timestamp.
=== Room Detail Page (room-details-panel)
#let room-details-panel = [
#figure(
image("../resources/img/ui_images/detail_panel.png", width: 100%),
caption: [Room details panel],
alt: "Detail page of a room with real sensor data: CO₂ hero card, metric cards, and history table with several rows."
) <fig:room-details-panel>
]
#room-details-panel
As illustrated in @fig:room-details-panel the page is split into a fixed-height header, a left panel showing current metrics (@co2 hero card, temperature, humidity, window state), and a right panel showing a paginated 24-hour history table.
A single combineLatest pipeline, gated by takeUntilDestroyed, merges three concurrent streams: room metadata, latest reading (polled every 15 s), and history (polled every 15 s). This avoids multiple independent subscriptions and ensures all three values are updated atomically on each cycle.
When the @api returns no data, the UI displays explicit messages rather than leaving panels empty, preventing any ambiguity between missing data and a zero value.
@co2 levels are determined by thresholds defined in @co2\-levels.config.ts, shared across the map badges, the detail page, and aligned with the Go gateway and the Teams notification service.
=== REST Service Integration
The Go @api exposes endpoints under /api/v1/ with Basic Auth and a CORS policy restricted to \*.e.kb28.ch. Two adaptations were required:
1. *Angular dev proxy* (proxy.conf.json) : Forwards /api requests from localhost:4200 to https://api.db.e.kb28.ch, bypassing the browser CORS restriction during development.
2. *Response mapping* : The Go @api returns raw InfluxDB rows (co2_ppm, temp, window: bool, …). A toSensorReading() function in SensorService normalises these fields into the apps SensorReading model before they reach any component.
A functional basicAuthInterceptor is registered via provideHttpClient(withInterceptors([…])). It is a no-op when credentials are empty, allowing unauthenticated development against a server with auth disabled.
=== CI/CD Pipeline
Two GitHub Actions workflows cover the UI. The first (`angular-ci.yml`) runs on every pull request targeting `main` and acts as the quality gate. The second (`ui.yml`) extends this into a full build-and-deploy pipeline on every push to `main`.
==== Quality gate (pull-request checks)
The `angular-ci.yml` workflow runs on Node 20, scoped to the `ui/` path, and executes the following steps in order:
*• TypeScript compilation* : `tsc --noEmit` catches type errors independently of the build step.
*• Lint* : ESLint enforces code style rules across all source files.
*• Prettier check* : `format:check` fails the build if any file is not formatted, keeping the diff clean in reviews.
*• Dev and production builds* : Both configurations are built to catch environment-specific issues early (e.g. a missing prod-only environment field).
*• Unit tests* : Karma runs headless in ChromeHeadless with code coverage enabled. The coverage report is uploaded as a GitHub Actions artefact (7-day retention).
==== Build-and-deploy pipeline (main branch)
As illustrated in @fig:ci-cd, the `ui.yml` workflow is structured as three sequential jobs gated by `needs`:
*1. Build & test (`ci`)* — identical quality checks to the PR gate, plus credential injection: a `sed` pass replaces `__API_USERNAME__` and `__API_PASSWORD__` in `environment.prod.ts` with values from GitHub Secrets immediately before the production build. The plaintext credentials never touch the source tree or the container image filesystem.
*2. Docker build & push (`docker`)* — only runs when the `ci` job succeeds and the branch is `main`. The image is built from `ui/` and pushed to GitHub Container Registry (`ghcr.io`) under two tags: a commit-SHA tag for traceability and a `latest` tag for the deploy step to reference. The image name is normalised to lowercase to satisfy Docker Hub naming rules.
*3. Deploy (`deploy`)* — connects to the physical server over @ssh using a certificate-authenticated key pair stored as GitHub Secrets. The deploy script pulls the exact SHA-tagged image, stops and removes the previous container, and starts a new one with `--restart unless-stopped` on port 80. The private key is written to a temporary file and deleted immediately after the @ssh session closes.
#let ci-cd = [
#figure(
image("../resources/img/ui_images/ci_cd.png"),
caption: [CI/CD Pipeline],
alt: "CI/CD Pipeline Execution"
) <fig:ci-cd>
]
#ci-cd
==== Secret management
Four secrets are required: `API_USERNAME`, `API_PASSWORD` (injected into the production build), `SSH_PRIVATE_KEY` + `SSH_CERTIFICATE` (deploy authentication), `SSH_HOST`, `SSH_PORT`, and `SSH_USER` (server coordinates). None of these values appear in any committed file; the dev environment uses empty strings and proxies requests locally via `proxy.conf.json`.
=== Notification
Spring Boot 3.3.5 microservice (Java 17) that polls @co2 sensor data from the
backend @api and sends Telegram alerts when air quality degrades as illustrated in @fig:notification.
#let notification = [
#figure(
image("../resources/img/ui_images/notification.png", width: 90%),
caption: [Telegram notification],
alt: "Telegram alert"
) <fig:notification>
]
#notification
=== Application Security
==== Secret Management
*Zero secrets in source code.* All sensitive values are injected exclusively
via environment variables never hardcoded, never logged.
*FindSecBugs caught* `DM_DEFAULT_ENCODING` on `getBytes()` during SAST
integration fixed immediately by switching to `getBytes(StandardCharsets.UTF_8)`.
==== STRIDE Threat Model
The notification microservice was analysed using the STRIDE framework,
identifying five threat categories and their corresponding controls,
as summarised in @tab:threat_model.
#figure(
table(
columns: (auto, 1fr, 1fr),
stroke: 0.5pt,
[*Threat*], [*Scenario*], [*Control*],
[Spoofing], [Impersonating the @co2 backend], [Basic Auth + URL injected via GitHub Secret],
[Tampering], [Altering sensor readings], [Read-only, service never writes to the API],
[Info. Disclosure], [Leaking Telegram bot token], [Env vars only, masked in CI logs],
[Denial of Service], [Telegram alert spam], [Deduplication via `ConcurrentHashMap`],
[Elevation of Privilege], [Unauthorized channel access], [Private channel, bot is sole admin],
),
caption: [STRIDE threat model for the notification microservice],
)<tab:threat_model>
As shown in @tab:threat_model, each identified threat is mitigated at
the service level without relying on external infrastructure. Spoofing
of the CO₂ backend is prevented by Basic Auth with credentials injected
via GitHub Secrets, never hardcoded. Tampering is structurally impossible
as the service operates in read-only mode against the API. Token
disclosure is avoided by confining the Telegram bot token to environment
variables, masked in CI logs. Alert flooding is controlled by a
`ConcurrentHashMap`-based deduplication mechanism that suppresses repeated
notifications for the same condition. Finally, unauthorised channel access
is prevented by restricting the Telegram channel to a private scope with
the bot as sole administrator.
#pagebreak()
==== Security Testing Pipeline
Deployment is gated behind three blocking security jobs, as shown in
@tab:security_test_pipeline. If any fails, the Docker image is not
built and the service is not deployed.
#figure(
table(
columns: (auto, auto, auto, 1fr),
stroke: 0.5pt,
[*Job*], [*Tool*], [*Blocking*], [*Checks*],
[`ci`], [SpotBugs + FindSecBugs], [Yes], [Java static vulns, encoding, crypto],
[`sast-codeql`], [CodeQL], [Yes], [Injection, SSRF, unsafe deserialization],
[`dast`], [OWASP ZAP], [Yes], [HTTP headers, runtime misconfigs],
[`dependency-check`], [OWASP Dep. Check], [No], [CVEs in third-party libraries],
),
caption: [Security test pipeline for the notification microservice],
)<tab:security_test_pipeline>
As detailed in @tab:security_test_pipeline, three jobs are strictly
blocking: `ci` runs SpotBugs with the FindSecBugs plugin to catch
Java-specific vulnerabilities such as encoding issues and unsafe
cryptographic usage; `sast-codeql` performs semantic analysis to detect
injection flaws, SSRF, and unsafe deserialization; and `dast` runs an
OWASP ZAP baseline scan against the running service to identify HTTP
header misconfigurations and runtime exposure. The `dependency-check`
job is non-blocking but reports CVEs in third-party libraries for
visibility without halting delivery.
== Physical model
The physical models follows the design.
== Conclusion
The integration of each unit has been eased thanks to the corresponding interfaces definition beforehand.\
Said integration suffered some minor issues, all of them small enough to be managed in a small amount of hours of work.
While some issues remains, as listed in @sec:validation, the complete application can be demonstrated.
Furthermore, the implementation of the gateway and database enables efficient modification of the deployment on site.\
For instance, moving a node requires only to move it physically and modify the mapping file accordingly.
]