CI-tron Components

The executor service

The main service in a CI-tron gateway is called the executor. It is the service that coordinates different services to enable time-sharing test machines, AKA DUTs.

This service can be interacted with using executorctl, our client, and/or our REST API.

The executor coordinates the different states for the DUTs. Here is a flow of all the states a DUT can be in:

        graph TD
    subgraph "DUT state machine"
    START --> |is retired?| RETIRED
    START --> |is marked ready for service?| QUICK_CHECK
    QUICK_CHECK --> |Success| IDLE
    QUICK_CHECK --> |Failed| TRAINING
    TRAINING --> |Failed| TRAINING
    TRAINING --> |Success| IDLE
    RETIRED --> |Activate| QUICK_CHECK
    IDLE --> |Retire| RETIRED
    IDLE --> |Job received| QUEUED
    QUEUED --> RUNNING
    RUNNING --> IDLE
end
    

DUT state machine

Let’s see what every state of a DUT means:
  • IDLE: The device is available (but powered down to save energy), waiting for a job.

  • TRAINING: The device is being tested for boot reliability (20 rounds by default).

  • RETIRED: The device is undergoing maintenance, and cannot accept jobs.

  • QUICK_CHECK: The device is verifying that its current configuration matches what is described in the database.

  • QUEUED: The device has been chosen to execute a job, but the executor isn’t ready just yet (expected to last <1s)

  • RUNNING: The device is running a job.

Executor configuration

The executor service is configured through the use of environment variables. In a CI-tron gateway, the default configuration is stored at /etc/base_config.env, while user-provided overrides are usually located in /config/config.env.

Here are the relevant options to most deployment, usually set in /config/config.env:

  • FARM_NAME: Name of the test farm. Derived from the hostname when it ends with […]-gateway (the farm will be named with the prefix), otherwise mandatory. (sometimes mandatory, default: None)

  • EXECUTOR_REGISTRATION_JOB: Local path to the registration job (default: $package_dir/job_templates/register.yml.j2)

  • EXECUTOR_BOOTLOOP_JOB: Local path to the registration job (default: $package_dir/job_templates/bootloop.yml.j2)

  • SERGENT_HARTMAN_BOOT_COUNT: How many rounds of testing should be used to qualify a test machine. Setting this value to 0, a DUT will be considered ready for service as soon as registration occurs. A negative value disables registration/training/quick_check altogether (default: 100)

  • SERGENT_HARTMAN_QUALIFYING_BOOT_COUNT: How many successful rounds of testing should be used to qualify a test machine (default: 100)

  • SERGENT_HARTMAN_REGISTRATION_RETRIAL_DELAY: How many seconds should be waited after an unsuccessful registration attempt before trying another one (default: 120)

  • SERGENT_HARTMAN_QUICK_CHECK: Should the DUTs be booted/checked after a reboot or after being activated after being retired (default: enabled)

  • IMAGESTORE_PATH: The base path where the executor should store the container images it downloads when asked by storage:imagestore:. This folder should contain a config.yml file that documents how clients are supposed to get access to the image store. See Imagestore config.yml’s format for details about its format. (default: $TMP/imagestores/)

  • IMAGESTORE_PULL_CMD: The command to execute to pull the container in the image store (default: podman –root ${imgstore} pull –tls-verify=${tls_verify} –platform=${platform} ${image_name})

  • IMAGESTORE_IMAGE_EXISTS_CMD: The command to execute to check if an image exists in the store (default: podman –root ${imgstore} image exists ${image_name})

The following config options are partially auto-generated, and set via /config/minio/minio.env:

  • MINIO_URL: URL to the local minio service, accessible both locally and by test machines (default: http://ci-gateway:9000`)

  • MINIO_ROOT_USER: Admin username for the local minio service (default minioadmin)

  • MINIO_ROOT_PASSWORD: Admin password for the local minio service (default minio-root-password)

And here are the lower-level options:

  • BOOTS_DISABLE_SERVERS: Set to a non-empty value to disable netbooting services (DHCP and TFTP). (default: None)

  • BOOTS_DHCP_IPv4_SOCKET_NAME: Name of the socket to use for the DHCP server, as set by systemd’s socket activation unit using FileDescriptorName= (default: dhcp_ipv4)

  • BOOTS_TFTP_IPv4_SOCKET_NAME: Name of the socket to use for the TFTP server, as set by systemd’s socket activation unit using FileDescriptorName= (default: tftp_ipv4)

  • BOOTS_DB_USER_FILE: Path to the file overriding the default boots_db.yml.j2 file (default: /config/boots_db.yml.j2)

  • CONSOLE_PATTERN_DEFAULT_MACHINE_UNFIT_FOR_SERVICE_REGEX: Automatically tag a DUT as unfit for service if it generates a line matched by this regular expression (default: None)

  • EXECUTOR_HOST: Binding address for the HTTP service (default: 0.0.0.0)

  • EXECUTOR_PORT: Binding port for the HTTP service (default: 80)

  • EXECUTOR_HTTP_IPv4_SOCKET_NAME: Name of the socket to use for the HTTP server, as set by systemd’s socket activation unit using FileDescriptorName= (default: http_ipv4). Overrides EXECUTOR_PORT/EXECUTOR_PORT.

  • EXECUTOR_URL: HTTP url of the executor service, reachable locally and from the test machines (default: http://ci-gateway)

  • EXECUTOR_ARTIFACT_CACHE_ROOT: Folder to use as a cache for the kernel/initrd artifacts used by the jobs (recommended, default: None)

  • EXECUTOR_VPDU_ENDPOINT: Automatically add a virtual PDU for local testing (format: host:port, default: None)

  • GITLAB_CONF_FILE: Path to the gitlab runner configuration file, which will be overridden as new test machines are added to the farm (default: /etc/gitlab-runner/config.toml)

  • GITLAB_CONF_TEMPLATE_FILE: Template to use for the creation of the gitlab runner configuration file (default: $package_dir/templates/gitlab_runner_config.toml.j2)

  • GITLAB_ALLOW_INSECURE: Allow MARS_DB_FILE to reference a GitLab instance using an http://` URL rather than ``https:// (default: false)

  • IMAGESTORE_PATH: The base path where the executor should store the container images it downloads when asked by storage:imagestore:. This folder should contain a config.yml file that documents how clients are supposed to get access to the image store. See Imagestore config.yml’s format for details about its format. (default: $TMP/imagestores/)

  • IMAGESTORE_PULL_CMD: The command to execute to pull the container in the image store (default: podman –root ${imgstore} pull –tls-verify=${tls_verify} –platform=${platform} ${image_name})

  • IMAGESTORE_IMAGE_EXISTS_CMD: The command to execute to check if an image exists in the store (default: podman –root ${imgstore} image exists ${image_name})

  • MARS_DB_FILE: Path to the database (default: /config/mars_db.yaml)

  • MINIO_ADMIN_ALIAS: Alias set up by the executor to refer to the minio instanced specified by MINIO_URL, MINIO_ROOT_USER, and MINIO_ROOT_PASSWORD (default: local)

  • PRIVATE_INTERFACE: Network interface connected to the DUTs’ network (default: private)

  • SALAD_URL: URL to the salad service (default: http://ci-gateway:8005)

Imagestore config.yml’s format

mount:                                     # List of mount points the DUTs can mount the image store
  - type: nfs                              # The name of the filesystem
    src: "ci-gateway:/imagestores"         # The source of the filesystem
    opts:                                  # The list of mounting options that should be set
      - vers=4.2
      - ro
      - addr=10.42.0.1
  - ...                                    # Add more mounting methods here

Registry config.yml’s format

The goal of this configuration file is to allow job descriptions to redirect accesses to container registries to a local proxy, thus saving bandwidth and reducing execution time by using the following function: {{ registry.to_local_proxy("$IMAGE_NAME") }}.

images:
  - match: ^quay.io            # Regular expression that will match the part of the image
    replace: ci-gateway:8100   # String to replace the match with
  - ...                        # Add more replacement rules here

Note

This file is meant to be auto-generated by the registryd service, based on the content of the registry description file found in /config/registries/.

Check out /config/registries/8100_quay.yml.j2.example for more details.

PDU module

# Power Delivery Unit (PDU) Module

The goal of the project is to create a library that can speak with as many PDU
models as possible without the need for configuration files. The configuration
is meant to be stored externally, and merely passed to this helper library
during the instanciation of a PDU object.

The library:

 * Exposes the list of drivers supported / models
 * Instanciate PDUs
 * List the available ports of a PDU
 * Sets / Gets the state of of ports

## Supported PDUs

Any SNMP-enabled PDU is supported by this project, but the following models have
pre-baked configuration to make things easier:

 * APC's Masterswitch: `apc_masterswitch`
 * Cyberpower's PDU41004: `cyberpower_pdu41004`
 * Cyberpower's pdu15swhviec12atnet:`cyberpower_pdu15swhviec12atnet`
 * Devantech Ltd's network based relays (`ETH` and `WIFI` series only): `devantech`
 * KernelChip's network based relays (Laurent-2,5,112,128): `kernelchip`
 * Shelly devices: `shelly`
 * Tasmota devices: `tasmota`
 * TP-Link's Smart Switch with Power-Over-Ethernet: `snmp_poe_tplink`
 * A generic Power-Over-Ethernet switch: `snmp_poe`
 * A generic PPPS-compatible USB hub driver: `usbhub` [^usbhub]
 * A generic SNMP driver: `snmp`
 * A virtual PDU: `vpdu`
 * A dummy PDU: `dummy`

See [Instanciating an SNMP-enabled PDU](#instanciating-an-snmp-enabled-pdu) for more
information on how to set up your PDU.

[^usbhub]:

    * Requires your user to be able to write to `$devpath/*-port*/disable`. Sample udev rule:
        * `SUBSYSTEM=="usb", DRIVER=="hub", ACTION=="add" RUN+="/bin/sh -c \"chown -f root:plugdev $sys$devpath/*-port*/disable || true\"" RUN+="/bin/sh -c \"chmod -f 664 $sys$devpath/*-port*/disable || true\""`
    * Supported devices: See uhubctl's [compatibility list](https://github.com/mvp/uhubctl#compatible-usb-hubs)
    * Supported hub selection modes:
      * By serial ID:
        * `serial`: Unique identifier that can be used to find the hub, regardless of where it is plugged. Ignored when `location` is set.
      * By controller path, USB bus address, and vendor/product IDs:
        * `controller`: Sysfs path to the USB controller the hub is connected to (eg /sys/devices/pci0000:00/0000:00:14.0)
        * `devices`: A list of USB devices that make up the hub (usually one for USB2 and one for USB3). Expected format:
          * `devpath`: A string of dot-separated integers representing the USB port address to be found in any of the USB controller's busses
          * `idVendor`: The USB vendor ID of the USB device
          * `idProduct`: The USB product ID of the USB device
          * `maxchild`: The number of ports of the USB hub (optional)
          * `speed`: The speed of the connection in Mbps to the USB hub (optional)
        * `on_unknown_port_state`: Specify what to do when the devices have mixed states (default: `restore_last_state`)
          * `do_nothing`: Do not take corrective actions, consider the state as UNKNOWN
          * `turn_off`: Turn off the port
          * `turn_on`: Turn on the port
          * `restore_last_state`: Restore the last state seen, or turn the port off if no state was known

## Gotchas

Be warned that the current interface is *not* stable just yet.

## Instanciating an SNMP-enabled PDU

### Already-supported PDUs

If your PDU is in the supported list, then you are in luck and the only
information needed from you will be the model name, and the hostname of the
device:

    pdu = PDU.create(model="<<model>>", config={"hostname": "<<ip_address>>"})

or

    from pdu.drivers.apc import ApcMasterswitchPDU
    pdu = ApcMasterswitchPDU(config={"hostname": "<<ip_address>>"})

### Supported parameters

Here is the list of parameters that are supported for SNMP-based PDUs:

 * `hostname`: hostname or IP address of SNMP agent
 * `version`: the SNMP version to use; 1, 2 (equivalent to 2c) or 3
 * `community`: SNMP community string (used for both R/W) (v1 & v2)
 * `security_username`: security name (v3)
 * `privacy_protocol`: privacy protocol (`DES`, `SHA`, or `None`) (v3)
 * `privacy_password`: privacy passphrase (v3)
 * `auth_protocol`: authentication protocol (`MD5`, `SHA`, or `None`) (v3)
 * `auth_password`: authentication passphrase (v3)
 * `context_engine_id`: context engine ID, will be probed if not supplied (v3)
 * `security_engine_id`: security engine ID, will be probed if not supplied (v3)

As an example, use the following configuration if you want to use SNMPv3 rather
than the SNMPv1 protocol by default:

    from pdu.drivers.apc import ApcMasterswitchPDU
    pdu = ApcMasterswitchPDU(config={"hostname": "<<ip_address>>", version=3, security_username="<<username>>",
                                     auth_protocol="SHA", auth_password="<<password>>",
                                     privacy_protocol="AES", privacy_password="<<password>>"})

### Other SNMP-enabled PDUs

If your PDU model is currently-unknown, you will need to use the default SNMP
driver which will require a lot more information from you. Here is an example
for the `apc_masterswitch`:

    pdu = PDU.create(model="snmp", config={
            "hostname": "10.0.0.42",
            "outlet_labels": "1.3.6.1.4.1.318.1.1.4.4.2.1.4",
            "outlet_status": "1.3.6.1.4.1.318.1.1.4.4.2.1.3",
            "state_mapping": { "ON": 1, "OFF": 2, "REBOOT": 3}
    })

or, for the `cyberpower_pdu15swhviec12atnet` PDU:

    from pdu.drivers.snmp import ManualSnmpPDU

    pdu = ManualSnmpPDU(config={
            "hostname": "10.0.0.42",
            "outlet_labels": "1.3.6.1.4.1.3808.1.1.5.6.3.1.2",
            "outlet_status": "1.3.6.1.4.1.3808.1.1.5.6.3.1.3",
            "outlet_ctrl": "1.3.6.1.4.1.3808.1.1.5.6.5.1.3",
            "state_mapping": { "ON": 2, "OFF": 3, "REBOOT": 4}
            "inverse_state_mapping": { 1: "ON", 2: "OFF", 3: "REBOOT" }
    })

To figure out which values you need to set, I suggest you use a combination of
snmpwalk and [this online MIB browser](https://mibs.observium.org/) to find the
following fields and values:

 * `outlet_labels`: OID that contains the ports' labels. In the case of APC's masterswitch,
    the address is `1.3.6.1.4.1.318.1.1.4.4.2.1.4`.
 * `outlet_status`: OID that contains the ports' statuses. In the case of APC's masterswitch,
    the address is `1.3.6.1.4.1.318.1.1.4.4.2.1.3`.
 * `outlet_ctrl`: OID that allows setting the ports' statuses. This setting is optional and
    will default to `outlet_status` if missing.
 * `state_mapping`: Table specifying which integer number to use when trying to set a particular port state.
    Format: `{ "ON": 2, "OFF": 3, "REBOOT": 4}`
 * `inverse_state_mapping`: Table specifying which integer number corresponds to which status when reading a port state.
    Format: `{ 1: "ON", 2: "OFF", 3: "REBOOT" }`

Try these values, and change them accordingly!

Once you have collected all this information, feel free to
[open an issue](https://gitlab.freedesktop.org/gfx-ci/ci-tron/-/issues/new)
to ask us to add this information to the list of drivers. Make sure to include
the curl command line you used to register your PDU!

## Frequently Asked Questions

### Why not use pdudaemon?

We initially wanted to use [pdudaemon](https://github.com/pdudaemon/pdudaemon),
but since it does not allow reading back the state from the PDU, it isn't
possible to make sure that the communication with the PDU is working which
reduces the reliability and debuggability of the system.

Additionally, pdudaemon requires a configuration file, which is contrary to the
objective of the project to be as stateless as possible and leave configuration
outside of the project. The configuration could have been auto-generated
on-the-fly but since there is no way to check if the communication with the PDU
is working, it would make for a terrible interface for users.

Finally, most of the drivers in the project are using a telnet interface rather
than SNMP, which makes them brittle and stateful. See for
[yourself](https://github.com/pdudaemon/pdudaemon/blob/master/pdudaemon/drivers/apc7952.py#L65).

MarsDB

MarsDB is the database for all the runtime data of the CI instance:

  • List of PDUs connected

  • List of test machines

  • List of Gitlab instances where to expose the test machines

Its location is set using the MARS_DB_FILE environment variable, and is live-editable. This means you can edit the file directly and changes will be reflected instantly in the executor.

Machines can be added to MarsDB by POSTing or PUTing to the /api/v1/dut/ REST endpoint. Fields in the REST API match the ones found in the database, but some fields cannot be set at the creation of the machine for safety reasons as we want to enforce a separation between fields that are meant to be auto-generated and the ones that are meant to be manually-configured (denoted by the (MANUAL) tag in the DB file description below).

The most prominent manual fields are pdu and pdu_port, which means a newly-added machine won’t be usable until manually associated to its PDU port by manually editing the DB file. An easier solution to enroll a new machine is to use the discovery process by POSTing to the /api/v1/dut/discover endpoint the pdu and pdu_port_id fields. This will initiate the discovery sequence where the executor will turn this port ON, wait for the machine to register itself, then automatically add associate the machine to the PDU port specified in the discovery process. Using the discovery process allows a machine to go through the TRAINING process without further manual intervention.

Here is an annotated sample file, where AUTO means you should not be modifying this value (and all children of it) while MANUAL means that you are expected to set these values by editing the DB file manually, or through the REST interface. All the other values should be machine-generated, for example using the machine-registration container:

pdus:                                           # List of all the power delivery units (MANUAL)
  APC:                                          # Name of the PDU
    driver: apc_masterswitch                    # The [driver of your PDU](pdu/README.md)
    config:                                     # The configuration of the driver (driver-dependent)
      hostname: 10.0.0.2
  VPDU:                                         # A virtual PDU, spawning virtual machines
    driver: vpdu
    config:
      hostname: localhost:9191
    reserved_port_ids: []                       # List of reserved ports in the PDU where no virtual DUT can be added (REST)
duts:                                           # List of all the test machines
  de:ad:be:ef:ca:fe:                            # MAC address of the machine
    base_name: gfx9                             # Most significant characteristic of the machine. Basis of the auto-generated name
    ip_address: 192.168.0.42                    # IP address of the machine
    tags:                                       # List of tags representing the machine
    - amdgpu:architecture:GCN5.1
    - amdgpu:family:RV
    - amdgpu:codename:RENOIR
    - amdgpu:gfxversion:gfx9
    - amdgpu:APU
    - amdgpu:pciid:0x1002:0x1636
    manual_tags:                                # List of tags that cannot be automatically generated (MANUAL)
    - freesync_display
    local_tty_device: ttyUSB0                   # Test machine's serial port to talk to the gateway
    gitlab:                                     # List of GitLab instances to expose this runner on
      freedesktop:                              # Parameters for the `freedesktop` GitLab instance
        token: <token>                          # Token given by the registration process (AUTO)
        exposed: true                           # Should this machine be exposed on `freedesktop`? (MANUAL)
        runner_id: 4242                         # GitLab's runner ID associated to this machine
        acl:                                    # Access control list for this DUT on that GitLab instance (see explanations in the sub-section below)
          deny:
            users:                              # Username of the user creating the job
              - bad_user
          allow:
            projects:                           # Full path (relative to the instance) of the project that created the job
              - gfx-ci/ci-tron
            projects_in_groups:                 # Matches if the project is in of one of these groups
              - gfx-ci
    pdu: APC                                    # Name of the PDU to contact to turn ON/OFF this machine (MANUAL/REST)
    pdu_port_id: 1                              # ID of the port where the machine is connected (MANUAL/REST)
    pdu_off_delay: 30                           # How long should the PDU port be off when rebooting the machine? (REST)
    ready_for_service: true                     # The machine has been tested and can now be used by users (AUTO/REST)
    is_retired: false                           # The user specified that the machine is no longer in use
    first_seen: 2021-12-22 16:57:08.146275      # When was the machine first seen in CI (AUTO)
    comment: null                               # Field used to add a quick note about a DUT for admins (MANUAL/REST)
gitlab:                                         # Configuration of anything related to exposing the machines on GitLab (MANUAL)
  freedesktop:                                  # Name of the gitlab instance
    url: https://gitlab.freedesktop.org/        # URL of the instance
    registration_token: <token>                 # API token with the `create_runner` scope. For instance runners, you also need `admin_mode`. For project or group tokens, `role` must be `Maintainer` or `Owner`.
    runner_type: (instance|group|project)_type  # Where you want to register your runner
    group_id: <group id>                        # ID of the group you want to register the runners in. Only needed for group_type runners
    project_id: <project id>                    # ID of the project you want to register the runners in. Only needed for project_type runners
    access_token: <token>                       # A `read_api` or a `manage_runner` token, used to verify consistency between the local and gitlab state. For project or group tokens, `role` must be `Maintainer` or `Owner`.
    expose_runners: true                        # Expose the test machines on this instance? Handy for quickly disabling all machines
    maximum_timeout: 21600                      # Maximum timeout allowed for any job running on our test machines
    acl:                                        # Access control list for the DUT & gateway runners on this GitLab instance, when no ACL rule on the DUT/gateway runner matches (see explanations in the sub-section below)
      ...
    gateway_runner:                             # Expose a runner that will run locally, and not on test machines
      token: <token>                            # Token given by the registration process (AUTO)
      exposed: true                             # Should the gateway runner be exposed?
      runner_id: 4243                           # GitLab's runner ID associated to this machine
      acl:                                      # Access control list for the gateway runner (see explanations in the sub-section below)
        allow:                                  # At least one `allow` item must be defined to allow the gateway runner to be exposed
          users:
            - eric

Access Control Lists (ACL)

The acl: items in the example above can be a bit complex, so let’s look at them in details:

  • They are split in 2 levels: DUT/gateway ACL, and instance ACL. The former is more specific and is evaluated first, and the latter is only evaluated if no rule matched.

  • The deny list is evaluated before allow, so if a job would match both, it is rejected.

  • If no rule matches, the default decision depends on whether any rule was set at either level:

    • If there was no ACL rule set anywhere, you don’t want any restriction, so the default decision is to allow.

    • If an ACL rule was set, you care about access control, so the default decision is to deny.

Warning

Gateway runners can’t be exposed unless they define an ACL with at least one allow – typically, the farm admin(s).

Danger

Keep in mind that gateway runners are --privileged, so don’t give access to them to people you don’t trust!

See the podman documentation for more information.

Frequently asked questions

  • How do I move runners from one GitLab project to another?

There are currently no easy ways of doing so currently. The best solution is to call the following command line for every runner in MaRS DB:

$ curl -X DELETE "https://gitlab.example.com/api/v4/runners" --form "token=<token>"

The executor will periodically check the validity of the tokens, and upon seeing they got deleted, it will re-create them in the new project.

REST API

The executor includes a REST API with various endpoints available.

Endpoint /duts

Method: GET

Lists the available machines and their information (IP address, tags, …)

curl localhost:8000/api/v1/duts

Endpoint /dut/

Method: POST, PUT

Adds a new machine to MARS_DB_FILE, if there is a discovery process on-going it’ll use this data to set the PDU and port_id.

This endpoint is used from the machine_registration.py script.

Endpoint /dut/<machine_id>

Method: GET

Lists all the information of a selected machine. machine_id is the MAC Address.

curl localhost:8000/api/v1/dut/<machine_id>
curl localhost:8000/api/v1/dut/52:54:00:11:22:0a

Method: DELETE

Remove the machine from the database, and all its associated GitLab runner tokens.

curl -X DELETE localhost:8000/api/v1/dut/<machine_id>

Method: PATCH

Update one or more of the DUT’s editable fields:

  • comment (str): Specify a comment about the DUT meant for the farm admins

  • firmware_boot_time (float): Number of seconds needed by the DUT’s firmware to request the boot parameters after powering up

  • is_retired (bool): Tag the DUT as retired/active (see our DUT state machine:)

  • manual_tags (list[str]): Overwrite the manual tags

  • pdu_off_delay (float): Number of seconds needed to ensure the machine is fully off

  • ready_for_service (bool): Tag the DUT as ready for service (see our DUT state machine:)

curl -X PATCH localhost:8000/api/v1/dut/52:54:00:11:22:0a \
    -H 'Content-Type: application/json' \
    -d '{"pdu_off_delay": 10, "comment": "this is an example comment"}'

Endpoint /duts/<machine_id>/boot.ipxe

Method: GET

TODO: To be documented.

Endpoint /dut/<machine_id>/quick_check

Method: GET

Returns true if a quick check of the machine has been queued, false otherwise.

curl localhost:8000/api/v1/dut/<machine_id>/quick_check

Method: POST

Queue a quick check on the machine. No parameters are needed.

curl -X POST localhost:8000/api/v1/dut/<machine_id>/quick_check

Endpoint /dut/discover

Method: GET

Shows if there is a discovery process on-going and the data of this discovery: pdu, port_id and start date.

curl localhost:8000/api/v1/dut/discover

Method: POST

Launches a discovery process, it will boot the machine behind a given PDU/port_id and will put this data in discover_data to be used by the machine_registration.py script.

curl -X POST localhost:8000/api/v1/dut/discover \
    -H 'Content-Type: application/json' \
    -d '{"pdu": "VPDU", "port_id": '10'}'

If no machines show up, the discovery process will automatically timeout after 150 seconds by default. This value can be specified using the timeout parameter:

curl -X POST localhost:8000/api/v1/dut/discover \
    -H 'Content-Type: application/json' \
    -d '{"pdu": "VPDU", "port_id": '10', "timeout": '60'}'

Method: DELETE

Erases all the discovery data, discover_data will be emptied.

curl -X DELETE localhost:8000/api/v1/dut/discover

Endpoint /dut/<machine_id>/cancel_job

Method: POST

Cancel the jobs running in a machine. machine_id is the MAC Address.

curl -X POST localhost:8000/api/v1/dut/<machine_id>/cancel_job
curl -X POST localhost:8000/api/v1/dut/52:54:00:11:22:0a/cancel_job

Endpoint /pdus

Method: GET

Lists the available PDUs and the list of their port_ids with some information such as label or state.

curl localhost:8000/api/v1/pdus

Endpoint /pdu/<pdu_name>

Method: GET

Lists all the information of a selected PDU

curl localhost:8000/api/v1/pdu/<pdu_name>
curl localhost:8000/api/v1/pdu/VPDU

Endpoint /pdu/<pdu_name>/port/<port_id>

Method: GET

Lists the information of a port_id: label, min_off_time and state

curl localhost:8000/api/v1/pdu/<pdu_name>/port/<port_id>
curl localhost:8000/api/v1/pdu/VPDU/port/10

Method: PATCH

Turns a port OFF or ON.

curl -X PATCH localhost:8000/api/v1/pdu/VPDU/port/10 \
    -H 'Content-Type: application/json' \
    -d '{"state": "on"}'

Reserve or un-reserve a port. Use True to reserve, False to un-reserve.

curl -X PATCH localhost:8000/api/v1/pdu/VPDU/port/10 \
    -H 'Content-Type: application/json' \
    -d '{"reserved": True}'

Endpoint /full-state

Method: GET

Provides all the information from the endpoints /pdus, /duts, and /dut/discover in a single call.

Endpoint /jobs

Method: POST

Used to submit jobs. To be documented.

Executor client - executorctl

The executor client executorctl can be used to list the DUTs exposed through this CI farm, and run jobs on them.

The executor client can be found in git under executor/client and installed with pip.

$ executorctl run -t $machine_tag $/path/to/job/file

Examples of job that can be run under vivian can be found at job_templates

SALAD

SALAD provides automatic mapping between serial consoles and their attached
test machine. This is done with a service running by default on port 8100
and a REST interface running on port 8005.

The REST API provides the following endpoints:

    curl -s http://localhost:8005/api/v1/machine/

and

    curl -s http://localhost:8005/api/v1/machine/AA:BB:CC:DD:EE:FF

Where the host "localhost", the port 8005 and the mac address AA:BB:CC:DD:EE:FF
should be replaced by valid values.

GFX Info

# GFXInfo

Access the list of supported GPUs, from their codename to the environment they
are used in. The primary purpose is to generate a list of tags that can be used
to compare the results coming from machines running automated testing.

## How to

TODO

Machine Registration Container

The machine registration container is responsible for the following functions:

  • Registering new test machines:

    • Creating tags for the test machine without manual intervention, with the help of GFX Info;

    • Finding out which TTY device is connected to the gateway’s SALAD service;

  • Verifying that the state of the test machine matches the state found in MarsDB, including verifying that the serial console is talking to the gateway;

  • Verifying that the test machine has the wanted list of tags, and hiding the hardware that is not needed for testing (useful for multi-GPU setups).

Find out more about it in the source code.