How to work interactively on the CI Machines
When developing with GEOS, developers may sometimes face compilation errors or test failures that only manifest in specific Continuous Integration (CI) builds. To effectively troubleshoot these issues, it’s advisable to debug directly in the target environment. The preferred method involves using Docker to locally replicate the problematic image. However, for those without Docker access on their machines, (or for cases inherently related to the CI configuration), an alternative is to establish a connection to the CI machines. Here are the steps to do so:
Step 1: Adding a GHA to establish a connection
First, as much as you can, try to reduce the number of jobs you’re triggering by commenting out the configurations you do not require for your debugging. Then in your branch, add the following GHA step to the .github/build_and_test.yml (see full documentation of the action here <https://github.com/lhotari/action-upterm>_).
- name: ssh
uses: lhotari/action-upterm@v1
with:
## limits ssh access and adds the ssh public key for the user which triggered the workflow
limit-access-to-actor: true
## limits ssh access and adds the ssh public keys of the listed GitHub users
limit-access-to-users: GitHubLogin
The action should be added after whichever step triggers an error. In case of a build failure it is best to add the action after the build, test and deploy step. It is also important to prevent the job to exit upon failure. For instance, it is suggested to comment the following lines in the build, test and deploy step.
set -e
exit ${EXIT_STATUS}
You can now commit the changes and push them to your remote branch.
Step 2: Inspect the CI and grab server address
Run lhotari/action-upterm@v1
upterm
Auto-generating ~/.ssh/known_hosts by attempting connection to uptermd.upterm.dev
Pseudo-terminal will not be allocated because stdin is not a terminal.
Warning: Permanently added 'uptermd.upterm.dev' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey).
Adding actor "GitHubLogin" to allowed users.
Fetching SSH keys registered with GitHub profiles: GitHubLogin
Fetched 2 ssh public keys
Creating a new session. Connecting to upterm server ssh://uptermd.upterm.dev:22
Created new session successfully
Entering main loop
=== Q16OBOFBLODJVA3TRXPL
Command: tmux new -s upterm -x 132 -y 43
Force Command: tmux attach -t upterm
Host: ssh://uptermd.upterm.dev:22
SSH Session: ssh Q16oBofblOdjVa3TrXPl:ZTc4NGUxMWRiMjI5MDgudm0udXB0ZXJtLmludGVybmFsOjIyMjI=@uptermd.upterm.dev
Step 3: Connect to the machine via ssh
You can now open a terminal in your own machine and sshe to the upterm server, e.g.,
ssh Q16oBofblOdjVa3TrXPl:ZTc4NGUxMWRiMjI5MDgudm0udXB0ZXJtLmludGVybmFsOjIyMjI=@uptermd.upterm.dev
Step 4: Run the docker container interactively
Once you are connected to the machine it is convenient to follow these steps to interactively run the docker container:
docker ps -a
The id of the existing docker container will be displayed and you can use it to commit the container.
docker commit <id> debug_image
and then run it interactively, e.g.
docker run -it --volume=/home/runner/work/GEOS/GEOS:/tmp/geos -e ENABLE_HYPRE=ON -e ENABLE_HYPRE_DEVICE=CUDA -e ENABLE_TRILINOS=OFF --cap-add=SYS_PTRACE --entrypoint /bin/bash debug_image
Step 5: Cancel the workflow
Once you are done, do not forget to cancel the workflow!