From cabb9fca11e0babcae88f365ce669d5159bc3e67 Mon Sep 17 00:00:00 2001
From: dbischof90 <dbischof90@noreply.codeberg.org>
Date: Sat, 14 Oct 2023 17:42:40 +0000
Subject: [PATCH] admin: installation: remote data folder (#193)

Documentation proposal for remote data mounting, following up from https://codeberg.org/forgejo/forgejo/issues/1590

Co-authored-by: Daniel Bischof <daniel.bischof@protonmail.com>
Reviewed-on: https://codeberg.org/forgejo/docs/pulls/193
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: dbischof90 <dbischof90@noreply.codeberg.org>
Co-committed-by: dbischof90 <dbischof90@noreply.codeberg.org>
---
 docs/admin/installation.md | 145 +++++++++++++++++++++++++++++++------
 1 file changed, 124 insertions(+), 21 deletions(-)

diff --git a/docs/admin/installation.md b/docs/admin/installation.md
index 6e2e05ea..836f05a0 100644
--- a/docs/admin/installation.md
+++ b/docs/admin/installation.md
@@ -146,6 +146,109 @@ services:
 +      - ./postgres:/var/lib/postgresql/data
 ```
 
+### Hosting repository data on remote storage systems
+
+You might also mount the data and repository folders on a remote drive such as a
+network-attached storage system. While there are a multitude of possible solutions,
+we will focus on a somewhat minimal setup with NFS here and explain what
+measures have to be taken in general so that the administrators can adapt this to
+their individual setup.
+
+We begin to describe a possible setup and will try to highlight all important aspects which
+the administrator will have to consider if a different hosting environment is present.
+An important assumption for the Forgejo image to make is to own the folders it writes into
+and reads from. This is naturally an issue since file-system permissions are a machine-local
+concept and don't translate over the network easily.
+
+We assume that a server with the hostname `server` is accessible which has a folder `/respositories`
+shared via NFS. Append an entry to your `/etc/exports` like
+
+```shell
+[...]
+/repositories	*(rw,sync,all_squash,ec=sys,anonuid=1024,anongid=100)
+```
+
+Four aspects to consider:
+
+- The folder is mounted as `rw`, meaning clients can both read and write in the folder.
+- The folder is mounted as `sync`. This is NFS-specific but means that transactions block until they are finished. This is
+  not essential but increases the robustness against file corruption
+- The `all_squash` setting maps all file accesses to an anonymous user, meaning that both the files of a user with the UID of `1050`
+  and `1051` are mapped to a single `UID` on the server.
+- We set these anonymous (G/U)ID to explicit values on the server with `anonuid=1024,anongid=100`. Hence all files will be owned by
+  a user with the UID `1024`, belonging to a group `100`. Make sure the UID is available and a group with that ID is present.
+
+Effectively we are now able to write and create files and folders on the remote share. With the `all_squash` setting, we map
+all users to one user, hence all data writable by one user is writable by all users, implying all files have a `drwxrwxrwx`
+setting (abreviated "`0777` permissions"). We can also "fake-own" data, since all `chown` calls are now mapped to the anonymous user. This is an
+important behaviour.
+We now mount this folder on the `client` which will host Forgejo to a folder `/mnt/repositories`...
+
+```shell
+# mount -o hard,timeo=10,retry=10,vers=4.1 server:/repositories /mnt/repositories/
+```
+
+... and create two folders
+
+```shell
+$ mkdir conf
+$ mkdir data
+```
+
+To consider in the client setup is the `hard` setting, blocking all file operations if the share is not available.
+This prevents state changes in the repository which could potentially corrupt the repository data and is an NFS-specific setting.
+
+To circumvent this, you can use the
+We will use the `rootless` image, which hosts the `ssh` server for Forgejo embedded. A possible entry for a `docker-compose` file
+would look like this (shown as a `diff like` view to the example shown [in our initial example](#installation-with-docker)):
+
+```yaml
+version: "3"
+
+networks:
+  forgejo:
+    external: false
+
+services:
+  server:
+-    image: codeberg.org/forgejo/forgejo:1.20
++    image: codeberg.org/forgejo/forgejo:1.20-rootless
+    container_name: forgejo
+    environment:
++      - USER_UID=1024
++      - USER_GID=100
+-      - USER_UID=1000
+-      - USER_GID=1000
+
+    restart: always
+    networks:
+      - forgejo
+    volumes:
+-      - ./forgejo:/var/lib/gitea
++      - /mnt/repositories/data:/var/lib/gitea
++      - /mnt/repositories/conf:/etc/gitea
+      - /etc/timezone:/etc/timezone:ro
+      - /etc/localtime:/etc/localtime:ro
+    ports:
+      - "3000:3000"
+      - "222:22"
+```
+
+This will write the configuration into our created `conf` folder and all other data into the `data` folder.
+Make sure that `USER_UID` and `USER_GID` match the `anonuid` and `anongid` setting
+in the NFS server setting here such that the Forgejo user sees files and folders with the same UID and GID
+in the respective folders and thus identifies itself as the sole owner of the folder structure.
+
+Using the `rootless` image here solves another problem resulting from the file-system ownership issue.
+If we create ssh keys on the `client` image and save them on the `server`, they too will have `0777` permissions, which is prohibited by `openssh`.
+It is important for all involved tools that these files not be writable by just anybody with a login, so you would get you an error if you try to use them.
+Changing permissions will also not succeed through the chosen `all_squash` setup, which was necessary to allow a correct ownership
+mechanic on the server. To resolve this, we consider the `rootless` image, which embeds the `ssh` server, circumventing the problem entirely.
+
+Note that this is a comparatively simple setup which does not necessarily reflect the reality of your network.
+User mapping and ownership could theoretically be streamlined better with Kerberos, which is however out of scope
+for this guide.
+
 ## Installation from binary
 
 ### Install Forgejo and git, create git user
@@ -163,7 +266,7 @@ and make it executable:
 `# cp forgejo-1.20.5-0-linux-amd64 /usr/local/bin/forgejo`
 `# chmod 755 /usr/local/bin/forgejo`
 
-Make sure `git` and `git-lfs` are installed:  
+Make sure `git` and `git-lfs` are installed:
 `# apt install git git-lfs`
 
 Create a user `git` on the system. Forgejo will run as that user, and when accessing git through ssh
@@ -190,12 +293,12 @@ like Fedora, CentOS etc.), run this instead:
 
 Now create the directories Forgejo will use and set access rights appropriately:
 
-`# mkdir /var/lib/forgejo`  
-`# chown git:git /var/lib/forgejo && chmod 750 /var/lib/forgejo`  
+`# mkdir /var/lib/forgejo`
+`# chown git:git /var/lib/forgejo && chmod 750 /var/lib/forgejo`
 _This is the directory Forgejo will store its data in, including your git repos._
 
-`# mkdir /etc/forgejo`  
-`# chown root:git /etc/forgejo && chmod 770 /etc/forgejo`  
+`# mkdir /etc/forgejo`
+`# chown root:git /etc/forgejo && chmod 770 /etc/forgejo`
 _This is the directory Forgejo's config, called `app.ini`, is stored in. Initially it needs to
 be writable by Forgejo, but after the installation you can make it read-only for Forgejo because
 then it shouldn't modify it anymore._
@@ -214,15 +317,15 @@ setup instructions.
 
 Forgejo provides a
 [systemd service script](https://codeberg.org/forgejo/forgejo/src/branch/forgejo/contrib/systemd/forgejo.service).
-Download it to the correct location:  
+Download it to the correct location:
 `# wget -O /etc/systemd/system/forgejo.service https://codeberg.org/forgejo/forgejo/raw/branch/forgejo/contrib/systemd/forgejo.service`
 
 If you're _not_ using sqlite, but MySQL or MariaDB or PostgreSQL, you'll have to edit that file
 (`/etc/systemd/system/forgejo.service`) and uncomment the corresponding `Wants=` and `After=` lines.
 Otherwise it _should_ work as it is.
 
-Now enable and start the Forgejo service, so you can go on with the installation:  
-`# systemctl enable forgejo.service`  
+Now enable and start the Forgejo service, so you can go on with the installation:
+`# systemctl enable forgejo.service`
 `# systemctl start forgejo.service`
 
 ### Forgejos web-based configuration
@@ -231,10 +334,10 @@ You should now be able to access Forgejo in your local web browser, so open http
 
 If it doesn't work:
 
-- Make sure the forgejo service started successfully by checking the output of  
-  `# systemctl status forgejo.service`  
-  If that indicates an error but the log lines underneath are too incomplete to tell what caused it,  
-  `# journalctl -n 100 --unit forgejo.service`  
+- Make sure the forgejo service started successfully by checking the output of
+  `# systemctl status forgejo.service`
+  If that indicates an error but the log lines underneath are too incomplete to tell what caused it,
+  `# journalctl -n 100 --unit forgejo.service`
   will print the last 100 lines logged by Forgejo.
 
 You should be greeted by Forgejo's "Initial Configuration" screen.
@@ -258,11 +361,11 @@ So far, so good, but we're not quite done yet - some manual configuration in the
 
 ### Further configuration in Forgejo's app.ini
 
-Stop the forgejo service:  
+Stop the forgejo service:
 `# systemctl stop forgejo.service`
 
 While at it, make `/etc/forgejo/` and the `app.ini` read-only for the git user (Forgejo doesn't
-write to it after the initial configuration):  
+write to it after the initial configuration):
 `# chmod 750 /etc/forgejo && chmod 640 /etc/forgejo/app.ini`
 
 Now (as root) edit `/etc/forgejo/app.ini`
@@ -303,10 +406,10 @@ The following changes are recommended if dealing with many large files:
 
 - By default **LFS data uploads expire** after 20 minutes - this can be too short for big files,
   slow connections or slow LFS storage (git-lfs seems to automatically restart the upload then -
-  which means that it can take forever and use lots of traffic)..  
+  which means that it can take forever and use lots of traffic)..
   If you're going to use LFS with big uploads, increase thus limit, by adding a line
   `LFS_HTTP_AUTH_EXPIRY = 180m` (for 180 minutes) to the `[server]` section.
-- Similarly there are timeouts for all kinds of git operations, that can be too short.  
+- Similarly there are timeouts for all kinds of git operations, that can be too short.
   Increasing all those git timeouts by adding a `[git.timeout]` section
   below the `[server]` section:
   ```ini
@@ -340,7 +443,7 @@ The following changes are recommended if dealing with many large files:
   HTTP_PORT = 80
   ```
 
-When you're done editing the app.ini, save it and start the forgejo service again:  
+When you're done editing the app.ini, save it and start the forgejo service again:
 `# systemctl start forgejo.service`
 
 You can test sending a mail by clicking the user button on the upper right of the Forgejo page
@@ -360,7 +463,7 @@ Keep in mind that:
 - You need to specify the path to the config (app.ini) with `--config /etc/forgejo/app.ini`
   (or `-c /etc/forgejo/app.ini`).
 
-So all in all your command might look like:  
+So all in all your command might look like:
 `$ sudo -u git forgejo -w /var/lib/forgejo -c /etc/forgejo/app.ini admin user list`
 
 > **_For convenience_**, you could create a `/usr/local/bin/forgejo.sh` with the following contents:
@@ -370,7 +473,7 @@ So all in all your command might look like:
 > sudo -u git forgejo -w /var/lib/forgejo -c /etc/forgejo/app.ini "$@"
 > ```
 >
-> and make it executable:  
+> and make it executable:
 > `# chmod 755  /usr/local/bin/forgejo.sh`
 >
 > Now if you want to call `forgejo` on the commandline (for the default system-wide installation
@@ -378,8 +481,8 @@ So all in all your command might look like:
 > line shown above.
 
 You can always call forgejo and its subcommands with `-h` or `--help` to make it output usage
-information like available options and (sub)commands, for example  
-`$ forgejo admin user -h`  
+information like available options and (sub)commands, for example
+`$ forgejo admin user -h`
 to show available subcommands to administrate users on the commandline.
 
 ## Installation from package