- The ambiguous character detection is an important security feature to
combat against sourcebase attacks (https://trojansource.codes/).
- However there are a few problems with the feature as it stands
today (i) it's apparantly an big performance hitter, it's twice as slow
as syntax highlighting (ii) it contains false positives, because it's
reporting valid problems but not valid within the context of a
programming language (ambiguous charachters in code comments being a
prime example) that can lead to security issues (iii) charachters from
certain languages always being marked as ambiguous. It's a lot of effort
to fix the aforementioned issues.
- Therefore, make it configurable in which context the ambiguous
character detection should be run, this avoids running detection in all
contexts such as file views, but still enable it in commits and pull
requests diffs where it matters the most. Ideally this also becomes an
per-repository setting, but the code architecture doesn't allow for a
clean implementation of that.
- Adds unit test.
- Adds integration tests to ensure that the contexts and instance-wide
is respected (and that ambigious charachter detection actually work in
different places).
- Ref: https://codeberg.org/forgejo/forgejo/pulls/2395#issuecomment-1575547
- Ref: https://codeberg.org/forgejo/forgejo/issues/564
- Remove unnecessary checks for `ctx.Repo.TreePath`, because it will
already early-return if it's empty.
- Simplify `performBlame` to extract the repoPath from the context.
- Don't render the topics, as they are not shown in the blame
page (there's a condition in the template for the blame page).
- Fix that `performBlame` doesn't call `NotFound`, it should instead
only return the error.
- Fix that the error handlings call `ServerError` instead of `NotFound`.
- Simplify `bypass-blame-ignore` to use `ctx.FormBool`.
- Remove `TreeLink`, `HasParentPath` and `ParentPath` as it's not used
in the blame template.
- Inline `BranchLink` and `RawFileLink` string operations.
- Move around `NumLines` to make it clear the error is handled.
- Less code, less things the code does, more readable and thus more
maintanable!
(cherry picked from commit e02baca55c)
(cherry picked from commit 74e00620ca)
Refactor Hash interfaces and centralize hash function. This will allow
easier introduction of different hash function later on.
This forms the "no-op" part of the SHA256 enablement patch.
Closes #26329
This PR adds the ability to ignore revisions specified in the
`.git-blame-ignore-revs` file in the root of the repository.
![grafik](https://github.com/go-gitea/gitea/assets/1666336/9e91be0c-6e9c-431c-bbe9-5f80154251c8)
The banner is displayed in this case. I intentionally did not add a UI
way to bypass the ignore file (same behaviour as Github) but you can add
`?bypass-blame-ignore=true` to the url manually.
---------
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
To avoid duplicated load of the same data in an HTTP request, we can set
a context cache to do that. i.e. Some pages may load a user from a
database with the same id in different areas on the same page. But the
code is hidden in two different deep logic. How should we share the
user? As a result of this PR, now if both entry functions accept
`context.Context` as the first parameter and we just need to refactor
`GetUserByID` to reuse the user from the context cache. Then it will not
be loaded twice on an HTTP request.
But of course, sometimes we would like to reload an object from the
database, that's why `RemoveContextData` is also exposed.
The core context cache is here. It defines a new context
```go
type cacheContext struct {
ctx context.Context
data map[any]map[any]any
lock sync.RWMutex
}
var cacheContextKey = struct{}{}
func WithCacheContext(ctx context.Context) context.Context {
return context.WithValue(ctx, cacheContextKey, &cacheContext{
ctx: ctx,
data: make(map[any]map[any]any),
})
}
```
Then you can use the below 4 methods to read/write/del the data within
the same context.
```go
func GetContextData(ctx context.Context, tp, key any) any
func SetContextData(ctx context.Context, tp, key, value any)
func RemoveContextData(ctx context.Context, tp, key any)
func GetWithContextCache[T any](ctx context.Context, cacheGroupKey string, cacheTargetID any, f func() (T, error)) (T, error)
```
Then let's take a look at how `system.GetString` implement it.
```go
func GetSetting(ctx context.Context, key string) (string, error) {
return cache.GetWithContextCache(ctx, contextCacheKey, key, func() (string, error) {
return cache.GetString(genSettingCacheKey(key), func() (string, error) {
res, err := GetSettingNoCache(ctx, key)
if err != nil {
return "", err
}
return res.SettingValue, nil
})
})
}
```
First, it will check if context data include the setting object with the
key. If not, it will query from the global cache which may be memory or
a Redis cache. If not, it will get the object from the database. In the
end, if the object gets from the global cache or database, it will be
set into the context cache.
An object stored in the context cache will only be destroyed after the
context disappeared.
As discussed in #22847 the helpers in helpers.less need to have a
separate prefix as they are causing conflicts with fomantic styles
This will allow us to have the `.gt-hidden { display:none !important; }`
style that is needed to for the reverted PR.
Of note in doing this I have noticed that there was already a conflict
with at least one chroma style which this PR now avoids.
I've also added in the `gt-hidden` style that matches the tailwind one
and switched the code that needed it to use that.
Signed-off-by: Andrew Thornton <art27@cantab.net>
---------
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
This PR follows #21535 (and replace #22592)
## Review without space diff
https://github.com/go-gitea/gitea/pull/22678/files?diff=split&w=1
## Purpose of this PR
1. Make git module command completely safe (risky user inputs won't be
passed as argument option anymore)
2. Avoid low-level mistakes like
https://github.com/go-gitea/gitea/pull/22098#discussion_r1045234918
3. Remove deprecated and dirty `CmdArgCheck` function, hide the `CmdArg`
type
4. Simplify code when using git command
## The main idea of this PR
* Move the `git.CmdArg` to the `internal` package, then no other package
except `git` could use it. Then developers could never do
`AddArguments(git.CmdArg(userInput))` any more.
* Introduce `git.ToTrustedCmdArgs`, it's for user-provided and already
trusted arguments. It's only used in a few cases, for example: use git
arguments from config file, help unit test with some arguments.
* Introduce `AddOptionValues` and `AddOptionFormat`, they make code more
clear and simple:
* Before: `AddArguments("-m").AddDynamicArguments(message)`
* After: `AddOptionValues("-m", message)`
* -
* Before: `AddArguments(git.CmdArg(fmt.Sprintf("--author='%s <%s>'",
sig.Name, sig.Email)))`
* After: `AddOptionFormat("--author='%s <%s>'", sig.Name, sig.Email)`
## FAQ
### Why these changes were not done in #21535 ?
#21535 is mainly a search&replace, it did its best to not change too
much logic.
Making the framework better needs a lot of changes, so this separate PR
is needed as the second step.
### The naming of `AddOptionXxx`
According to git's manual, the `--xxx` part is called `option`.
### How can it guarantee that `internal.CmdArg` won't be not misused?
Go's specification guarantees that. Trying to access other package's
internal package causes compilation error.
And, `golangci-lint` also denies the git/internal package. Only the
`git/command.go` can use it carefully.
### There is still a `ToTrustedCmdArgs`, will it still allow developers
to make mistakes and pass untrusted arguments?
Generally speaking, no. Because when using `ToTrustedCmdArgs`, the code
will be very complex (see the changes for examples). Then developers and
reviewers can know that something might be unreasonable.
### Why there was a `CmdArgCheck` and why it's removed?
At the moment of #21535, to reduce unnecessary changes, `CmdArgCheck`
was introduced as a hacky patch. Now, almost all code could be written
as `cmd := NewCommand(); cmd.AddXxx(...)`, then there is no need for
`CmdArgCheck` anymore.
### Why many codes for `signArg == ""` is deleted?
Because in the old code, `signArg` could never be empty string, it's
either `-S[key-id]` or `--no-gpg-sign`. So the `signArg == ""` is just
dead code.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Change all license headers to comply with REUSE specification.
Fix #16132
Co-authored-by: flynnnnnnnnnn <flynnnnnnnnnn@github>
Co-authored-by: John Olheiser <john.olheiser@gmail.com>
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.
Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.
Close #19913
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Prototyping
* Start work on creating offsets
* Modify tests
* Start prototyping with actual MPH
* Twiddle around
* Twiddle around comments
* Convert templates
* Fix external languages
* Fix latest translation
* Fix some test
* Tidy up code
* Use simple map
* go mod tidy
* Move back to data structure
- Uses less memory by creating for each language a map.
* Apply suggestions from code review
Co-authored-by: delvh <dev.lh@web.de>
* Add some comments
* Fix tests
* Try to fix tests
* Use en-US as defacto fallback
* Use correct slices
* refactor (#4)
* Remove TryTr, add log for missing translation key
* Refactor i18n
- Separate dev and production locale stores.
- Allow for live-reloading in dev mode.
Co-authored-by: zeripath <art27@cantab.net>
* Fix live-reloading & check for errors
* Make linter happy
* live-reload with periodic check (#5)
* Fix tests
Co-authored-by: delvh <dev.lh@web.de>
Co-authored-by: 6543 <6543@obermui.de>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: zeripath <art27@cantab.net>
* remove unnecessary web context data fields, and unify the i18n/translation related functions to `Locale`
* in development, show an error if a translation key is missing
* remove the unnecessary loops `for _, lang := range translation.AllLangs()` for every request, which improves the performance slightly
* use `ctx.Locale.Language()` instead of `ctx.Data["Lang"].(string)`
* add more comments about how the Locale/LangType fields are used
Fix #17514
Given the comments I've adjusted this somewhat. The numbers of characters detected are increased and include things like the use of U+300 to make à instead of à and non-breaking spaces.
There is a button which can be used to escape the content to show it.
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Gwyneth Morgan <gwymor@tilde.club>
Co-authored-by: silverwind <me@silverwind.io>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
* Some refactors related repository model
* Move more methods out of repository
* Move repository into models/repo
* Fix test
* Fix test
* some improvements
* Remove unnecessary function
Use check attribute code to check the assigned language of a file and send that in to
chroma as a hint for the language of the file.
Signed-off-by: Andrew Thornton <art27@cantab.net>
There are multiple places where Gitea does not properly escape URLs that it is building and there are multiple places where it builds urls when there is already a simpler function available to use this.
This is an extensive PR attempting to fix these issues.
1. The first commit in this PR looks through all href, src and links in the Gitea codebase and has attempted to catch all the places where there is potentially incomplete escaping.
2. Whilst doing this we will prefer to use functions that create URLs over recreating them by hand.
3. All uses of strings should be directly escaped - even if they are not currently expected to contain escaping characters. The main benefit to doing this will be that we can consider relaxing the constraints on user names and reponames in future.
4. The next commit looks at escaping in the wiki and re-considers the urls that are used there. Using the improved escaping here wiki files containing '/'. (This implementation will currently still place all of the wiki files the root directory of the repo but this would not be difficult to change.)
5. The title generation in feeds is now properly escaped.
6. EscapePound is no longer needed - urls should be PathEscaped / QueryEscaped as necessary but then re-escaped with Escape when creating html with locales Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
The call to html.EscapeString in routers/web/repo/blame.go:renderBlame is extraneous
as the commit message is now rendered by the template. The template will correctly
escape strings - therefore we are currently double escaping.
This PR fixes this.
Fix #17492
Signed-off-by: Andrew Thornton <art27@cantab.net>
Adds a link to each blame hunk, to view the blame of an earlier version of the file, similar to GitHub. Also refactors the blame render from fmtstring based to template based.
* Fix blame bottom line and add blame prior button
* Jump to previous parent commit from the commit.
* Fix previous commit link
* Fix previous blame link
* Fix the given file not exist in the previous commit.
* Fix blameRow struct not export
* fix theming issues, rename template var
* remove unused LastCommit fetch
* fix location of blame-hunk divider
* rewrite previous commit checks
* remove duplicate commit lookup
its already resolved and stored in ctx.Repo.Commit!
* split out blamePart processing into function
Co-authored-by: rogerluo410 <rogerluo410@gmail.com>
* refactor routers directory
* move func used for web and api to common
* make corsHandler a function to prohibit side efects
* rm unused func
Co-authored-by: 6543 <6543@obermui.de>