2023-05-18 06:45:25 -04:00
|
|
|
// Copyright 2023 The Gitea Authors. All rights reserved.
|
|
|
|
// SPDX-License-Identifier: MIT
|
|
|
|
|
|
|
|
package issues
|
|
|
|
|
|
|
|
import (
|
|
|
|
"context"
|
|
|
|
"fmt"
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
"strconv"
|
2023-05-18 06:45:25 -04:00
|
|
|
"strings"
|
|
|
|
|
|
|
|
"code.gitea.io/gitea/models/db"
|
|
|
|
"code.gitea.io/gitea/models/organization"
|
|
|
|
repo_model "code.gitea.io/gitea/models/repo"
|
|
|
|
"code.gitea.io/gitea/models/unit"
|
|
|
|
user_model "code.gitea.io/gitea/models/user"
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
"code.gitea.io/gitea/modules/container"
|
2024-03-02 10:42:31 -05:00
|
|
|
"code.gitea.io/gitea/modules/optional"
|
2023-05-18 06:45:25 -04:00
|
|
|
|
|
|
|
"xorm.io/builder"
|
|
|
|
"xorm.io/xorm"
|
|
|
|
)
|
|
|
|
|
|
|
|
// IssuesOptions represents options of an issue.
|
|
|
|
type IssuesOptions struct { //nolint
|
2024-03-24 14:51:08 -04:00
|
|
|
Paginator *db.ListOptions
|
2023-05-19 10:17:48 -04:00
|
|
|
RepoIDs []int64 // overwrites RepoCond if the length is not 0
|
Include public repos in doer's dashboard for issue search (#28304)
It will fix #28268 .
<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">
<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">
## :warning: BREAKING :warning:
But need to give up some features:
<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">
However, such abandonment may fix #28055 .
## Backgroud
When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.
<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">
It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?
Whatever, it has been broken by #26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes #28268.
## How to fix it
It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.
But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"
<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">
Before #26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of #26012).
And to keep this, #26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.
https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338
## Give up unnecessary features
We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".
And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?
Let me show you my search dashboard on gitea.com.
<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">
I never think the long repo list helps anything.
And if we agree to abandon it, things will be much easier. That is this
PR.
## TODO
I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.
<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">
The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.
Or please block this PR and help to complete it.
Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
2023-12-07 00:26:18 -05:00
|
|
|
AllPublic bool // include also all public repositories
|
2023-05-18 06:45:25 -04:00
|
|
|
RepoCond builder.Cond
|
|
|
|
AssigneeID int64
|
|
|
|
PosterID int64
|
|
|
|
MentionedID int64
|
|
|
|
ReviewRequestedID int64
|
|
|
|
ReviewedID int64
|
|
|
|
SubscriberID int64
|
|
|
|
MilestoneIDs []int64
|
|
|
|
ProjectID int64
|
2024-05-27 04:59:54 -04:00
|
|
|
ProjectColumnID int64
|
2024-03-02 10:42:31 -05:00
|
|
|
IsClosed optional.Option[bool]
|
|
|
|
IsPull optional.Option[bool]
|
2023-05-18 06:45:25 -04:00
|
|
|
LabelIDs []int64
|
|
|
|
IncludedLabelNames []string
|
|
|
|
ExcludedLabelNames []string
|
|
|
|
IncludeMilestones []string
|
|
|
|
SortType string
|
|
|
|
IssueIDs []int64
|
|
|
|
UpdatedAfterUnix int64
|
|
|
|
UpdatedBeforeUnix int64
|
|
|
|
// prioritize issues from this repo
|
|
|
|
PriorityRepoID int64
|
2024-03-02 10:42:31 -05:00
|
|
|
IsArchived optional.Option[bool]
|
2023-05-18 06:45:25 -04:00
|
|
|
Org *organization.Organization // issues permission scope
|
|
|
|
Team *organization.Team // issues permission scope
|
|
|
|
User *user_model.User // issues permission scope
|
|
|
|
}
|
|
|
|
|
|
|
|
// applySorts sort an issues-related session based on the provided
|
|
|
|
// sortType string
|
|
|
|
func applySorts(sess *xorm.Session, sortType string, priorityRepoID int64) {
|
|
|
|
switch sortType {
|
|
|
|
case "oldest":
|
|
|
|
sess.Asc("issue.created_unix").Asc("issue.id")
|
|
|
|
case "recentupdate":
|
|
|
|
sess.Desc("issue.updated_unix").Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
case "leastupdate":
|
|
|
|
sess.Asc("issue.updated_unix").Asc("issue.created_unix").Asc("issue.id")
|
|
|
|
case "mostcomment":
|
|
|
|
sess.Desc("issue.num_comments").Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
case "leastcomment":
|
|
|
|
sess.Asc("issue.num_comments").Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
case "priority":
|
|
|
|
sess.Desc("issue.priority").Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
case "nearduedate":
|
|
|
|
// 253370764800 is 01/01/9999 @ 12:00am (UTC)
|
|
|
|
sess.Join("LEFT", "milestone", "issue.milestone_id = milestone.id").
|
|
|
|
OrderBy("CASE " +
|
|
|
|
"WHEN issue.deadline_unix = 0 AND (milestone.deadline_unix = 0 OR milestone.deadline_unix IS NULL) THEN 253370764800 " +
|
|
|
|
"WHEN milestone.deadline_unix = 0 OR milestone.deadline_unix IS NULL THEN issue.deadline_unix " +
|
|
|
|
"WHEN milestone.deadline_unix < issue.deadline_unix OR issue.deadline_unix = 0 THEN milestone.deadline_unix " +
|
|
|
|
"ELSE issue.deadline_unix END ASC").
|
|
|
|
Desc("issue.created_unix").
|
|
|
|
Desc("issue.id")
|
|
|
|
case "farduedate":
|
|
|
|
sess.Join("LEFT", "milestone", "issue.milestone_id = milestone.id").
|
|
|
|
OrderBy("CASE " +
|
|
|
|
"WHEN milestone.deadline_unix IS NULL THEN issue.deadline_unix " +
|
|
|
|
"WHEN milestone.deadline_unix < issue.deadline_unix OR issue.deadline_unix = 0 THEN milestone.deadline_unix " +
|
|
|
|
"ELSE issue.deadline_unix END DESC").
|
|
|
|
Desc("issue.created_unix").
|
|
|
|
Desc("issue.id")
|
|
|
|
case "priorityrepo":
|
|
|
|
sess.OrderBy("CASE "+
|
|
|
|
"WHEN issue.repo_id = ? THEN 1 "+
|
|
|
|
"ELSE 2 END ASC", priorityRepoID).
|
|
|
|
Desc("issue.created_unix").
|
|
|
|
Desc("issue.id")
|
|
|
|
case "project-column-sorting":
|
|
|
|
sess.Asc("project_issue.sorting").Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
default:
|
|
|
|
sess.Desc("issue.created_unix").Desc("issue.id")
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyLimit(sess *xorm.Session, opts *IssuesOptions) {
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
if opts.Paginator == nil || opts.Paginator.IsListAll() {
|
2024-06-11 14:47:45 -04:00
|
|
|
return
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
}
|
|
|
|
|
2024-03-24 14:51:08 -04:00
|
|
|
start := 0
|
|
|
|
if opts.Paginator.Page > 1 {
|
|
|
|
start = (opts.Paginator.Page - 1) * opts.Paginator.PageSize
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
2024-03-24 14:51:08 -04:00
|
|
|
sess.Limit(opts.Paginator.PageSize, start)
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyLabelsCondition(sess *xorm.Session, opts *IssuesOptions) {
|
2023-05-18 06:45:25 -04:00
|
|
|
if len(opts.LabelIDs) > 0 {
|
|
|
|
if opts.LabelIDs[0] == 0 {
|
|
|
|
sess.Where("issue.id NOT IN (SELECT issue_id FROM issue_label)")
|
|
|
|
} else {
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
// deduplicate the label IDs for inclusion and exclusion
|
|
|
|
includedLabelIDs := make(container.Set[int64])
|
|
|
|
excludedLabelIDs := make(container.Set[int64])
|
|
|
|
for _, labelID := range opts.LabelIDs {
|
2023-05-18 06:45:25 -04:00
|
|
|
if labelID > 0 {
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
includedLabelIDs.Add(labelID)
|
2023-05-18 06:45:25 -04:00
|
|
|
} else if labelID < 0 { // 0 is not supported here, so just ignore it
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
excludedLabelIDs.Add(-labelID)
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
}
|
Optimization of labels handling in issue_search (#4228)
This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page.
<hr/>
### Background
Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one :
```
[07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"
```
Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too).
Thus, this PR proposes two enhancements:
* rewrite the database query to use only one squashed condition,
* deduplicate the label ids when generating the URL.
### Performance comparison
Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances :
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m10,491s
user 0m0,017s
sys 0m0,008s
```
...and with the patch:
```sh
$ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0"
real 0m0,094s
user 0m0,012s
sys 0m0,013s
```
### Annex
This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :)
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Co-authored-by: Chl <chl@xlii.si>
Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
|
|
|
// ... and use them in a subquery of the form :
|
|
|
|
// where (select count(*) from issue_label where issue_id=issue.id and label_id in (2, 4, 6)) = 3
|
|
|
|
// This equality is guaranteed thanks to unique index (issue_id,label_id) on table issue_label.
|
|
|
|
if len(includedLabelIDs) > 0 {
|
|
|
|
subQuery := builder.Select("count(*)").From("issue_label").Where(builder.Expr("issue_id = issue.id")).
|
|
|
|
And(builder.In("label_id", includedLabelIDs.Values()))
|
|
|
|
sess.Where(builder.Eq{strconv.Itoa(len(includedLabelIDs)): subQuery})
|
|
|
|
}
|
|
|
|
// or (select count(*)...) = 0 for excluded labels
|
|
|
|
if len(excludedLabelIDs) > 0 {
|
|
|
|
subQuery := builder.Select("count(*)").From("issue_label").Where(builder.Expr("issue_id = issue.id")).
|
|
|
|
And(builder.In("label_id", excludedLabelIDs.Values()))
|
|
|
|
sess.Where(builder.Eq{"0": subQuery})
|
|
|
|
}
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(opts.IncludedLabelNames) > 0 {
|
|
|
|
sess.In("issue.id", BuildLabelNamesIssueIDsCondition(opts.IncludedLabelNames))
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(opts.ExcludedLabelNames) > 0 {
|
|
|
|
sess.And(builder.NotIn("issue.id", BuildLabelNamesIssueIDsCondition(opts.ExcludedLabelNames)))
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyMilestoneCondition(sess *xorm.Session, opts *IssuesOptions) {
|
2023-05-18 06:45:25 -04:00
|
|
|
if len(opts.MilestoneIDs) == 1 && opts.MilestoneIDs[0] == db.NoConditionID {
|
|
|
|
sess.And("issue.milestone_id = 0")
|
|
|
|
} else if len(opts.MilestoneIDs) > 0 {
|
|
|
|
sess.In("issue.milestone_id", opts.MilestoneIDs)
|
|
|
|
}
|
|
|
|
|
|
|
|
if len(opts.IncludeMilestones) > 0 {
|
|
|
|
sess.In("issue.milestone_id",
|
|
|
|
builder.Select("id").
|
|
|
|
From("milestone").
|
|
|
|
Where(builder.In("name", opts.IncludeMilestones)))
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyProjectCondition(sess *xorm.Session, opts *IssuesOptions) {
|
2023-08-15 10:50:12 -04:00
|
|
|
if opts.ProjectID > 0 { // specific project
|
|
|
|
sess.Join("INNER", "project_issue", "issue.id = project_issue.issue_id").
|
|
|
|
And("project_issue.project_id=?", opts.ProjectID)
|
|
|
|
} else if opts.ProjectID == db.NoConditionID { // show those that are in no project
|
|
|
|
sess.And(builder.NotIn("issue.id", builder.Select("issue_id").From("project_issue").And(builder.Neq{"project_id": 0})))
|
|
|
|
}
|
|
|
|
// opts.ProjectID == 0 means all projects,
|
|
|
|
// do not need to apply any condition
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyProjectColumnCondition(sess *xorm.Session, opts *IssuesOptions) {
|
2024-05-27 04:59:54 -04:00
|
|
|
// opts.ProjectColumnID == 0 means all project columns,
|
2023-10-20 08:01:25 -04:00
|
|
|
// do not need to apply any condition
|
2024-05-27 04:59:54 -04:00
|
|
|
if opts.ProjectColumnID > 0 {
|
|
|
|
sess.In("issue.id", builder.Select("issue_id").From("project_issue").Where(builder.Eq{"project_board_id": opts.ProjectColumnID}))
|
|
|
|
} else if opts.ProjectColumnID == db.NoConditionID {
|
2023-10-25 07:51:49 -04:00
|
|
|
sess.In("issue.id", builder.Select("issue_id").From("project_issue").Where(builder.Eq{"project_board_id": 0}))
|
2023-10-20 08:01:25 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyRepoConditions(sess *xorm.Session, opts *IssuesOptions) {
|
2023-05-19 10:17:48 -04:00
|
|
|
if len(opts.RepoIDs) == 1 {
|
|
|
|
opts.RepoCond = builder.Eq{"issue.repo_id": opts.RepoIDs[0]}
|
|
|
|
} else if len(opts.RepoIDs) > 1 {
|
|
|
|
opts.RepoCond = builder.In("issue.repo_id", opts.RepoIDs)
|
|
|
|
}
|
Include public repos in doer's dashboard for issue search (#28304)
It will fix #28268 .
<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">
<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">
## :warning: BREAKING :warning:
But need to give up some features:
<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">
However, such abandonment may fix #28055 .
## Backgroud
When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.
<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">
It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?
Whatever, it has been broken by #26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes #28268.
## How to fix it
It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.
But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"
<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">
Before #26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of #26012).
And to keep this, #26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.
https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338
## Give up unnecessary features
We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".
And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?
Let me show you my search dashboard on gitea.com.
<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">
I never think the long repo list helps anything.
And if we agree to abandon it, things will be much easier. That is this
PR.
## TODO
I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.
<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">
The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.
Or please block this PR and help to complete it.
Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
2023-12-07 00:26:18 -05:00
|
|
|
if opts.AllPublic {
|
|
|
|
if opts.RepoCond == nil {
|
|
|
|
opts.RepoCond = builder.NewCond()
|
|
|
|
}
|
|
|
|
opts.RepoCond = opts.RepoCond.Or(builder.In("issue.repo_id", builder.Select("id").From("repository").Where(builder.Eq{"is_private": false})))
|
|
|
|
}
|
2023-05-19 10:17:48 -04:00
|
|
|
if opts.RepoCond != nil {
|
|
|
|
sess.And(opts.RepoCond)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyConditions(sess *xorm.Session, opts *IssuesOptions) {
|
2023-05-18 06:45:25 -04:00
|
|
|
if len(opts.IssueIDs) > 0 {
|
|
|
|
sess.In("issue.id", opts.IssueIDs)
|
|
|
|
}
|
|
|
|
|
2023-05-19 10:17:48 -04:00
|
|
|
applyRepoConditions(sess, opts)
|
2023-05-18 06:45:25 -04:00
|
|
|
|
2024-03-02 10:42:31 -05:00
|
|
|
if opts.IsClosed.Has() {
|
|
|
|
sess.And("issue.is_closed=?", opts.IsClosed.Value())
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
if opts.AssigneeID > 0 {
|
|
|
|
applyAssigneeCondition(sess, opts.AssigneeID)
|
|
|
|
} else if opts.AssigneeID == db.NoConditionID {
|
|
|
|
sess.Where("issue.id NOT IN (SELECT issue_id FROM issue_assignees)")
|
|
|
|
}
|
|
|
|
|
|
|
|
if opts.PosterID > 0 {
|
|
|
|
applyPosterCondition(sess, opts.PosterID)
|
|
|
|
}
|
|
|
|
|
|
|
|
if opts.MentionedID > 0 {
|
|
|
|
applyMentionedCondition(sess, opts.MentionedID)
|
|
|
|
}
|
|
|
|
|
|
|
|
if opts.ReviewRequestedID > 0 {
|
|
|
|
applyReviewRequestedCondition(sess, opts.ReviewRequestedID)
|
|
|
|
}
|
|
|
|
|
|
|
|
if opts.ReviewedID > 0 {
|
|
|
|
applyReviewedCondition(sess, opts.ReviewedID)
|
|
|
|
}
|
|
|
|
|
|
|
|
if opts.SubscriberID > 0 {
|
|
|
|
applySubscribedCondition(sess, opts.SubscriberID)
|
|
|
|
}
|
|
|
|
|
|
|
|
applyMilestoneCondition(sess, opts)
|
|
|
|
|
|
|
|
if opts.UpdatedAfterUnix != 0 {
|
|
|
|
sess.And(builder.Gte{"issue.updated_unix": opts.UpdatedAfterUnix})
|
|
|
|
}
|
|
|
|
if opts.UpdatedBeforeUnix != 0 {
|
|
|
|
sess.And(builder.Lte{"issue.updated_unix": opts.UpdatedBeforeUnix})
|
|
|
|
}
|
|
|
|
|
2023-08-15 10:50:12 -04:00
|
|
|
applyProjectCondition(sess, opts)
|
2023-05-18 06:45:25 -04:00
|
|
|
|
2024-05-27 04:59:54 -04:00
|
|
|
applyProjectColumnCondition(sess, opts)
|
2023-05-18 06:45:25 -04:00
|
|
|
|
2024-03-02 10:42:31 -05:00
|
|
|
if opts.IsPull.Has() {
|
|
|
|
sess.And("issue.is_pull=?", opts.IsPull.Value())
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
2024-03-02 10:42:31 -05:00
|
|
|
if opts.IsArchived.Has() {
|
|
|
|
sess.And(builder.Eq{"repository.is_archived": opts.IsArchived.Value()})
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
applyLabelsCondition(sess, opts)
|
|
|
|
|
|
|
|
if opts.User != nil {
|
2024-03-02 10:42:31 -05:00
|
|
|
sess.And(issuePullAccessibleRepoCond("issue.repo_id", opts.User.ID, opts.Org, opts.Team, opts.IsPull.Value()))
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// teamUnitsRepoCond returns query condition for those repo id in the special org team with special units access
|
|
|
|
func teamUnitsRepoCond(id string, userID, orgID, teamID int64, units ...unit.Type) builder.Cond {
|
|
|
|
return builder.In(id,
|
|
|
|
builder.Select("repo_id").From("team_repo").Where(
|
|
|
|
builder.Eq{
|
|
|
|
"team_id": teamID,
|
|
|
|
}.And(
|
|
|
|
builder.Or(
|
|
|
|
// Check if the user is member of the team.
|
|
|
|
builder.In(
|
|
|
|
"team_id", builder.Select("team_id").From("team_user").Where(
|
|
|
|
builder.Eq{
|
|
|
|
"uid": userID,
|
|
|
|
},
|
|
|
|
),
|
|
|
|
),
|
|
|
|
// Check if the user is in the owner team of the organisation.
|
|
|
|
builder.Exists(builder.Select("team_id").From("team_user").
|
|
|
|
Where(builder.Eq{
|
|
|
|
"org_id": orgID,
|
|
|
|
"team_id": builder.Select("id").From("team").Where(
|
|
|
|
builder.Eq{
|
|
|
|
"org_id": orgID,
|
|
|
|
"lower_name": strings.ToLower(organization.OwnerTeamName),
|
|
|
|
}),
|
|
|
|
"uid": userID,
|
|
|
|
}),
|
|
|
|
),
|
|
|
|
)).And(
|
|
|
|
builder.In(
|
|
|
|
"team_id", builder.Select("team_id").From("team_unit").Where(
|
|
|
|
builder.Eq{
|
|
|
|
"`team_unit`.org_id": orgID,
|
|
|
|
}.And(
|
|
|
|
builder.In("`team_unit`.type", units),
|
|
|
|
),
|
|
|
|
),
|
|
|
|
),
|
|
|
|
),
|
|
|
|
))
|
|
|
|
}
|
|
|
|
|
|
|
|
// issuePullAccessibleRepoCond userID must not be zero, this condition require join repository table
|
|
|
|
func issuePullAccessibleRepoCond(repoIDstr string, userID int64, org *organization.Organization, team *organization.Team, isPull bool) builder.Cond {
|
|
|
|
cond := builder.NewCond()
|
|
|
|
unitType := unit.TypeIssues
|
|
|
|
if isPull {
|
|
|
|
unitType = unit.TypePullRequests
|
|
|
|
}
|
|
|
|
if org != nil {
|
|
|
|
if team != nil {
|
|
|
|
cond = cond.And(teamUnitsRepoCond(repoIDstr, userID, org.ID, team.ID, unitType)) // special team member repos
|
|
|
|
} else {
|
|
|
|
cond = cond.And(
|
|
|
|
builder.Or(
|
|
|
|
repo_model.UserOrgUnitRepoCond(repoIDstr, userID, org.ID, unitType), // team member repos
|
|
|
|
repo_model.UserOrgPublicUnitRepoCond(userID, org.ID), // user org public non-member repos, TODO: check repo has issues
|
|
|
|
),
|
|
|
|
)
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
cond = cond.And(
|
|
|
|
builder.Or(
|
|
|
|
repo_model.UserOwnedRepoCond(userID), // owned repos
|
|
|
|
repo_model.UserAccessRepoCond(repoIDstr, userID), // user can access repo in a unit independent way
|
|
|
|
repo_model.UserAssignedRepoCond(repoIDstr, userID), // user has been assigned accessible public repos
|
|
|
|
repo_model.UserMentionedRepoCond(repoIDstr, userID), // user has been mentioned accessible public repos
|
|
|
|
repo_model.UserCreateIssueRepoCond(repoIDstr, userID, isPull), // user has created issue/pr accessible public repos
|
|
|
|
),
|
|
|
|
)
|
|
|
|
}
|
|
|
|
return cond
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyAssigneeCondition(sess *xorm.Session, assigneeID int64) {
|
|
|
|
sess.Join("INNER", "issue_assignees", "issue.id = issue_assignees.issue_id").
|
2023-05-18 06:45:25 -04:00
|
|
|
And("issue_assignees.assignee_id = ?", assigneeID)
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyPosterCondition(sess *xorm.Session, posterID int64) {
|
|
|
|
sess.And("issue.poster_id=?", posterID)
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyMentionedCondition(sess *xorm.Session, mentionedID int64) {
|
|
|
|
sess.Join("INNER", "issue_user", "issue.id = issue_user.issue_id").
|
2023-05-18 06:45:25 -04:00
|
|
|
And("issue_user.is_mentioned = ?", true).
|
|
|
|
And("issue_user.uid = ?", mentionedID)
|
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyReviewRequestedCondition(sess *xorm.Session, reviewRequestedID int64) {
|
2023-09-02 22:12:38 -04:00
|
|
|
existInTeamQuery := builder.Select("team_user.team_id").
|
|
|
|
From("team_user").
|
|
|
|
Where(builder.Eq{"team_user.uid": reviewRequestedID})
|
|
|
|
|
2023-09-21 07:59:50 -04:00
|
|
|
// if the review is approved or rejected, it should not be shown in the review requested list
|
|
|
|
maxReview := builder.Select("MAX(r.id)").
|
|
|
|
From("review as r").
|
|
|
|
Where(builder.In("r.type", []ReviewType{ReviewTypeApprove, ReviewTypeReject, ReviewTypeRequest})).
|
|
|
|
GroupBy("r.issue_id, r.reviewer_id, r.reviewer_team_id")
|
|
|
|
|
2023-09-02 22:12:38 -04:00
|
|
|
subQuery := builder.Select("review.issue_id").
|
|
|
|
From("review").
|
|
|
|
Where(builder.And(
|
2023-09-21 07:59:50 -04:00
|
|
|
builder.Eq{"review.type": ReviewTypeRequest},
|
2023-09-02 22:12:38 -04:00
|
|
|
builder.Or(
|
|
|
|
builder.Eq{"review.reviewer_id": reviewRequestedID},
|
|
|
|
builder.In("review.reviewer_team_id", existInTeamQuery),
|
|
|
|
),
|
2023-09-21 07:59:50 -04:00
|
|
|
builder.In("review.id", maxReview),
|
2023-09-02 22:12:38 -04:00
|
|
|
))
|
2024-06-11 14:47:45 -04:00
|
|
|
sess.Where("issue.poster_id <> ?", reviewRequestedID).
|
2023-09-02 22:12:38 -04:00
|
|
|
And(builder.In("issue.id", subQuery))
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applyReviewedCondition(sess *xorm.Session, reviewedID int64) {
|
2023-05-18 06:45:25 -04:00
|
|
|
// Query for pull requests where you are a reviewer or commenter, excluding
|
2024-03-11 05:24:23 -04:00
|
|
|
// any pull requests already returned by the review requested filter.
|
2023-05-18 06:45:25 -04:00
|
|
|
notPoster := builder.Neq{"issue.poster_id": reviewedID}
|
|
|
|
reviewed := builder.In("issue.id", builder.
|
|
|
|
Select("issue_id").
|
|
|
|
From("review").
|
|
|
|
Where(builder.And(
|
|
|
|
builder.Neq{"type": ReviewTypeRequest},
|
|
|
|
builder.Or(
|
|
|
|
builder.Eq{"reviewer_id": reviewedID},
|
|
|
|
builder.In("reviewer_team_id", builder.
|
|
|
|
Select("team_id").
|
|
|
|
From("team_user").
|
|
|
|
Where(builder.Eq{"uid": reviewedID}),
|
|
|
|
),
|
|
|
|
),
|
|
|
|
)),
|
|
|
|
)
|
|
|
|
commented := builder.In("issue.id", builder.
|
|
|
|
Select("issue_id").
|
|
|
|
From("comment").
|
|
|
|
Where(builder.And(
|
|
|
|
builder.Eq{"poster_id": reviewedID},
|
|
|
|
builder.In("type", CommentTypeComment, CommentTypeCode, CommentTypeReview),
|
|
|
|
)),
|
|
|
|
)
|
2024-06-11 14:47:45 -04:00
|
|
|
sess.And(notPoster, builder.Or(reviewed, commented))
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
2024-06-11 14:47:45 -04:00
|
|
|
func applySubscribedCondition(sess *xorm.Session, subscriberID int64) {
|
|
|
|
sess.And(
|
2023-05-18 06:45:25 -04:00
|
|
|
builder.
|
|
|
|
NotIn("issue.id",
|
|
|
|
builder.Select("issue_id").
|
|
|
|
From("issue_watch").
|
|
|
|
Where(builder.Eq{"is_watching": false, "user_id": subscriberID}),
|
|
|
|
),
|
|
|
|
).And(
|
|
|
|
builder.Or(
|
|
|
|
builder.In("issue.id", builder.
|
|
|
|
Select("issue_id").
|
|
|
|
From("issue_watch").
|
|
|
|
Where(builder.Eq{"is_watching": true, "user_id": subscriberID}),
|
|
|
|
),
|
|
|
|
builder.In("issue.id", builder.
|
|
|
|
Select("issue_id").
|
|
|
|
From("comment").
|
|
|
|
Where(builder.Eq{"poster_id": subscriberID}),
|
|
|
|
),
|
|
|
|
builder.Eq{"issue.poster_id": subscriberID},
|
|
|
|
builder.In("issue.repo_id", builder.
|
|
|
|
Select("id").
|
|
|
|
From("watch").
|
|
|
|
Where(builder.And(builder.Eq{"user_id": subscriberID},
|
|
|
|
builder.In("mode", repo_model.WatchModeNormal, repo_model.WatchModeAuto))),
|
|
|
|
),
|
|
|
|
),
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Issues returns a list of issues by given conditions.
|
2023-08-07 15:26:40 -04:00
|
|
|
func Issues(ctx context.Context, opts *IssuesOptions) (IssueList, error) {
|
2023-05-18 06:45:25 -04:00
|
|
|
sess := db.GetEngine(ctx).
|
|
|
|
Join("INNER", "repository", "`issue`.repo_id = `repository`.id")
|
|
|
|
applyLimit(sess, opts)
|
|
|
|
applyConditions(sess, opts)
|
|
|
|
applySorts(sess, opts.SortType, opts.PriorityRepoID)
|
|
|
|
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
issues := IssueList{}
|
2023-05-18 06:45:25 -04:00
|
|
|
if err := sess.Find(&issues); err != nil {
|
|
|
|
return nil, fmt.Errorf("unable to query Issues: %w", err)
|
|
|
|
}
|
|
|
|
|
2023-07-22 10:14:27 -04:00
|
|
|
if err := issues.LoadAttributes(ctx); err != nil {
|
2023-05-18 06:45:25 -04:00
|
|
|
return nil, fmt.Errorf("unable to LoadAttributes for Issues: %w", err)
|
|
|
|
}
|
|
|
|
|
|
|
|
return issues, nil
|
|
|
|
}
|
|
|
|
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
// IssueIDs returns a list of issue ids by given conditions.
|
|
|
|
func IssueIDs(ctx context.Context, opts *IssuesOptions, otherConds ...builder.Cond) ([]int64, int64, error) {
|
|
|
|
sess := db.GetEngine(ctx).
|
|
|
|
Join("INNER", "repository", "`issue`.repo_id = `repository`.id")
|
|
|
|
applyConditions(sess, opts)
|
|
|
|
for _, cond := range otherConds {
|
|
|
|
sess.And(cond)
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
applyLimit(sess, opts)
|
|
|
|
applySorts(sess, opts.SortType, opts.PriorityRepoID)
|
|
|
|
|
|
|
|
var res []int64
|
|
|
|
total, err := sess.Select("`issue`.id").Table(&Issue{}).FindAndCount(&res)
|
2023-05-18 06:45:25 -04:00
|
|
|
if err != nil {
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
return nil, 0, err
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|
|
|
|
|
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012)
Fix #24662.
Replace #24822 and #25708 (although it has been merged)
## Background
In the past, Gitea supported issue searching with a keyword and
conditions in a less efficient way. It worked by searching for issues
with the keyword and obtaining limited IDs (as it is heavy to get all)
on the indexer (bleve/elasticsearch/meilisearch), and then querying with
conditions on the database to find a subset of the found IDs. This is
why the results could be incomplete.
To solve this issue, we need to store all fields that could be used as
conditions in the indexer and support both keyword and additional
conditions when searching with the indexer.
## Major changes
- Redefine `IndexerData` to include all fields that could be used as
filter conditions.
- Refactor `Search(ctx context.Context, kw string, repoIDs []int64,
limit, start int, state string)` to `Search(ctx context.Context, options
*SearchOptions)`, so it supports more conditions now.
- Change the data type stored in `issueIndexerQueue`. Use
`IndexerMetadata` instead of `IndexerData` in case the data has been
updated while it is in the queue. This also reduces the storage size of
the queue.
- Enhance searching with Bleve/Elasticsearch/Meilisearch, make them
fully support `SearchOptions`. Also, update the data versions.
- Keep most logic of database indexer, but remove
`issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is
the entry point to search issues.
- Start a Meilisearch instance to test it in unit tests.
- Add unit tests with almost full coverage to test
Bleve/Elasticsearch/Meilisearch indexer.
---------
Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
|
|
|
return res, total, nil
|
2023-05-18 06:45:25 -04:00
|
|
|
}
|