1
0
Fork 0
mirror of https://codeberg.org/forgejo/forgejo.git synced 2024-12-30 14:09:42 -05:00
forgejo/models/issues/issue_search.go

490 lines
16 KiB
Go
Raw Normal View History

// Copyright 2023 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package issues
import (
"context"
"fmt"
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
"strconv"
"strings"
"code.gitea.io/gitea/models/db"
"code.gitea.io/gitea/models/organization"
repo_model "code.gitea.io/gitea/models/repo"
"code.gitea.io/gitea/models/unit"
user_model "code.gitea.io/gitea/models/user"
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
"code.gitea.io/gitea/modules/container"
"code.gitea.io/gitea/modules/optional"
"xorm.io/builder"
"xorm.io/xorm"
)
// IssuesOptions represents options of an issue.
type IssuesOptions struct { //nolint
Paginator *db.ListOptions
RepoIDs []int64 // overwrites RepoCond if the length is not 0
Include public repos in doer's dashboard for issue search (#28304) It will fix #28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> ## :warning: BREAKING :warning: But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix #28055 . ## Backgroud When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by #26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes #28268. ## How to fix it It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before #26012, it was being done in the DB, but it caused the results to be incomplete (see the description of #26012). And to keep this, #26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 ## Give up unnecessary features We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. ## TODO I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
2023-12-07 00:26:18 -05:00
AllPublic bool // include also all public repositories
RepoCond builder.Cond
AssigneeID int64
PosterID int64
MentionedID int64
ReviewRequestedID int64
ReviewedID int64
SubscriberID int64
MilestoneIDs []int64
ProjectID int64
ProjectColumnID int64
IsClosed optional.Option[bool]
IsPull optional.Option[bool]
LabelIDs []int64
IncludedLabelNames []string
ExcludedLabelNames []string
IncludeMilestones []string
SortType string
IssueIDs []int64
UpdatedAfterUnix int64
UpdatedBeforeUnix int64
// prioritize issues from this repo
PriorityRepoID int64
IsArchived optional.Option[bool]
Org *organization.Organization // issues permission scope
Team *organization.Team // issues permission scope
User *user_model.User // issues permission scope
}
// applySorts sort an issues-related session based on the provided
// sortType string
func applySorts(sess *xorm.Session, sortType string, priorityRepoID int64) {
switch sortType {
case "oldest":
sess.Asc("issue.created_unix").Asc("issue.id")
case "recentupdate":
sess.Desc("issue.updated_unix").Desc("issue.created_unix").Desc("issue.id")
case "leastupdate":
sess.Asc("issue.updated_unix").Asc("issue.created_unix").Asc("issue.id")
case "mostcomment":
sess.Desc("issue.num_comments").Desc("issue.created_unix").Desc("issue.id")
case "leastcomment":
sess.Asc("issue.num_comments").Desc("issue.created_unix").Desc("issue.id")
case "priority":
sess.Desc("issue.priority").Desc("issue.created_unix").Desc("issue.id")
case "nearduedate":
// 253370764800 is 01/01/9999 @ 12:00am (UTC)
sess.Join("LEFT", "milestone", "issue.milestone_id = milestone.id").
OrderBy("CASE " +
"WHEN issue.deadline_unix = 0 AND (milestone.deadline_unix = 0 OR milestone.deadline_unix IS NULL) THEN 253370764800 " +
"WHEN milestone.deadline_unix = 0 OR milestone.deadline_unix IS NULL THEN issue.deadline_unix " +
"WHEN milestone.deadline_unix < issue.deadline_unix OR issue.deadline_unix = 0 THEN milestone.deadline_unix " +
"ELSE issue.deadline_unix END ASC").
Desc("issue.created_unix").
Desc("issue.id")
case "farduedate":
sess.Join("LEFT", "milestone", "issue.milestone_id = milestone.id").
OrderBy("CASE " +
"WHEN milestone.deadline_unix IS NULL THEN issue.deadline_unix " +
"WHEN milestone.deadline_unix < issue.deadline_unix OR issue.deadline_unix = 0 THEN milestone.deadline_unix " +
"ELSE issue.deadline_unix END DESC").
Desc("issue.created_unix").
Desc("issue.id")
case "priorityrepo":
sess.OrderBy("CASE "+
"WHEN issue.repo_id = ? THEN 1 "+
"ELSE 2 END ASC", priorityRepoID).
Desc("issue.created_unix").
Desc("issue.id")
case "project-column-sorting":
sess.Asc("project_issue.sorting").Desc("issue.created_unix").Desc("issue.id")
default:
sess.Desc("issue.created_unix").Desc("issue.id")
}
}
func applyLimit(sess *xorm.Session, opts *IssuesOptions) {
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
if opts.Paginator == nil || opts.Paginator.IsListAll() {
return
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
}
start := 0
if opts.Paginator.Page > 1 {
start = (opts.Paginator.Page - 1) * opts.Paginator.PageSize
}
sess.Limit(opts.Paginator.PageSize, start)
}
func applyLabelsCondition(sess *xorm.Session, opts *IssuesOptions) {
if len(opts.LabelIDs) > 0 {
if opts.LabelIDs[0] == 0 {
sess.Where("issue.id NOT IN (SELECT issue_id FROM issue_label)")
} else {
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
// deduplicate the label IDs for inclusion and exclusion
includedLabelIDs := make(container.Set[int64])
excludedLabelIDs := make(container.Set[int64])
for _, labelID := range opts.LabelIDs {
if labelID > 0 {
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
includedLabelIDs.Add(labelID)
} else if labelID < 0 { // 0 is not supported here, so just ignore it
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
excludedLabelIDs.Add(-labelID)
}
}
Optimization of labels handling in issue_search (#4228) This PR optimizes the SQL query and de-duplicate the labels' ids when generating the query string, on the issue page. <hr/> ### Background Some time ago, BingBot and some other crawlers have been putting my instance on its knees with requests containing a lot of label ids, like this one : ``` [07/Aug/2023:11:28:37 +0200] "GET /Dolibarr/sendrecurringinvoicebymail/issues?q=&type=all&sort=&state=closed&labels=1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c10%2c2%2c1%2c1%2c10%2c10%2c7%2c6%2c10%2c10%2c3%2c2%2c1%2c5%2c10%2c1%2c6%2c2%2c7%2c3%2c7%2c6%2c10%2c1%2c10%2c1%2c1%2c7%2c7%2c1%2c1%2c1%2c1%2c10%2c10%2c1%2c2%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c1%2c2%2c1%2c12%2c6%2c6%2c10&milestone=0&project=-1&poster=0 HTTP/1.1" 499 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36" ``` Since each of the label ids implies a join, it grows exponentially expensive for the database engine (at least on PostgreSQL but SQLite suffers a little too). Thus, this PR proposes two enhancements: * rewrite the database query to use only one squashed condition, * deduplicate the label ids when generating the URL. ### Performance comparison Here are some timings on Postgresql-backed, Forgejo 7.0.4 instances : ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m10,491s user 0m0,017s sys 0m0,008s ``` ...and with the patch: ```sh $ time curl -s -o /dev/null "http://localhost:3000/toto/tata/issues?q=&type=all&sort=&labels=19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25%2c19%2c25&state=open&milestone=0&project=0&assignee=0&poster=0" real 0m0,094s user 0m0,012s sys 0m0,013s ``` ### Annex This issue was originally proposed to [Gitea](https://github.com/go-gitea/gitea/pull/26460) but didn't get much attention, and I switched to Forgejo in the meantime :) Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/4228 Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org> Co-authored-by: Chl <chl@xlii.si> Co-committed-by: Chl <chl@xlii.si>
2024-06-28 01:11:57 -04:00
// ... and use them in a subquery of the form :
// where (select count(*) from issue_label where issue_id=issue.id and label_id in (2, 4, 6)) = 3
// This equality is guaranteed thanks to unique index (issue_id,label_id) on table issue_label.
if len(includedLabelIDs) > 0 {
subQuery := builder.Select("count(*)").From("issue_label").Where(builder.Expr("issue_id = issue.id")).
And(builder.In("label_id", includedLabelIDs.Values()))
sess.Where(builder.Eq{strconv.Itoa(len(includedLabelIDs)): subQuery})
}
// or (select count(*)...) = 0 for excluded labels
if len(excludedLabelIDs) > 0 {
subQuery := builder.Select("count(*)").From("issue_label").Where(builder.Expr("issue_id = issue.id")).
And(builder.In("label_id", excludedLabelIDs.Values()))
sess.Where(builder.Eq{"0": subQuery})
}
}
}
if len(opts.IncludedLabelNames) > 0 {
sess.In("issue.id", BuildLabelNamesIssueIDsCondition(opts.IncludedLabelNames))
}
if len(opts.ExcludedLabelNames) > 0 {
sess.And(builder.NotIn("issue.id", BuildLabelNamesIssueIDsCondition(opts.ExcludedLabelNames)))
}
}
func applyMilestoneCondition(sess *xorm.Session, opts *IssuesOptions) {
if len(opts.MilestoneIDs) == 1 && opts.MilestoneIDs[0] == db.NoConditionID {
sess.And("issue.milestone_id = 0")
} else if len(opts.MilestoneIDs) > 0 {
sess.In("issue.milestone_id", opts.MilestoneIDs)
}
if len(opts.IncludeMilestones) > 0 {
sess.In("issue.milestone_id",
builder.Select("id").
From("milestone").
Where(builder.In("name", opts.IncludeMilestones)))
}
}
func applyProjectCondition(sess *xorm.Session, opts *IssuesOptions) {
if opts.ProjectID > 0 { // specific project
sess.Join("INNER", "project_issue", "issue.id = project_issue.issue_id").
And("project_issue.project_id=?", opts.ProjectID)
} else if opts.ProjectID == db.NoConditionID { // show those that are in no project
sess.And(builder.NotIn("issue.id", builder.Select("issue_id").From("project_issue").And(builder.Neq{"project_id": 0})))
}
// opts.ProjectID == 0 means all projects,
// do not need to apply any condition
}
func applyProjectColumnCondition(sess *xorm.Session, opts *IssuesOptions) {
// opts.ProjectColumnID == 0 means all project columns,
// do not need to apply any condition
if opts.ProjectColumnID > 0 {
sess.In("issue.id", builder.Select("issue_id").From("project_issue").Where(builder.Eq{"project_board_id": opts.ProjectColumnID}))
} else if opts.ProjectColumnID == db.NoConditionID {
sess.In("issue.id", builder.Select("issue_id").From("project_issue").Where(builder.Eq{"project_board_id": 0}))
}
}
func applyRepoConditions(sess *xorm.Session, opts *IssuesOptions) {
if len(opts.RepoIDs) == 1 {
opts.RepoCond = builder.Eq{"issue.repo_id": opts.RepoIDs[0]}
} else if len(opts.RepoIDs) > 1 {
opts.RepoCond = builder.In("issue.repo_id", opts.RepoIDs)
}
Include public repos in doer's dashboard for issue search (#28304) It will fix #28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> ## :warning: BREAKING :warning: But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix #28055 . ## Backgroud When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by #26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes #28268. ## How to fix it It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before #26012, it was being done in the DB, but it caused the results to be incomplete (see the description of #26012). And to keep this, #26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 ## Give up unnecessary features We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. ## TODO I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
2023-12-07 00:26:18 -05:00
if opts.AllPublic {
if opts.RepoCond == nil {
opts.RepoCond = builder.NewCond()
}
opts.RepoCond = opts.RepoCond.Or(builder.In("issue.repo_id", builder.Select("id").From("repository").Where(builder.Eq{"is_private": false})))
}
if opts.RepoCond != nil {
sess.And(opts.RepoCond)
}
}
func applyConditions(sess *xorm.Session, opts *IssuesOptions) {
if len(opts.IssueIDs) > 0 {
sess.In("issue.id", opts.IssueIDs)
}
applyRepoConditions(sess, opts)
if opts.IsClosed.Has() {
sess.And("issue.is_closed=?", opts.IsClosed.Value())
}
if opts.AssigneeID > 0 {
applyAssigneeCondition(sess, opts.AssigneeID)
} else if opts.AssigneeID == db.NoConditionID {
sess.Where("issue.id NOT IN (SELECT issue_id FROM issue_assignees)")
}
if opts.PosterID > 0 {
applyPosterCondition(sess, opts.PosterID)
}
if opts.MentionedID > 0 {
applyMentionedCondition(sess, opts.MentionedID)
}
if opts.ReviewRequestedID > 0 {
applyReviewRequestedCondition(sess, opts.ReviewRequestedID)
}
if opts.ReviewedID > 0 {
applyReviewedCondition(sess, opts.ReviewedID)
}
if opts.SubscriberID > 0 {
applySubscribedCondition(sess, opts.SubscriberID)
}
applyMilestoneCondition(sess, opts)
if opts.UpdatedAfterUnix != 0 {
sess.And(builder.Gte{"issue.updated_unix": opts.UpdatedAfterUnix})
}
if opts.UpdatedBeforeUnix != 0 {
sess.And(builder.Lte{"issue.updated_unix": opts.UpdatedBeforeUnix})
}
applyProjectCondition(sess, opts)
applyProjectColumnCondition(sess, opts)
if opts.IsPull.Has() {
sess.And("issue.is_pull=?", opts.IsPull.Value())
}
if opts.IsArchived.Has() {
sess.And(builder.Eq{"repository.is_archived": opts.IsArchived.Value()})
}
applyLabelsCondition(sess, opts)
if opts.User != nil {
sess.And(issuePullAccessibleRepoCond("issue.repo_id", opts.User.ID, opts.Org, opts.Team, opts.IsPull.Value()))
}
}
// teamUnitsRepoCond returns query condition for those repo id in the special org team with special units access
func teamUnitsRepoCond(id string, userID, orgID, teamID int64, units ...unit.Type) builder.Cond {
return builder.In(id,
builder.Select("repo_id").From("team_repo").Where(
builder.Eq{
"team_id": teamID,
}.And(
builder.Or(
// Check if the user is member of the team.
builder.In(
"team_id", builder.Select("team_id").From("team_user").Where(
builder.Eq{
"uid": userID,
},
),
),
// Check if the user is in the owner team of the organisation.
builder.Exists(builder.Select("team_id").From("team_user").
Where(builder.Eq{
"org_id": orgID,
"team_id": builder.Select("id").From("team").Where(
builder.Eq{
"org_id": orgID,
"lower_name": strings.ToLower(organization.OwnerTeamName),
}),
"uid": userID,
}),
),
)).And(
builder.In(
"team_id", builder.Select("team_id").From("team_unit").Where(
builder.Eq{
"`team_unit`.org_id": orgID,
}.And(
builder.In("`team_unit`.type", units),
),
),
),
),
))
}
// issuePullAccessibleRepoCond userID must not be zero, this condition require join repository table
func issuePullAccessibleRepoCond(repoIDstr string, userID int64, org *organization.Organization, team *organization.Team, isPull bool) builder.Cond {
cond := builder.NewCond()
unitType := unit.TypeIssues
if isPull {
unitType = unit.TypePullRequests
}
if org != nil {
if team != nil {
cond = cond.And(teamUnitsRepoCond(repoIDstr, userID, org.ID, team.ID, unitType)) // special team member repos
} else {
cond = cond.And(
builder.Or(
repo_model.UserOrgUnitRepoCond(repoIDstr, userID, org.ID, unitType), // team member repos
repo_model.UserOrgPublicUnitRepoCond(userID, org.ID), // user org public non-member repos, TODO: check repo has issues
),
)
}
} else {
cond = cond.And(
builder.Or(
repo_model.UserOwnedRepoCond(userID), // owned repos
repo_model.UserAccessRepoCond(repoIDstr, userID), // user can access repo in a unit independent way
repo_model.UserAssignedRepoCond(repoIDstr, userID), // user has been assigned accessible public repos
repo_model.UserMentionedRepoCond(repoIDstr, userID), // user has been mentioned accessible public repos
repo_model.UserCreateIssueRepoCond(repoIDstr, userID, isPull), // user has created issue/pr accessible public repos
),
)
}
return cond
}
func applyAssigneeCondition(sess *xorm.Session, assigneeID int64) {
sess.Join("INNER", "issue_assignees", "issue.id = issue_assignees.issue_id").
And("issue_assignees.assignee_id = ?", assigneeID)
}
func applyPosterCondition(sess *xorm.Session, posterID int64) {
sess.And("issue.poster_id=?", posterID)
}
func applyMentionedCondition(sess *xorm.Session, mentionedID int64) {
sess.Join("INNER", "issue_user", "issue.id = issue_user.issue_id").
And("issue_user.is_mentioned = ?", true).
And("issue_user.uid = ?", mentionedID)
}
func applyReviewRequestedCondition(sess *xorm.Session, reviewRequestedID int64) {
existInTeamQuery := builder.Select("team_user.team_id").
From("team_user").
Where(builder.Eq{"team_user.uid": reviewRequestedID})
// if the review is approved or rejected, it should not be shown in the review requested list
maxReview := builder.Select("MAX(r.id)").
From("review as r").
Where(builder.In("r.type", []ReviewType{ReviewTypeApprove, ReviewTypeReject, ReviewTypeRequest})).
GroupBy("r.issue_id, r.reviewer_id, r.reviewer_team_id")
subQuery := builder.Select("review.issue_id").
From("review").
Where(builder.And(
builder.Eq{"review.type": ReviewTypeRequest},
builder.Or(
builder.Eq{"review.reviewer_id": reviewRequestedID},
builder.In("review.reviewer_team_id", existInTeamQuery),
),
builder.In("review.id", maxReview),
))
sess.Where("issue.poster_id <> ?", reviewRequestedID).
And(builder.In("issue.id", subQuery))
}
func applyReviewedCondition(sess *xorm.Session, reviewedID int64) {
// Query for pull requests where you are a reviewer or commenter, excluding
// any pull requests already returned by the review requested filter.
notPoster := builder.Neq{"issue.poster_id": reviewedID}
reviewed := builder.In("issue.id", builder.
Select("issue_id").
From("review").
Where(builder.And(
builder.Neq{"type": ReviewTypeRequest},
builder.Or(
builder.Eq{"reviewer_id": reviewedID},
builder.In("reviewer_team_id", builder.
Select("team_id").
From("team_user").
Where(builder.Eq{"uid": reviewedID}),
),
),
)),
)
commented := builder.In("issue.id", builder.
Select("issue_id").
From("comment").
Where(builder.And(
builder.Eq{"poster_id": reviewedID},
builder.In("type", CommentTypeComment, CommentTypeCode, CommentTypeReview),
)),
)
sess.And(notPoster, builder.Or(reviewed, commented))
}
func applySubscribedCondition(sess *xorm.Session, subscriberID int64) {
sess.And(
builder.
NotIn("issue.id",
builder.Select("issue_id").
From("issue_watch").
Where(builder.Eq{"is_watching": false, "user_id": subscriberID}),
),
).And(
builder.Or(
builder.In("issue.id", builder.
Select("issue_id").
From("issue_watch").
Where(builder.Eq{"is_watching": true, "user_id": subscriberID}),
),
builder.In("issue.id", builder.
Select("issue_id").
From("comment").
Where(builder.Eq{"poster_id": subscriberID}),
),
builder.Eq{"issue.poster_id": subscriberID},
builder.In("issue.repo_id", builder.
Select("id").
From("watch").
Where(builder.And(builder.Eq{"user_id": subscriberID},
builder.In("mode", repo_model.WatchModeNormal, repo_model.WatchModeAuto))),
),
),
)
}
// Issues returns a list of issues by given conditions.
func Issues(ctx context.Context, opts *IssuesOptions) (IssueList, error) {
sess := db.GetEngine(ctx).
Join("INNER", "repository", "`issue`.repo_id = `repository`.id")
applyLimit(sess, opts)
applyConditions(sess, opts)
applySorts(sess, opts.SortType, opts.PriorityRepoID)
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
issues := IssueList{}
if err := sess.Find(&issues); err != nil {
return nil, fmt.Errorf("unable to query Issues: %w", err)
}
if err := issues.LoadAttributes(ctx); err != nil {
return nil, fmt.Errorf("unable to LoadAttributes for Issues: %w", err)
}
return issues, nil
}
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
// IssueIDs returns a list of issue ids by given conditions.
func IssueIDs(ctx context.Context, opts *IssuesOptions, otherConds ...builder.Cond) ([]int64, int64, error) {
sess := db.GetEngine(ctx).
Join("INNER", "repository", "`issue`.repo_id = `repository`.id")
applyConditions(sess, opts)
for _, cond := range otherConds {
sess.And(cond)
}
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
applyLimit(sess, opts)
applySorts(sess, opts.SortType, opts.PriorityRepoID)
var res []int64
total, err := sess.Select("`issue`.id").Table(&Issue{}).FindAndCount(&res)
if err != nil {
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
return nil, 0, err
}
Refactor and enhance issue indexer to support both searching, filtering and paging (#26012) Fix #24662. Replace #24822 and #25708 (although it has been merged) ## Background In the past, Gitea supported issue searching with a keyword and conditions in a less efficient way. It worked by searching for issues with the keyword and obtaining limited IDs (as it is heavy to get all) on the indexer (bleve/elasticsearch/meilisearch), and then querying with conditions on the database to find a subset of the found IDs. This is why the results could be incomplete. To solve this issue, we need to store all fields that could be used as conditions in the indexer and support both keyword and additional conditions when searching with the indexer. ## Major changes - Redefine `IndexerData` to include all fields that could be used as filter conditions. - Refactor `Search(ctx context.Context, kw string, repoIDs []int64, limit, start int, state string)` to `Search(ctx context.Context, options *SearchOptions)`, so it supports more conditions now. - Change the data type stored in `issueIndexerQueue`. Use `IndexerMetadata` instead of `IndexerData` in case the data has been updated while it is in the queue. This also reduces the storage size of the queue. - Enhance searching with Bleve/Elasticsearch/Meilisearch, make them fully support `SearchOptions`. Also, update the data versions. - Keep most logic of database indexer, but remove `issues.SearchIssueIDsByKeyword` in `models` to avoid confusion where is the entry point to search issues. - Start a Meilisearch instance to test it in unit tests. - Add unit tests with almost full coverage to test Bleve/Elasticsearch/Meilisearch indexer. --------- Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
2023-07-31 02:28:53 -04:00
return res, total, nil
}