-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Improving search #325
Comments
I agree with your suggestions on this, however there's already several open issues and PRs pertaining to these features. |
Better written issue comes late though, as it often happens... There were some discussions in Discord as well. Maybe some of that needs to be summarized in GitHub somehow. I don't like the accidental nature of PRs for such a fundamental feature, without settling on the design first. (And I'm not speaking about UI.) I'd probably go as far as suggesting to make a (E)BNF description of the search syntax before attempting to implement it. And there are other considerations as well, such as familiarity to random user, syntax extensibility and compatibility with possible UI helpers. |
yes
I was actually thinking about this - in all honestly, I might make a PR of a markdown file detailing search syntax. Though, I still have not used TagStudio enough to be confident in it. If we want to do this, some things I would like is a general consensus on reserved characters and such, though we could just update it before merging it. |
perhaps something like this : Search syntaxThis section describes the (planned) search syntax used in TagStudio General structureSearches are parsed from the inner-most group outward, then left to right. Eg, Boolean operatorsBy default, search terms are in the AND mode. For example,
Searching fieldsBy default, search terms apply to tags. There are a few special exceptions:
Common field attributes:Fields have attributes too - searching them is commonly done in these ways. Note that
Tag/field rulesGenerally, tags with characters that are used for search (spaces,
|
Very nicely written @mm12. I have just one big problem with your suggestion.
A lot of the suggested forbidden characters are super useful and popular in tag names and shouldn't have to be escaped. Some notable tag examples from Danbooru: The only restrictions I support are the following:
And if we want to allow wildcards in tag searches:
Also, can you please clarify what you meant when you said this?
And I don't understand what you meant when you said this:
|
Good point. To clarify, I didn't mean they should always be escaped, just that they often will, in some contexts. Emote tags (
All the functionality that the current tag system supports can be described by the the spec (from my understanding). In the case where the system is upgraded,
I will edit the comment to clarify these. |
…StudioDev#272 and TagStudioDev#325) Adds ability to check the existence of fields of any type using the following syntax: ```has_<field>``` ```has_<field>:<True|False>``` Adds the ability to search the content of text_line and text_box fields using the following syntax: ```<field>:<text>``` (Replace whitespace with underscore _ in `text`) Updated test_search.py for new behavior.
Hey @mm12, I just got done implementing some field search syntax in my PR #310, and I was hoping to get your input on it. I also have boolean operators implemented pretty much identically to your suggestions from previous commits, except all tags and parentheses need trailing whitespace in my syntax. I haven't implemented quotation marks, wildcards, or regular expressions in my syntax. I notice that you still say that unmatched parentheses and colons without spaces should be escaped in your current spec. In order to try to understand your reasoning, my current PR ignores these restrictions. If you clone my repository, try searching for |
Though, what I am looking to do is make this application scalable in terms of entry count and search complexity. To address this, I have 2 suggestions that can be a starting point here:
|
Planned Features > Database Migration
Good point. I've no experience with them. If they come with own query language or the query language needs to be adapted somehow - it will be good to think about it now. |
Preserving relevant discussion from Discord. Start of the discussion: https://discord.com/channels/1229183630228848661/1229309667528806420/1260037269297827931 THEHWIZ — 07/09/2024 4:58 AM
Sam L. — 07/09/2024 5:17 AM (reply to THEHWIZ)
CyanVoxel — 07/09/2024 5:49 AM
Killy — 07/09/2024 1:44 PM
Killy — 07/09/2024 1:55 PM
Killy — 07/09/2024 2:08 PM
Yoylo — 07/09/2024 2:56 PM
Killy — 07/09/2024 4:00 PM
gawi. — 07/09/2024 5:14 PM
Killy — 07/09/2024 6:41 PM
gawi. — 07/09/2024 11:04 PM
Sam L. — 07/10/2024 2:24 AM (reply to Killy)
Sam L. — 07/10/2024 2:28 AM (reply to gawi)
Killy — 07/10/2024 2:35 AM (reply to Sam L)
Gherkin — 07/10/2024 2:39 AM
Killy — 07/10/2024 2:40 AM
Gherkin — 07/10/2024 2:40 AM
Sam L. — 07/10/2024 2:41 AM (reply to Killy)
Killy — 07/10/2024 2:41 AM
Sam L. — 07/10/2024 2:42 AM (reply to Killy)
Killy — 07/10/2024 2:43 AM (reply to Sam L)
Gherkin — 07/10/2024 2:43 AM
NiX — 07/10/2024 2:44 AM (reply to Gherkin)
Sam L. — 07/10/2024 2:46 AM (reply to Gherkin)
NiX — 07/10/2024 3:10 AM (reply to Sam L)
Killy — 07/10/2024 3:12 AM
Gherkin — 07/10/2024 3:29 AM
NiX — 07/10/2024 3:33 AM (reply to Gherkin)
Sam L. — 07/10/2024 3:33 AM (reply to Killy)
NiX — 07/10/2024 3:34 AM (reply to NiX)
NiX — 07/10/2024 3:36 AM (reply to Sam L)
Gherkin — 07/10/2024 3:37 AM (reply to Sam L)
Killy — 07/10/2024 3:39 AM (reply to Sam L)
Sam L. — 07/10/2024 3:39 AM (reply to Gherkin)
NiX — 07/10/2024 3:42 AM (reply to Sam L)
Sam L. — 07/10/2024 3:42 AM (reply to Killy)
NiX — 07/10/2024 3:45 AM (reply to Sam L)
Killy — 07/10/2024 3:47 AM (reply to NiX)
Sam L. — 07/10/2024 3:48 AM (reply to Killy)
NiX — 07/10/2024 3:48 AM (reply to Killy)
Sam L. — 07/10/2024 3:49 AM
NiX — 07/10/2024 3:52 AM
Sam L. — 07/10/2024 3:53 AM (reply to NiX)
Killy — 07/10/2024 3:57 AM (reply to NiX)
NiX — 07/10/2024 3:58 AM (reply to Killy)
Killy — 07/10/2024 4:00 AM
NiX — 07/10/2024 4:02 AM
Killy — 07/10/2024 4:03 AM
Killy — 07/10/2024 4:49 AM
Killy — 07/10/2024 5:38 AM
Sam L. — 07/10/2024 6:56 AM (reply to Killy)
Killy — 07/10/2024 7:08 AM
Sam L. — 07/10/2024 7:53 AM (reply to Killy)
K — 07/10/2024 2:47 PM (reply to Killy)
Killy — 07/10/2024 2:49 PM (reply to Sam L)
Killy — 07/10/2024 3:19 PM
gawi. — 07/10/2024 6:15 PM
Killy — 07/10/2024 7:49 PM
Gherkin — 07/10/2024 10:55 PM
mister — 07/11/2024 4:01 PM
Killy — 07/11/2024 4:58 PM
Killy — 07/11/2024 5:25 PM
|
In #314 (comment) I'm contemplating about support for Set Theory operations in the search query. Mathematically, a set containing A and B is written as As such, operator-less |
I have been thinking a lot about how tags, fields, and field contents are identified. Currently, @mm12's suggestion has quotation marks used to facilitate a more literal representation of tag and field identifiers:
And @mm12's suggestion has a regular expression option for matching field contents:
My suggestion for string matching is this: Use CasesTag identifiers, field identifiers, and field content would all use the exact same text matching system, except that field content would match possible substrings rather than needing to match the whole string like in the other two cases. DelimitersSurrounding an expression with a delimiter would allow users to include whitespace while giving an indicator for the syntax. Multiple delimiter options gives users a way to avoid unnecessary escaping.
Different Syntaxes
EscapingEscaping using backslash is actually kind of horrifying in this context, because there are three scenarios we would need to simultaneously accommodate with our system:
If we wanted to escape every backslash purely for the sake of simplifying the first two cases, then we are creating backslash hell for anyone who wants to use backslashes in regex syntax. And if we wanted to "pass through" backslashes to our syntax, we would have to do so selectively in order to accommodate the first two cases. Then we would suddenly have very different rules for strings of backslashes followed by delimiters compared to strings of backslashes standing on their own. For these reasons, I prefer a "padding" approach. The only rules the user needs to know are that all pairs of delimiter characters are reduced to a single character, and that if they don't escape a delimiter character, then it may end up being interpreted as the end of the string. (Specifically if it is followed by whitespace, the end of the search query, or, when not matching field content, a colon Has FieldI'm doing away with the |
Checklist
Description
I know this is somewhat on the roadmap, but I thought I would share some specifics of how search should be improved. It is very important that work on search functionality starts early, to make sure a system is developed in a way that supports the future implementation of features (ie, we dont want to be in a situation where implementing a standard search feature would require major changes)
Solution
End goal: A fully featured search system. This could make use of Elastic and/or Opensearch. Desirable qualities:
filename:<query>
- this should able to be used with any given type of field. Of course, by default, it is assumed to be a tag.a happy new year
it becomesa_happy_new_year
, but the underscores do not get shown to the user. This means that spaces and underscores are effectively the same."a happy new year"
- however, this means that items with a quote in them will need to be escaped: a tag named"wow thats cool"
would need to be searched with something like"\"wow thats cool\""
instead.Alternatives
No response
The text was updated successfully, but these errors were encountered: