Imported tags are case-sensitive, creating duplicates

Steps to reproduce

  1. Import data with tags of varying case (e.g., from Workflowy), or:
  2. Create items with mix of #someTag vs. #SomeTag
  3. Observe how both now show up in auto-completion and tag pane, as if they are different things.

Expected result

Only to have the one instance of the tag show up (probably first), not to have duplicates.

Actual result

Many duplicate tags now show up in auto-completion and tag pane.

Environment

Desktop browser, not platform-specific.


Additional comments

For search, searching for #someTag also finds/highlights instances #SomeTag, so case sensitivity is not consistently enforced.

WF indexed tags case-insensitive, so I have a large mix of imported data. Cleaning it all up is difficult, thereā€™s no search/replace. Depending on place in a sentence, I have #Demo and #demo, for example, but itā€™s the same tag! Now I cannot click on a single tag to find all its instances, and auto-completion is cluttered.

I propose all tags are indexed lower-case, and their used display is the first instance found in the document (since tag list is updated all the time with changes anyway).

Hopefully no one should be relying on current behavior. Previous posts, such as this one, never mentioned relying on case sensitivity.

2 Likes

The problem is more complicated with tag pane. For example, if you have #someTag, #someTAG, #SOMETAG, and #sometag, which one should be shown in the tag pane?

The easiest and straightforward answer would to be always use all lowercase (thus ā€œ#sometagā€), but that hurts readability for many tags.

Or we could count all possible combinations of the same tag (case-insensitive), and try to decide which is the most common spelling, but from our last internal discussion on this, this adds too much complexity to our code that itā€™s not really worth the benefit (we could be wrong on this).

I know we cannot expect users to always create tags using the built-in autocomplete, but Iā€™m kinda curious why do these tags exist in the first place. If WorkFlowy enforced only one spelling, why are there duplicates? Were they typed in manually?

1 Like

Yes, they were typed manually in WF. Thatā€™s what I meant above by ā€œDepending on place in a sentenceā€. WF treated these as the same tag:

  • #CodeReview needs to be done
  • Feature X has passed #codeReview!

This is just an example, I think for my real data itā€™s actually more common with single-word tags, like #demo or #discovery.

The problem of which one to choose is one I acknowledged above. I thought about lower-casing all as well. I think separating how you store tags vs. display is definitely a part of that.

Looking for the most occurrence does sound very slow. That is why my suggestion is to use the first occurrence for display purpose. Iā€™m pretty sure this is how WF solved it, and I think itā€™s actually reset each time the doc is indexed. Since tags auto-completed from the current scope, you could have #Demo or #demo show up in the auto-complete depending on which occurred first; this seemed to work just fine.

But seriously, almost anything in DL that eliminated/combined the dupes would be fine with me as a solution. I know auto-completion is a standard response to data issues (e.g., with tag trailing punctuation), but DL supports import! If non-compliant data is not cleaned up somehow during import, or at least warned about, this leaves things in a non-ideal state. And supporting mixed case is a huge advantage in readability, which is why I see this issue.

Sounds reasonable in general.

Itā€™s possible to auto-complete tags from all documents though (itā€™s in the options), which would make it a tad more complicated.

Another thing: wouldnā€™t it annoy some users that they have no control over how the tag is displayed? They probably wonā€™t understand the first tag in the first document decides how it looks in the tag pane, and found out they canā€™t really do anything with it.

Or maybe they can just rename it to a consistent case? Do you think they would do that?

1 Like

@Erica, Iā€™m glad youā€™re thinking in-depth about this implementation. I think thereā€™re lots of ways to explode this feature to make it more complicated, such as allowing some preference to control display, allowing user to rename tags for custom display and storing that, etc.

But from looking at my (rather large) list of real tags, I would be 100% fine with them being all lowercase. None are longer than a few words, and this would not lose their readability. The huge win would be combining dupe ones. I obviously donā€™t know about othersā€™ data.

If you do go with the ā€œ1st usage sets displayā€ approach, clicking the tag already shows you its instances (well, kind of), so a user would see right away where the top hit came from.

Another easy approach I can think of is a hover tooltip over the tag that shows found variations. This is more informative without a lot of new UI. You could do this both for lowercase and 1st use approach. E.g., I hover over #codecoverage and it shows me that it found #CodeCoverage, #codeCoverage. These could even have their own sub-counts, so the user could decide which one should ā€œwinā€ if they do manual cleanup (though I wouldnā€™t care about this).

Just to echo what I said on another thread, a good guideline on hashtags is ā€œWhat would Twitter do?ā€ There, tags and mentions are unambiguously case-insensitive.

  • Linking to #sharesomegreatnews still finds the common tag (just to pick a random trending one)
  • Twitter is not confused when I mention @eriCA instead of @Erica, and will go to that person

Yeah I agree, we never thought tags should be case sensitive, it was somewhat from the laziness when implementing tag pane. We went through the thought process above and decided to just show them separately.

So yeah, they shouldā€™ve been case-insensitive if we had sorted out this issue back then.

Are you saying itā€™s too late now? It doesnā€™t seem so! And also, tag auto-completion seems an even more important area than tag pane for daily use.

No thatā€™s not what I meant, I meant that we knew tags should be case insensitive from the beginning (we use Twitter too and we know how it should work), but because we didnā€™t sort out the issue we discussed here back then, it wasnā€™t properly handled. Itā€™s an explanation for the status quo, if you want to think of it that way.

1 Like

Exact same issue Iā€™m still asking about (with no reply), here: New Tag Pane feature seems to have broken autocomplete capitalization of previous tags (Bug on ticket fixed, yet the same issue as here, was not).

Pls update - whatā€™s progress and position on this issue at Dynalist HQ?

EDIT: For clarity, hereā€™s the latest I added on other ticket:

Sorry for the late reply, we have been in the middle of an office move recently, and started replying to the piled up forum threads in the past few days. The order is a little random, and Iā€™m sorry it hasnā€™t gotten a reply yet.

Iā€™ll reply to the issue under your own post.

Pls keep it here/merge - this bug is the exact same issue (other was not)

ā€œother was notā€?

Would you prefer me close the other one as a duplicate?

Yes, detail here, you already closed other ticket.

1 Like

On top of the details listed, the search results donā€™t even return the Case-Sensitive tags, they return ALL case-insensitive tags. So, cleaning up the mess of tags that Iā€™ve needed to clean up for over a year now just got even more painful :frowning: .

Can we please get a confirmation - is anything happening with this?
If so, eta?

1 Like

Do you mind telling me what youā€™re trying to clean up here? Are you trying to merge ā€œ#msgā€ and ā€œ#Msgā€?

Thatā€™s right, along with many other tags.

It occurred to me that a case-sensitive find and replace would also resolve this issue.

As an aside, not a DL issue, any suggestions on deleting duplicates?

1 Like

Am I missing something, is there existing merge functionality?

Something like that ā€“ for tags under ā€œDocument tagsā€ (not for ā€œAll tagsā€ yet), you can right click, select ā€œRenameā€ and rename it to another case-sensitive tag.

I tried renaming ā€œ#Msgā€ to ā€œ#msgā€ and it seems to work.

Right, didnā€™t know, yet mine are over multiple lists.

That does cut down some time though, thanks :+1:

Any updates in other regards, perchance? :slightly_smiling_face: