Imported tags are case-sensitive, creating duplicates

Alex_Pasternak · November 18, 2017, 1:55am

Yes, they were typed manually in WF. That’s what I meant above by “Depending on place in a sentence”. WF treated these as the same tag:

#CodeReview needs to be done
Feature X has passed #codeReview!

This is just an example, I think for my real data it’s actually more common with single-word tags, like #demo or #discovery.

The problem of which one to choose is one I acknowledged above. I thought about lower-casing all as well. I think separating how you store tags vs. display is definitely a part of that.

Looking for the most occurrence does sound very slow. That is why my suggestion is to use the first occurrence for display purpose. I’m pretty sure this is how WF solved it, and I think it’s actually reset each time the doc is indexed. Since tags auto-completed from the current scope, you could have #Demo or #demo show up in the auto-complete depending on which occurred first; this seemed to work just fine.

But seriously, almost anything in DL that eliminated/combined the dupes would be fine with me as a solution. I know auto-completion is a standard response to data issues (e.g., with tag trailing punctuation), but DL supports import! If non-compliant data is not cleaned up somehow during import, or at least warned about, this leaves things in a non-ideal state. And supporting mixed case is a huge advantage in readability, which is why I see this issue.

Erica · November 19, 2017, 10:32pm

Sounds reasonable in general.

It’s possible to auto-complete tags from all documents though (it’s in the options), which would make it a tad more complicated.

Another thing: wouldn’t it annoy some users that they have no control over how the tag is displayed? They probably won’t understand the first tag in the first document decides how it looks in the tag pane, and found out they can’t really do anything with it.

Or maybe they can just rename it to a consistent case? Do you think they would do that?

Alex_Pasternak · November 20, 2017, 2:47pm

@Erica, I’m glad you’re thinking in-depth about this implementation. I think there’re lots of ways to explode this feature to make it more complicated, such as allowing some preference to control display, allowing user to rename tags for custom display and storing that, etc.

But from looking at my (rather large) list of real tags, I would be 100% fine with them being all lowercase. None are longer than a few words, and this would not lose their readability. The huge win would be combining dupe ones. I obviously don’t know about others’ data.

If you do go with the “1st usage sets display” approach, clicking the tag already shows you its instances (well, kind of), so a user would see right away where the top hit came from.

Another easy approach I can think of is a hover tooltip over the tag that shows found variations. This is more informative without a lot of new UI. You could do this both for lowercase and 1st use approach. E.g., I hover over #codecoverage and it shows me that it found #CodeCoverage, #codeCoverage. These could even have their own sub-counts, so the user could decide which one should “win” if they do manual cleanup (though I wouldn’t care about this).

Alex_Pasternak · November 22, 2017, 5:37pm

Just to echo what I said on another thread, a good guideline on hashtags is “What would Twitter do?” There, tags and mentions are unambiguously case-insensitive.

Linking to #sharesomegreatnews still finds the common tag (just to pick a random trending one)
Twitter is not confused when I mention @eriCA instead of @Erica, and will go to that person

Erica · November 26, 2017, 2:45pm

Yeah I agree, we never thought tags should be case sensitive, it was somewhat from the laziness when implementing tag pane. We went through the thought process above and decided to just show them separately.

So yeah, they should’ve been case-insensitive if we had sorted out this issue back then.

Alex_Pasternak · November 26, 2017, 5:01pm

Are you saying it’s too late now? It doesn’t seem so! And also, tag auto-completion seems an even more important area than tag pane for daily use.

Erica · November 27, 2017, 6:28am

No that’s not what I meant, I meant that we knew tags should be case insensitive from the beginning (we use Twitter too and we know how it should work), but because we didn’t sort out the issue we discussed here back then, it wasn’t properly handled. It’s an explanation for the status quo, if you want to think of it that way.

Morgan_Newall · August 8, 2018, 12:07pm

Exact same issue I’m still asking about (with no reply), here: New Tag Pane feature seems to have broken autocomplete capitalization of previous tags (Bug on ticket fixed, yet the same issue as here, was not).

Pls update - what’s progress and position on this issue at Dynalist HQ?

EDIT: For clarity, here’s the latest I added on other ticket:

Erica · August 10, 2018, 6:10pm

Sorry for the late reply, we have been in the middle of an office move recently, and started replying to the piled up forum threads in the past few days. The order is a little random, and I’m sorry it hasn’t gotten a reply yet.

I’ll reply to the issue under your own post.

Morgan_Newall · August 10, 2018, 7:34pm

Pls keep it here/merge - this bug is the exact same issue (other was not)

Erica · August 10, 2018, 7:40pm

“other was not”?

Would you prefer me close the other one as a duplicate?

Morgan_Newall · August 11, 2018, 4:40am

Yes, detail here, you already closed other ticket.

Morgan_Newall · August 15, 2018, 4:30pm

On top of the details listed, the search results don’t even return the Case-Sensitive tags, they return ALL case-insensitive tags. So, cleaning up the mess of tags that I’ve needed to clean up for over a year now just got even more painful .

Can we please get a confirmation - is anything happening with this?
If so, eta?

Erica · August 16, 2018, 3:26am

Do you mind telling me what you’re trying to clean up here? Are you trying to merge “#msg” and “#Msg”?

Morgan_Newall · August 16, 2018, 7:03am

That’s right, along with many other tags.

It occurred to me that a case-sensitive find and replace would also resolve this issue.

As an aside, not a DL issue, any suggestions on deleting duplicates?

Morgan_Newall · August 16, 2018, 1:43pm

Am I missing something, is there existing merge functionality?

Erica · August 16, 2018, 2:47pm

Something like that – for tags under “Document tags” (not for “All tags” yet), you can right click, select “Rename” and rename it to another case-sensitive tag.

I tried renaming “#Msg” to “#msg” and it seems to work.

Morgan_Newall · August 16, 2018, 3:00pm

Right, didn’t know, yet mine are over multiple lists.

That does cut down some time though, thanks

Any updates in other regards, perchance?

David_Eiffert · December 28, 2019, 5:06am

Hi there! New pro user here. Any update on this? I tag keywords inside sentences, and sometimes the tagged word is at the beginning of the sentence so it’s capitalized - “Tag” vs “tag”.
I know I can rename the tag to make them all lower case, but then those sentences will begin with a lower case letter, which is problematic when exporting.

Shida · December 28, 2019, 4:28pm

I believe the biggest difficulty we encountered in the past (if you read up the discussion) is that we’re unsure how to display the merged tags.

For example, if you have #someTag, #someTAG, #SOMETAG, and #sometag, which one should be shown in the tag pane?