Yes, they were typed manually in WF. Thatās what I meant above by āDepending on place in a sentenceā. WF treated these as the same tag:
- #CodeReview needs to be done
- Feature X has passed #codeReview!
This is just an example, I think for my real data itās actually more common with single-word tags, like #demo or #discovery.
The problem of which one to choose is one I acknowledged above. I thought about lower-casing all as well. I think separating how you store tags vs. display is definitely a part of that.
Looking for the most occurrence does sound very slow. That is why my suggestion is to use the first occurrence for display purpose. Iām pretty sure this is how WF solved it, and I think itās actually reset each time the doc is indexed. Since tags auto-completed from the current scope, you could have #Demo or #demo show up in the auto-complete depending on which occurred first; this seemed to work just fine.
But seriously, almost anything in DL that eliminated/combined the dupes would be fine with me as a solution. I know auto-completion is a standard response to data issues (e.g., with tag trailing punctuation), but DL supports import! If non-compliant data is not cleaned up somehow during import, or at least warned about, this leaves things in a non-ideal state. And supporting mixed case is a huge advantage in readability, which is why I see this issue.