Sysmo DB
  1. Sysmo DB
  2. SYSMO-1590

Indexing is missing some terms in an excel file, possibly due to parenthesis

    Details

    • Type: Bug Bug
    • Status: Reopened Reopened
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.21
    • Fix Version/s: unscheduled
    • Labels:
      None

      Description

      Olga reported that searching for Monoisotopic did not return the expected result https://seek.sysmo-db.org/data_files/1057 which contains that term.

      Looking into the file, it does contain that text but surrounded by parenthesis ().
      It looks like these undesirably affect the indexing and searching of text.

        Issue Links

          Activity

          Hide
          Stuart Owen added a comment -

          I'm re-opening this.
          Although the fix looks correct, it may have opened up another issue. The re-indexing on SEEK after deploying this change is now regularly getting stuck and I am having to kick start it. I suspect it is getting stuck on some datafiles, but the code the trigger a timeout after 15 minutes doesn't seem to be triggered.
          Needs some looking into.
          A short term fix could be to reduce the max size of a data file that gets indexed.

          Show
          Stuart Owen added a comment - I'm re-opening this. Although the fix looks correct, it may have opened up another issue. The re-indexing on SEEK after deploying this change is now regularly getting stuck and I am having to kick start it. I suspect it is getting stuck on some datafiles, but the code the trigger a timeout after 15 minutes doesn't seem to be triggered. Needs some looking into. A short term fix could be to reduce the max size of a data file that gets indexed.
          Hide
          Stuart Owen added a comment - - edited

          I needed to back-out the change to the tokenizer, as it broke searching by DOI.

          Show
          Stuart Owen added a comment - - edited I needed to back-out the change to the tokenizer, as it broke searching by DOI.

            People

            • Assignee:
              Quyen Nguyen
              Reporter:
              Stuart Owen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 3 hours
                3h
                Remaining:
                Remaining Estimate - 3 hours
                3h
                Logged:
                Time Spent - Not Specified
                Not Specified