• mspencer712@lemmy.fmhy.ml
    link
    fedilink
    English
    arrow-up
    12
    ·
    1 year ago

    Yep, Longest Common Subsequence is usually greedy and that’s the earliest set of lines that satisfies the search. Happens when you just treat a file as lines and only match those.

    You can get better results with more syntax or content awareness. Chunk into paragraphs or code blocks or functions, then sentences or statement lists, then lines, then words, etc. I think Beyond Compare can do this.

    • GroteStreet 🦘@aussie.zone
      link
      fedilink
      arrow-up
      13
      ·
      1 year ago

      Oh that’s not uncommon in the industry. Especially when dealing with legacy code.

      Personal best was 40k lines in a file called misc.c containing all the global functions that don’t fit anywhere else.

      Runner up was the one where each developer dumped their miscellaneous functions in their own files, so they don’t have to deal with merge conflicts. Which means we had x1.c, x2.c, x3.c … etc.

      • xedrak@kbin.social
        link
        fedilink
        arrow-up
        7
        ·
        1 year ago

        Oh trust me, I know. Personal best is 20k lines in a Java file that served as the main control flow of the entire software. Just because it’s common doesn’t make me any less disgusted 😂

        Thankfully now I’m the asshole senior who gets to prevent this kind of stuff from happening in the first place. But like you said, that doesn’t help with legacy applications lol.

      • YMS@kbin.social
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        1 year ago

        Best I can offer is a combined UI and logic class with 12,500 lines currently. It started out with less than 3,000 lines in the year 2000 (using the brand new Java 1.3), grew to 14,000 over time and survived our recent project-wide one-year cleanup project with only minor losses of code lines.

    • Dandroid@dandroid.app
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      I work for one of the mega corporations as a decently high level software engineer. My team’s job is to maintain legacy code. This is my life. 😞

      • GroteStreet 🦘@aussie.zone
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Ah, a fellow janitorial staff. Some of these shit have been there so long they’ve seeped through the walls. There’s no way to get rid of them, short of demolishing the whole building.

    • aidan@lemmy.worldOP
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      1 year ago

      You should see Firefox source code, there are many files like that. Honestly it’s better than having 100,000 files which is what would happen with the size of Firefox.

      • YMS@kbin.social
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        As someone who professionally works in a project with many, many thousands of files (I don’t know the exact number right now, but we’re coming close to 10 million lines of code) and many of them having thousands of lines (see my other comment): No, longer files is not better than more files.

        • aidan@lemmy.worldOP
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          It depends, obviously if stuff is unrelated than they should be in separate files, but having in one folder 1000 files containing each function I think that would be very exhausting to search through to understand the code.

    • aidan@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      Its okay, I’ve only used it for contributing to firefox so I’m not that familiar.