• 12 Posts
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle



  • Now instead of just querying the goddamn database, a one line fucking SQL statement, I have to deal with the user team

    Exactly, you understand very well the purpose of microservices. You can submit a patch if you need that feature now.

    Funnily enough I’m the technical lead of the team that handles the user service in an insurance company.

    Due to direct access to our data without consulting us, we’re getting legal issues as people were using addresses to guess where people lived instead of using our endpoints.

    I guess some people really hate the validation that service layers have.





  • It does seem odd that scraping activity from just two accounts allegedly managed to cause such an extended server outage. The irony of this situation also hasn’t been lost on online creatives, who have extensively criticized both companies (and generative AI systems in general) for training their models on masses of online data scraped from their works without consent. Stable Diffusion and Midjourney have both been targeted with several copyright lawsuits, with the latter being accused of creating an artist database for training purposes in December.

    As far as I know they do not have copyright over the output of their models. Apart from banning the users they pretty much have no solutions to stop this. Even if they had copyright, it’s still legally unknown if training LLMs constitutes a copyright violation.

    In a similar fashion a lot of the recent chat llm’s have been trained on output from chatgpt. After all why pay humans to produce training data when your competitor has already done it for you.






  • These models do not see letters but tokens. For the model, violet is probably two symbols viol and et. Apart from learning by heart the number of letters in each token, it is impossible for the model to know the number of letters in a word.

    This is also why gpt family sucks at addition their tokenizer has symbols for common numbers like 14. This meant that to do 14 + 1 it could not use the knowledge 4 + 1 was 5 as it could not see the link between the token 4 and the token 14. The Llama tokenizer fixes this, and is thus much better at basic algebra even with much smaller models.


  • For folks who aren’t sure how to interpret this, what we’re looking at here is early work establishing an upper bound on the complexity of a problem that a model can handle based on its size. Research like this is absolutely essential for determining whether these absurdly large models are actually going to achieve the results people have already ascribed to them on any sort of consistent basis. Previous work on monosemanticity and superposition are relevant here, particularly with regards to unpacking where and when these errors will occur.

    I’m not sure this work accomplishes that. Sure, it builds up on previous work that showed that a transformer can be simulated by a TC0 family. However, the limits of this fact are not clear. The paper even admits as such

    Our result on the limitations of T-LLMs as general learners comes from Proposition 1 and Theorem 2. On the one hand, T-LLMs are within the TC0 complexity family; on the other hand, general learners require at least as hard as P/ poly-complete. In the field of circuit theory, it is known that TC0 is a subset of P/ poly and commonly believed that TC0 is a strict subset of P/ poly, though the strictness is still an open problem to be proved.

    I believe this is one of the weakest points of the paper, as it bases all of its reasoning on an unproven theorem. And you can implement many things with a TC0 algorithm, addition, multiplication, basic logic, heck you can even make transformers.

    There still is something that bothers me. Why did it define general learning as being at least a universal circuit for the set of all circuits within a polynomial size ? Why this restriction ? I tried googling general learner and universal circuit and only came up with this paper.

    While searching, I found that this paper was rejected, you can find the reviews here : https://openreview.net/forum?id=e5lR6tySR7

    If you are searching for a paper on the limits of T-LLMs the paper What Algorithms can Transformers Learn? A Study in Length Generalization may prove more informative. https://arxiv.org/pdf/2310.16028.pdf It explains why transformers are so bad at addition.

    Here is the key part of their abstract :

    Specifically, we leverage RASP (Weiss et al., 2021)— a programming language designed for the computational model of a Transformer— and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths.














  • If you’re looking for the mathematical side, I’d recommend The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition that said it’s quite old school, you won’t find any information on why the new techniques work (especially deep learning). Still, if you want to understand bootstrapping, bias variance decomposition, the curse of dimensionality, I’d say this is one of the best books.

    I’ll also share recommended readings of different EPFL courses that I did for my masters degree :

    • Linear algebra and learning from data
    • The elements of statistical learning : data mining, inference, and prediction / Friedman
    • Understanding Machine Learning / Shalev-Shwartz
    • Neural Networks and Deep Learning / Nielsen
    • Machine Learning: A Probabilistic Perspective / Murphy
    • Pattern Recognition and Machine Learning / Bishop
    • Reinforcement Learning / Sutton
    • Deep Learning / Goodfellow