LGGMs a new class of graph generative models trained on a large corpus of graphs

Akisamb@programming.dev · 1 month ago

They’ve got thunderbird which is as far as I know the only serious alternative to outlook.

Akisamb@programming.dev · 3 months ago

Huh, your comment made look if there were personalized recommendation algorithms running on lemmy. From what I found, it appears that Lemmy does not use personalized recommendation algorithms : https://join-lemmy.org/docs/contributors/07-ranking-algo.html

The specific function used for ranking is here : https://github.com/LemmyNet/lemmy/blob/4ba6221e04ab3e186669aeaa890d23b1e3f3d1a9/crates/db_schema/replaceable_schema/utils.sql#L18

I’m wondering how hard it would be to adapt the code to customize the score for every user, instead of it being global.

I makes it look like the advancement here is finding methods to efficiently use sets of graphs which are an order of magnitude larger than prior methods could use for training? They also seem to have used more sets of graphs than prior models across a wider set of domains. Am I reading this correctly?

I find it challenging to gauge the paper’s impact fully, as this isn’t my area of expertise. However, the ability to use diverse graphs in a single model surprised me and seemed worth sharing.

Akisamb@programming.dev · 3 months ago

LGGMs a new class of graph generative models trained on a large corpus of graphs

Akisamb@programming.dev · 4 months ago

Now instead of just querying the goddamn database, a one line fucking SQL statement, I have to deal with the user team

Exactly, you understand very well the purpose of microservices. You can submit a patch if you need that feature now.

Funnily enough I’m the technical lead of the team that handles the user service in an insurance company.

Due to direct access to our data without consulting us, we’re getting legal issues as people were using addresses to guess where people lived instead of using our endpoints.

I guess some people really hate the validation that service layers have.

Akisamb@programming.dev · 5 months ago

[PLDI'23] Scallop: A Language for Neurosymbolic Programming

Akisamb@programming.dev · 6 months ago

Qwen1.5-MoE-A2.7B: A Small MoE Model with only 2.7B Activated Parameters yet Matching the Performance of State-of-the-Art 7B models

Akisamb@programming.dev · 6 months ago

This is not true in France. Politicians that have proven fraud are arrested and charged. In France we have Sarkozy, Cahuzac, Fillon that were all charged with crimes.

They were president, minister and presidential candidate respectively. I’d be surprised if it was different in the USA. I’m seeing that trump is also being charged, the system seems to be working.

Akisamb@programming.dev · 6 months ago

It does seem odd that scraping activity from just two accounts allegedly managed to cause such an extended server outage. The irony of this situation also hasn’t been lost on online creatives, who have extensively criticized both companies (and generative AI systems in general) for training their models on masses of online data scraped from their works without consent. Stable Diffusion and Midjourney have both been targeted with several copyright lawsuits, with the latter being accused of creating an artist database for training purposes in December.

As far as I know they do not have copyright over the output of their models. Apart from banning the users they pretty much have no solutions to stop this. Even if they had copyright, it’s still legally unknown if training LLMs constitutes a copyright violation.

In a similar fashion a lot of the recent chat llm’s have been trained on output from chatgpt. After all why pay humans to produce training data when your competitor has already done it for you.

Akisamb@programming.dev · 6 months ago

Midjourney bans all Stability AI employees over alleged data scraping

Akisamb@programming.dev · 6 months ago

The Annotated S4

Akisamb@programming.dev · 7 months ago

Why would java have an impact on battery performance ? Pretty much all credit cards run java for their encryption algorithms, and they need pretty much no power to run.

Akisamb@programming.dev · 8 months ago

They gave them a birth control shot without properly informing them of what it was. Still scandalous, but not what you are saying.

Akisamb@programming.dev · 8 months ago

These models do not see letters but tokens. For the model, violet is probably two symbols viol and et. Apart from learning by heart the number of letters in each token, it is impossible for the model to know the number of letters in a word.

This is also why gpt family sucks at addition their tokenizer has symbols for common numbers like 14. This meant that to do 14 + 1 it could not use the knowledge 4 + 1 was 5 as it could not see the link between the token 4 and the token 14. The Llama tokenizer fixes this, and is thus much better at basic algebra even with much smaller models.

Akisamb@programming.dev · edit-2 9 months ago

For folks who aren’t sure how to interpret this, what we’re looking at here is early work establishing an upper bound on the complexity of a problem that a model can handle based on its size. Research like this is absolutely essential for determining whether these absurdly large models are actually going to achieve the results people have already ascribed to them on any sort of consistent basis. Previous work on monosemanticity and superposition are relevant here, particularly with regards to unpacking where and when these errors will occur.

I’m not sure this work accomplishes that. Sure, it builds up on previous work that showed that a transformer can be simulated by a TC⁰ family. However, the limits of this fact are not clear. The paper even admits as such

Our result on the limitations of T-LLMs as general learners comes from Proposition 1 and Theorem 2. On the one hand, T-LLMs are within the TC⁰ complexity family; on the other hand, general learners require at least as hard as P/ poly-complete. In the field of circuit theory, it is known that TC⁰ is a subset of P/ poly and commonly believed that TC⁰ is a strict subset of P/ poly, though the strictness is still an open problem to be proved.

I believe this is one of the weakest points of the paper, as it bases all of its reasoning on an unproven theorem. And you can implement many things with a TC⁰ algorithm, addition, multiplication, basic logic, heck you can even make transformers.

There still is something that bothers me. Why did it define general learning as being at least a universal circuit for the set of all circuits within a polynomial size ? Why this restriction ? I tried googling general learner and universal circuit and only came up with this paper.

While searching, I found that this paper was rejected, you can find the reviews here : https://openreview.net/forum?id=e5lR6tySR7

If you are searching for a paper on the limits of T-LLMs the paper What Algorithms can Transformers Learn? A Study in Length Generalization may prove more informative. https://arxiv.org/pdf/2310.16028.pdf It explains why transformers are so bad at addition.

Here is the key part of their abstract :

Specifically, we leverage RASP (Weiss et al., 2021)— a programming language designed for the computational model of a Transformer— and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths.

Akisamb@programming.dev · 9 months ago

Didier Raoult for a large part. He was the one who published the paper that really started this whole mess. His shoddy research practices and non-respect for patients did plenty of harm.

Good thing that they’ve forced his retirement.

Akisamb@programming.dev · 9 months ago

Hard to say from the article only, but if it is like the status quo in the EU and USA, then only the training data can be illegally obtained. If I have an AI that is able to say verbatim the script of the Bee movie, I will be sued.

Google books had a similar issue. They scanned pretty much all the books in existence and indexed them. Small issue they did not obtain the consent of the copyright holders before doing this. They were sued and won. You can use copyrighted data as long as you do not provide Access to it.

Akisamb@programming.dev · 9 months ago

Dark Visitors - A list of known AI agents on the internet

Akisamb@programming.dev · 9 months ago

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

Akisamb@programming.dev · 9 months ago

I mean yes in the sense that the capture of civilians has a clear military objective. Doesn’t make it less awful.

One genocidal state doesn’t justify another one. There are no good guys in this conflict. That said one side has more bombs than the other so we should be focusing on that side. But please, no justifying war crimes.

Akisamb@programming.dev · edit-2 9 months ago

You can find a demo here if you want to test a 3 billion parameter model using this architecture that was trained on the pile.

The evolution of attention alternatives is an exciting one, long context lengths are becoming realistic. Here’s a graph of the training time vs sequence length from the paper. At the 128K mark we have a 100X speedup compared to attention.

Training: our efficient scan is 40× faster than a standard implementation

Akisamb@programming.dev · edit-2 9 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Akisamb@programming.dev · 10 months ago

Hi, I’m the moderator of this community. Should be fine, data treatment is a big part of machine learning.

Akisamb@programming.dev · 10 months ago

Most surprisingly, the inspectors observed barefoot employees working in a sterile area of the facility, where they should have been wearing shoes—plus gowns, gloves, and shoe booties. (The barefoot workers were also not wearing gowns or gloves.) A production manager puzzlingly told FDA inspectors that shoeless work is “standard practice.”

They were supposed to cover everything including the feet.

Akisamb@programming.dev · 11 months ago

Reference counting is a GC though ?

It’s a bad one sure and will leak memory in cases of a cycle which most tracing GC are able to do.

It’s main advantage is that there are no GC pauses.

https://en.m.wikipedia.org/wiki/Reference_counting

Akisamb@programming.dev · 11 months ago

They are prisoners of Hamas

Hamas is controlling Gaza through a dictatorship yes. But their ideas are popular.

Even more radical political parties like Lion’s den or Palestinian islamic jihad having higher approval ratings.

That said even Fatah is more popular than Hamas.

https://www.washingtoninstitute.org/policy-analysis/polls-show-majority-gazans-were-against-breaking-ceasefire-hamas-and-hezbollah

Akisamb@programming.dev · 1 year ago

WordPress wirh custom templates running on a LAMP stack.

Akisamb@programming.dev · edit-2 1 year ago

If you’re looking for the mathematical side, I’d recommend The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition that said it’s quite old school, you won’t find any information on why the new techniques work (especially deep learning). Still, if you want to understand bootstrapping, bias variance decomposition, the curse of dimensionality, I’d say this is one of the best books.

I’ll also share recommended readings of different EPFL courses that I did for my masters degree :

Linear algebra and learning from data
The elements of statistical learning : data mining, inference, and prediction / Friedman
Understanding Machine Learning / Shalev-Shwartz
Neural Networks and Deep Learning / Nielsen
Machine Learning: A Probabilistic Perspective / Murphy
Pattern Recognition and Machine Learning / Bishop
Reinforcement Learning / Sutton
Deep Learning / Goodfellow

Akisamb@programming.dev · 1 year ago

Inverse Scaling: When Bigger Isn't Better

Akisamb@programming.dev · 1 year ago

Inside the AI Factory

Akisamb@programming.dev · 1 year ago

Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust (Explained)

Akisamb@programming.dev · 1 year ago

Has anybody replaced attention with Hyena Hierarchy

Moderates