TABLE OF CONTENTS

Introduction The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results Conclusions

The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results Conclusions

Introduction The Challenge Facing RAG for Long-Context

Introduction The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results Conclusions

Introduction The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results

Introduction The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results Conclusions

Introduction The Challenge Facing RAG for Long-Context The Algorithm The Implementation Results

Introduction

Long-context LLMs are highly capable but require enormous compute and memory resources. This makes them expensive to serve in the cloud and renders them largely impractical for compute-constrained scenarios such as on-device applications. This raises the question of how we can reduce the cost of processing long-contexts without hurting performance.

At Zyphra, we are developing fast but effective RAG systems to preprocess long-context inputs for LLMs. The idea is to use RAG to select a subset of the text most relevant for the task at hand before sending it to the LLM. By only inputting a small portion of the context to the LLM, the cost of LLM inference can be drastically reduced.

We previously presented a preliminary graph-based RAG system engineered for this purpose and demonstrated its effectiveness on the HashHop benchmark, achieving state-of-the-art accuracy with unprecedented speed and memory efficiency.

Now, we present an expanded version of our system and demonstrate its effectivenessacross an extensive set of long-context benchmarks. Details can be found in our new paper. This new version of the algorithm, which we call Mixture-of-PageRanks (MixPR), achieves SOTA results across a variety of standard long-context benchmarks while being highly computationally efficient. Below is a summary of the algorithm and main results.

The Challenge Facing RAG for Long-Context

Previous works applying RAG to long-context tasks have two main limitations. First, they tend to focus on simple QA type questions, even though many long-context tasks are more complex, e.g., reasoning or summarization. Second, they do not analyze or focus on reducing the compute cost of the retrieval pipeline.

RAG models are normally tested on standard informational retrieval benchmarks, where it is assumed a pre-embedded vector database is available to answer questions at test time. However, in long context tasks the data that the RAG system is retrieving from, i.e., the long context, is given at test time. This means the RAG system is not just required to do fast retrieval but also fast database construction. If a RAG system is very slow at constructing the database, it will be much less useful as a long-context preprocessing algorithm.

The Algorithm

We have developed a RAG system which can handle long-context inputs and solve complex tasks beyond simple QA. Our system uses a novel retriever based on the PageRank algorithm, which is the classic graph-based retrieval algorithm originally developed by Google. A novel feature of our algorithm is the usage of a mixture of two different types of PageRank that apply to two main categories of tasks: local and global retrieval tasks.

Tasks requiring local retrieval are those where we want to retrieve text chunks that are semantically and syntactically related to the query, e.g., QA, key-value retrieval, reasoning, etc. Tasks requiring global retrieval are those where we want to retrieve the most important parts of the overall document, where importance is independent of query-relatedness. For example, for summarizing books we may want to retrieve the text chunks describing the book’s main events, which will not directly relate to the specific content of the phrase “summarize this book” but will nonetheless be important in relation to the structure of the events in the book.

We argue Standard PageRank (PR) provides a way to address global retrieval tasks. Given an adjacency matrix, A, linking items in a database, PR will assign importance to each item based only on their structural importance in the graph (e.g., the number of incoming connections and importance of neighbors). We visualize this in the bottom of the figure above, where structurally important nodes are represented in dark blue.

Conversely, Personalized PageRank (PPR) can provide a way to do local retrieval for difficult tasks. PPR weights items in a database based both on their relatedness to a ‘personalization vector’ and based on their structural importance. We use the personalization vector to bias weights toward a node representing the query. We visualize this in the middle row of the figure above.

PPR can be seen as working akin to approximate Bayesian inference: it tries to maximize the likelihood, P(X | Z), of the input data X (i.e., maximize similarity to the query), while also maximizing the prior probability, P(Z), of the node (i.e., the structural importance of node in the graph). In the case of local retrieval tasks, an X is given in the form of a query. In global retrieval tasks, no X is given (the query does not contain relevant info to be searched), and thus we end up just wanting to maximize the prior probability/structural importance over our weights Z.

Finally, we need a way to do routing between PR and PPR based on the type of query. We use the base LLM in the RAG system with a zero-shot prompt to classify the query as either requiring global retrieval (route to PR) or local retrieval (route to PPR). This simple method worked very well on multiple LLMs.

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Implementation

Next, how do we implement MixPR so it can be executed quickly in real-time, on compute limited hardware? Although the abstract MixPR algorithm issimple it alone does not specify how to implement the algorithm so that it is compute and memory efficient. We need a way to embed the text, construct a graph between text chunks, and perform retrieval quickly.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

To speed this process up we first embed text chunks using a symbolic program that creates sparse embeddings from keyword statistics, using the TF-IDF algorithm. This program is fast and runs entirely on the CPU. The embeddings are stored in memory-efficient sparse matrices. To construct the adjacency matrix we construct the proximity matrix representing cosine similarity values between all text embeddings, and use it as the adjacency matrix. This operation can be donewith a single matrix multiply.

The figure above shows that this process of chunking text, embedding, constructing the graph, and retrieving text is very fast, taking only a few seconds to process contexts of a million+ tokens on a desktop CPU. And it is much faster than the standard process that uses dense embeddings.

Introduction

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Challenge Facing RAG for Long-Context

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Algorithm

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Implementation

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Results

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

A small sample of our results can be seen in the figure above and below. Please see our paper for the full results. We test our MixPR model across multiple long context benchmarks including RULER, a synthetic long context dataset with over 12 different tasks, BABILong a natural language long-context reasoning benchmark, and HashHop a synthetic long-context reasoning benchmark. All of these benchmarks contain tasks that go beyond simple QA.

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Implementation

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Results

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Importantly, MixPR, which uses standard PR for global retrieval and PPR for local retrieval, outcompetes a retriever that uses PPR on all tasks, supporting our idea that using a mixture of PR and PPR, rather than just PPR, provides performance benefits.

How does MixPR compare to SOTA on these benchmarks? We match the SOTA model on HashHop, outcompeting the SOTA model at very long context lengths. We also achieve SOTA results on RULER, where MixPR models outperform base long-context LLMs. MixPR with GPT-4o achieves the 2nd best result on BABILong only behind a specialized recurrent memory transformer finetuned specifically for BABILong, and first among previous RAG models on the BABILong leaderboard. See our paper for full result tables.

Conclusions

At Zyphra, we are committed to developing AI systems that can work on a variety of devices, including compute-constrained devices like phones and personal computers. We believe RAG will play an important role in dealing with the computational costs of conditioning language models on large databases of texts. Our MixPR RAG system provides evidence for this claim by showing how RAG can be used in a highly compute efficient way while still being effective on long-context tasks.

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Algorithm

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

The Implementation

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

The Algorithm

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Algorithm

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Implementation

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Results

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Conclusions

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Implementation

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Results

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Conclusions

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Link to Cookbook (GitHub)

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Implementation

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Results

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Conclusions

What is Annealing?

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The Implementation

Results

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

The Algorithm

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Implementation

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Results

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Conclusions

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Algorithm

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Table 1: Evaluation scores for Zyda-2 vs alternative datasets broken down more granularly by specific evaluation metric

The Implementation

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Results

Conclusions

Analysis of Global Duplicates

We present histograms depicting distribution of cluster sizes in all the datasets (see Fig. 7-11). Please, note that all the figures are in log-log scale. We see a significant drop in the number of clusters starting from the size of around 100. This drop is present both in DCLM and FineWeb-Edu2 (see Fig. 8 and 9 respectively), and most likely is explained by a combination of the deduplication strategy and quality when creating both datasets: DCLM deduplication was done individually within 10 shards, while FineWeb-Edu2 was deduplicated within every Common Crawl snapshot. We find that large clusters usually contain low quality material (repeated advertisements, license agreements templates, etc), so it’s not surprising that such documents were removed. Notably, DCLM still contained one cluster with the size close to 1 million documents, containing low quality documents seemingly coming from the advertisements (see Appendix).We find both Zyda-1and Dolma-CC contain a small amount of duplicates, which is expected, since both datasets were deduplicated globally by their authors. Remaining duplicates are likely false negatives from the initial deduplication procedure. Note, that distribution of duplicates clusters sizes of these two datasets (Fig. 10 and 11) don’t contain any sharp drops, but rather hyper exponentially decreases with cluster size.

Figure 7: Distribution of cluster sizes of duplicates in global dataset (log-log scale).

Figure 8: Distribution of cluster sizes of duplicates in DCLM (log-log scale).

Figure 9: Distribution of cluster sizes of duplicates in FineWeb-Edu2 (log-log scale).

Figure 10: Distribution of cluster sizes of duplicates in Zyda-1 (log-log scale).

Figure 11: Distribution of cluster sizes of duplicates in Dolma-CC (log-log scale).

Largest cluster in DCLM

Below is an example of the document from the largest cluster (~1M documents) of duplicates in DCLM (quality score 0.482627):
‍
‍Is safe? Is scam?
Is safe for your PC?
Is safe or is it scam?
Domain is SafeSafe score: 1
‍‍
‍The higher the number, the more dangerous the website.Any number higher than 1 means DANGER.
‍‍
‍Positive votes:
Negative votes:
Vote Up Vote Down review
‍‍
‍Have you had bad experience with Warn us, please!

Examples of varying quality score in a cluster of duplicates in DCLM

Below one will find a few documents with different quality scores from DCLM coming from the same duplicates cluster. Quality score varies from ~0.2 to ~0.04.

Document ID: <urn:uuid:941f22c0-760e-4596-84fa-0b21eb92b8c4>

Quality score of: 0.19616

Thrill Jockey instrumental duo Rome are, like many of the acts on the Chicago-based independent label, generally categorized as loose adherents of "post-rock," a period-genre arising in the mid-'90s to refer to rock-based bands utilizing the instruments and structures of music in a non-traditionalist or otherwise heavily mutated fashion. Unlike other Thrill Jockey artists such as Tortoise and Trans-Am, however, Rome draw less obviously from the past, using instruments closely associated with dub (melodica, studio effects), ambient (synthesizers, found sounds), industrial (machine beats, abrasive sounds), and space music (soundtrack-y atmospherics), but fashioning from them a sound which clearly lies beyond the boundaries of each. Perhaps best described as simply "experimental," Rome formed in the early '90s as the trio of Rik Shaw (bass), Le Deuce (electronics), and Elliot Dicks (drums). Based in Chicago, their Thrill Jockey debut was a soupy collage of echoing drums, looping electronics, and deep, droning bass, with an overwhelmingly live feel (the band later divulged that much of the album was the product of studio jamming and leave-the-tape-running-styled improvisation). Benefiting from an early association with labelmates Tortoise as representing a new direction for American rock, Rome toured the U.S. and U.K. with the group (even before the album had been released), also appearing on the German Mille Plateaux label's tribute compilation to French philosopher Gilles Deleuze, In Memoriam. Although drummer Dicks left the group soon after the first album was released, Shaw and Deuce wasted no time with new material, releasing the "Beware Soul Snatchers" single within weeks of its appearance. An even denser slab of inboard studio trickery, "Soul Snatchers" was the clearest example to date of the group's evolving sound, though further recordings failed to materialize. ~ Sean Cooper, Rovi

Document ID: <urn:uuid:0df10da5-58b8-44d8-afcb-66aa73d1518b>

Quality score of: 0.091928

Thrill Jockey instrumental duo Rome are, like many of the acts on the Chicago-based independent label, generally grouped in as loose adherents of "post-rock," a period-genre arising in the mid-'90s to refer to rock-based bands utilizing the instruments and structures of the music in a non-traditionalist or otherwise heavily mutated fashion. Unlike other Thrill Jocky artists such as Tortoise and Trans-Am, however, Rome draw less obviously from the past, using instruments closely associated with dub (melodica, studio effects), ambient (synthesizers, found sounds), industrial (machine beats, abrasive sounds), and space music (soundtrack-y atmospherics), but fashioning from them a sound which lay clearly beyond the boundaries of each. Perhaps best described as simply experimental, Rome formed in the early '90s as the trio of Rik Shaw (bass), Le Deuce (electronics), and Elliot Dick (drums). Based in Chicago, their Thrill Jockey debut was a soupy collage of echoing drums, looping electronics, and deep, droning bass, with an overwhelmingly live feel (the band later divulged that much of the album was the product of studio jamming and leave-the-tape-running styled improvisation). Benefiting from an early association with labelmates Tortoise as representing a new direction for American rock, Rome toured the U.S. and U.K. with the group (even before the album had been released), also appearing on the German Mille Plateaux label's tribute compilation to French philosopher Gilles Deleuze, In Memoriam. Although drummer Elliot Dick left the group soon after the first album was released, Shaw and Deuce wasted no time with new material, releasing the "Beware Soul Snatchers" single within weeks of its appearance. An even denser slab of inboard studio trickery, "Soul Snatchers" was the clearest example to date of the group's evolving sound, though further recordings failed to materialize.
Sean Cooper, Rovi
‍
More Rome
‍
You may also like...

Document ID: <urn:uuid:4986ef09-3ee3-4e13-9084-7898aaf72aaf>

Quality score of: 0.072259

recent on-air advertisers

Now Playing

You Control the ...

Artist Snapshot:

Thrill Jockey instrumental duo Rome are, like many of the acts on the Chicago-based independent label, generally grouped in as loose adherents of "post-rock," a period-genre arising in the mid-'90s to refer to rock-based bands utilizing the instruments and structures of the music in a non-traditionalist or otherwise heavily mutated fashion. Unlike other Thrill Jocky artists such as Tortoise and Trans-Am, however, Rome draw less obviously from the past, using instruments closely associated with dub (melodica, studio effects), ambient (synthesizers, found sounds), industrial (machine beats, abrasive sounds), and space music (soundtrack-y atmospherics), but fashioning from them a sound which lay clearly beyond the boundaries of each. Perhaps best described as simply experimental, Rome formed in the early '90s as the trio of Rik Shaw (bass), Le Deuce (electronics), and Elliot Dick (drums). Based in Chicago, their Thrill Jockey debut was a soupy collage of echoing drums, looping electronics, and deep, droning bass, with an overwhelmingly live feel (the band later divulged that much of the album was the product of studio jamming and leave-the-tape-running styled improvisation). Benefiting from an early association with labelmates Tortoise as representing a new direction for American rock, Rome toured the U.S. and U.K. with the group (even before the album had been released), also appearing on the German Mille Plateaux label's tribute compilation to French philosopher Gilles Deleuze, In Memoriam. Although drummer Elliot Dick left the group soon after the first album was released, Shaw and Deuce wasted no time with new material, releasing the "Beware Soul Snatchers" single within weeks of its appearance. An even denser slab of inboard studio trickery, "Soul Snatchers" was the clearest example to date of the group's evolving sound, though further recordings failed to materialize. ~ Sean Cooper, RoviSean Cooper, Rovi
‍
More Rome
‍
You may also like...

Document ID: <urn:uuid:1e0496a9-0116-418a-9aec-e65b1d20e709>

Quality score of: 0.0424

18 June 2015

ROME self titled 1996

by request

Artist Biography by

Thrill Jockey instrumental duo Rome are, like many of the acts on the Chicago-based independent label, generally categorized as loose adherents of "post-rock," a period-genre arising in the mid-'90s to refer to rock-based bands utilizing the instruments and structures of music in a non-traditionalist or otherwise heavily mutated fashion. Unlike other Thrill Jockey artists such as Tortoise and Trans-Am, however, Rome draw less obviously from the past, using instruments closely associated with dub (melodica, studio effects), ambient (synthesizers, found sounds), industrial (machine beats, abrasive sounds), and space music (soundtrack-y atmospherics), but fashioning from them a sound which clearly lies beyond the boundaries of each. Perhaps best described as simply "experimental," Rome formed in the early '90s as the trio of Rik Shaw (bass), Le Deuce (electronics), and Elliot Dicks (drums). Based in Chicago, their Thrill Jockey debut was a soupy collage of echoing drums, looping electronics, and deep, droning bass, with an overwhelmingly live feel (the band later divulged that much of the album was the product of studio jamming and leave-the-tape-running-styled improvisation). Benefiting from an early association with labelmates Tortoise as representing a new direction for American rock, Rome toured the U.S. and U.K. with the group (even before the album had been released), also appearing on the German Mille Plateaux label's tribute compilation to French philosopher Gilles Deleuze, In Memoriam. Although drummer Dicks left the group soon after the first album was released, Shaw and Deuce wasted no time with new material, releasing the "Beware Soul Snatchers" single within weeks of its appearance. An even denser slab of inboard studio trickery, "Soul Snatchers" was the clearest example to date of the group's evolving sound, though further recordings failed to materialize.
‍
1 Leaving Perdition 8:10
2 Intermodal 3:39
3 Lunar White 3:25
4 She's A Black Belt 3:14
5 Rohm 1:09
6 Radiolucence (Version) 5:31
7 Deepest Laws 14:14

No comments:

Introduction

Reported scores underlined.

Pass@1 scores with greedy sampling.

Pass@1 scores with greedy sampling. Livebench 2024-11-25.
Bold: Best score at 1.5B scale w/ greedy sampling
*reported scores

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Evals (reported underlined). All numbers pass@1 estimated using n=16

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

The Challenge Facing RAG for Long-Context

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

The Algorithm

The Implementation

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Footnote: Training on the Eurus-2-RL dataset did not match the DeepScaleR math evaluation numbers, possibly due to lower quality synthetic math questions in NuminaMath-CoT providing a mixed training signal, or the solvability filtering process with QwQ-preview reducing the difficulty of the dataset. Additionally, the relatively small percentage of code (5%) likely led to math dominating training at the expense of code performance. Training on domain specific datasets and merging resulting models seems to be a potential way to counteract this problem, as demonstrated with SFT in Light-R1.

Results

Introduction

The Challenge Facing RAG for Long-Context

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

The Algorithm

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

The Implementation

Results

RAG is not all we are working on here at Zyphra. To get more information about our work on model training, data curation, and algorithmic innovation, check out our other blog posts.

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

The Implementation

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Results

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.

Conclusions

Introduction

The Challenge Facing RAG for Long-Context

The Algorithm

The final retrieval algorithm can be summarized:

Chunk and embed text. Create adjacency matrix.
Classify query as either requiring global or local retrieval.
If local retrieval, retrieve using PPR. If global, retrieve using PR.

Prompt #1

I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #2

The emperor's complexion did not change, remaining as still as a sculpture, and a touch of touching warmth flashed in his eyes. He deeply glanced at the loyal minister, and finally spoke: "Well, I will consider it again." His voice was low and firm, leaving a faint hint of helplessness and tenderness in the air.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #3

You don't even think to call me "Godfather." You come into my house on the day my daughter is to be married and you ask me to do murder - for money.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #4

Brave bakers boldly baked big batches of brownies in beautiful bakeries.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #5

Active artists always appreciate artistic achievements and applaud awesome artworks.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #6

I was, like, talking to my friend, and she’s all, um, excited about her, uh, trip to Europe, and I’m just, like, so jealous, right?

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #7

F one F two F four F eight H sixteen H thirty two H sixty four

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #8

Its chlorover. Like totally chlorover. Totally. Completely. Chlorover.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

Prompt #9

Crafting a symphony of flavors the skilled chef orchestrated a culinary masterpiece that left an indelible mark mark mark mark mark on the palates of the discerning diners.

Zonos

ElevenLabs

Cartesia

Fish Speech v1.5

The Implementation

Previous SOTA graph RAG systems tend to have a slow embedding process involving LLMs writing entity pairs from the text, which are then used to construct a graph (e.g., see here and here).

Results

Across the benchmarks there are 8 tasks requiring multi-hop retrieval. Compared to nearest neighbor baselines and the base LLMs, MixPR performs better on difficult multi-hop retrieval questions.

Conclusions

MixPR also outcompetes nearest neighbor baselines on global retrieval tasks.