Noam Slonim

Noam Slonim

Noam Slonim (Hebrew: נעם סלונים; born in Jerusalem) is an Israeli computer scientist, specializing in Natural Language Processing and the application of Large language models. He is a Research Scientist at Google Research Israel (since September 2025) and formerly an IBM Distinguished Engineer. He founded and served as Principal Investigator of Project Debater and led Language Model Utilization at IBM Research. Beyond his scientific achievements, Slonim had a writing and media career. He was a writer for Season 4 of The Cameric Five TV comedy show, published a weekly column in Haaretz on brain science, and co-created and wrote the Israeli sitcom Puzzle. He was also the head writer for Seasons 2 and 3 of the sitcom Ha-movilim and featured in the 2020 documentary The Debater. In October 2025, his debut novel, Questionable Memories, was published by Kinneret Publishing Group. == Education and research interests == Slonim graduated from the Hebrew University of Jerusalem in 1996 with a B.S. degree in Computer Science, Physics, and Mathematics. In 2002 he completed Ph.D. summa cum laude at the Interdisciplinary Center for Neural Computation at the Hebrew University, under the supervision of Professor Naftali Tishby. His thesis focused on the theory and applications of the Information Bottleneck method. From 2003 till 2006 he did post-doctoral studies at the Lewis-Sigler Institute for Integrative Genomics at Princeton University, working with Professor Bill Bialek and Professor Saeed Tavazoie. He joined IBM Research in 2007. Slonim holds over 30 patents (granted or pending) and has co-authored more than 100 scientific publications. In 2025, he joined Google Research Israel as a research scientist. == Research activities == From 1998 to 2003 he worked on the theory and applications of the Information Bottleneck method, suggesting various cluster analysis algorithms inspired by this method, and demonstrating the practical value of these algorithms on various domains. From 2003 to 2006 he worked on developing Machine Learning algorithms that rely on Information Theory concepts, and applied these algorithms to the analysis of various types of Genomics data. In 2011 he proposed to develop the first Artificial Intelligence system that can meaningfully participate in a full live debate with an expert human debater. This work gave rise to Project Debater, that debated expert human debaters in several live events during 2018 and 2019. In 2020, Slonim delivered the opening keynote at the EMNLP conference, describing the IBM Research work on developing Project Debater. From 2022 to 2025, he led IBM Research efforts applying large language models to practical use cases; in 2025 he moved to Google Research Israel as a Research Scientist. == Writing and video career == In 1996 Slonim was a writer for Season 4 of The Cameric Five TV comedy show. In 1997–1998 he published a weekly column in Haaretz newspaper, focused on brain science research. In 1997–1999 he co-created and co-wrote the Israeli sitcom, Puzzle. In 2008–2010 he was the head writer of Season 2 and Season 3 of the Israeli Sitcom, Ha-movilim. In 2020 he was featured in the documentary The Debater, an official selection of the 2020 Copenhagen International Documentary Film Festival. In 2025, his debut novel, Questionable Memories, was published by Kinneret Publishing Group.

Multi-model database

In the field of database design, a multi-model database is a database management system designed to support multiple data models against a single, integrated backend. In contrast, most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. Document, graph, relational, and key–value models are examples of data models that may be supported by a multi-model database. == Background == The relational data model became popular after its publication by Edgar F. Codd in 1970. Due to increasing requirements for horizontal scalability and fault tolerance, NoSQL databases became prominent after 2009. NoSQL databases use a variety of data models, with document, graph, and key–value models being popular. A multi-model database is a database that can store, index and query data in more than one model. For some time, databases have primarily supported only one model, such as: relational database, document-oriented database, graph database or triplestore. A database that combines many of these is multi-model. This should not be confused with multimodal database systems such as Pixeltable or ApertureDB, which focus on unified management of different media types (images, video, audio, text) rather than different data models. For some time, it was all but forgotten (or considered irrelevant) that there were any other database models besides relational. The relational model and notion of third normal form were the default standard for all data storage. However, prior to the dominance of relational data modeling, from about 1980 to 2005, the hierarchical database model was commonly used. Since 2000 or 2010, many NoSQL models that are non-relational, including documents, triples, key–value stores and graphs are popular. Arguably, geospatial data, temporal data, and text data are also separate models, though indexed, queryable text data is generally termed a "search engine" rather than a database. The first time the word "multi-model" has been associated to the databases was on May 30, 2012 in Cologne, Germany, during the Luca Garulli's key note "NoSQL Adoption – What’s the Next Step?". Luca Garulli envisioned the evolution of the 1st generation NoSQL products into new products with more features able to be used by multiple use cases. The idea of multi-model databases can be traced back to Object–Relational Data Management Systems (ORDBMS) in the early 1990s and in a more broader scope even to federated and integrated DBMSs in the early 1980s. An ORDBMS system manages different types of data such as relational, object, text and spatial by plugging domain specific data types, functions and index implementations into the DBMS kernels. A multi-model database is most directly a response to the "polyglot persistence" approach of knitting together multiple database products, each handing a different model, to achieve a multi-model capability as described by Martin Fowler. This strategy has two major disadvantages: it leads to a significant increase in operational complexity, and there is no support for maintaining data consistency across the separate data stores, so multi-model databases have begun to fill in this gap. Multi-model databases are intended to offer the data modeling advantages of polyglot persistence, without its disadvantages. Operational complexity, in particular, is reduced through the use of a single data store. == Benchmarking multi-model databases == As more and more platforms are proposed to deal with multi-model data, there are a few works on benchmarking multi-model databases. For instance, Pluciennik, Oliveira, and UniBench reviewed existing multi-model databases and made an evaluation effort towards comparing multi-model databases and other SQL and NoSQL databases respectively. They pointed out that the advantages of multi-model databases over single-model databases are as follows : == Architecture == The main difference between the available multi-model databases is related to their architectures. Multi-model databases can support different models either within the engine or via different layers on top of the engine. Some products may provide an engine which supports documents and graphs while others provide layers on top of a key-key store. With a layered architecture, each data model is provided via its own component. == User-defined data models == In addition to offering multiple data models in a single data store, some databases allow developers to easily define custom data models. This capability is enabled by ACID transactions with high performance and scalability. In order for a custom data model to support concurrent updates, the database must be able to synchronize updates across multiple keys. ACID transactions, if they are sufficiently performant, allow such synchronization. JSON documents, graphs, and relational tables can all be implemented in a manner that inherits the horizontal scalability and fault-tolerance of the underlying data store. == Theoretical Foundation for Multi-Model Databases == The traditional theory of relations is not enough to accurately describe multi-model database systems. Recent research is focused on developing a new theoretical foundation for these systems. Category theory can provide a unified, rigorous language for modeling, integrating, and transforming different data models. By representing multi-model data as sets and their relationships as functions or relations within the Set category, we can create a formal framework to describe, manipulate, and understand various data models and how they interact.

Latent space

A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another. Position within the latent space can be viewed as being defined by a set of latent variables that emerge from the resemblances between the objects. In most cases, the dimensionality of the latent space is chosen to be lower than the dimensionality of the feature space from which the data points are drawn, making the construction of a latent space an example of dimensionality reduction, which can also be viewed as a form of data compression. Latent spaces are usually fit via machine learning, and they can then be used as feature spaces in machine learning models, including classifiers and other supervised predictors. The interpretation of latent spaces in machine learning models is an ongoing area of research, but achieving clear interpretations remains challenging. The black-box nature of these models often makes the latent space unintuitive, while its high-dimensional, complex, and nonlinear characteristics further complicate the task of understanding it. Analysis of the latent space geometry of diffusion models reveals a fractal structure of phase transitions in the latent space, characterized by abrupt changes in the Fisher information metric. Some visualization techniques have been developed to connect the latent space to the visual world, but there is often not a direct connection between the latent space interpretation and the model itself. Such techniques include t-distributed stochastic neighbor embedding (t-SNE), where the latent space is mapped to two dimensions for visualization. Latent space distances lack physical units, so the interpretation of these distances may depend on the application. == Embedding models == Several embedding models have been developed to perform this transformation to create latent space embeddings given a set of data items and a similarity function. These models learn the embeddings by leveraging statistical techniques and machine learning algorithms. Here are some commonly used embedding models: Word2Vec: Word2Vec is a popular embedding model used in natural language processing (NLP). It learns word embeddings by training a neural network on a large corpus of text. Word2Vec captures semantic and syntactic relationships between words, allowing for meaningful computations like word analogies. GloVe: GloVe (Global Vectors for Word Representation) is another widely used embedding model for NLP. It combines global statistical information from a corpus with local context information to learn word embeddings. GloVe embeddings are known for capturing both semantic and relational similarities between words. Siamese Networks: Siamese networks are a type of neural network architecture commonly used for similarity-based embedding. They consist of two identical subnetworks that process two input samples and produce their respective embeddings. Siamese networks are often used for tasks like image similarity, recommendation systems, and face recognition. Variational Autoencoders (VAEs): VAEs are generative models that simultaneously learn to encode and decode data. The latent space in VAEs acts as an embedding space. By training VAEs on high-dimensional data, such as images or audio, the model learns to encode the data into a compact latent representation. VAEs are known for their ability to generate new data samples from the learned latent space. == Multimodality == Multimodality refers to the integration and analysis of multiple modes or types of data within a single model or framework. Embedding multimodal data involves capturing relationships and interactions between different data types, such as images, text, audio, and structured data. Multimodal embedding models aim to learn joint representations that fuse information from multiple modalities, allowing for cross-modal analysis and tasks. These models enable applications like image captioning, visual question answering, and multimodal sentiment analysis. To embed multimodal data, specialized architectures such as deep multimodal networks or multimodal transformers are employed. These architectures combine different types of neural network modules to process and integrate information from various modalities. The resulting embeddings capture the complex relationships between different data types, facilitating multimodal analysis and understanding. == Applications == Embedding latent space and multimodal embedding models have found numerous applications across various domains: Information retrieval: Embedding techniques enable efficient similarity search and recommendation systems by representing data points in a compact space. Natural language processing: Word embeddings have revolutionized NLP tasks like sentiment analysis, machine translation, and document classification. Computer vision: Image and video embeddings enable tasks like object recognition, image retrieval, and video summarization. Recommendation systems: Embeddings help capture user preferences and item characteristics, enabling personalized recommendations. Healthcare: Embedding techniques have been applied to electronic health records, medical imaging, and genomic data for disease prediction, diagnosis, and treatment. Social systems: Embedding techniques can be used to learn latent representations of social systems such as internal migration systems, academic citation networks, and world trade networks.

Generalized multidimensional scaling

Generalized multidimensional scaling (GMDS) is an extension of metric multidimensional scaling, in which the target space is non-Euclidean. When the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. GMDS is an emerging research direction. Currently, main applications are recognition of deformable objects (e.g. for three-dimensional face recognition) and texture mapping.

Geographical cluster

A geographical cluster is a localized anomaly, usually an excess of something given the distribution or variation of something else. Often it is considered as an incidence rate that is unusual in that there is more of some variable than might be expected. Examples would include: a local excess disease rate, a crime hot spot, areas of high unemployment, accident blackspots, unusually high positive residuals from a model, high concentrations of flora or fauna, physical features or events like earthquake epicenters etc... Identifying these extreme regions may be useful in that there could be implicit geographical associations with other variables that can be identified and would be of interest. Pattern detection via the identification of such geographical clusters is a very simple and generic form of geographical analysis that has many applications in many different contexts. The emphasis is on localized clustering or patterning because this may well contain the most useful information. A geographical cluster is different from a high concentration as it is generally second order, involving the factoring in of the distribution of something else. == Geographical cluster detection == Identifying geographical clusters can be an important stage in a geographical analysis. Mapping the locations of unusual concentrations may help identify causes of these. Some techniques include the Geographical Analysis Machine and Besag and Newell's cluster detection method.

Photoanalysis

Photoanalysis (or photo analysis) refers to the study of pictures to compile various types of data, for example, to measure the size distribution of virtually anything that can be captured by photo. Photoanalysis technology has changed the way mines and mills quantify fragmented material. Images are an effective way to document conditions before, after, and even during blasting activities. The technology is advancing at a high rate, and lenses, storage media memory, light sensitivity and resolution have been improving steadily. Today's digital cameras and camcorders include high-resolution optics, compact size, automatic time and date stamps, good battery life, shutters to freeze motion, and computers to autofocus and eliminate jitter using image stabilization. == Mining == Photoanalysis in mining operations can provide an automated system that forewarns a company of potential problems with materials, leading to economies and reduced damage caused from over-sized materials. It can also help determine the effectiveness of blasts. A company can use this technology to monitor materials moving on a conveyor belt in an underground environment, to measure piles left over from a blast, and even measure the amount of material being carried by dump trucks or vessels to a destination. Photoanalysis is being used on SAG mills worldwide to control the size of rock being crushed. Companies are using this technology to determine the size of particles being processed in the SAG Mill.[1] Archived 2009-05-23 at the Wayback Machine Having oversize material entering the SAG mill makes an operation less efficient, costing companies money in electrical and maintenance costs. Photoanalysis technology can eliminate unwanted material before it enters the mill, keeping rock crushing costs low. == Forestry == Wood chip size can affect the overall quality of a product. With automated photoanalysis systems, companies can remove any unwanted wrong-size particles without stopping their mill process. Photoanalysis can affect how efficiently forestry companies operate. In mills worldwide, photoanalysis technology is improving the use of lumber products, cutting back on the amount of trees being used to operate, and saving companies money through quality control optimization.[2] With the current downturn in the North American forestry industry, operators are looking at making their mills more efficient and effective when processing materials. Photoanalysis technology helps identify any weaknesses in the process by continuously monitoring different sections of an operation. == Agriculture == Agricultural companies can, using photoanalysis, monitor conveyor belts of food without contaminating the product by touching it. Other benefits of photoanalysis systems include: Automated removal of any unwanted material on food conveyor Improved quality control for the most important parts of the agricultural process Pinpoint accuracy that helps the efficiency and effectiveness of product handling techniques The importance of photoanalysis technology is being noticed by the agricultural industry as it identifies any unwanted materials going through the process. In an example, if a mouse is on a conveyor of corn, photoanalysis technology would be able to identify the unwanted object and remove it before it contaminates the whole process. == Origins of photoanalysis technology == Photoanalysis technology was created by using the Waterloo Image Enhancement Process in the 1980s. After further development of the imaging process with explosives producer DuPont, engineers Tom Palangio and Takis Katsabanis began selling photoanalysis software commercially. They later renamed the process WipFrag, standing for Waterloo Image Process Fragmentation Today, photoanalysis technology has evolved into stabilized and portable systems that can automatically capture and analyze results instantly. Thousands of these products are currently being used around the world to measure fragmented material. == Photoanalysis equipment photos == == Fragmentation analysis == Fragmentation analysis is becoming a popular term in mining, agricultural and forestry industries. With the majority of money in these industries directed towards the proper sizing of materials, companies are using fragmentation analysis to determine various factors within an operation.[3] The two main ways a company keeps track of fragmented material are through manual and automated sieving procedures. Manual sieving involves extracting a sample of material to analyze the size distribution. The results can be tabulated within two days. Automated sieving is an advanced way of sieving materials running through a process. Without having to extract the material, photoanalysis can take place, allowing for immediate results with pinpoint accuracy. == Blast Fragmentation Software == Operators are using fragmentation analysis to determine the effectiveness of various blasts. With automated sieving technology, workers can track the success of these blasts and receive instant results. Companies are using these results to determine what blasting method yielded the best results for their specific operation. The common variables associated with blast optimization are the provided Particle Size Distribution (PSD) from a shovel fragmentation system, geology including rock type and fracturing, and energy factor. By using photoanalysis the fragmented materials can be monitored, offering pinpoint accuracy and allowing mine operators to make adjustments to future blasting procedures. See Optical Granulometry to view the automated sieving process. == Pre-crushing analysis == Maintenance costs can be significantly reduced if an operation focuses on the fragmentation of the particles passing through their process. Automated sieving systems can detect and help remove any oversize material before it enters the crusher and causes maintenance problems. It also helps determine the effectiveness of the mining process prior to crushing; the sizing of material is always a critical part of operations in the mining, forestry and agricultural industries. Having an analysis taking place at every major point in an operation allows for the proper tracking of material being processed. Engineers can then determine what part of the process needs improving based solely on the size of material. == Post-crushing analysis == Measuring how effective industrial crushers are, can help save a company millions of dollars in energy costs on an annual basis. There are two components that affect a typical crusher: the size of the material inputted, and the speed at which the crusher is moving. If the user can find a perfect balance between these two components, the materials will be crushed to the right size in the shortest time possible. Meeting the material standards set by governments and large companies can be hard. Having a post-crushing analysis taking place ensures that no oversize material gets shipped; eliminating the chance of getting fined for not meeting industry specifications.

Analogical modeling

Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah. It is applicable to language modeling and other categorization tasks. Analogical modeling is related to connectionism and nearest neighbor approaches, in that it is data-based rather than abstraction-based; but it is distinguished by its ability to cope with imperfect datasets (such as caused by simulated short term memory limits) and to base predictions on all relevant segments of the dataset, whether near or far. In language modeling, AM has successfully predicted empirically valid forms for which no theoretical explanation was known (see the discussion of Finnish morphology in Skousen et al. 2002). == Implementation == === Overview === An exemplar-based model consists of a general-purpose modeling engine and a problem-specific dataset. Within the dataset, each exemplar (a case to be reasoned from, or an informative past experience) appears as a feature vector: a row of values for the set of parameters that define the problem. For example, in a spelling-to-sound task, the feature vector might consist of the letters of a word. Each exemplar in the dataset is stored with an outcome, such as a phoneme or phone to be generated. When the model is presented with a novel situation (in the form of an outcome-less feature vector), the engine algorithmically sorts the dataset to find exemplars that helpfully resemble it, and selects one, whose outcome is the model's prediction. The particulars of the algorithm distinguish one exemplar-based modeling system from another. In AM, we think of the feature values as characterizing a context, and the outcome as a behavior that occurs within that context. Accordingly, the novel situation is known as the given context. Given the known features of the context, the AM engine systematically generates all contexts that include it (all of its supracontexts), and extracts from the dataset the exemplars that belong to each. The engine then discards those supracontexts whose outcomes are inconsistent (this measure of consistency will be discussed further below), leaving an analogical set of supracontexts, and probabilistically selects an exemplar from the analogical set with a bias toward those in large supracontexts. This multilevel search exponentially magnifies the likelihood of a behavior's being predicted as it occurs reliably in settings that specifically resemble the given context. === Analogical modeling in detail === AM performs the same process for each case it is asked to evaluate. The given context, consisting of n variables, is used as a template to generate 2 n {\displaystyle 2^{n}} supracontexts. Each supracontext is a set of exemplars in which one or more variables have the same values that they do in the given context, and the other variables are ignored. In effect, each is a view of the data, created by filtering for some criteria of similarity to the given context, and the total set of supracontexts exhausts all such views. Alternatively, each supracontext is a theory of the task or a proposed rule whose predictive power needs to be evaluated. It is important to note that the supracontexts are not equal peers one with another; they are arranged by their distance from the given context, forming a hierarchy. If a supracontext specifies all of the variables that another one does and more, it is a subcontext of that other one, and it lies closer to the given context. (The hierarchy is not strictly branching; each supracontext can itself be a subcontext of several others, and can have several subcontexts.) This hierarchy becomes significant in the next step of the algorithm. The engine now chooses the analogical set from among the supracontexts. A supracontext may contain exemplars that only exhibit one behavior; it is deterministically homogeneous and is included. It is a view of the data that displays regularity, or a relevant theory that has never yet been disproven. A supracontext may exhibit several behaviors, but contain no exemplars that occur in any more specific supracontext (that is, in any of its subcontexts); in this case it is non-deterministically homogeneous and is included. Here there is no great evidence that a systematic behavior occurs, but also no counterargument. Finally, a supracontext may be heterogeneous, meaning that it exhibits behaviors that are found in a subcontext (closer to the given context), and also behaviors that are not. Where the ambiguous behavior of the nondeterministically homogeneous supracontext was accepted, this is rejected because the intervening subcontext demonstrates that there is a better theory to be found. The heterogeneous supracontext is therefore excluded. This guarantees that we see an increase in meaningfully consistent behavior in the analogical set as we approach the given context. With the analogical set chosen, each appearance of an exemplar (for a given exemplar may appear in several of the analogical supracontexts) is given a pointer to every other appearance of an exemplar within its supracontexts. One of these pointers is then selected at random and followed, and the exemplar to which it points provides the outcome. This gives each supracontext an importance proportional to the square of its size, and makes each exemplar likely to be selected in direct proportion to the sum of the sizes of all analogically consistent supracontexts in which it appears. Then, of course, the probability of predicting a particular outcome is proportional to the summed probabilities of all the exemplars that support it. (Skousen 2002, in Skousen et al. 2002, pp. 11–25, and Skousen 2003, both passim) === Formulas === Given a context with n {\displaystyle n} elements: total number of pairings: n 2 {\displaystyle n^{2}} number of agreements for outcome i: n i 2 {\displaystyle n_{i}^{2}} number of disagreements for outcome i: n i ( n − n i ) {\displaystyle n_{i}(n-n_{i})} total number of agreements: ∑ n i 2 {\displaystyle \sum {n_{i}^{2}}} total number of disagreements: ∑ n i ( n − n i ) = n 2 − ∑ n i 2 {\displaystyle \sum {n_{i}(n-n_{i})}=n^{2}-\sum {n_{i}^{2}}} === Example === This terminology is best understood through an example. In the example used in the second chapter of Skousen (1989), each context consists of three variables with potential values 0-3 Variable 1: 0,1,2,3 Variable 2: 0,1,2,3 Variable 3: 0,1,2,3 The two outcomes for the dataset are e and r, and the exemplars are: 3 1 0 e 0 3 2 r 2 1 0 r 2 1 2 r 3 1 1 r We define a network of pointers like so: The solid lines represent pointers between exemplars with matching outcomes; the dotted lines represent pointers between exemplars with non-matching outcomes. The statistics for this example are as follows: n = 5 {\displaystyle n=5} n r = 4 {\displaystyle n_{r}=4} n e = 1 {\displaystyle n_{e}=1} total number of pairings: n 2 = 25 {\displaystyle n^{2}=25} number of agreements for outcome r: n r 2 = 16 {\displaystyle n_{r}^{2}=16} number of agreements for outcome e: n e 2 = 1 {\displaystyle n_{e}^{2}=1} number of disagreements for outcome r: n r ( n − n r ) = 4 {\displaystyle n_{r}(n-n_{r})=4} number of disagreements for outcome e: n e ( n − n e ) = 4 {\displaystyle n_{e}(n-n_{e})=4} total number of agreements: n r 2 + n e 2 = 17 {\displaystyle n_{r}^{2}+n_{e}^{2}=17} total number of disagreements: n r ( n − n r ) + n e ( n − n e ) = n 2 − ( n r 2 + n e 2 ) = 8 {\displaystyle n_{r}(n-n_{r})+n_{e}(n-n_{e})=n^{2}-(n_{r}^{2}+n_{e}^{2})=8} uncertainty or fraction of disagreement: 8 / 25 = .32 {\displaystyle 8/25=.32} Behavior can only be predicted for a given context; in this example, let us predict the outcome for the context "3 1 2". To do this, we first find all of the contexts containing the given context; these contexts are called supracontexts. We find the supracontexts by systematically eliminating the variables in the given context; with m variables, there will generally be 2 m {\displaystyle 2^{m}} supracontexts. The following table lists each of the sub- and supracontexts; x means "not x", and - means "anything". These contexts are shown in the venn diagram below: The next step is to determine which exemplars belong to which contexts in order to determine which of the contexts are homogeneous. The table below shows each of the subcontexts, their behavior in terms of the given exemplars, and the number of disagreements within the behavior: Analyzing the subcontexts in the table above, we see that there is only 1 subcontext with any disagreements: "3 1 2", which in the dataset consists of "3 1 0 e" and "3 1 1 r". There are 2 disagreements in this subcontext; 1 pointing from each of the exemplars to the other (see the pointer network pictured above). Therefore, only supracontexts containing this subcontext will contain any disagreements. We use a simple rule to identify the homogeneous supraco