LLM Data Assets

About 1,130,000 results

Open links in new tab

Any time

amazon.com
https://aws.amazon.com › blogs › machine-learning › search-enterprise...
Search enterprise data assets using LLMs backed by knowledge …
Nov 27, 2024 · In this post, we present a generative AI-powered semantic search solution that empowers business users to quickly and accurately find relevant data assets across various enterprise data sources.
arxiv.org
https://arxiv.org › abs
Datasets for Large Language Models: A Comprehensive Survey
Feb 28, 2024 · Abstract: This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that sustains and nurtures the development of LLMs.
github.com
https://github.com › mlabonne › llm-datasets
GitHub - mlabonne/llm-datasets: Curated list of datasets and …
Data is the most valuable asset in LLM development. When building a dataset, we target the three following characteristics: Accuracy: Samples should be factually correct and relevant to their corresponding instructions. This can involve using solvers for math and unit tests for code.
arxiv.org
https://arxiv.org › abs
[2503.18792] REALM: A Dataset of Real-World LLM Use Cases
1 day ago · It categorizes LLM applications and explores how users' occupations relate to the types of applications they use. By integrating real-world data, REALM offers insights into LLM adoption across different domains, providing a foundation for future research on their evolving societal roles. A dedicated dashboard this https URL presents the data.
github.com
https://github.com › LLMDataHub
LLMDataHub: Awesome Datasets for LLM Training - GitHub
Training a chatbot LLM that can follow human instruction effectively requires access to high-quality datasets that cover a range of conversation domains and styles. In this repository, we provide a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each ...
Missing:
- Data Assets
Must include:
- Data Assets

projectpro.io
https://www.projectpro.io › article › llm-datasets-for-training
15+ High-Quality LLM Datasets for Training your LLM Models
Oct 28, 2024 · Large language models (LLMs) are fueled by vast amounts of text data, ranging from books and code to articles and web crawl information. This data equips LLMs with the statistical knowledge to understand human language patterns. Here, we'll discuss some popular datasets for training LLMs for text generation tasks.
tsinghua.edu.cn
https://dbgroup.cs.tsinghua.edu.cn › ligl › papers
[PDF]
LLM for Data Management - Tsinghua University
We in-troduce three typical LLM based data management applications, including database optimization (e.g., system diagnosis), data pro-cessing (e.g., data standardization), and data...
Missing:
- Data Assets
Must include:
- Data Assets
arxiv.org
https://arxiv.org › abs
OpenLLM-RTL: Open Dataset and Benchmark for LLM-Aided …
6 days ago · The automated generation of design RTL based on large language model (LLM) and natural language instructions has demonstrated great potential in agile circuit design. However, the lack of datasets and benchmarks in the public domain prevents the development and fair evaluation of LLM solutions. This paper highlights our latest advances in open datasets and benchmarks from three perspectives ...
ieee.org
https://ieeexplore.ieee.org › document
An LLM-Based Framework for Synthetic Data Generation
The demand for high-quality datasets is rapidly increasing across sectors such as healthcare, finance, and cybersecurity, yet challenges like data scarcity and privacy concerns persist. To address this, we introduce a framework for synthetic data generation that empowers users to create realistic datasets while maintaining privacy. The framework leverages fine-tuned Large Language Models (LLMs ...
Missing:
- Data Assets
Must include:
- Data Assets
ieee.org
https://ieeexplore.ieee.org › document
Data-Prep-Kit: getting your data ready for LLM ... - IEEE Xplore
Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on …
Missing:
- Data Assets
Must include:
- Data Assets
Some results have been removed
Pagination
- 1
- 2
- 3
- 4
- Next

Search enterprise data assets using LLMs backed by knowledge …

Datasets for Large Language Models: A Comprehensive Survey

GitHub - mlabonne/llm-datasets: Curated list of datasets and …

[2503.18792] REALM: A Dataset of Real-World LLM Use Cases

LLMDataHub: Awesome Datasets for LLM Training - GitHub

Missing:

Must include:

15+ High-Quality LLM Datasets for Training your LLM Models

LLM for Data Management - Tsinghua University

Missing:

Must include:

OpenLLM-RTL: Open Dataset and Benchmark for LLM-Aided …

An LLM-Based Framework for Synthetic Data Generation

Missing:

Must include:

Data-Prep-Kit: getting your data ready for LLM ... - IEEE Xplore

Missing:

Must include: