Investigación
Nuestro trabajo se basa en la investigación previa de nuestros miembros en Grandes modelos de lenguaje para código y Aprendizaje por refuerzo, reflejando una profunda experiencia en el ámbito académico e industrial.
Code LLM
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
2020
Code LLM
Seed-Coder: Let the Code Model Curate Data for Itself
2025
Artículos
2025
FullStack Bench: Evaluating LLMs as Full Stack Coders
Code LLM benchmark
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Code LLM benchmark
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
LLM
Seed-Coder: Let the Code Model Curate Data for Itself
Code LLM
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
LLM
LLM-Powered Test Case Generation for Detecting Tricky Bugs
Code LLM
Focused-dpo: Enhancing code generation through focused preference optimization on error-prone points
Code Post-training
CodeDPO: Aligning Code Models with Self Generated and Verified Source Code
Code Post-training
SEAlign: Alignment training for software engineering agent
Code Agent
2024
Deep learning for code generation: a survey
Code LLM
Poison Attack and Poison Detection on Deep Source Code Processing Models
Code LLM
Selene: Pioneering Automated Proof in Software Verification
Code LLM
TrickyBugs: A Dataset of Corner-case Bugs in Plausible Programs
Code LLM benchmark
HITS: High-coverage LLM-based Unit Test Generation via Method Slicing
Code LLM
Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges
Code Agent
HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position
Code LLM
2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
LLM benchmark
Retrieval-generation synergy augmented large language models
LLM
Improved Visual Story Generation with Adaptive Context Modeling
LLM
Who Judges the Judge: An Empirical Study on Online Judge Tests
Code LLM
Self-edit: Fault-aware code editor for code generation
Code Agent
2022
Toolcoder: Teach code generation models to use api search tools
Code Agent
Towards Robustness of Deep Program Processing Models—Detection, Estimation, and Enhancement
Code LLM
Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
Foundation Model
One model, multiple modalities: A sparsely activated approach for text, sound, image, video and code
Foundation Model
2021
2020
Hierarchical Poset Decoding for Compositional Generalization in Language
LLM
Fact-aware Sentence Split and Rephrase with Permutation Invariant Training
LLM
Generating Adversarial Examples for Holding Robustness of Source Code Processing Models
Code LLM
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Code LLM
Learning to represent programs with heterogeneous graphs
Code LLM