TensorGPT
Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition.
Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition.
On the Structural Pruning of Large Language Models.
Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes.
Faithfully Emulating Large Production Networks.
Scalable Tail Latency Estimation for Data Center Networks.
A caching framework for microservice applications.
Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.