This paper presents a hierarchical control framework for quadrupedal locomotion that unifies the complementary strengths of model-based optimization and reinforcement learning. We develop a convex Quadratic Programming (QP) solver based on the primal-dual Chambolle-Pock algorithm, enabling both massively parallel policy training and real-time deployment through efficient handling of constrained optimization problems. Our hierarchical framework employs learned policies for robust high-level control to handle real-world perturbations, while ensuring safety and energy efficiency through a low-level whole-body controller powered by the proposed solver. Extensive benchmarks and experimental validation demonstrate quantifiable improvements in energy consumption, constraint satisfaction, and task transferability across simulated and real-world environments.