Representing the scene appearance by a global image descriptor (BoW, NetVLAD, etc.) is a widely adopted choice to address Visual Place Recognition (VPR). The main reasons are that appearance descriptors can be effectively provided with radiometric and perspective invariances as well as they can deal with large environments because of their compactness. However, addressing metric localization with such descriptors (a problem called Appearance-based Localization or AbL) achieves much poorer accuracy than those techniques exploiting the observation of 3D landmarks, which represent the standard for visual localization. In this paper, we propose ALLOM (Appearance-based Localization with Local Observation Models) which addresses AbL by leveraging the topological location of a robot within a map to achieve accurate metric estimations. This topology-assisted metric localization is implemented with a sequential Monte Carlo Bayesian filter that applies a specific observation model for each different place of the environment, thus taking advantage of the local correlation between the pose and the appearance descriptor within each region. ALLOM also benefits from the topological structure of the map to detect eventual robot loss-of-tracking and to effectively cope with its relocalization by applying VPR. Our proposal demonstrates superior metric localization capability compared to different state-of-the-art AbL methods under a wide range of situations.