标题:用于无线数据收集的无人机路径规划:一种深度强化学习方法

摘要】自主部署支持下一代通信网络的无人机需要有效的轨迹规划方法。我们提出了一种新的端到端强化学习(RL)方法,用于在城市环境中从物联网(IoT)设备进行支持无人机的数据收集。自主无人机的任务是从有限的飞行时间和避障中收集来自分布式传感器节点的数据。虽然以前的基于学习和非学习的方法必须执行昂贵的重新计算或重新学习重要场景参数(例如传感器的数量,传感器的位置或最大飞行时间)发生变化时的行为,但我们训练了双深度Q网络(DDQN ),并结合经验回放来学习无人机控制策略,该策略概括了场景参数的变化。通过利用通过卷积网络层馈给代理的环境的多层地图,我们表明,我们提出的网络体系结构使代理能够针对各种场景参数制定移动决策,从而平衡数据收集目标与飞行时间效率和安全约束。还示出了通过使用以无人机位置为中心的地图而不是非中心地图在学习效率方面的显着优势。

Title: UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

[abstract]  Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV’s position over a non-centered map are also illustrated.

【作者】Harald Bayerlein, Mirco Theile, Marco Caccamo, David Gesbert

点击查看原文

头像

By szf