Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins

1New York University, 2Carnegie Mellon University, 3Columbia University

* denotes equal contribution.

Visuo-Skin (ViSk), a simple effective framework that leverages low-dimensional skin- based tactile sensing for visuotactile policy learning in the real world.

Abstract

Visuo-motor policy learning has advanced robotic manipulation, but mastering precise, contact-rich tasks remains challenging due to vision's limitations in reasoning about contacts.

To solve this, several efforts have been made to integrate tactile sensors into policy learning. However, many of these efforts rely on optical tactile sensors that are either confined to recognition tasks or require complex dimensionality reduction steps for policy learning. This work looks at learning policies with magnetic skin sensors as they are natively low-dimensional, highly sensitive, and cheap to integrate on robotic platforms.

To do this effectively, we present VISK, a simple framework that uses a transformer-based policy and treats skin sensor data as additional tokens to vision-based information. Evaluated across four complex real-world tasks (credit card swiping, plug insertion, USB insertion, and bookshelf retrieval), VISK significantly outperforms vision-only and prior tactile models. Further analysis reveals that combining tactile and visual modalities enhances policy performance and spatial generalization, achieving an average improvement of 27.5% across tasks.

ViSk Architecture

ViSk Figure 2

Policy Learning for 4 Precise Tasks

The following videos are learnt ViSk policy rollouts being executed on the robot at 1x speed.

Plug Insertion

USB Insertion

Card Swiping

Brown Book Retrieval

Experimental Results

We run 10 evaluations each across 3 seeds, on held out unseen target object positions for each task.

Policy Performance out of 30 Rollout Evaluations

Policy Performance out of 30 rollouts

Comparison between Sensors

Comparison across sensors

Generalization to Unseen Task Variations

Task Generalization
Task Generalization Results

Zero-Shot Generalization Results