Alessandro Simoni

I am a Ph.D. Candidate at AImageLab at the University of Modena and Reggio Emilia, Italy.

My research activities are focused on Computer Vision and Deep Learning applied to Collaborative Robotic tasks, more precisely in topics like 3D Object Reconstruction and Human/Robot Pose Estimation. I work under the supervision of Prof. Roberto Vezzani.

Email  |  CV  |  Google Scholar  |  Github  |  LinkedIn

Research activities

In the first part of my Ph.D. I tackled the task of 3D Object Reconstruction. More recently, I've started working on robotic tasks, involving Pose Estimation and Grasping. Representative papers are highlighted.

Authored publications:


Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps
Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani
Under Review

Thanks to a novel 3D pose representation composed of two decoupled heatmaps, efficient deep networks from the 2D HPE domain can be adapted to accurately compute 3D joints locations in world coordinates. Moreover, depth maps are used to bridge the gap between synthetic and real data.


Multi-Category Mesh Reconstruction From Image Collections
Alessandro Simoni, Stefano Pini, Roberto Vezzani, Rita Cucchiara
3DV, 2021

arXiv  |  bibtex  |  code  |  poster  |  slides  |  presentation (video)

A multi-category mesh reconstruction framework infers the textured mesh of objects, learning category-specific priors in an unsupervised manner and obtaining smooth shapes with a dynamic mesh subdivision approach.


Improving Car Model Classification through Vehicle Keypoint Localization
Alessandro Simoni, Andrea D'Eusanio, Stefano Pini, Guido Borghi, Roberto Vezzani
VISAPP, 2021 (Oral Presentation)

paper  |  bibtex  |  slides  |  presentation (video)

A multi-task framework combines visual features and keypoint localization features in order to improve car model classification accuracy.


Future Urban Scenes Generation Through Vehicles Synthesis
Alessandro Simoni, Luca Bergamini, Andrea Palazzi, Simone Calderara, Rita Cucchiara
ICPR, 2020

arXiv  |  bibtex  |  code  |  poster  |  slides  |  presentation (video)

A two-stage approach in which interpretable information are exploited by a novel view synthesis architecture in order to reproduce the future visual appearance of vehicles in an urban scene.

Co-authored publications:


Unsupervised Detection of Dynamic Hand Gestures from Leap Motion Data
Andrea D'Eusanio, Stefano Pini, Guido Borghi, Alessandro Simoni, Roberto Vezzani
ICIAP, 2021

paper  |  bibtex

An unsupervised approach used to train a Transformer-based architecture that learns to detect dynamic hand gestures in a continuous temporal sequence.


SHREC 2021: Skeleton-based hand gesture recognition in the wild
Ariel Caputo, Andrea Giacchetti, Simone Soso, Deborah Pintani, Andrea D'Eusanio, Stefano Pini, Guido Borghi, Alessandro Simoni, Roberto Vezzani, Rita Cucchiara, et al.
Computers & Graphics, 2021

paper  |  bibtex

A Transformer-based architecture and a Finite State Machine (FSM) are able to detect and classify a gesture. One of the proposals in the SHREC2021 contest.


Extracting Accurate Long-term Behavior Changes from a Large Pig Dataset
Luca Bergamini, Stefano Pini, Alessandro Simoni, Roberto Vezzani, Simone Calderara, Rick B. D'Eath, Robert B. Fisher
VISAPP, 2021

paper  |  bibtex  |  dataset

Given a large annotated pig dataset, long-term pig behavior analysis is possible, even though estimates from individual frames can be noisy.


A Transformer-Based Network for Dynamic Hand Gesture Recognition
Andrea D'Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, Rita Cucchiara
3DV, 2020

paper  |  bibtex

A Transformer-based architecture that is able to recognize dynamic hand gestures exploiting information from a single active depth sensor (depth maps and surface normals).


Multimodal Hand Gesture Classification for the Human-Car Interaction
Andrea D'Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, Rita Cucchiara
Informatics, 2020

paper  |  bibtex

A multimodal combination of CNNs whose input is represented by RGB, depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems.

Reviewing Service
Conferences:
  • IEEE International Conference on Pattern Recognition (ICPR)

Journals:
  • IEEE Robotics and Automation Letters (RA-L)

Workshops:
  • Towards a Complete Analysis of People: From Face and Body to Clothes (T-CAP)

Courses and Summer Schools
  • Advanced Course on Data Science and Machine Learning - ACDL 2021, Certosa di Pontignano (SI), Italy (certificate)


Source code

Credit for style and layout