If you struggle with explaining why your data science project is difficult, it’s often down to data not being “ready to go”. But how to explain that to people that never worked with data themselves?

A great idea to help explain “what’s taking so long” to get results out of data is the concept of data readiness. It allows you to state exactly how far the data is away from where it needs to be to perform an analysis.

Data readiness consists of three bands, each of which we can divide further into more specific data readiness levels. …

Natural Language Processing (NLP) has been booming since 2018 with ELMo BERT &c. In recent months, the field made even more progress. But what gives? What can industry leverage? Three key takeaways from AMLD’s NLP track.

Superstars: Transformer Models

Transformer models are all the rage these days. They have beaten the previously dominant long short-term memory networks in many state of the art NLP tasks. But what are they really about?

The key ingredient: Parallelization. As opposed to long short-term memory networks, Transformer models are not required to ingest sentences one word after another. Instead, they can be fed complete sentences at once.

Leveraging parallel computing is their main advantage, since this allows for training on more data than ever before (i.e. ridiculous amounts of text). …

Mikio Braun held a great talk at AMLD 2020 on the topic of getting ML into production. Mikio is staff scientist at GetYourGuide and was previously working as senior datascientist for Zalando. He might have an idea or two about applied ML, so I thought it’s worth sharing my takeaways from his talk.

His main theme: How do we “do ML”? Is algorithms + tools + data all there is to it? Maybe, but surely they all have further facets to explore. Mikio focuses on three topics:

  1. Where and when to apply ML (and when not?)
  2. In which architecture or…

Today, the fourth edition of Applied Machine Learning Days kicked off its keynotes. Here are my top three takeaways.

Meaningful data remains a challenge

Data keeps being the main asset, but at the same time making this data F.A.I.R (findable, accessible, interoperable and reusable) keeps being a main challenge. Companies and their C-suit seem to realize this more and more.

  • Christopher Bishop of Microsoft Research Lab made (again) a great case for compute & data beating domain knowledge in the long run, referring to the “Bitter Lesson”, a recent blog post of Rich Sutton. Worth a read!
  • Asif Jan presented how Roche is running a…

Dennis Meier

Datascience Teamlead

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store