Accelerating Projects in Machine Learning with Applied ML Prototypes - Cloudera Blog
It’s no secret that advancements like AI and machine learning (ML) can have a major impact on business operations. In Cloudera’s recent reportLimitless: The Positive Power of AI, we found that 87% of business decision makers are achieving success through existing ML programs. Among the top benefits of ML, 59% of decision makers cite time savings, 54% cite cost savings, and 42% believe ML enables employees to focus on innovation as opposed to manual tasks.
Data practitioners are at the top of the list of employees who are now able to put more focus on innovation.
Cloudera has seen a lot of opportunity to extend even more time saving benefits specifically to data scientists with the debut ofApplied Machine Learning Prototypes(AMPs). These AMPs help kickstart projects in machine learning by providing working examples of how to solve common data science use cases, enabling data scientists to move faster and focus more time on driving further innovation.
AMPs are fully built end-to-end data science solutions that allow data scientists to go from an idea to a fully working machine learning solution in a fraction of the time. Accessible with a single click from Cloudera machine learning or via public GitHub repositories, AMPs provide an end-to-end framework for building, deploying, and monitoring business-ready ML applications.
AMPs were born from the observation that data scientists very rarely start a new project from scratch. The pattern that we most often observe is that after a data scientist understands the problem and the data that they have to work with, they search the internet to find an example of something similar to what they are trying to accomplish. Unfortunately, this pattern of development has some significant drawbacks: (1) a lack of visibility into the author’s credibility; (2) there’s no guarantee that the code you find uses current best practices; and (3) it’s unknown whether the libraries used will work in your current environment.
AMPs are the solution to this age-old (well, 21st-Century old) problem. Every AMP was built by a member of Cloudera’s ML research group,Fast Forward Labs. Each AMP goes through a rigorous review process by some of the brightest and credible ML minds. AMPs are periodically reviewed and updated to ensure that methods and libraries are up to date. Lastly, each AMP ships with a requirements file so that a clean and consistent environment can be deployed with the correct dependencies.
For anyone who might be thinking, “If you’re releasing complete machine learning projects, aren’t you already doing the data scientist’s job for them?” The answer is a resounding no. These AMPs absolutely provide a starting point and allow data scientists to have a bit of ahead start on their project, but they still require coding and iterations to fit the specific use case. By rolling out AMPs, we’re helping large organizations accelerate past the deployment hump that often occurs, despite large initial investments in ML.
The Fast Forwards Labs team has developed and released more than a dozen AMPs to date with more to come. AMPs so far include:
We are still hard at work on some new AMPs, too. One much-anticipated, soon-to-be-released AMP is another flavor of distributing Python workloads, this time with Ray. Much like Dask, Ray is a unified framework for scaling AI and Python applications. This AMP will give practitioners an example of another way to distribute their data science workloads.
The biggest benefit of AMPs is the ability to fast track adoption of machine learning. For one biotech company, theStreamlit AMPhelped to get new apps in their tenant, enabling their data scientists to communicate results with business users. They also used theChurn Predictiondemo for onboarding, as a reference of ML and Python best practices. Companies also rely on AMPs likecontinuous model monitoringto improve their MLOps capabilities. For other use cases, like natural language processing (NLP), we have a number of AMPs that can help.
AMPs are great demonstration tools for practitioners to use during conversations with their internal stakeholders, proofs of concept, and workshops. They are a great way to demonstrate value and pave the way for quick wins with machine learning. They are available immediately to download from GitHub. If you’d like to talk to us about how to do more with your machine learning (contact info/link here).
If this blog inspired you to try your hand at creating your own AMP, then we’ve got just the thing for you. Cloudera, along with AMD, is sponsoring ahackathonwhere participants are tasked with creating their own unique applied ML prototype. Winning entrants will receive a cash prize, and their projects will be reviewed by Cloudera Fast Forward Labs and added to the AMP Catalog.
If you have a project that you would love to share with the community, are looking to differentiate your resume from the masses, and/or could use some extra cash, thensign upfor your chance to win!