How a Big Data Architect transformed a retailer’s resource management

Reading Time: 6 minutes

If we think of the future of retail, what springs to mind? It could be same-day drone delivery, virtual changing rooms, or wholly unique, fully personalised products. Such services might be just on the horizon, but the architecture that could support them is already here – big data. In order to piece together what they can do with this vast amount of information, a big data architect is a crucial component of the project team.

Both online and in-store retailers are embracing data-driven strategies. This helps them better understand their customers, optimise prices and predict sales patterns. Retailers are always finding innovative new ways to get the insights they need from both structured and unstructured data sources, so long as they have the right tools and team at their disposal.

A good example of how retailers can effectively exploit big data is a recent project I did with a chain of stores that wanted to better distribute resources among their locations. With a neural network model, the team refined an existing system to streamline their operations dramatically. Here, we’ll explore this use case alongside some key facts about big data in retail.

Table of Contents

What is big data architecture?

For those unfamiliar with the concepts, we’ll kick off with a definition and then get a bit deeper into the mechanics. Big data architecture is designed to handle the ingestion, processing and analysis of data that are too numerous and varied for traditional database systems.

Architecting big data solutions generally involve batch processing of big data sources, real-time processing, interactive exploration, predictive analytics and machine learning. The architecture itself is organised into layers that handle different stages and processes. So what are the layers of a big data architecture?

Get a big data architect to transform your data strategy

In essence, the “layers” provide a logical way to organise the architecture’s components. Each layer will perform a specific function, which isn’t necessarily a separate machine or process, but a means to rationalise the structure. These layers are data sources, data massaging and storage, the analysis layer and the consumption layer. Let’s look at what each one does.

Big data sources

The big data architect will be able to identify which data are required for the analysis the business wants to perform. The sources for a big data architecture will come from various channels. The collection point could be either a primary or secondary source; for example, it could be direct from the retailer’s payment systems or from a third-party shopping platform. The velocity and volume will also vary depending on its origin. It will vary in format, including structured, semi-structured and unstructured data.

Data massaging and storage

This layer acquires data from the sources and converts them into the format appropriate to the analysis process. For instance, an image might need to be processed so it can be stored in a Relational Database Management System or a Hadoop architecture. In some cases, this phase might be skipped if the analysis layer can read data in its native format. There are also data governance processes and regulatory standards to consider, which will affect where and how the data is stored.

Analysis

The analysis layer crunches the data that’s been prepared by the messaging layer. Designing this layer will require careful consideration by the big data architect. Decisions will need to be made to ensure that the process produces the desired insights. This includes understanding whether the source and analysis tools are compatible and then, in turn, selecting the right algorithms to get the information you’re looking for.

Consumption

The consumption layer ingests the output of the analysis layer. This could be visualisation tools, business intelligence platforms, or a good old-fashioned presentation prepared by a human. It can be challenging to articulate the findings of the analysis layer, especially within the context of big data, but this is where the business will really start to extract value.

What are the areas in the retail industry that can benefit from big data?

Now we’ve outlined the basic components of big data architecture, let’s look at how it can support strategy in retail. Big data analytics can be applied to every stage of the supply chain and customer journey; from forecasting demand to optimising pricing, data offers strategists a wealth of invaluable information. In the following section, I’ll break down in a little more detail which areas big data could make all the difference.

Predictive models and forecasting trends

The most exciting thing that big data can do is help us predict the future. Certainly, it’s not a crystal ball, but it can still be a game changer for business strategists. In retail, such strategies can be deployed to harvest insights from sources including historic sales data and social media. This helps create models indicating what your customers are likely to buy and when, providing greater visibility on supply, demand and profit margins.

Price optimisation

Online retail giants like Amazon have been trailblazers in harnessing data to optimise prices. Online, consumers are empowered to price check in a matter of seconds, meaning that’ll always be the best price that wins. To secure the sale, the price needs to be updated in real-time. Although the uncertainty can cause headaches for third-party sellers, it ultimately boosts sales and improves margins.

Customer behaviour and personalisation

One of the most powerful applications of big data in retail is monitoring customer behaviour and preferences. By analysing customer data, retailers can identify the most popular touchpoints, marketing channels and products. They can also monitor when and where conversion takes place in order to optimise the customer journey.

This information will also facilitate deeper personalisation for product development and marketing effects. Consumers are much more likely to respond to tailor-made marketing efforts – see Amazon’s seminal “other customers also bought…” algorithm – making such strategies vital to retail in the digital age.

Case study: Big data architecture for retail network

This case study demonstrates the value of big data analytics for retailers. My client was a chain of stores that needed to implement a system that could predict the resource needs of each location and allocate them accordingly. They already had a system in place, but it was my role to make it work as well as possible.

The current system made predictions using a Tensorflow neural network and allocated resources using an operations research (OR) algorithm implemented in Python, Spark and Pyspark. Everything was running on the client’s GPU, which meant it had some performance setbacks: it lacked accuracy and the overall execution time was far too slow.

The method

In order to increase the prediction accuracy of the model, I set about improving the data quality. This was achieved via a two-pronged approach: the data was prepared, cleaned, and policies implemented to ensure no “dirty” or redundant data was fed to the system. This “missing data” governance strategy ensured the architecture was only handling valuable, insight-rich information.

Redundant data was also reduced by slimming down data dimensionality. For those unfamiliar with the term, this is how the data is grouped or clustered, whether it be in columns or other formats to rationalise the information. This was achieved via a principal components analysis and selecting the most relevant principal component analysis (PCA) variables.

Finally, tweaks were made to the existing neural network architecture and changing the loss function – which in layman’s terms, is how you reward the neural network during training. We also adjusted the features that were fed to the neural network to get a better handle on the results we wanted.

Summary of approach:

Data quality improved through proper preparation and cleaning.
Missing data policy implemented.
Data dimensionality was reduced through principal components analysis and selection of relevant PCA variables.
Adjustment of neural network architecture by optimising the number of neurons and layers.
Optimisation of loss function.
Update of features fed to the neural network.

The results

These adjustments meant that execution time was improved significantly. By redistributing the machines where each algorithm runs, the neural network is parallelizable by definition – that is, it can achieve model parallelism. This involves distributing training examples across different processors so that the model can expand without sacrificing accuracy. This also meant the network ran well on the client’s existing GPUs.

Meanwhile, the OR algorithms, which run better on CPUs, were migrated to solve performance issues. The result was system performances that met the client’s requirements in regard to both accuracy and running time. This supercharged their existing resource management strategy, transforming the way they managed their network of stores.

Summary of achievements:

Parallelism was achieved for the neural network.
Processes moved to units to maximise efficiency.
Overall accuracy and run time were fully optimised.
Improved resource management across locations.

Get a big data architect to transform your data strategy

There is a wealth of information out there that could transform retail. It can help businesses get to know their customers better, meet their demands, and run their operations more efficiently – all they need to do is know how to handle it. This is where a big data architect can help. They’ll be able to create a solution to rationalise information, process it, and get insights that have a meaningful relationship to business objectives.

However, a challenge is that it’s often difficult for retailers to get a senior data architect on staff. For a start, these profiles are difficult to come by; a big data architect is one of the most in-demand profiles across industries. Furthermore, for many, implementing a process such as the one described here is done on a project-by-project basis. This can make justifying such a highly qualified salaried staff member difficult.

That said, this doesn’t mean such expertise is out of reach. Outvise connects clients with high-end business and technology experts to deliver on ambitious data projects. With a network of over 37,000 data scientists, engineers and consultants based all over the world with a variety of different professional backgrounds, you can find a big data architect with the experience you need in a matter of clicks.

A big data architect could hold the key to making your business run better – so start exploring the Outvise portfolio to get your project on the road.

Gabriel Vidal Sitjes

PhD in High Energy Physics with more than 20 years of expertise analyzing data and applying machine learning, artificial intelligence and big data solutions to data of all kinds in international centers.