Our Market Dominance Guy, Chris Beall, is flying solo again this week as he meets with data guru Tom Zheng. Tom is a business intelligence engineer and works as an independent contractor in the field of data analysis. In other words, he spends his days making sense out of those large quantities of data that tend to pile up in businesses. As CEO of ConnectAndSell, Chris uses Tom’s data analysis services to guide him through the often-confusing pathways that data can create. As Chris says, because data is kept in ways that are not always optimal for analysis, business leaders need people like Tom to help make sense of it, so they’ll know if they’re dominating their market or not, or if they’re making or losing money on different parts of their business.
With education credentials in economics and finance, Tom employs his talents in data engineering, data architecture, and data analysis. Along with these skills, he brings a great bedside manner — coupled with brutal honesty — to his data-sharing sessions with CEOs. To get to an actionable truth about the numbers he’s analyzing for companies, Tom uses a series of questions that he asks himself and his clients: “Is this data meaningful? Is it true? And does it lead us somewhere or not?” Take a listen to how expert data analysis can help you dominate your market on today’s Market Dominance Guys’ episode, “Giving Your Data the Sniff Test.”
About Our Guest Tom Zheng, a business intelligence engineer, and independent contractor, is a seasoned data guru who specializes in transforming your company’s data into actionable insights. With experience throughout the entire data value chain, from designing robust data ecosystems, implementing automated pipelines, and visualizing results in meaningful ways, Tom’s unique storytelling approach will help you discover those “Aha!” moments needed to challenge traditional thinking and assumptions. He can be reached at (416) 877-5412.
Here is the full transcript from this episode:
Hey everybody, Chris Beall here without [Corey Frank 00:02:03], I don't know how I can get by a Market Dominance Guys episode or two without him, but I don't know I'm going to be brave and I'm going to plunge right in. I'm here with Tom Zheng. Tom is a data guru. He's a guy who makes sense out of data. He's a master of the tools and the techniques and the mindset that it takes to take those big piles of data that tend to pile up in our businesses and help you make sense out of them. And he's working with me at Connect and Sell to make sense out of our let's say 50 million rows a year of data.
We call them rows in the data business. So think of it as we do 50 million dials a year. Each one generates some data. And if we wanted to figure out anything about that data, it's a couple of different ways we could go. I could have somebody do a project and go try to figure some stuff out. But what I've been doing with Tom, and this is a very Market Dominance Guys relevant is we spend some time every day exploring the data together. I've come up with a name for it. I call it a data concierge. So the CEO or a CRO, or whoever really cares about trying to understand the business and figure out what to do about the business can actually exercise their curiosity directly on the data, but without becoming an expert on the tools and have somebody to help them think through what makes sense. What's a good hypothesis. How might we go about addressing that hypothesis and get some facts. So welcome, Tom.
Thanks Chris. Glad to be here.
It's awesome to have you here. And this is one of these things that we've talked about a lot of stuff on Market Dominance Guys, but it tends to be about sales. And it tends to be about sales as though my thesis, which is you can pave a market with trust and with trust-based conversations or trust-yielding conversations, and then harvest that market over three or four years as folks come into the decision process. It says though, that thesis just operates by itself, right? And I think that's BS actually, when you really look at a real business and then you think, "Well, how does it work?" I've got to be getting feedback from the outside world somehow that says, Hey, what you're trying is either working or not, or accomplishing X or Y, or maybe doing something surprisingly wonderful that you didn't even imagine.
And that tends to be, I'll say hidden in the data in two ways. One is data is fundamentally complex. Maybe three ways. Tom might give us some more, but one is it's fundamentally complex. It represents lots of different things that happen and you're trying to figure out what does it all really, really mean and how does it go together? And the other is data is kept in ways that are not optimal for analysis. So we tend to build systems that are good for operating, getting something done, but we tend not to build them so that they're great for figuring out what was going on.
So I'm just going to have a chat here with Tom, and he's kind of enlightened us about something that I think every person who cares about market dominance should be thinking about, which is, who is helping you make sense of the data in your business, such that you can tell if you're dominating or not. You can tell if you're making or losing money on different parts of the business, and you can even uncover opportunities to go and build the business where you might not have seen those opportunities before. So I'm going to ask Tom just a little bit about his background first, and then we'll go from there. So, Tom, how did you fall into this? I know you used to teach little kids how to play the piano. Has that background that led you into this direction or was it something else?
Well, sort of, I mean, if you listen to studies, there's been lots of studies that show people who play piano tend to be better at math, right? Because music inherently is numbers-based. And so growing up, I've always been better on the numbers side than on the language side. And so when I went to university, I pursued a degree in economics and finance. I went to business school, but my major was in economics and finance. And so more so in the economics part, that's where you deal with a lot of data, right? So that was kind of the entryway into learning more about statistics and working with large data sets. And so when I started off in my career, I worked for a financial consulting company who specialized in helping banks revise processes and become more lean essentially. But I was involved on a lot of data-based projects and technology driven projects, which further honed in my technology skills.
And after a few years of working consulting, I then ended up working in a brand new industry, which was the cannabis industry here in Canada. And so a couple of years ago when they legalized cannabis, that's when I jumped into that industry and I worked as a data engineer. And I've since left that industry and now I am an independent consultant and my specialties are really anything to do with data specifically in data engineering, architecture and analysis.
Fascinating, fascinating. What was it that as you went into the cannabis industry, what were the data areas that were interesting there? What was the subject of... What was the mystery that they were trying to resolve that was most intriguing to you?
Well, considering it's a brand new industry with no standardizations, one of the challenges and what people were like myself were trying to fix is to create a standardized schema of how we capture our data and to design, for example how do you create a database table? What columns do you need in a particular dimension table? These are all things that were unknown because cannabis has never really been legalized at least in North America on a large scale, right? And additionally, there were no mature technology players. So everybody was designing their technology from scratch. And when people design their technology, they often didn't have a data lens to it because the idea is let's capture the data first before worrying about analyzing it tomorrow. And so one of the challenges of working in that industry is that data was often unclean and you spent a lot of time having to cleanse that data before you could even use it.
Oh, appropriate for the cannabis industry. A lot of cleansing was necessary before it's ready for use.
What can I say? What can I say? Something that struck me is when folks are doing sales-oriented kind of work, they tend to be using a CRM and CRMs have the endearing quality, but also frustrating that you can extend them by adding fields, adding objects and so forth. And that often is done by folks without any data backgrounds. So we have a couple of fields in our CRM that are laughable. One of them says, for instance, if I recall correctly, something like at the account level, there's a field that says 2017 revenue. And nobody who designs data for analysis or even maintenance would ever create such a field, but to the person that was trying to keep track of revenue that year, they thought, "Well, how simple, I'll just make a field that says 2017 revenue and drop it right here on the account object." Not thinking ahead to does that mean I need a 2018 revenue field and what is going to update that whatever process updated the 2017 revenue field it has to be changed in 2018 and so forth and so on.
That's an innocent example, but it's not an egregious example. I would say we have, my guess is we have probably 150 data fields that have been added to our CRM over time, including some new objects. I would say 10 to 20% of those are somehow in meaningful use. So we don't know what they are. And if you try to... And some of them changed their meaning over time. So you'll say, "Oh yeah, back in 2019, the way we use this field was we put in it the number of hours that it took to sell the deal, but that became uninteresting, but we kind of liked the field. And so we decided to put in the actual cycle time of the deal itself the total number of days between first engagement and first close, and we just thought that'd be better." And there you are the analyst trying to make sense out of this, how do you tackle stuff like that? And have you been faced with this, I'll call it the extensible data model done by amateur modeling, modelers problem that's full of lots of data? What do you do with that problem?
Absolutely. So this is something I see all the time and the reason why it happens is down... I can boil it down to one word, which is convenience, right? Often people will make customizations for their own convenience. And so whenever somebody adds a custom column or field, often they are just manipulating or filtering or aggregating existing data that already exists in their system, but it makes it more convenient to access. And so before I answer the question of how I deal with these types of custom columns, the first thing I will say is that as a general best practice, if you are running your company or at least the IT side of your systems company, you should always try to avoid adding these columns of convenience. Because if you use an analytical tool for example, Power BI, Tableau, or Click, there are very easy ways for you to recreate those columns of convenience directly on your report. And so it negates the need for you to actually add it into your system.
Whenever I see those things, it's always a big pain because they're not often labeled correctly. And what I mean by that is as a best practice for data governance, you should have this thing called a data dictionary, which is basically a tool that allows you to add metadata to all of your data sources, right? So in your case, for example, 2017 revenue, if a data dictionary tool was used, then the original author of that custom column could say, "These were the filters applied. This is how I aggregated and transformed the data." And so as a result, the data analyst or the end consumer does not need to make assumptions or reverse engineer how that column was calculated, right?
But in the absence of documentation, which is quite frequent, somebody like myself would have to reverse engineer and figure out how that column was calculated, if it's not inherently obvious. And it does take up a lot of time, but I always tend to do that before I run any sort of analysis, because I'd rather give you no information than to give you wrong information, which we all thought was correct. And you end up making wrong business decisions out of that.
Yeah. It's pretty easy I think to be led down the garden path, by thinking that a field means one thing, analyzing it, drawing a conclusion, and then chasing that conclusion, turning it into a hypothesis about the business. Maybe even getting other people excited about it. So now it has political implications because you've made some claim to, even as the CEO, or maybe especially as a CEO, you don't want to say something and then have to walk it back. And while we're also excited, the particular danger I see actually, as you say it, everybody believes it. You realize it isn't true. And then you can't get them to stop believing it.
I think it happens a fair amount. So as you work through that, here you are, you're somebody who's like me who says, "Hey, Tom, give me a hand on this stuff." And you're looking into the data and you're finding these labels that are ambiguous, or they don't seem to match up with the data itself. Say you were to find two different times for an event and one time appeared to be the time at the beginning and the other is the time at the end. We just went through this today. That's why I'm bringing this up folks. And you say, "Well, let me just take a look and see." How do those spread out? So it seems like one of the first things you do is you say, "Look, let's just count all the values and just eyeball it, sort it top to bottom." And then in the case we were looking at today, you found a bunch of negative times, and we're pretty sure that time is never negative. That is directly... Duration being negative is kind of things don't take minus 10 seconds. That doesn't happen in the actual world we live in.
So there you are. You now have this piece of evidence that there's an issue with the actual data itself. You don't really know is that an issue that's going to make a difference or not? Or can I just... Is it just some error that was made in data input or whatever, in some small fraction. You got to make a call there, right? How do you make that call? And then do you do that alone? Or the people... I would think the people are generally not there who had to do with creating those fields or filling them they're gone, right? Everybody's always [crosstalk 00:15:54]. what do you do and how do you get past that to start to get to the good stuff?
Well, being an analyst, you have to have a degree of reasonableness, right? So with all data sets, there are going to be erroneous records, erroneous data and it's up to you often as the person on the front line to decide whether or not something is acceptable or not. So call it the sniff test. Right? So in the case of our analysis, even though we did discover negative time, the negative time did not represent a big chunk of all of the available data rows, let's say. And so in this case even if we did include it in our analysis, it would not make a grand impact.
But every analyst should, whenever they discover something that seems odd, figure out the magnitude of that data, right? Figure out how much would it impact your final number if you were to include that data and if it's not statistically significant, then don't waste your time and just include it and then call it out when you actually record those numbers. That's the way I recommend other analysts go about it. Because sometimes people can go down a rabbit hole where you spend an entire day trying to cleanse a piece of data which ultimately doesn't have any material effect on your final numbers.
Yeah. My old chemistry professor, I remember from high school said to me once, and I think I was probably 15 years old, said something that still sticks with me, which is, "A difference, is a difference if it makes a difference." And, but of course that's tricky. You can get kind of circular on that. You can assume that it doesn't make a difference than find out that it was the thing that made all the difference. There's a lot of thinking that goes on to this. I think that going into this process.
One of the ways we've been working, one of the things I'd like the audience to think about is this. If you are engaged in a market dominance play, so what you're doing is you've identified a market, you've made a list you're having yourself or your folks talk to folks on that list. You're building trust. You're trying to stay out of the red ocean of everybody who's currently in market and everybody's fighting over those deals and go to the blue ocean where you're early, so to speak. And you're going to very inexpensively use technology and good techniques and good attitude in order to talk to people multiple times. So you're doing all that stuff. I'm going to make a recommendation that you find yourself somebody who you can work with. And I mean, work with intensively to iterate on these particular matters and to allow your curiosity, to guide you into a couple of things. One is, is this data that we're looking at meaningful or not? Second, is it true or not? Third, does it lead us somewhere or not? And I think you need to iterate quickly.
If you had looked to look at how you and I are working together, we're having a touchpoint every day, unless I'm on an airplane or whatever. And during that touchpoint, you're showing me what you've come up with from the previous day in the general rhythm is what's happening. And then I'm going, huh? That's interesting. That makes me think of this. Or you're saying, "When I got to this point, this didn't quite look right. This looks like this might be wrong in some way. Or I discovered something interesting." And we iterate. So we have this daily iteration cycle, but how often do you think we iterate or pivot or maneuver within say a one-hour touchpoint session? And is that normally how you've worked with folks in the past? Or is that something that's a little bit new and different?
Well between you and I, we definitely pivot a lot and I actually see that as a good thing, right? It's basically the whole concept of if you're going to fail, then fail fast. So often we will analyze some data only to find out that, you know what, this isn't actually the data that we want to analyze. So a great example is when we were trying to figure out making our own analysis as to whether or not a phone number is direct or not only to find out well, why it matters if we identify a number as being a direct number or not when we should just always dial the number that has a faster navigation counter. Right? So to the audience, that example might not have made much sense, but I hope it did. But nonetheless, my point is it's important to be able to be agile, especially if you are looking into a data concierge service.
Because historically speaking, most companies would treat data analysis like for example, software development, where you capture all of the requirements up to the front and then you provide a time estimate and then you develop the work and then you present it. Right. But ultimately the problem with that is that 99% of the time, people don't know what they want. Right. I always like to reference that meme or that segment of a movie from the notebook where the guy is trying to ask his lady friend, what do you want? And she's like, "I don't know. I don't know." Right. And it's become a meme where people turn up for a dinner conversations like, what do you want for dinner? I don't know. What do you want? I don't know. And it's the same thing for data as well, often business leaders don't actually know what the most important KPIs are to successfully running their business.
Because you can't know what you want if you don't know that it exists. Right. Or you can't know what you want until you figure out it's statistically significant. So that's why I think data concierge is something that you've identified Chris, I think it's so important that lots of companies who do want to succeed, utilize this new approach, as opposed to the standard report building format of capturing all of the requirements upfront.
Because in my past experience, 95% of the reports that I built, it looks new and shiny for the first couple of days. And then eventually nobody ends up using it. And how do I know this? Well it's because in the tool that I use, which is Power BI, every single interaction with the report is logged and I have access to those logs. And so what do I do? I run my own report using the logged data only to find out that a report gets used quite heavily initially. And then it just crickets.
Interesting. Interesting. So the half-life of a report with regard to its actual utility seems to be somehow inversely correlated to the detail level of the specification that went into it. So the more you specify and the more certain you are that you got it right up front, and everybody's talking about it, thinking about it, crafting it, but they're not looking at the actual data. Yeah. I'm just making this up. I have a feeling it's probably true. Then the shorter the time the report will be considered actually valuable and will be used on a daily basis by the people running the company.
That's right. In other words, the simpler, the report, the more it'll actually get used. Right? So the best example is give me a sales report. Just tell me how much money have I made this week. Those types of reports, straight and simple, it's going to get used a lot. But once you start saying, give me a sales report, but only show the top five teams, right. Or the top five individuals and their sales. Then it starts being used less because other people might say, "Well damn, I need the top 10," or, "Oh, I only need my team." And so it doesn't meet everybody's requirements if you get it to be more detailed. And so as a result, it starts getting used less and less and less.
And eventually as well, when it comes to your standard report building process is that you often find new data points or you find irregularities or an assumption with data, which fails your assumptions, and then you've got to revise it. And then you got to create a new report in the future and so for everybody within an organization is constantly learning because of its data. And so that's why if I had to put a number, I would say the half-life of a report is usually just about two weeks.
Wow. Wow. See, you spend a bunch of money. You spend even more time. You do all the specifications and you end up with something that less for two weeks, which means it wasn't providing much value in the two weeks either, because otherwise it would have been hung onto. It's fascinating. Well, I mean, I really like the way we're doing this. I actually brought it to some folks at Microsoft and asked, you have companies, your customers like Intel or Boeing, or these are big companies where I'm sure the CEO would love to have a private process where they can ask questions of the business without depending on individuals in the business to give them answers. In fact, I call it being a CEO of being in the lonely minds club. We're assumed to have no hearts. And so we can't be the lonely hearts club, but the loneliness comes from the fact that no matter how you set up an organization if you're at the top of it, your people are obliged to lie to you, whether they want to or not.
That is... The unvarnished truth doesn't know how to move to the singularity at the top of a company. But the data itself contains somewhere in it, the unvarnished truth. So why not sit with somebody? And I like to do it every day. I think that's kind of the sensible amount of time to spend in iteration, right? To ask direct questions of the data.
But as a CEO, I'm not going to learn the tools. And you know me, I'm not the least [toolsy 00:26:09] guy in the world. Right. I built a little bit of code in my life and that kind of thing. But when I watch you with Power BI, I can say, "Hey Tom, what do you think instead of just having the Y-axis be the number of dials on the excess as be the duration of the navigation a dial, what if we looked at the actual volume by multiplying those two together? And then plotting that against something else, whatever it is we want to plot it against. What do you think?" And you'll just go, "Sure, absolutely. I can do that. Hang on a second." And you'll go click and some things will happen. And here we are. It's very important that we're screen sharing at the time and I'll get a visual on that instantly. And I might see something in it and you might see something in it, or it could be nothing, but it didn't cost a lot of time and nobody had to write a specification and it'll spark curiosity.
So it's almost like if data is the new oil, you don't want to just go drill where some bunch of people walking around on the ground said, "Well we found oil in a place once where there was a Mesquite tree and there was a cow nearby and it was noon." So here's a Mesquite tree and here's a cow and it's noon, let's drill here and spend the next six weeks drilling a hole in the ground and then find out there's no oil down there. You want to drill a ways and sniff around. I think in the oil business, they use neutron activation analysis to do this correctly. And then you want to steer the drill toward the more promising oil. And if you're running a really hard rock and you can't get through it well, maybe you want to go another direction. Right. Is that a reasonable analogy for this kind of thing?
Yeah, I would say so. I mean, the biggest issue, I think with your traditional method of upstream reporting is that people can easily fudge the numbers and tell a different viewpoint of that story. I wouldn't say lying, but you can just conveniently forget a filter or hide in a filter somewhere, or present as a completely separate view of what's actually going on in the world. Right. And I mean any good data analyst will know exactly how to fudge the numbers to make the numbers look good. And I've done that for other executives as well, right? For usually middle senior managers like directors or senior directors, often I would present to them the data and the results. And they say, "Oh, no, no, this doesn't look good. Help me make it look better." Right. That's the issue with your standard method of reporting.
But ultimately if your data is recorded correctly, assuming there's not any sort of catastrophic failures in your technology stock data doesn't lie. Right. And so as the CEO, you have a fiduciary duty to do what's best for the organization as a whole. So why accept anything less than the actual truth? I mean, the truth might not look good, but how can you make good business decisions if you're not presented the absolute truth? Right. So that's why I think the traditional way of reporting does need some sort of reform. But the one thing that I would be a little bit concerned about is just how many CEOs out there are really willing to commit let's say an hour each day, going through the data with the data concierge.
Because I genuinely mean this, Chris, I think you are one of the hard-working CEOs who actually give a shit, because and pardon my language because lots of CEOs out there just want people to do the work and they're not intellectually curious themselves. And so if you are going to utilize a data concierge service, you have to be intellectually curious and you have to understand your business very well.
Yeah. That's interesting. I don't know if I'm special in this regard. I actually think here's my hypothesis. And my observation about CEOs, CEOs have a hard time getting the truth out of anywhere. And so sometimes they despair of getting it at all. And so... But I do believe, and I know very few exceptions of the CEOs that I know. And by the way, Market Dominance Guys, all about it's a CEO audience, right? This is about folks who want to dominate markets and middle managers don't get to dominate markets. Maybe they get to play. Maybe they get to sort of be the CEO of their own world. General managers are always CEOs.
Some people are CEOs who carry funny titles and you just kind of go, is that really a CEO? Like I would say, [Matt McCorkell 00:30:47] over at [Case or Compressors 00:30:49], he carries this title of a manager of branch operations. Does that sound like a CEO? No, but I guarantee it Matt McCorkell is a CEO, I've worked with him and he's driving for improved results holistically for the company within the constraints as he sees them and believes them. And he is relentlessly curious. So what I find is the curiosity is there, but the pick has been blunted on the hard rock of trying to get to actionable truth that you can believe. Because you're making big bets. You're making big bets. Here's a big bet you and I are talking about, which is I'll call it the direct number bet, right. Do we have enough information in our system about navigation times, we thought it was direct numbers, but it's really navigation times in order to automatically choose the best possibility of the ones on offer for trying to reach somebody?
And I find one when we're doing that and I think this might be a little difference between me and some other folks is I find it super helpful to have analogies. Analogies are soft, but there is an old experiment that was done where folks are asked to figure out from the values on some playing cards and a rule, whether the rule is actually being followed or not. Like all face cards have an odd number on the other side or something like that. Right. And people have a heck of a time reasoning through stuff like that. But if you take the same problem and you express it in terms of, there are some people at a table in a restaurant and the waiter or waitress has got to figure out who's of drinking age or not. Who they can serve the drink to and you put the same problem in those words, exactly the same mathematical problem, everybody can reason instantly.
It's the phone number thing you were talking about, my example is, okay, so you're trying to get enough eggs in order to make this recipe. And the recipe calls for lots of eggs, maybe 12 dozen eggs. And you know stores that carry the eggs and you know the navigation time, how long it takes on average and the midpoint, 50% longer, 50% shorter, two, that's called the median for folks who don't like these sorts of things to get to the store. Right. And I have two forks, two ways I can go. Well, if I don't know if the store is open at all and I can't call and find out. And then once I get there, if I don't know if they have eggs today or not, what's my best strategy?
Well, the best strategy is always take the fastest route so if not this store, you have some time left to go to another one because you only have so much time to get anything done. That analogy is easy to think through. Okay. As soon as I'm concrete and I have a road and I'm in my car and then get to choose the long way or the short way, and then it's like, "Well, let's go find the short ways." And always choose them if we can. Have you found that you find yourself needing to explain to somebody that you're working with in terms of an analogy, so they can think through something because they can't do it with the playing cards, but they can do it when they're the waiter or the waitress?
[crosstalk 00:34:20] problem.
Absolutely. And that's such an important thing. I mean, it's not as much of a skill set as being a traditional data analyst because you're often not the ones telling the story, but if you want to be a top-notch, in my opinion, a top-notch data analysts, you need to have a good enough business background to be able to convert these data concepts into analogies as well. Because the long story short is that 90% of the people in the room are not going to be as strong when it comes to data science, as you are, that's why you work in data and your audience works not in data, right? They are business stakeholders. And so that's why it's very important to be able to translate from numbers into English.
And it's funny that we bring this topic up because historically I've always worked with data scientists who are extremely smart in their fields, but they don't know how to properly convey the end result. And so as a result, they lead meetings where they just end up speaking gibberish. But people assume that what they're speaking is correct, because holy crap, this guy's using a lot of big words and statistical concepts, he must be smart, right. But at the end of the day, what are you here to do? You're here to drive business value and you want to make sure that your audience can understand the value. And so analogies are a great way to be able to deliver on some of those results.
Well, and I would think also there's one more thing, which is, we'll go back to that truth thing. So here I am a CEO trying to figure out what's the next great move to make for this company? And also how can I avoid screwing up in some really bad way that I'm going to regret? So we're looking at this stuff together and I come up with an analogy. And one of the things that I really enjoy about working with you is you don't just accept the analogy. You'll point out where it's flawed.
Well, it could be like that... You're very gentle about it by the way, which I think comes from that piano lessons for the four-year-olds and stuff like that. So I get to be the four-year-old and I go, "Well, is it like this teacher?" And you go, "Well, it's almost like that. But if you want the cord to sound better and move this a finger over one of these keys that [inaudible 00:36:38], and then it will be a major chord and for this part of the song it sounds better because it's happy. Where as this other one sounded kind of sad and anxious." Or something. Can you hear the difference, right? You're very... But bedside manner I'll call it with still brutal truthfulness. Like "No, Mr. CEO, that analogy doesn't cut it, it's wrong." That seems like an important skill.
To leave or reply to comments, please download free Podbean or
To leave or reply to comments, please download free Podbean App.