AI Research

How do financial executives respond to the use of artificial intelligence in financial reporting and auditing?

Review of Accounting Studies

Financial reporting quality can benefit from companies and auditors using artificial intelligence (AI) in complex and subjective financial reporting areas. However, benefits will only accrue if managers incorporate AI-based information into their financial reporting decisions, which the popular press and academic literature suggest is uncertain. We use a multi-method approach to examine how financial executives view and respond to AI. In a survey, respondents describe various uses of AI at their companies, spanning from simple to complex functions. While managers are not averse to the use of AI by their companies or their auditors, they appear to be uncertain about how auditors’ use of AI will directly benefit their companies. In an experiment that manipulates whether a company and/or its auditor use AI, managers whose companies use AI record larger audit adjustments for a complex accounting estimate when the …

Author(s)	Cassandra Estep, Emily E Griffith, Nikki L MacKenzie
Date	2024
Topic	audit analytics, ai in financial reporting, decision support systems

Biometrics, Privacy, and Authentication

Biometrics and Neuroscience Research in Business and Management: Advances and Applications

Biometric data hold the potential to untold benefits for people from personalized medicine to the rapid deployment of medical care. In addition to health-oriented applications, there are uses of biometric data to support business operations such as the use of biometric data to improve the efficiency of marketing. While the benefits may be more readily apparent, what are the potential downsides to the collection and use of biometric data? In this chapter, we discuss the opportunities that exist for using biometric data as well as the risks associated with its collection. Drawing on lessons from business’ monetization of consumer data, we draw parallels between the unintended consequences of permitting unchecked data collection to support business operations and the opportunistic collection of biometric data.

Author(s)	David Schweidel
Date	2024
Topic	biometric data analysis, personalized medicine, data monetization

TM-OKC: AN UNSUPERVISED TOPIC MODEL FOR TEXT IN ONLINE KNOWLEDGE COMMUNITIES.

MIS Quarterly

Online knowledge communities (OKCs), such as question-and-answer sites, have become increasingly popular venues for knowledge sharing. Accordingly, it is necessary for researchers and practitioners to develop effective and efficient text analysis tools to understand the massive amount of user-generated content (UGC) on OKCs. Unsupervised topic modeling has been widely adopted to extract humaninterpretable latent topics embedded in texts. These identified topics can be further used in subsequent analysis and managerial practices. However, existing generic topic models that assume documents are independent are inappropriate for analyzing OKCs where structural relationships exist between questions and answers. Thus, a new method is needed to fill this research gap. In this study, we propose a new topic model specifically designed for the text in OKCs. We make three primary contributions to the …

Author(s)	Dongcheng Zhang, Kunpeng Zhang, Yi Yang, David A Schweidel
Date	2024
Topic	unsupervised topic modeling, text analysis, Natural Language Processing

Moving Beyond ChatGPT: Applying Large Language Models in Marketing Contexts

NIM Marketing Intelligence Review

ChatGPT to the public, people were amazed by what large language models (LLMs)–the type of generative AI behind the chat-like surface–were able to produce. Even if LLMs might seem like sentient machines, they should more appropriately be viewed as “stochastic parrots” or eager-to-please interns. But despite their apparent prowess, LLMs are not trained for a particular context. Should the text attract new customers? Engage current customers? Will it be used for direct mail or a blog post? The intended use of the text will ultimately dictate what makes for successful content. If we could filter previously developed content and use only that which has been deployed successfully for a particular task, we could try to recreate the formula for success. This is not wishful thinking, but rather offers an accessible approach to tailoring LLMs for marketing applications that we have successfully demonstrated in the search engine marketing process.

Author(s)	David A Schweidel, Martin Reisenbichler, Thomas Reutterer
Date	2024
Topic	Large Language Models, Generative AI, content filtering

The creator economy: An introduction and a call for scholarly research

Bloggers, streamers, artists, celebrities, musicians and service providers are just a few examples of creators who aim to monetize their talent by generating and posting digital content. Aided by technological platforms and AI tools, they form a complex and dynamic ecosystem of economic activity, estimated to be worth over $100 billion dollars, and growing rapidly. In this editorial we explore the creator economy from a marketing perspective, addressing questions such as: How can creators optimize their content, establish their brand, build their content composition, and expand their audience? How do platforms create the right mix of creators and curate their content? What challenges and opportunities are presented for traditional firms?We define the basic terminology and identify key stakeholders. We propose research questions related to creators, consumers, firms, and platforms, and discuss the implications for …

Author(s)	Renana Peres, Martin Schreier, David A Schweidel, Alina Sorescu
Date	2024
Topic	content optimization, audience analysis, platform curation

Frontiers in Operations: Valuing Nursing Productivity in Emergency Departments

Manufacturing & Service Operations Management

Problem definition: We quantify the increase in productivity in emergency departments (EDs) from increasing nurse staff. We then estimate the associated revenue gains for the hospital and the associated welfare gains for society. The United States is over a decade into the worst nursing shortage crisis in history fueled by chronic underinvestment. To demonstrate to hospital managers and policymakers the benefits of investing in nursing, we clarify the positive downstream effects of doing so in the ED setting. Methodology/results: We use a high-resolution data set of patient visits to the ED of a major U.S. academic hospital. Time-dependent hazard estimation methods (nonparametric and parametric) are used to study how the real-time service speed of a patient varies with the state of the ED, including the time-varying workloads of the assigned nurse. A counterfactual simulation is used to estimate the gains from …

Author(s)	Hao Ding, Sokol Tushe, Diwas Singh KC, Donald KK Lee
Date	2024
Topic	time-dependent hazard estimation, nonparametric methods, parametric methods

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

arXiv (preprint)

Applications of machine learning (ML) techniques to operational settings often face two challenges: i) ML methods mostly provide point predictions whereas many operational problems require distributional information; and ii) They typically do not incorporate the extensive body of knowledge in the operations literature, particularly the theoretical and empirical findings that characterize specific distributions. We introduce a novel and rigorous methodology, the Boosted Generalized Normal Distribution (GND), to address these challenges. The Generalized Normal Distribution (GND) encompasses a wide range of parametric distributions commonly encountered in operations, and GND leverages gradient boosting with tree learners to flexibly estimate the parameters of the GND as functions of covariates. We establish GND's statistical consistency, thereby extending this key property to special cases studied in the ML literature that lacked such guarantees. Using data from a large academic emergency department in the United States, we show that the distributional forecasting of patient wait and service times can be meaningfully improved by leveraging findings from the healthcare operations literature. Specifically, GND performs 6% and 9% better than the distribution-agnostic ML benchmark used to forecast wait and service times respectively. Further analysis suggests that these improvements translate into a 9% increase in patient satisfaction and a 4% reduction in mortality for myocardial infarction patients. Our work underscores the importance of integrating ML with operations knowledge to enhance distributional forecasts.

Author(s)	Ragip Gurlek, Francis de Vericourt, Donald KK Lee
Date	2024
Topic	boosted generalized normal distribution, distributional forecasting, gradient boosting

Realtime, multimodal invasive ventilation risk monitoring using language models and BoXHED

arXiv (preprint)

Author(s)	Arash Pakbin, Aaron Su, Donald KK Lee, Bobak J Mortazavi
Date	2024
Topic	Natural Language Processing, Computer Vision, Reinforcement Learning

Modeling the evolution of customer balances

SSRN (preprint)

Customer balances are critical drivers of customer lifetime value in numerous industries such as banking, asset management, brokerage, and financial technology. However, no prior research focuses on accurately projecting the evolution of individual-level customer balances. We propose the first model specifically suited to this task, addressing the unique empirical challenges associated with balance-driven businesses. Our model leverages the empirical regularity that non-zero customer balances are remarkably similar to that of a log-Laplace distribution, implying greater efficiency and goodness-of-fit by incorporating this parametric knowledge. The proposed parametric machine learning model does this by modeling customer balances with a log-Laplace distribution, allowing the parameters of the Laplace distribution to vary flexibly as a function of customer covariates. Using data from a major US bank, we demonstrate that our approach outperforms both purely parametric and non-parametric alternatives. Furthermore, the model offers valuable insights for marketers, such as identifying which customer segments to prioritize for acquisition. For example, we find that younger and older customers tend to generate higher revenues, and customers without credit history are more valuable than those with low credit scores. The proposed approach provides a robust, portable foundation for improving customer acquisition and retention strategies in balance-driven industries.

Author(s)	Ragip Gürlek, Daniel McCarthy, Stephen Samaha, Rex Du, Donald KK Lee
Date	2024
Topic	parametric modeling, customer segmentation, predictive analytics

Frontiers in Operations: Valuing Nursing Productivity in Emergency Departments

Manufacturing & Service Operations Management

Problem definition: We quantify the increase in productivity in emergency departments (EDs) from increasing nurse staff. We then estimate the associated revenue gains for the hospital and the associated welfare gains for society. The United States is over a decade into the worst nursing shortage crisis in history fueled by chronic underinvestment. To demonstrate to hospital managers and policymakers the benefits of investing in nursing, we clarify the positive downstream effects of doing so in the ED setting. Methodology/results: We use a high-resolution data set of patient visits to the ED of a major U.S. academic hospital. Time-dependent hazard estimation methods (nonparametric and parametric) are used to study how the real-time service speed of a patient varies with the state of the ED, including the time-varying workloads of the assigned nurse. A counterfactual simulation is used to estimate the gains from …

Author(s)	Hao Ding, Sokol Tushe, Diwas Singh KC, Donald KK Lee
Date	2024
Topic	time-dependent hazard estimation, counterfactual simulation, nonparametric methods

Diversity in Frontline Employee Perceptions: Policies and Procedures, Training, and Leadership as Drivers of Service Equality

Production and Operations Management

Author(s)	Eve D Rosenzweig, Ken Kelley, Elliot Bendoly
Date	2024
Topic	Natural Language Processing, Computer Vision, Reinforcement Learning

Humans’ Use of AI-Assistance: The Effect of Loss Aversion on Willingness to Delegate Decisions

We conduct an experiment that has subjects classify images. Subjects are presented an image and must then select the set of image keywords that best represent the image. Subjects are presented 20 images for practice and 40 for monetary compensation. We randomly assign participants to either monetary incentives framed as an opportunity for gain or monetary incentives framed as an opportunity for loss. Participants are given the option to delegate the image classification to a human expert or an AI if they do not want to make the selection on their own. In this study, we measure participants delegation decisions as well as their situational awareness.

Author(s)	Joseph Buckman, Jesse Bockstedt
Date	2024
Topic	image classification, human-ai interaction, decision-making

Preferential Latent Space Models for Networks with Textual Edges

arXiv (preprint)

Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. Other examples include co-author networks and social media networks. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we propose to represent the text document between each pair of nodes as a vector counting the appearances of keywords extracted from the corpus, and introduce a new and flexible preferential latent space network model that can offer direct insights on how contents of the textual exchanges modulate the relationships between nodes. We establish identifiability conditions for the proposed model and tackle model estimation with a computationally efficient projected gradient descent algorithm. We further derive the non-asymptotic error bound of the estimator from each step of the algorithm. The efficacy of our proposed method is demonstrated through simulations and an analysis of the Enron email network.

Author(s)	Maoyu Zhang, Biao Cai, Dong Li, Xiaoyue Niu, Jingfei Zhang
Date	2024
Topic	keyword extraction, latent space modeling, network analysis

Fast community detection in dynamic and heterogeneous networks

Journal of Computational and Graphical Statistics

Dynamic heterogeneous networks describe the temporal evolution of interactions among nodes and edges of different types. While there is a rich literature on finding communities in dynamic networks, the application of these methods to dynamic heterogeneous networks can be inappropriate, due to the involvement of different types of nodes and edges and the need to treat them differently. In this article, we propose a statistical framework for detecting common communities in dynamic and heterogeneous networks. Under this framework, we develop a fast community detection method called DHNet that can efficiently estimate the community label as well as the number of communities. An attractive feature of DHNet is that it does not require the number of communities to be known a priori, a common assumption in community detection methods. While DHNet does not require any parametric assumptions on the …

Author(s)	Maoyu Zhang, Jingfei Zhang, Wenlin Dai
Date	2024
Topic	community detection, dynamic networks, heterogeneous networks

On difference-based gradient estimation in nonparametric regression

Statistical Analysis and Data Mining: The ASA Data Science Journal

We propose a framework to directly estimate the gradient in multivariate nonparametric regression models that bypasses fitting the regression function. Specifically, we construct the estimator as a linear combination of adjacent observations with the coefficients from a vector?valued difference sequence, so it is more flexible than existing methods. Under the equidistant designs, closed?form solutions of the optimal sequences are derived by minimizing the estimation variance, with the estimation bias well controlled. We derive the theoretical properties of the estimators and show that they achieve the optimal convergence rate. Further, we propose a data?driven tuning parameter?selection criterion for practical implementation. The effectiveness of our estimators is validated via simulation studies and a real data application.

Author(s)	Maoyu Zhang, Wenlin Dai
Date	2024
Topic	nonparametric regression, gradient estimation, estimation variance

Fast robust location and scatter estimation: a depth-based method

Technometrics

The minimum covariance determinant (MCD) estimator is ubiquitous in multivariate analysis, the critical step of which is to select a subset of a given size with the lowest sample covariance determinant. The concentration step (C-step) is a common tool for subset-seeking; however, it becomes computationally demanding for high-dimensional data. To alleviate the challenge, we propose a depth-based algorithm, termed as FDB, which replaces the optimal subset with the trimmed region induced by statistical depth. We show that the depth-based region is consistent with the MCD-based subset under a specific class of depth notions, for instance, the projection depth. With the two suggested depths, the FDB estimator is not only computationally more efficient but also reaches the same level of robustness as the MCD estimator. Extensive simulation studies are conducted to assess the empirical performance of our …

Author(s)	Maoyu Zhang, Yan Song, Wenlin Dai
Date	2024
Topic	minimum covariance determinant, multivariate analysis, statistical depth

Learning Brain Connectivity in Social Cognition with Dynamic Network Regression

The Annals of Applied Statistics

The supplementary materials provide the extended models with time-varying covariates and low-rank covariate effects, the simulation results of parameter tuning, the sensitivity analysis under model misspecifications, the computational cost of DNetReg, and additional results from real data analysis.

Author(s)	Maoyu Zhang, Biao Cai, Wenlin Dai, Dehan Kong, Hongyu Zhao, Jingfei Zhang
Date	2024
Topic	time-varying covariates, low-rank covariate effects, parameter tuning

From Clicks to Returns: Website Browsing and Product Returns

SSRN (preprint)

Online retailers are challenged by frequent product returns, which approach a staggering annual value of nearly $1 trillion in the US alone (The New Yorker 2023). While existing research focused on managing returns using a purchase/return framework, we explore how prepurchase customer activities on retailers’ websites can improve product return management. We demonstrate that such information provides important insights and can inform retailer’s return management strategies. Using data from a large European apparel retailer, we propose and estimate a joint model of customer search, purchase, and returns. The model-free evidence and our empirically-based customer-journey model consistently show how specific customer browsing patterns are linked to product returns. More specifically, we find that purchasing the last clicked product, browsing fewer products, using filters, and browsing a more focused variety of products are linked to a lower return probability. Using our model, we show how strategic adjustments of product visibility on the website can improve retailers’ overall performance.

Author(s)	Marat Ibragimov, Siham El Kihal, John R Hauser
Date	2024
Topic	joint modeling, customer journey analysis, predictive analytics

The Spillover Effect of Fraudulent Reviews on Product Recommendations

Management Science

As the prevalence of user-generated reviews has been growing, the pervasiveness of fraudulent reviews has been increasing as well. In an effort to alleviate the consequences of fraudulent reviews, platforms have been using machine-learning algorithms for fraudulent review detection. However, the current business practice of simply removing fraudulent reviews might not be sufficient, as even their temporary presence might forge spillover effects propagating through other shopping tools. In particular, we examine and discover the persistence of long-lasting significant adverse impact of fraudulent reviews through their propagation to recommender systems, even long after successfully detecting and removing all fraud incidents. We conduct additional analyses further examining the intensity and evolution of the spillover effect over time across different dimensions, such as the cost of the fraudulent activity, the …

Author(s)	Panagiotis Adamopoulos
Date	2024
Topic	fraudulent review detection, recommender systems, spillover effects

Consumer Social Connectedness and Persuasiveness of Collaborative-Filtering Recommender Systems: Evidence From an Online-to-Offline Recommendation App

Production and Operations Management

Consumers often rely on their social connections or social technologies, such as (automated) system-generated recommender systems, to navigate the proliferation of diverse products and services offered in online and offline markets and cope with the corresponding choice overload. In this study, we investigate the relationship between the consumers’ social connectedness and the economic impact of recommender systems. Specifically, we examine whether the social connectedness levels of consumers moderate the effectiveness of online recommendations toward increasing product demand levels. We study this novel research question using a combination of datasets and a demand-estimation model. Interestingly, the empirical results show a positive moderating effect of social connectedness on the demand effect of online-to-offline …

Author(s)	Panagiotis Adamopoulos, Vilma Todri
Date	2024
Topic	recommender systems, demand estimation, social network analysis

The Impact of Generative AI on Advertising Effectiveness

The advent of generative artificial intelligence (genAI) is reshaping industries, including advertising, where its ability to generate ads is gaining traction. However, debates persist regarding whether GenAI can outperform human experts, and if so, to what extent and in which tasks it excels. Through secondary data analysis and lab experiments, this study investigates the effectiveness of genAI in ad creation and modification compared to human experts. Our findings suggest that while generative AI-“modified” ads do not outperform human experts ads, generative AI-“created” ads do. We argue that this indicates the proficiency of visual generative AI in creation tasks but its limitations in modification tasks. Additionally, AI can enhance product package design, demonstrating its effectiveness in creation and ideation tasks. The study contributes empirical evidence on AI’s impact on advertising and sheds light on its role across different task levels.

Author(s)	Hyesoo Lee, Panagiotis Adamopoulos, Vilma Georgia Todri, Anindya Ghose
Date	2024
Topic	Generative AI adversarial networks, visual generative ai, ad creation

The Impact of Generative Artificial Intelligence on Higher Education: Disruption or Seamless Integration?

SSRN (preprint)

As higher education stands at the crossroads of tradition and technological innovation, generative artificial intelligence (AI) presents unprecedented opportunities and challenges. This research seeks to unravel the complexities of generative AI's impact, exploring whether its integration into higher education disrupts traditional modes of teaching or enhances educational practices and outcomes. In this research, I explore student performances from learning from four courses that mix the role of AI and human instructors for content generation and delivery modes. I found that students achieved an average of 5.7% more points on quizzes after attending a purely human-generated and delivered course compared to students who attended a purely AI-generated and delivered course. Furthermore, students who attended a hybrid human-generated and AI-delivered course gained, on average, 4.3 additional points compared to a pure human-generated and delivered course. Finally, students who attended the hybrid AI-generated and human-delivered course received, on average, 2.7 fewer points when compared to a purely AI-generated and delivered course. Thus, human-generated content is superior to AI-generated content for higher education, whereas AI-generated delivery (voice and avatar) can enhance students' learning. I further discuss the opportunities and implications of generative AI in higher education.

Author(s)	Rajiv Garg
Date	2024
Topic	Generative AI ai, educational technology, hybrid learning

Segmenting Bitcoin Transactions for Price Movement Prediction

Journal of Risk and Financial Management

Cryptocurrencies like Bitcoin have received substantial attention from financial exchanges. Unfortunately, arbitrage-based financial market price prediction models are ineffective for cryptocurrencies. In this paper, we utilize standard machine learning models and publicly available transaction data in blocks to predict the direction of Bitcoin price movement. We illustrate our methodology using data we merged from the Bitcoin blockchain and various online sources. This gave us the Bitcoin transaction history (block IDs, block timestamps, transaction IDs, senders’ addresses, receivers’ addresses, transaction amounts), as well as the market exchange price, for the period from 13 September 2011 to 5 May 2017. We show that segmenting publicly available transactions based on investor typology helps achieve higher prediction accuracy compared to the existing Bitcoin price movement prediction models in the literature. This transaction segmentation highlights the role of investor types in impacting financial markets. Managerially, the segmentation of financial transactions helps us understand the role of financial and cryptocurrency market participants in asset price movements. These findings provide further implications for risk management, financial regulation, and investment strategies in this new era of digital currencies.

Author(s)	Yuxin Zhang, Rajiv Garg, Linda L Golden, Patrick L Brockett, Ajit Sharma
Date	2024
Topic	price prediction, transaction segmentation, investor typology

What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts

arXiv (preprint)

We examine how large language models (LLMs) interpret historical stock returns and compare their forecasts with estimates from a crowd-sourced platform for ranking stocks. While stock returns exhibit short-term reversals, LLM forecasts over-extrapolate, placing excessive weight on recent performance similar to humans. LLM forecasts appear optimistic relative to historical and future realized returns. When prompted for 80% confidence interval predictions, LLM responses are better calibrated than survey evidence but are pessimistic about outliers, leading to skewed forecast distributions. The findings suggest LLMs manifest common behavioral biases when forecasting expected returns but are better at gauging risks than humans.

Author(s)	Shuaiyu Chen, T Clifton Green, Huseyin Gulen, Dexin Zhou
Date	2024
Topic	Large Language Models, forecasting, behavioral biases

Alternative Data in Active Asset Management

Alternative data are data gathered from nontraditional sources beyond company filings and analyst research. Alternative data are crucial in investing, offering unique insights and competitive advantages. The demand for alternative data has skyrocketed in the past two decades, due to the regulatory changes and the growing importance of intangible assets such as intellectual property. Alternative data cover various sources, including firm-released information, government-released information, information about investor attention and trading, and third-party information. However, alternative data landscape is constantly evolving due to alpha decay, technological advancements, regulatory changes, and market efficiency. These challenges require investors to continuously adapt their strategies, discover new data sources, and develop sophisticated analysis techniques to maintain an edge in an increasingly data-driven financial world.

Author(s)	T Clifton Green, Shaojun Zhang
Date	2024
Topic	alternative data analysis, data sourcing techniques, investment strategy optimization

Downstream task-oriented generative model selections on synthetic data training for fraud detection models

arXiv (preprint)

Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability constrain, the BN-based generative models is better than NN-based when synthetic training fraud detection model under strict model interpretability constrain. Our results provides practical guidance for machine learning practitioner who is interested in replacing their training dataset from real to synthetic, and shed lights on more general downstream task-oriented generative model selection problems.

Author(s)	Yinan Cheng, Chi-Hua Wang, Vamsi K Potluru, Tucker Balch, Guang Cheng
Date	2024
Topic	Generative AI models, synthetic data, fraud detection

LLM-driven Imitation of Subrational Behavior: Illusion or Reality?

arXiv (preprint)

Modeling subrational agents, such as humans or economic households, is inherently challenging due to the difficulty in calibrating reinforcement learning models or collecting data that involves human subjects. Existing work highlights the ability of Large Language Models (LLMs) to address complex reasoning tasks and mimic human communication, while simulation using LLMs as agents shows emergent social behaviors, potentially improving our comprehension of human conduct. In this paper, we propose to investigate the use of LLMs to generate synthetic human demonstrations, which are then used to learn subrational agent policies though Imitation Learning. We make an assumption that LLMs can be used as implicit computational models of humans, and propose a framework to use synthetic demonstrations derived from LLMs to model subrational behaviors that are characteristic of humans (e.g., myopic behavior or preference for risk aversion). We experimentally evaluate the ability of our framework to model sub-rationality through four simple scenarios, including the well-researched ultimatum game and marshmallow experiment. To gain confidence in our framework, we are able to replicate well-established findings from prior human studies associated with the above scenarios. We conclude by discussing the potential benefits, challenges and limitations of our framework.

Author(s)	Andrea Coletta, Kshama Dwarakanath, Penghang Liu, Svitlana Vyetrenko, Tucker Balch
Date	2024
Topic	Large Language Models, imitation learning, synthetic data generation

ABIDES-Economist: Agent-Based Simulation of Economic Systems with Learning Agents

arXiv (preprint)

We introduce a multi-agent simulator for economic systems comprised of heterogeneous Households, heterogeneous Firms, Central Bank and Government agents, that could be subjected to exogenous, stochastic shocks. The interaction between agents defines the production and consumption of goods in the economy alongside the flow of money. Each agent can be designed to act according to fixed, rule-based strategies or learn their strategies using interactions with others in the simulator. We ground our simulator by choosing agent heterogeneity parameters based on economic literature, while designing their action spaces in accordance with real data in the United States. Our simulator facilitates the use of reinforcement learning strategies for the agents via an OpenAI Gym style environment definition for the economic system. We demonstrate the utility of our simulator by simulating and analyzing two hypothetical (yet interesting) economic scenarios. The first scenario investigates the impact of heterogeneous household skills on their learned preferences to work at different firms. The second scenario examines the impact of a positive production shock to one of two firms on its pricing strategy in comparison to the second firm. We aspire that our platform sets a stage for subsequent research at the intersection of artificial intelligence and economics.

Author(s)	Kshama Dwarakanath, Svitlana Vyetrenko, Peyman Tavallali, Tucker Balch
Date	2024
Topic	Reinforcement Learning, multi-agent systems, economic simulation

Six Levels of Privacy: A Framework for Financial Synthetic Data

arXiv (preprint)

Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.

Author(s)	Tucker Balch, Vamsi K Potluru, Deepak Paramanand, Manuela Veloso
Date	2024
Topic	synthetic data generation, privacy preservation, financial modeling

Atlas-X Equity Financing: Unlocking New Methods to Securely Obfuscate Axe Inventory Data Based on Differential Privacy

arXiv (preprint)

Banks publish daily a list of available securities/assets (axe list) to selected clients to help them effectively locate Long (buy) or Short (sell) trades at reduced financing rates. This reduces costs for the bank, as the list aggregates the bank's internal firm inventory per asset for all clients of long as well as short trades. However, this is somewhat problematic: (1) the bank's inventory is revealed; (2) trades of clients who contribute to the aggregated list, particularly those deemed large, are revealed to other clients. Clients conducting sizable trades with the bank and possessing a portion of the aggregated asset exceeding are considered to be concentrated clients. This could potentially reveal a trading concentrated client's activity to their competitors, thus providing an unfair advantage over the market. Atlas-X Axe Obfuscation, powered by new differential private methods, enables a bank to obfuscate its published axe list on a daily basis while under continual observation, thus maintaining an acceptable inventory Profit and Loss (P&L) cost pertaining to the noisy obfuscated axe list while reducing the clients' trading activity leakage. Our main differential private innovation is a differential private aggregator for streams (time series data) of both positive and negative integers under continual observation. For the last two years, Atlas-X system has been live in production across three major regions-USA, Europe, and Asia-at J.P. Morgan, a major financial institution, facilitating significant profitability. To our knowledge, it is the first differential privacy solution to be deployed in the financial sector. We also report benchmarks of our algorithm based on …

Author(s)	Antigoni Polychroniadou, Gabriele Cipriani, Richard Hua, Tucker Balch
Date	2024
Topic	Differential Privacy, time series data, data obfuscation

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

arXiv (preprint)

Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.

Author(s)	Elizabeth Fons, Rachneet Kaur, Soham Palande, Zhen Zeng, Svitlana Vyetrenko, Tucker Balch
Date	2024
Topic	Large Language Models, time series analysis, univariate time series

HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

arXiv (preprint)

A myriad of different Large Language Models (LLMs) face a common challenge in contextually analyzing table question-answering tasks. These challenges are engendered from (1) finite context windows for large tables, (2) multi-faceted discrepancies amongst tokenization patterns against cell boundaries, and (3) various limitations stemming from data confidentiality in the process of using external models such as gpt-3.5-turbo. We propose a cooperative game dubbed "HiddenTables" as a potential resolution to this challenge. In essence, "HiddenTables" is played between the code-generating LLM "Solver" and the "Oracle" which evaluates the ability of the LLM agents to solve Table QA tasks. This game is based on natural language schemas and importantly, ensures the security of the underlying data. We provide evidential experiments on a diverse set of tables that demonstrate an LLM's collective inability to generalize and perform on complex queries, handle compositional dependencies, and align natural language to programmatic commands when concrete table schemas are provided. Unlike encoder-based models, we have pushed the boundaries of "HiddenTables" to not be limited by the number of rows - therefore we exhibit improved efficiency in prompt and completion tokens. Our infrastructure has spawned a new dataset "PyQTax" that spans across 116,671 question-table-answer triplets and provides additional fine-grained breakdowns & labels for varying question taxonomies. Therefore, in tandem with our academic contributions regarding LLMs' deficiency in TableQA tasks, "HiddenTables" is a tactile manifestation of how LLMs …

Author(s)	William Watson, Nicole Cho, Tucker Balch, Manuela Veloso
Date	2024
Topic	Large Language Models, table question-answering, Natural Language Processing

LETS-C: Leveraging Language Embedding for Time Series Classification

arXiv (preprint)

Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.

Author(s)	Rachneet Kaur, Zhen Zeng, Tucker Balch, Manuela Veloso
Date	2024
Topic	language modeling, time series classification, language embedding

Distributionally and adversarially robust logistic regression via intersecting Wasserstein balls

arXiv (preprint)

Empirical risk minimization often fails to provide robustness against adversarial attacks in test data, causing poor out-of-sample performance. Adversarially robust optimization (ARO) has thus emerged as the de facto standard for obtaining models that hedge against such attacks. However, while these models are robust against adversarial attacks, they tend to suffer severely from overfitting. To address this issue for logistic regression, we study the Wasserstein distributionally robust (DR) counterpart of ARO and show that this problem admits a tractable reformulation. Furthermore, we develop a framework to reduce the conservatism of this problem by utilizing an auxiliary dataset (e.g., synthetic, external, or out-of-domain data), whenever available, with instances independently sampled from a nonidentical but related ground truth. In particular, we intersect the ambiguity set of the DR problem with another Wasserstein ambiguity set that is built using the auxiliary dataset. We analyze the properties of the underlying optimization problem, develop efficient solution algorithms, and demonstrate that the proposed method consistently outperforms benchmark approaches on real-world datasets.

Author(s)	Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi Potluru, Tucker Balch, Manuela Veloso
Date	2024
Topic	adversarial robust optimization, empirical risk minimization, logistic regression

Empirical Equilibria in Agent-based Economic systems with Learning agents

arXiv (preprint)

We present an agent-based simulator for economic systems with heterogeneous households, firms, central bank, and government agents. These agents interact to define production, consumption, and monetary flow. Each agent type has distinct objectives, such as households seeking utility from consumption and the central bank targeting inflation and production. We define this multi-agent economic system using an OpenAI Gym-style environment, enabling agents to optimize their objectives through reinforcement learning. Standard multi-agent reinforcement learning (MARL) schemes, like independent learning, enable agents to learn concurrently but do not address whether the resulting strategies are at equilibrium. This study integrates the Policy Space Response Oracle (PSRO) algorithm, which has shown superior performance over independent MARL in games with homogeneous agents, with economic agent-based modeling. We use PSRO to develop agent policies approximating Nash equilibria of the empirical economic game, thereby linking to economic equilibria. Our results demonstrate that PSRO strategies achieve lower regret values than independent MARL strategies in our economic system with four agent types. This work aims to bridge artificial intelligence, economics, and empirical game theory towards future research.

Author(s)	Kshama Dwarakanath, Svitlana Vyetrenko, Tucker Balch
Date	2024
Topic	multi-agent reinforcement learning, policy space response oracle

Ensemble Methods for Sequence Classification with Hidden Markov Models

arXiv (preprint)

We present a lightweight approach to sequence classification using Ensemble Methods for Hidden Markov Models (HMMs). HMMs offer significant advantages in scenarios with imbalanced or smaller datasets due to their simplicity, interpretability, and efficiency. These models are particularly effective in domains such as finance and biology, where traditional methods struggle with high feature dimensionality and varied sequence lengths. Our ensemble-based scoring method enables the comparison of sequences of any length and improves performance on imbalanced datasets. This study focuses on the binary classification problem, particularly in scenarios with data imbalance, where the negative class is the majority (e.g., normal data) and the positive class is the minority (e.g., anomalous data), often with extreme distribution skews. We propose a novel training approach for HMM Ensembles that generalizes to multi-class problems and supports classification and anomaly detection. Our method fits class-specific groups of diverse models using random data subsets, and compares likelihoods across classes to produce composite scores, achieving high average precisions and AUCs. In addition, we compare our approach with neural network-based methods such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), highlighting the efficiency and robustness of HMMs in data-scarce environments. Motivated by real-world use cases, our method demonstrates robust performance across various benchmarks, offering a flexible framework for diverse applications.

Author(s)	Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso
Date	2024
Topic	hidden markov models, ensemble methods, sequence classification

Auditing and Enforcing Conditional Fairness via Optimal Transport

arXiv (preprint)

Conditional demographic parity (CDP) is a measure of the demographic parity of a predictive model or decision process when conditioning on an additional feature or set of features. Many algorithmic fairness techniques exist to target demographic parity, but CDP is much harder to achieve, particularly when the conditioning variable has many levels and/or when the model outputs are continuous. The problem of auditing and enforcing CDP is understudied in the literature. In light of this, we propose novel measures of {conditional demographic disparity (CDD)} which rely on statistical distances borrowed from the optimal transport literature. We further design and evaluate regularization-based approaches based on these CDD measures. Our methods, \fairbit{} and \fairlp{}, allow us to target CDP even when the conditioning variable has many levels. When model outputs are continuous, our methods target full equality of the conditional distributions, unlike other methods that only consider first moments or related proxy quantities. We validate the efficacy of our approaches on real-world datasets.

Author(s)	Mohsen Ghassemi, Alan Mishler, Niccolo Dalmasso, Luhao Zhang, Vamsi K Potluru, Tucker Balch, Manuela Veloso
Date	2024
Topic	Conditional Demographic Parity, algorithmic fairness, statistical distances

Limited or Biased: Modeling Subrational Human Investors in Financial Markets

Journal of Behavioral Finance

Human decision-making in real-life deviates significantly from optimal decisions made by fully rational agents, primarily due to computational limitations or psychological biases. While existing studies in psychology and economics have discovered various types of human limitations and biases, there lacks a comprehensive framework to transfer these findings into models of subrational investors in financial markets. In this study, we introduce a unified framework that use reinforcement learning (RL) to incorporate five different aspects of human subrationality including bounded rationality, myopic behavior, prospect-biased behavior, optimistic and pessimistic behaviors. Unlike the data-driven approaches, our model is trained based on a high-fidelity multi-agent market simulator, which is not limited by the availability of subrational investor trading data. Our framework demonstrates investment behavior that is …

Author(s)	Penghang Liu, Kshama Dwarakanath, Svitlana S Vyetrenko, Tucker Balch
Date	2024
Topic	Reinforcement Learning, multi-agent systems, financial modeling

Fairwasp: Fast and optimal fair wasserstein pre-processing

Proceedings of the AAAI Conference on Artificial Intelligence

Recent years have seen a surge of machine learning approaches aimed at reducing disparities in model outputs across different subgroups. In many settings, training data may be used in multiple downstream applications by different users, which means it may be most effective to intervene on the training data itself. In this work, we present FairWASP, a novel pre-processing approach designed to reduce disparities in classification datasets without modifying the original data. FairWASP returns sample-level weights such that the reweighted dataset minimizes the Wasserstein distance to the original dataset while satisfying (an empirical version of) demographic parity, a popular fairness criterion. We show theoretically that integer weights are optimal, which means our method can be equivalently understood as duplicating or eliminating samples. FairWASP can therefore be used to construct datasets which can be fed into any classification method, not just methods which accept sample weights. Our work is based on reformulating the pre-processing task as a large-scale mixed-integer program (MIP), for which we propose a highly efficient algorithm based on the cutting plane method. Experiments demonstrate that our proposed optimization algorithm significantly outperforms state-of-the-art commercial solvers in solving both the MIP and its linear program relaxation. Further experiments highlight the competitive performance of FairWASP in reducing disparities while preserving accuracy in downstream classification settings.

Author(s)	Zikai Xiong, Niccolò Dalmasso, Alan Mishler, Vamsi K Potluru, Tucker Balch, Manuela Veloso
Date	2024
Topic	fair machine learning, data preprocessing, demographic parity

Method and system for synthetic event series generation

A method for using multidimensional Hawkes processes for modeling and generating sequential events data in order to improve accuracy with respect to parameter estimation for various domains such as finance, epidemiology, and personalized recommendations is provided. The method includes: receiving information that relates to a sequence of events; modeling, based on the received information, the sequence of events by a multidimensional Hawkes process that relates to a conditional density function that includes a base intensity component and a cross-activation matrix component; defining, based on the conditional density function, a log-likelihood function that is dimensionally separable; and determining a maximum-likelihood estimation of a solution to the log-likelihood function along at least one dimension. The maximum-likelihood estimation may be determined by adapting a Frank-Wolfe algorithm by …

Author(s)	ZHAO Renbo, Niccolo Dalmasso, Mohsen Ghassemi, Vamsi Krishna Potluru, Tucker Richard Balch, Manuela Veloso
Date	2024
Topic	multidimensional hawkes processes, parameter estimation, maximum-likelihood estimation

Method and system for detecting anomalous behavior in stream data

A method and a system for detecting an anomalous sequence of events in stream data are provided. The method includes: receiving a first set of raw data; analyzing the first set of raw data in order to determine a first event sequence; applying a first Hidden Markov Model (HMM) to the first event sequence in order to generate a first output; and determining, based on the first output, whether the first event sequence is classifiable as being an anomalous event sequence. The HMM is trained by using known sequences of normal events and event sequences that are known to be anomalous.

Author(s)	Tucker Richard Balch, Veronica MEJIA BUSTAMANTE, CHO Nicole, Matthew Howard, Maxime Kawawa-beaudan, MANI Ganapathy, Ivan Rankenburg, Andrew J Schrager, Srijan Sood, VANN Jared, Manuela Veloso
Date	2024
Topic	hidden markov model, anomaly detection, event sequence analysis

Method and system for obtaining conditional demographic parity through optimal transport in data-driven model

A method and a system for obtaining conditional demographic parity in the construction of a data-driven model are provided. The method includes: identifying features associated with the model; determining a first joint distribution of model outputs and a feature based on a first level of a particular one of the features and a second joint distribution of model outputs and a feature based on a second level of the particular feature; computing a bi-causal transport distance between the first joint distribution and the second joint distribution; computing a regularizer based on the bi-causal transport distance; and applying the regularizer to the model.

Author(s)	Luhao Zhang, Mohsen Ghassemi, Ivan Brugere, Niccolo Dalmasso, Alan Mishler, Vamsi Krishna Potluru, Tucker Richard Balch, Manuela Veloso
Date	2024
Topic	Conditional Demographic Parity, bi-causal transport distance, model regularization

Method and system for ensuring privacy protection for datasets using space partitioning techniques

2. Background Information [0003] Preserving privacy of individuals while publishing a dataset for public use is a known challenge. A de facto standard for privacy is differential privacy (DP), which is widely used in the literature and in practice. Many existing results on differential privacy aim to preserve quality of answers for a certain class of queries for a survey. However, a more general problem studies the release of a differentially private synthetic dataset that can be used for downstream tasks without additional privacy leaks. Recent work mainly applies generative adversarial networks (GAN) and utilizes divergence metrics such as Jensen-Shannon divergence and Wasserstein distance as a metric of quality to compare synthetic and original datasets. Another class of utility metrics is based on kernels, eg, distance in reproducing kernel Hilbert space (RKHS) or similarly maximum mean discrepancy (MMD). The …

Author(s)	Navid Nouri, Eleonora Kreacic, Vamsi Krishna Potluru, Tucker Richard Balch, Manuela Veloso
Date	2024
Topic	Differential Privacy, Generative AI adversarial networks, jensen-shannon divergence

Shining a Light on Hurricane Damage Estimation via Nighttime Light Data: Pre-Processing Matters

Amidst escalating climate change, hurricanes are inflicting severe socioeconomic impacts, marked by heightened economic losses and increased displacement. Previous research utilized nighttime light data to predict the impact of hurricanes on economic losses. However, prior work did not provide a thorough analysis of the impact of combining different techniques for pre-processing nighttime light (NTL) data. Addressing this gap, our research explores a variety of NTL pre-processing techniques, including value thresholding, built masking, and quality filtering and imputation, applied to two distinct datasets, VSC-NTL and VNP46A2, at the zip code level. Experiments evaluate the correlation of the denoised NTL data with economic damages of Category 4-5 hurricanes in Florida. They reveal that the quality masking and imputation technique applied to VNP46A2 show a substantial correlation with economic damage …

Author(s)	Nancy Thomas, Saba Rahimi, Annita Vapsi, Cathy Ansell, Elizabeth Christie, Daniel Borrajo, Tucker Balch, Manuela Veloso
Date	2024
Topic	nighttime light data processing, data pre-processing techniques, correlation analysis

Method and system for differentially private learning of hawkes processes

A method for preserving privacy with respect to modeling event sequence data is provided. The method includes: receiving information about a sequence of events; modeling the event sequence by a Hawkes process that has an intensity that includes an exogenous base intensity rate and an indigenous component that has an excitation rate and a decay rate; analyzing the received information; and determining estimated values of the exogenous base intensity rate and the excitation rate, such that an accuracy of the estimates corresponds to a length of time over which the sequence of events is observed. Differential privacy is introduced by adding noise to the sequence of events in order to preserve the privacy of individuals associated with the events, and a cost of the differential privacy is expressible as an additional length of observation time required to ensure the accuracy of the estimates.

Author(s)	Mohsen Ghassemi, Eleonora Kreacic, Niccolo Dalmasso, Vamsi Krishna Potluru, Tucker Richard Balch, Manuela Veloso
Date	2024
Topic	hawkes process, Differential Privacy, event sequence modeling

Method and system for optimal stopping using fast probabilistic learning algorithms

A method for using a Gaussian Process-based algorithm to approximate an optimal stopping of a time series that corresponds to a sequence of events is provided. The method includes: receiving information that relates to an event sequence; estimating, based on the received information, a first potential reward that is obtained by stopping the event sequence at a first time, and a set of respective second potential rewards that are obtained by stopping the event sequence at corresponding times; and determining, based on the estimated first and second potential rewards, an optimal time for stopping the event sequence. The event sequence may include a numerical sequence that is modeled as a statistical learning method via a Gaussian Process (GP) function and/or a deep GP function that indicates a probability density distribution of the items in the numerical sequence over a predetermined time interval.

Author(s)	Kshama Dwarakanath, Danial Dervovic, Peyman Tavallali, Svitlana Vyetrenko, Tucker Richard Balch
Date	2024
Topic	gaussian process, optimal stopping, time series analysis

Method and system for simulation of limit order book markets

A method for using an artificial intelligence (AI) model to simulate a limit order book market in order to facilitate study and evaluation of trading strategies is provided. The method includes: receiving information that relates to a state of the market at a particular time; and determining, based on the information, a potential market action that is expected to occur. The determination is made by applying an AI algorithm that implements a machine learning technique to determine the potential market action. The AI algorithm is trained by using historical data that relates to the market.

Author(s)	Andrea Coletta, Svitlana Vyetrenko, Tucker Richard Balch
Date	2024
Topic	limit order book simulation, trading strategy evaluation, historical data analysis

Method and system for forecasting time series by image inpainting

Methods and systems for using images that represent time-series data to forecast corresponding images depicting future values of the time-series data are provided. The method includes: receiving a set of time-series data; converting the set of time-series data into a partial first image that includes a blank region to which future data to be included in the first set of time-series data corresponds; and performing an inpainting operation with respect to the partial first image by generating pixels for filling in the blank region in order to produce an augmented version of the first image. A machine learning algorithm that is trained by using historical time-series data may be used to perform the inpainting operation.

Author(s)	Manuela Veloso, Zhen Zeng, Naftali Y Cohen, Srijan Sood, Jacob Reinier Maat, Tucker Richard Balch
Date	2024
Topic	time-series forecasting, image inpainting, Computer Vision

EXPRESS: Consumer Social Connectedness and Persuasiveness of Collaborative-Filtering Recommender Systems: Evidence from an Online-to-Offline Recommendation App

Production and Operations Management

Consumers often rely on their social connections or social technologies, such as (automated) system-generated recommender systems, to navigate the proliferation of diverse products and services offered in online and offline markets and cope with the corresponding choice overload. In this study, we investigate the relationship between the consumers’ social connectedness and the economic impact of recommender systems. Specifically, we examine whether the social connectedness levels of consumers moderate the effectiveness of online recommendations toward increasing product demand levels. We study this novel research question using a combination of datasets and a demand-estimation model. Interestingly, the empirical results show a positive moderating effect of social connectedness on the demand effect of online-to-offline …

Author(s)	Panagiotis Adamopoulos, Vilma Todri
Date	2024
Topic	recommender systems, demand estimation, social network analysis

Applied AI for finance and accounting: Alternative data and opportunities

Big data and artificial intelligence (AI) have transformed the finance industry by altering the way data and information are generated, processed, and incorporated into decision-making processes. Data and information have emerged as a new class of assets, facilitating efficient contracting and risk-sharing among corporate stakeholders. Researchers have also increasingly embraced machine learning and AI analytics tools, which enable them to exploit empirical evidence to an extent that far surpasses traditional methodologies. In this review article, prepared for a special issue on Artificial Intelligence (AI) and Finance in the Pacific-Basin Finance Journal, we aim to provide a summary of the evolving landscape of AI applications in finance and accounting research and project future avenues of exploration. Given the burgeoning mass of literature in this field, it would be unproductive to attempt an exhaustive catalogue …

Author(s)	Sean Shun Cao, Wei Jiang, Lijun Lei, Qing Zhou
Date	2024
Topic	ai analytics, empirical evidence exploitation, risk-sharing mechanisms

Enhanced MBA Curriculum Delivers What the Future Demands

Business Over Breakfast

Bridging Amazon Innovation and Business Education

Alumni Reflect on Giving Back

MBA Students Bring Fresh Perspectives to Organizations

Show Your Goizueta Spirit

Explore Groundbreaking AI Research By Goizueta Faculty Experts

How do financial executives respond to the use of artificial intelligence in financial reporting and auditing?

Review of Accounting Studies

Biometrics, Privacy, and Authentication

Biometrics and Neuroscience Research in Business and Management: Advances and Applications

TM-OKC: AN UNSUPERVISED TOPIC MODEL FOR TEXT IN ONLINE KNOWLEDGE COMMUNITIES.

MIS Quarterly

Moving Beyond ChatGPT: Applying Large Language Models in Marketing Contexts

NIM Marketing Intelligence Review

The creator economy: An introduction and a call for scholarly research

Frontiers in Operations: Valuing Nursing Productivity in Emergency Departments

Manufacturing & Service Operations Management

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

arXiv (preprint)

Realtime, multimodal invasive ventilation risk monitoring using language models and BoXHED

arXiv (preprint)

Modeling the evolution of customer balances

SSRN (preprint)

Frontiers in Operations: Valuing Nursing Productivity in Emergency Departments

Manufacturing & Service Operations Management

Diversity in Frontline Employee Perceptions: Policies and Procedures, Training, and Leadership as Drivers of Service Equality

Production and Operations Management

Humans’ Use of AI-Assistance: The Effect of Loss Aversion on Willingness to Delegate Decisions

Preferential Latent Space Models for Networks with Textual Edges

arXiv (preprint)

Fast community detection in dynamic and heterogeneous networks

Journal of Computational and Graphical Statistics

On difference-based gradient estimation in nonparametric regression

Statistical Analysis and Data Mining: The ASA Data Science Journal

Fast robust location and scatter estimation: a depth-based method

Technometrics

Learning Brain Connectivity in Social Cognition with Dynamic Network Regression

The Annals of Applied Statistics

From Clicks to Returns: Website Browsing and Product Returns

SSRN (preprint)

The Spillover Effect of Fraudulent Reviews on Product Recommendations

Management Science

Consumer Social Connectedness and Persuasiveness of Collaborative-Filtering Recommender Systems: Evidence From an Online-to-Offline Recommendation App

Production and Operations Management

The Impact of Generative AI on Advertising Effectiveness

The Impact of Generative Artificial Intelligence on Higher Education: Disruption or Seamless Integration?

SSRN (preprint)

Segmenting Bitcoin Transactions for Price Movement Prediction

Journal of Risk and Financial Management

What Does ChatGPT Make of Historical Stock Returns? Extrapolation and Miscalibration in LLM Stock Return Forecasts

arXiv (preprint)

Alternative Data in Active Asset Management

Downstream task-oriented generative model selections on synthetic data training for fraud detection models

arXiv (preprint)

LLM-driven Imitation of Subrational Behavior: Illusion or Reality?

arXiv (preprint)

ABIDES-Economist: Agent-Based Simulation of Economic Systems with Learning Agents

arXiv (preprint)

Six Levels of Privacy: A Framework for Financial Synthetic Data

arXiv (preprint)

Atlas-X Equity Financing: Unlocking New Methods to Securely Obfuscate Axe Inventory Data Based on Differential Privacy

arXiv (preprint)

Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark

arXiv (preprint)

HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

arXiv (preprint)

LETS-C: Leveraging Language Embedding for Time Series Classification

arXiv (preprint)

Distributionally and adversarially robust logistic regression via intersecting Wasserstein balls

arXiv (preprint)

Empirical Equilibria in Agent-based Economic systems with Learning agents

arXiv (preprint)

Ensemble Methods for Sequence Classification with Hidden Markov Models

arXiv (preprint)

Auditing and Enforcing Conditional Fairness via Optimal Transport

arXiv (preprint)

Limited or Biased: Modeling Subrational Human Investors in Financial Markets

Journal of Behavioral Finance