Microsoft has provided its regular weekly update on improvements for Bing AI, and there are some impressive strides forward this week.
Neowin spotted that the chatbot now has reduced latency spikes when it comes to certain answers, meaning you won’t be hanging around as long for a reply in these cases.
Fun fact: internally, we are most excited about something majority of people find boring. Yesterday we released a completely reworked backend for inner monologue, reducing time to first token by ~25%, and, far more importantly, making latency more stable, reducing spikes: pic.twitter.com/E0zBZ3lHyYJune 29, 2023
As Mikhail Parakhin, Microsoft’s head of Advertising and Web Services, explained on Twitter, this is a result of Bing AI getting a “completely reworked backend for inner monologue”, meaning streamlining the chatbot’s ‘thinking’ process.
A second benefit this week is better handling of sports-related queries, so the range of topics covered in this field is now considerably broader, from the NHL to the FIFA Women’s World Cup.
Analysis: The importance of being snappy
A snappier Bing AI is vital. Whatever query you throw at the chatbot, you want the response to come with a minimum of hanging around. If you find yourself tapping your foot (or perhaps your finger on the keyboard) while waiting for Bing to get back to you, that’ll be off-putting (and might remind you of live chats, where waits can be frustrating if a help agent is dealing with multiple simultaneous queries).
That responsiveness is a key area for Bing to do well in (that and, of course, the accuracy and usefulness of the response delivered, which is always paramount). So Microsoft is quite right to be honing away on this front.
That said, you can still be subject to waiting times that aren’t the fault of the backend (inner workings of Bing as it processes queries), but are purely due to traffic spikes. When lots of people are using the chatbot, things get more sluggish purely in terms of coping with that volume.
Incidentally, Parakhin elaborated on this in the above Twitter thread, noting that the major peaks of usage occur when folks arrive in work (at three main times – when the US East coast arrives, then when Europe comes to work, then the Far East). That corresponds to 7am, 5pm and 1am PDT, if you’re curious.