Hacker News new | ask | show | jobs
by brittlewis12 899 days ago
you can absolutely access and continue all your past chats in cnvrs!

would love to hear what you think: https://testflight.apple.com/join/ERFxInZg

3 comments

EDIT: Attempting to converse with any Q4_K_M 7B parameter model on a 15 Pro Max... the phone just melts down. It feels like it is producing about one token per minute. MLC-Chat can handle 7B parameter models just fine even on a 14 Pro Max, which has less RAM, so I think there is an issue here.

EDIT 2: Even using StableLM, I am experiencing a total crash of the app fairly consistently if I chat in one conversation, then start a new conversation and try to chat in that. On a related note, since chat history is saved... I don't think it's necessary to have a confirmation prompt if the user clicks the "new chat" shortcut in the top right of a chat.

-----

That does seem much nicer than MLC Chat. I really like the selection of models and saving of conversations.

It looks like you’re still using the old version of TinyLlama. The 1.0 release is out now: https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGU...

Microsoft recently re-licensed Phi-2 to be MIT instead of non-commercial, so I would love to see that in the list of models. Similarly, there is a Dolphin-Phi fine tune.

The topic of discussion here is Mistral-7B v0.2, which is also missing from the model list, unfortunately. There are a few Mistral fine tunes in the list, but obviously not the same thing.

I also wish I could enable performance metrics to see how many tokens/sec the model was running at after each message, and to see how much RAM is being used.

On the whole, this app seems really nice!

Wow, thanks so much for taking the time to test it out and share such great feedback!

Thrilled about all those developments! More model options as well as link-based GGUF downloads on the way.

On the 7b models: I’m very sorry for the poor experience. I wouldn’t recommend 7b over Q2_K at the moment, unless you’re on a 16GB iPad (or an Apple Silicon Mac!). This needs to be much clearer, as you observed the consequences can be severe. The larger models, and even 3b Q6_K can be crash prone due to memory pressure. Will work on improve handling of low level out-of-memory errors very soon.

Will also investigate the StableLM crashes, I’m sorry about that! Hopefully Testflight recorded a trace. Just speculating, it may be a similar issue to the larger models, due to the higher-fidelity quant (Q6_K) combined with the context length eventually running out of RAM. Could you give the Q4_K_M a shot? I heard something similar from a friend yesterday, I’m curious if you have a better time with that — perhaps that’s a more sensible default.

Re: the overly-protective new chat alert, I agree, thanks for the suggestion. I’ll incorporate that into the next build. Can I credit you? Let me know how you’d like for me to refer to you, and I’d be happy to.

Finally, please feel free to email me any further feedback, and thanks again for your time and consideration!

britt [at] bl3 [dot] dev

I just checked and MLC Chat is running the 3-bit quantized version of Mistral-7B. It works fine on the 14 Pro Max (6GB RAM) without crashing, and is able to stay resident in memory on the 15 Pro Max (8GB RAM) when switching with another not-too-heavy app. 2-bit quantization just feels like a step too far, but I’ll give it a try.

Regarding credit, I definitely don’t need any. Just happy to see someone working on a better LLM app!

FYI, just submitted a new update for review with a few small but hopefully noticeable changes, thanks in no small part to your feedback:

1. StableLM Zephyr 3b Q4_K_M is now the built-in model, replacing the Q6_K variant.

2. More aggressive RAM headroom calculation, with forced fallback to CPU rather than failing to load or crashing.

3. New status indicator for Metal when model is loaded (filled bolt for enabled, vs slashed bolt for disabled.)

4. Metal will now also be enabled for devices with 4GB RAM or less, but only when the selected model can comfortably fit in RAM. Previously, only devices with at least 6GB had Metal enabled.

Thank you so much again for your time!

The fallback does seem to work! Although the 4-bit 7B models only run at 1 token every several seconds.

I still wish Phi-2, Dolphin Phi-2, and TinyLlama-Chat-v1.0 were available, but I understand you have plans to make it easier to download any model in the future.

4-bit StableLM and 2-bit 7B models do seem to be working more consistently.
That’s great to hear. I’m sorry again about that poor experience, and please do reach out if you have any other feedback!

Britt

My free / mostly open source app also stores conversation history, synced via iCloud

https://ChatOnMac.com

edit: I can't reply to you below: Do you have the right app, there's no TestFlight just App Store link - if it's ChatOnMac then it should have a dropdown at the top of the chat room to select a model. If it's empty or otherwise bugged out please let me know what you see in the top menu. It filters the available model presets based on how much RAM you have available, so let me know what specific device you have and I can look into it. Thank you.

The model presets are also configurable by forking the bot and loading your own via GitHub (bots run inside sandboxed hidden webviews inside the app). But this is not ergonomically friendly just yet.

I was excited when I saw this, but I'm having trouble with it (and it looks like I'm not the only one). As others have pointed out, the download link on your site does open TestFlight. I've since deleted that version and installed the official version from the AppStore after revisiting this thread in search of answers.

I now have the full version installed on my iPhone 15 pro, and I have added my OpenAI key, but none of the models I've selected (3.5 Turbo, 4, 4 Turbo) work. My messages in the chat have a red exclamation next to them which opens an error message stating 'Load failed' when clicked. If I click 'Retry Message' the entire app crashes.

Apologies for the rough edges and bad experience - I’ve just soft launched without announcement til this post. I will have a hotfix up soon. Thanks for the report.
No stress. Best of luck!
> Do you have the right app, there's no TestFlight just App Store link

On chatonmac.com, the "Download on the App Store" button does not link the App Store for me either - I get a modal titled "Public Beta & Launch Day News" with "Join the TestFlight Beta" and "Launch Day Newsletter Signup Form".

Hello, I like your app and the ethics you push forward. Do you plan to add the possibility to request for Dall-E 3 images within the chat? I’ve yet to find an app which does that and makes me use my own api key
It’s planned. This is just the v1 MVP. I’ll have a hotfix out soon. Thanks for the suggestion and context
Hey I tried the TestFlight. What are the steps after a fresh download for hooking it up to model?

I saw you can spec an OpenAI key but presume it would take llama or something else.

This is really nice to use. Especially compared to MLC. Well done!
Thank you so much for taking the time to try it out!